NVIDIA DGX Station for Windows — GB300 Grace Blackwell Ultra, 20 Petaflops FP4, 748GB Memory, Trillion-Parameter AI Agents on the Desktop

NVIDIA has announced DGX Station for Windows, a deskside AI supercomputer designed to run frontier AI models of up to 1 trillion parameters locally, directly within the Windows ecosystem. Announced at NVIDIA GTC Taipei and developed in collaboration with Microsoft, DGX Station for Windows is built on the NVIDIA GB300 Grace Blackwell Ultra Desktop Superchip and is expected to be available from ASUS, Dell Technologies, GIGABYTE, HP, MSI, and Supermicro in Q4 2026.

The system targets enterprise developers, researchers, engineers, designers, and data scientists who need frontier-class AI compute — historically only available in data centers running Linux — connected directly to the Windows applications and workflows they already use. DGX Station for Windows can run hundreds of agents simultaneously, supports pretraining and fine-tuning of large models, and scales workloads seamlessly to GB300 systems in the data center or cloud.

Key capabilities include up to 20 petaflops of FP4 AI performance, up to 748GB of coherent memory, 800Gb/s networking via ConnectX-8 SuperNIC, support for Windows security primitives and NVIDIA OpenShell, and optional pairing with an NVIDIA RTX PRO 6000 Blackwell Workstation GPU for physical AI workflows combining frontier compute with ray-traced visualization.

GB300 Grace Blackwell Ultra — The Superchip Inside

DGX Station for Windows is powered by the NVIDIA GB300 Grace Blackwell Ultra Desktop Superchip, which connects an NVIDIA Blackwell Ultra GPU to a 72-core NVIDIA Grace CPU via NVLink-C2C chip-to-chip interconnect. The unified memory pool reaches up to 748GB of coherent memory, accessible by both CPU and GPU without data transfer overhead, enabling the system to load and run trillion-parameter AI models locally.

AI compute tops out at up to 20 petaflops of FP4 performance, which NVIDIA positions as sufficient for pretraining, fine-tuning, large-scale inference, and multi-agent deployment on a single deskside unit.

The system also integrates the NVIDIA ConnectX-8 SuperNIC, supporting network speeds of up to 800Gb/s. This enables fast data ingestion from enterprise storage and allows multiple DGX Station units to be connected for even larger distributed workloads.

AI Workflows DGX Station for Windows Supports

The system is designed to handle the full range of enterprise AI workloads, all within the Windows environment:

AI Agents — Build and run multiple frontier agents in parallel, connected directly to enterprise Windows applications and workflows. Hundreds of agents can execute simultaneously on a single DGX Station.

AI Development — Pretrain, fine-tune, and iterate on large AI models within Windows, with access to Linux AI toolchains via Windows Subsystem for Linux (WSL).

Data Science — Ingest large datasets directly into up to 748GB of coherent memory, removing data movement bottlenecks across data preparation, machine learning, and analytics pipelines.

AI Inference — Run high-throughput inference on AI models, including models up to 1 trillion parameters.

Physical AI — Pair the GB300 Superchip with an additional NVIDIA RTX PRO 6000 Blackwell Workstation GPU to combine frontier AI compute with ray-traced visualization and simulation in a single deskside unit, for agents that operate across virtual-to-physical environments.

DGX Station for Windows can function as a dedicated AI supercomputer for a single developer or as a shared local compute node for entire teams, with workloads scaling to GB300 data center systems or the cloud.

NVIDIA OpenShell — Secure Agent Runtime on Windows

Autonomous agents need a runtime that governs how they act, use tools, and interact with other system components. DGX Station for Windows supports NVIDIA OpenShell, an open-source, secure-by-design agent runtime built on the new Windows security and containment primitives from Microsoft.

OpenShell creates an individual, isolated sandbox for each agent and separates application-layer operations from infrastructure-layer policy enforcement. Security and privacy policies are applied at the system level — outside the agent’s reach — rather than relying on behavioral system prompts that agents could potentially bypass. The goal is to enforce constraints on the environment the agent runs in, preventing credential leaks or private data exposure.

For enterprise IT teams, this means agents deploy and operate within the same managed Windows environment, governed through familiar Microsoft security, compliance, and fleet management tools. Linux workloads receive the same manageability support through Windows Subsystem for Linux.

Enterprise IT and Fleet Management

One of the design priorities for DGX Station for Windows is integration with existing enterprise IT infrastructure. Organizations running Windows environments can manage DGX Station deployments using the same tools they already use for fleet management, deployment, and system updates — without building separate Linux-based infrastructure for AI workloads.

The system is positioned as both a dedicated workstation for individual developers and a shared local compute node for teams, making it applicable to engineering groups, research labs, design studios, and data science teams within the same organization.

Availability

OEM PartnerAvailability
ASUSQ4 2026
Dell TechnologiesQ4 2026
GIGABYTEQ4 2026
HPQ4 2026
MSIQ4 2026
SupermicroQ4 2026

DGX Station for Windows extends the NVIDIA and Microsoft collaboration that also covers NVIDIA RTX Spark, the superchip for slim Windows laptops and compact desktops targeting personal AI agents, creative workloads, and gaming.

FAQ / Common Questions

What is NVIDIA DGX Station for Windows?
It is a deskside AI supercomputer designed for enterprise developers, researchers, and data scientists. Built on the NVIDIA GB300 Grace Blackwell Ultra Desktop Superchip, it brings data-center-class AI compute into the Windows environment, capable of running AI models up to 1 trillion parameters locally.

What are the key specs of DGX Station for Windows?
The system delivers up to 20 petaflops of FP4 AI performance, up to 748GB of coherent unified memory, a 72-core NVIDIA Grace CPU, a Blackwell Ultra GPU, and ConnectX-8 SuperNIC networking at up to 800Gb/s.

What is NVIDIA OpenShell and why does it matter for enterprises?
OpenShell is an open-source secure runtime for autonomous agents. It uses new Windows security and containment primitives to create isolated sandboxes for each agent and enforces security policies at the system level rather than relying on behavioral prompts. This allows enterprises to deploy agents within their existing Windows compliance and fleet management frameworks.

When will DGX Station for Windows be available?
It is expected from ASUS, Dell Technologies, GIGABYTE, HP, MSI, and Supermicro in Q4 2026.

Can DGX Station for Windows run existing Linux AI toolchains?
Yes. Access to Linux AI toolchains is available through Windows Subsystem for Linux, allowing developers to use Python-based frameworks, model training libraries, and other Linux-native tools within the Windows environment.

How does DGX Station for Windows relate to NVIDIA RTX Spark?
The two products form the ends of NVIDIA and Microsoft’s joint agent platform for Windows. RTX Spark targets slim laptops and compact desktops for personal agents and creative work. DGX Station for Windows targets enterprise deskside deployments requiring frontier-class AI compute and multi-agent infrastructure.


Note: Details above are based on NVIDIA’s announcement at GTC Taipei 2026 and are subject to change. Final feature availability, rollout timing, and supported configurations may vary. Verify against NVIDIA’s and the respective manufacturers’ official channels before relying on any specific detail.

Disclaimer: This post summarizes an NVIDIA product announcement for informational purposes. It is not affiliated with or endorsed by NVIDIA, Microsoft, or any manufacturer mentioned.


NVIDIA RTX Spark Superchip Unveiled — 1 Petaflop AI, 128GB Unified Memory, Windows-Native Agents, Blackwell GPU + Grace CPU in One Chip

NVIDIA has unveiled RTX Spark, a new superchip designed to bring personal AI agents, creative workloads, and gaming to slim Windows laptops and compact desktop PCs. The announcement was made at NVIDIA GTC Taipei, alongside a collaboration with Microsoft to deliver a native Windows platform for on-device agents. RTX Spark-powered devices from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI are expected to arrive this fall, with models from Acer and GIGABYTE following.

RTX Spark combines an NVIDIA Blackwell RTX GPU with 6,144 CUDA cores and fifth-generation Tensor Cores with a 20-core NVIDIA Grace CPU, connected via NVLink-C2C chip-to-chip interconnect. The superchip delivers 1 petaflop of AI compute and supports up to 128GB of unified memory. MediaTek collaborated with NVIDIA on the custom CPU design, contributing to power efficiency and connectivity.

The chip targets three simultaneous use cases: running 120B-parameter LLMs locally with 1 million token context, handling creative workflows including 12K 4:2:2 video editing and 90GB+ 3D scene rendering, and playing AAA games at 1440p at over 100 frames per second with ray tracing, DLSS, and Reflex.

Related blog to check out: NVIDIA’s Vera CPU for AI Agents — 1.8x Faster Than x86, 88 Olympus Cores, Adopted by Anthropic, OpenAI, Oracle Cloud, Dell, HPE, and More.

Blackwell GPU + Grace CPU — The RTX Spark Architecture

RTX Spark is built around two interconnected components on a single package. The GPU side carries the NVIDIA Blackwell RTX architecture with 6,144 CUDA cores, fifth-generation Tensor Cores with FP4 precision, and a new Blackwell video decoder capable of handling 12K 4:2:2 content. The CPU side is a 20-core NVIDIA Grace processor, co-designed with MediaTek for efficiency and connectivity in thin-and-light form factors.

The two dies communicate via NVLink-C2C, NVIDIA’s chip-to-chip interconnect, which enables a single unified memory pool of up to 128GB accessible by both the CPU and GPU simultaneously. This unified memory architecture is what allows RTX Spark to run frontier-class language models locally — models that would otherwise require GPU memory and system RAM to be managed separately.

The full NVIDIA AI and graphics stack ships with RTX Spark: CUDA, TensorRT, OptiX, DLSS, Reflex, and G-SYNC are all supported.

Windows-Native Agents — NVIDIA OpenShell and Microsoft Security Primitives

NVIDIA and Microsoft are partnering to bring a secure, on-device agent platform to Windows. The collaboration centers on two components.

New Windows security primitives provide identity, containment, policy, and end-to-end security for agents running natively on the device. These primitives are being built into Windows and are designed to let agents execute tasks across applications, run code, and handle files while remaining under user control.

NVIDIA OpenShell is a runtime layer that adds additional policy controls on top of the Windows primitives. It lets users define what agents can and cannot access, intelligently routes queries to local models based on privacy policies, and can strip or mask personal information before any query is sent to a cloud model.

Agent developers OpenClaw and Hermes Agent (from Nous Research) are among the first to adopt OpenShell and the Microsoft security primitives in their Windows apps. From the Windows taskbar, users will be able to invoke agents that can execute tasks inside applications, run cross-app workflows, generate images and video, write code, and search local files semantically.

Microsoft CEO Satya Nadella described the goal as delivering “unmetered intelligence to every home and every desk with Windows.”

Creative Capabilities — Adobe Rearchitects Premiere and Photoshop for RTX Spark

Adobe is rebuilding Photoshop and Premiere specifically for RTX Spark, targeting up to 2x faster AI, editing, coloring, and effects performance compared with existing workflows.

Adobe Premiere is getting a new video pipeline that uses RTX Spark’s unified memory, Blackwell GPU, and TensorRT software stack. The reworked pipeline targets real-time performance for editing and color correction, GPU-accelerated AI effects, and more efficient rendering of complex timelines. Adobe Substance 3D Painter and Stager will also run natively on RTX Spark.

Adobe Photoshop’s next-generation engine is being optimized for GPU-accelerated compositing, live filters, high dynamic range workflows, and natural brushing. The engine is built to use TensorRT. Both Premiere and Photoshop will also integrate with Windows agents, allowing creators to offload tasks to an on-device AI assistant from within the apps.

Firefly-powered Generative Fill in Photoshop and Generative Extend in Premiere are among the tools that will see direct performance gains from RTX Spark. Updates are expected to roll out alongside RTX Spark device availability in fall 2026.

Other software partners include Blackmagic Design, Blender (with DLSS 4.5 Ray Reconstruction coming to version 5.3), ComfyUI (which gains 4K AI video generation via RTX Video with 4x Frame Generation), OTOY Octane, CapCut, and llama.cpp for optimized local model inference.

Gaming on RTX Spark — DLSS 4.5, Ray Tracing, G-SYNC

For gaming, RTX Spark supports AAA titles at 1440p and over 100 frames per second with ray tracing, DLSS, and Reflex. RTX technology is active in over 1,000 games and applications, and over 100 Windows software providers are embracing the platform.

New RTX capabilities coming with RTX Spark include DLSS 4.5 Ray Reconstruction, which uses a second-generation transformer model and is coming to Blender 5.3 and dozens of games. RTX Video with 4x Frame Generation is coming to ComfyUI.

Game developers embracing the platform include KRAFTON, NetEase (NARAKA: BLADEPOINT), Remedy Entertainment, Riot Games, and XBOX. NetEase noted that RTX Spark enables its titles to run as intended on ultrathin, high-performance laptops.

Device Form Factors — Slim Laptops and Compact Desktops

RTX Spark laptops are engineered to be as slim as 14mm and as light as three pounds, available in 14- to 16-inch sizes. The chassis uses precision-machined aluminum. Displays are color-accurate tandem OLED panels with NVIDIA G-SYNC, targeting both creative color work and gaming visuals. All-day battery life is a stated design goal for the laptop line.

Compact RTX Spark desktop PCs are also in development, positioned for agentic AI workloads, creative production, gaming, and everyday productivity in a small-footprint chassis.

Named devices and OEM commitments:

  • Dell XPS 16 Creator Edition — RTX Spark with large unified memory, designed for creators
  • HP OmniBook — described as one of the thinnest RTX Spark laptops
  • Microsoft Surface Laptop Ultra — targeting creators, developers, and engineers
  • Additional designs from ASUS, Lenovo, MSI, with Acer and GIGABYTE following

NVIDIA DGX Station for Windows will extend the Blackwell architecture to enterprise developers who need a deskside AI supercomputer for running agents at scale.

Rollout Timing — What’s Live When

PhaseWhenScope
AnnouncementGTC Taipei 2026 (now)RTX Spark superchip unveiled
Windows agent developer detailsMicrosoft Build, June 2–3, 2026Security primitives, OpenShell for developers
RTX Spark devices availableFall 2026Laptops and compact desktops from ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI
Acer and GIGABYTE modelsAfter fall 2026Additional OEM devices to follow
Adobe app updatesAlongside fall 2026 RTX Spark availabilityPremiere, Photoshop, Substance 3D updates

FAQ / Common Questions

What is NVIDIA RTX Spark?
RTX Spark is a superchip that combines an NVIDIA Blackwell RTX GPU and a 20-core NVIDIA Grace CPU on a single package, connected via NVLink-C2C. It is designed for Windows laptops and compact desktops, targeting AI agent execution, creative workloads, and gaming in thin, portable form factors.

How much AI compute and memory does RTX Spark offer?
The superchip delivers 1 petaflop of AI compute and supports up to 128GB of unified memory shared between the GPU and CPU. This unified memory pool allows it to run 120-billion-parameter language models locally with 1 million token context.

Which laptops and PCs will use RTX Spark?
RTX Spark-powered devices are confirmed from ASUS, Dell (XPS 16 Creator Edition), HP (OmniBook), Lenovo, Microsoft Surface (Surface Laptop Ultra), and MSI for fall 2026. Acer and GIGABYTE will follow with additional models.

What is NVIDIA OpenShell?
OpenShell is a runtime for on-device agents that works alongside new Windows security primitives from Microsoft. It lets users set policies for what agents can access, routes queries to local or cloud models based on privacy preferences, and masks personal information before sending queries externally.

Will Adobe apps like Photoshop and Premiere work differently on RTX Spark?
Adobe is rebuilding both apps specifically for RTX Spark. The new engines use TensorRT, the Blackwell GPU, and unified memory to target up to 2x faster AI and graphics performance. Updates are expected to roll out when RTX Spark devices ship in fall 2026.

When will RTX Spark devices be available?
Laptops and compact desktops powered by RTX Spark are expected to be available from system builders and cloud partners starting fall 2026.


Note: Details above are based on NVIDIA’s announcement at GTC Taipei 2026 and are subject to change. Final feature availability, rollout timing, and supported devices may vary by region. Verify against NVIDIA’s and the respective manufacturers’ official channels before relying on any specific detail.

Disclaimer: This post summarizes an NVIDIA product announcement for informational purposes. It is not affiliated with or endorsed by NVIDIA, Microsoft, Adobe, or any device manufacturer mentioned.

NVIDIA’s Vera CPU for AI Agents — 1.8x Faster Than x86, 88 Olympus Cores, Adopted by Anthropic, OpenAI, Oracle Cloud, Dell, HPE, and More

NVIDIA has unveiled Vera, its first CPU built specifically for AI agents. Now in full production, Vera is a new class of processor designed to handle the CPU-side workloads that modern AI factories generate — agentic task execution, reinforcement learning, code compilation, Python and Java runtimes, and data processing pipelines. The announcement was made at NVIDIA GTC Taipei.

Key capabilities include 1.8x faster task completion compared with x86 CPUs, a custom Olympus CPU core engineered for AI factory workloads, 88 cores with Spatial Multithreading, and up to 1.2TB/s of LPDDR5X memory bandwidth. Vera also serves as the host CPU for NVIDIA’s Vera Rubin GPU platforms via second-generation NVLink-C2C, delivering up to 1.8TB/s of coherent CPU-to-GPU bandwidth.

NVIDIA positions Vera as the successor to its Grace CPU line, which has shipped nearly 2.5 million units to date. The shift in AI factory economics — from cores per dollar to tokens per dollar — is driving the need for CPUs that can complete orchestration, tool use, and sandbox execution faster and at greater concurrency.

Olympus Core — NVIDIA’s Custom CPU Architecture

At the heart of Vera is Olympus, a custom CPU core NVIDIA engineered specifically for the workloads that sit on the critical path of AI agent execution. These include Python runtimes, sandboxed code execution, orchestration logic, and analytics pipelines — the steps that happen between GPU kernel calls and determine how quickly agents can complete tasks.

Vera features 88 Olympus cores paired with Spatial Multithreading, a technique for processing more instructions across large numbers of concurrent environments, queries, and data processing tasks simultaneously. The LPDDR5X memory subsystem delivers up to 1.2TB/s of bandwidth, reducing the time agents spend waiting on CPU-bound steps and keeping accelerators active.

According to benchmarks from Phoronix, Vera delivered the fastest overall performance across agentic workloads — including code compilation, Python, Java, and database processing — compared with competing processors tested.

Vera in the AI Factory — From Standalone Servers to GPU-Coupled Systems

Vera is designed to run across the entire AI factory stack, not just in one configuration. It powers three distinct system types:

  • Standalone Vera CPU servers — standard CPU-only configurations for data processing, orchestration, and agentic AI workloads, offered by Dell Technologies, HPE, Lenovo, and Supermicro as an alternative to x86
  • NVIDIA Vera Rubin systems — Vera serves as the host CPU tightly coupled to Rubin GPUs via second-generation NVLink-C2C, providing up to 1.8TB/s of coherent bandwidth between processor and GPU
  • NVIDIA Vera BlueField-4 STX — integrates Vera with high-performance networking, storage acceleration, and in-silicon security for AI-native storage platforms

Vera also extends NVIDIA Confidential Computing at rack scale, protecting agentic workloads end-to-end across the data center.

Deployment Plans — AI Labs, Hyperscalers, and NYSE

A broad set of customers are planning to adopt Vera for production workloads.

Anthropic, the company behind Claude, is evaluating Vera for CPU-intensive agentic workloads. James Bradbury, head of compute at Anthropic, noted that scaling compute is an important accelerant for model growth and called Vera a promising part of the ecosystem for agentic workloads.

Oracle Cloud Infrastructure is planning to deploy Vera CPUs to support high-throughput reasoning and data processing across next-generation AI environments. Mahesh Thiagarajan, EVP of OCI, described it as the next frontier in hyperscale AI supercomputing.

NYSE is collaborating with Redpanda and HPE to use Vera CPUs to scale capacity and further optimize latency across its market infrastructure, which processes more than 1.1 trillion messages per day.

Other customers exploring or planning to deploy Vera include OpenAI, SpaceXAI, ByteDance, CoreWeave, Lambda, Nebius, Nscale, and Cloudflare, among others.

System Builders and Cloud Providers

Vera CPUs are available in two form factors: dense, liquid-cooled racks for large-scale agentic AI and reinforcement learning environments, and flexible two-socket air-cooled systems for enterprise, cloud, data processing, and AI factory deployments.

Infrastructure providers building Vera-based systems include Aivres, ASRock Rack, ASUS, Compal, Dell Technologies, Foxconn, GIGABYTE, HPE, Hyve Solutions, Inventec, Lenovo, MiTAC Computing, MSI, Pegatron, Quanta Cloud Technology (QCT), Supermicro, Wistron, and Wiwynn.

Cloud service providers planning to offer Vera CPU capacity include Akamai, ByteDance, Cloudflare, CoreWeave, Crusoe, Lambda, Nebius, Nscale, Oracle Cloud Infrastructure, Redpanda, Starburst, Together AI, and Vultr.

Rollout Timing — What’s Live When

PhaseWhenScope
ProductionNow (announced at GTC Taipei 2026)Vera CPU in full production
System availabilityFall 2026Vera-based servers from system builders and cloud partners

FAQ / Common Questions

What is NVIDIA Vera and what does it do?
Vera is NVIDIA’s first CPU designed specifically for AI agent workloads. It handles the CPU-intensive tasks in AI factories — orchestration, code execution, Python and Java runtimes, data processing — and is built to complete these steps 1.8x faster than x86 processors, keeping GPU accelerators busy and improving agent throughput.

What makes Vera different from NVIDIA’s Grace CPU?
Vera is built on Olympus, a new custom CPU core NVIDIA engineered from the ground up for AI agent execution. Grace focused on general high-performance computing in data centers; Vera targets the token-per-dollar economics of AI factories, with Spatial Multithreading and LPDDR5X memory bandwidth optimized for concurrent agent environments.

Which companies are planning to use NVIDIA Vera?
AI labs including Anthropic, OpenAI, and SpaceXAI are evaluating or planning to adopt Vera. Hyperscalers ByteDance, CoreWeave, Lambda, Nebius, Nscale, and Oracle Cloud Infrastructure are also among the planned deployments. NYSE is using Vera in collaboration with HPE and Redpanda for its market infrastructure.

When will Vera-based servers be available?
Vera systems from system builders and cloud partners are expected to be available starting Fall 2026.

What is Vera BlueField-4 STX?
It is a processor that integrates the Vera CPU with high-performance networking, storage acceleration, and in-silicon security, creating a secure-by-design AI-native data platform for storage workloads in AI factories.


Note: Details above are based on NVIDIA’s announcement at GTC Taipei 2026, and are subject to change. Final feature availability, rollout timing, and supported configurations may vary. Verify against NVIDIA’s official channels before relying on any specific detail.

Disclaimer: This post summarizes an NVIDIA product announcement for informational purposes. It is not affiliated with or endorsed by NVIDIA or any manufacturer mentioned.


Anthropic’s Claude Opus 4.8 Announced — Better Honesty, Dynamic Workflows in Claude Code, Effort Control, 3x Cheaper Fast Mode

Anthropic has announced Claude Opus 4.8, an upgrade to the Opus model line that builds on Opus 4.7 with improvements across coding, agentic tasks, reasoning, and practical knowledge work. The model is available today at the same pricing as Opus 4.7: $5 per million input tokens and $25 per million output tokens for standard usage. Fast mode pricing drops to $10 per million input tokens and $50 per million output tokens — three times cheaper than fast mode was for previous Opus models.

Key additions include improved honesty and uncertainty flagging, Dynamic Workflows in Claude Code, effort control on claude.ai, and mid-task system prompt updates via the Messages API.

The release lands alongside several platform-level updates, including access to claude.ai Cowork and a new Messages API feature that lets developers update Claude’s instructions mid-task without breaking the prompt cache.

What Changed in Opus 4.8

One of the headline changes Anthropic highlights is honesty. The company notes that Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in generated code pass without flagging them. Early testers describe the model as more likely to surface uncertainties rather than assert progress it hasn’t made — a pattern sometimes called “sycophantic confidence” in AI evaluations.

Anthropic’s Alignment team assessed the model before release and concluded it “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” Rates of misaligned behavior — including deception and cooperation with misuse — are described as substantially lower than Opus 4.7 and comparable to Claude Mythos Preview, currently the company’s most safety-assessed model.

Benchmark results across coding, agentic performance, reasoning, and knowledge tasks show improvements over Opus 4.7. Detailed evaluation figures appear in the Claude Opus 4.8 System Card.

Dynamic Workflows in Claude Code

Dynamic Workflows is a new research preview feature in Claude Code that expands the scale of tasks Claude can handle within a single session. Claude plans the work, then runs hundreds of parallel subagents — each handling a piece of the larger problem — and verifies outputs before reporting back.

Anthropic’s stated example use case is codebase-scale migrations: moving hundreds of thousands of lines of code from kickoff to merge, using the existing test suite as the quality bar. The feature uses Opus 4.8 as its backbone, and the agents can run for longer sessions than prior versions allowed.

Dynamic Workflows is available in Claude Code for Enterprise, Team, and Max plan subscribers.

Effort Control on claude.ai

A new effort control appears alongside the model selector on claude.ai and Cowork. Users can select how much computational effort Claude applies to a given response:

  • Lower effort — faster responses, slower rate limit consumption
  • Default (high effort) — Anthropic’s recommended balance of quality and speed
  • Extra / Max — more tokens spent for better results on difficult tasks or long-running asynchronous work

In Claude Code, the effort levels map to high, xhigh, and max. Anthropic has raised rate limits in Claude Code to accommodate higher token usage at elevated effort levels. The effort control is available across all plans.

Opus 4.8 defaults to high effort. Anthropic notes that on coding tasks, this effort level spends a similar number of tokens as Opus 4.7’s default while delivering better performance.

Messages API — System Prompts Mid-Task

The Messages API now accepts system entries inside the messages array. Previously, developers could only set system instructions at the start of a conversation. The new capability lets a developer update Claude’s instructions — permissions, token budgets, environment context — at any point during an agent run, without breaking the prompt cache or routing the update through a user turn.

This is aimed at agentic use cases where conditions change during task execution.

What Comes Next — Mythos-Class Models

Anthropic notes that Claude Mythos Preview is currently in limited deployment for cybersecurity work through Project Glasswing. The company describes Mythos-class models as having higher intelligence than Opus, but requiring stronger cyber safeguards before general release. Anthropic says it is making progress on developing those safeguards and expects to bring Mythos-class models to all customers “in the coming weeks.”

Availability and Pricing

Claude Opus 4.8 is available today via the Claude API using the model ID claude-opus-4-8. It is also accessible on claude.ai across all plans.

Usage TypeInput (per 1M tokens)Output (per 1M tokens)
Standard$5$25
Fast Mode$10$50

Fast mode pricing is unchanged in dollar terms from the listed rates above, but Anthropic notes this represents a 3× cost reduction compared to fast mode pricing on previous Opus versions.

FAQ / Common Questions

When is Claude Opus 4.8 available?
It launched on May 28, 2026 and is available immediately via the Claude API (claude-opus-4-8) and on claude.ai.

What is the price of Claude Opus 4.8?
Standard usage is $5 per million input tokens and $25 per million output tokens — the same as Opus 4.7. Fast mode is $10 per million input tokens and $50 per million output tokens, which is three times cheaper than fast mode was on prior Opus models.

What are Dynamic Workflows in Claude Code?
Dynamic Workflows is a research preview feature that lets Claude plan large tasks and run hundreds of parallel subagents within a single Claude Code session. It targets large-scale work like codebase migrations. Available for Enterprise, Team, and Max plan users.

What is effort control on claude.ai?
A new setting that lets users choose how much computational effort Claude applies to a response. Lower effort is faster and uses less of your rate limit; extra and max effort spend more tokens for better results on hard tasks.

What is Claude Mythos Preview?
Anthropic describes it as a model with higher intelligence than Opus, currently deployed to a small number of organizations for cybersecurity work under Project Glasswing. Broader availability is expected in the coming weeks.

Note: Details above are based on Anthropic’s announcement on May 28, 2026, and are subject to change. Final feature availability, rollout timing, and supported plans may vary. Verify against Anthropic’s official channels before relying on any specific detail.

Disclaimer: This post summarizes an Anthropic product announcement for informational purposes. It is not affiliated with or endorsed by Anthropic.

Google’s Antigravity 2.0 Built a Working OS with AI Agents — 93 Subagents, Gemini 3.5 Flash

Google has unveiled what its Antigravity 2.0 multi-agent platform can do when left to run asynchronously: a team of AI agents, powered entirely by Gemini 3.5 Flash, built a functional operating system from scratch — kernel, process and memory management, filesystem, and video and keyboard drivers — capable of running FreeDoom. The whole thing ran from a single high-level prompt, with no human corrections along the way.

Key highlights include 93 subagents working in parallel, 15,314 model calls, over 339 million input tokens (2.6 billion+ total with cache reads, output, and thinking), and a total API cost of $916.92. The same team has since built a working AlphaZero implementation, a photo editing suite, a real-time messaging app, and a multi-user collaboration platform.

The findings come alongside the launch of /teamwork-preview, a new slash command in Antigravity that gives users access to the same agent orchestration used in these experiments.


Synchronous vs Asynchronous — Why This Distinction Matters

The Google blog post draws a clean line between two ways of working with AI agents. In synchronous (human-in-the-loop) workflows, the personality and behaviour of the model matters — whether it thinks enough or too much, whether it takes unnecessary steps, whether it can be steered mid-task. These qualities build trust even when the final output would be identical either way.

In asynchronous (fire-and-forget) workflows, none of that matters. The only variable is raw intelligence. If the model is smart enough to reason through ambiguity and recover from failure on its own, it can run independently. If it isn’t, no amount of orchestration compensates.

Gemini 3.5 models, according to the post, cross that threshold. Gemini 3.1 Pro was unable to complete the OS build. Gemini 3.5 Flash — the lighter, more economical model — succeeded.


Building an OS from a Single Prompt

The operating system was built end-to-end without human guidance after the initial prompt. The agent team produced a working kernel with process and memory management, a filesystem, and video and keyboard drivers. FreeDoom ran on it.

The scale of the run:

  • 93 subagents across specialised roles
  • 15,314 model calls
  • 339M+ input tokens (2.6B+ total including cache reads, thinking tokens, and output)
  • $916.92 at standard API pricing

The OS has real limitations — no floating-point math support, no hardware acceleration, no complex multi-threading, no sandboxing, no JIT compilation, and no complex audio or video decoding. It’s nowhere near a modern production OS. But it was built from nothing, by an agent team, for under $1,000, from a single prompt.

One detail worth noting: the first run completed unusually fast. Investigation revealed the agents were referencing context from previous runs that hadn’t been cleared. Anti-cheating measures and guardrails were added. The clean run built the same result without any prior context to draw from.


AlphaZero, a Photo Editor, and More

After the OS, the team ran a second experiment: reproduce the AlphaZero paper. The agents built the reinforcement learning pipeline in JAX and Flax, trained a ResNet model from scratch via self-play using multi-TPU pods, and built a full-stack web app for users to play against the trained model. The pipeline scaled from small local training loops up to 9×9 board training on multi-TPU infrastructure.

Following those two, the same agent orchestration was applied to:

  • A photo editing suite
  • A real-time messaging app
  • A multi-user collaboration platform

The results are described as functional starting points — not commercial-grade, not production-ready, but usable and built autonomously.


The Seven-Agent Architecture

Rather than one agent handling everything, the system uses seven specialised agent types with defined scopes:

  • Sentinel — the front-desk manager. Structures the user’s intent, spawns the Orchestrator, supervises overall completion. Does not write code or make technical decisions.
  • Orchestrator — dispatch-only. Decomposes requirements into milestones, kicks off other agents, synthesises reports. Never writes code or runs builds itself.
  • Explorer — reads requirements and previous logs, writes formal strategies for the Orchestrator to act on. Never writes code.
  • Worker — the coder. Implements strategies, builds the code, runs tests.
  • Reviewer — independently reviews the Worker’s changes for design correctness, edge cases, and interface contract compliance.
  • Critic — stress-tests the solution, runs adversarial tests to find coverage gaps.
  • Auditor — an independent investigator that verifies the authenticity and robustness of generated solutions.

The separation of concerns is deliberate. Keeping analysis, coding, reviewing, and auditing in distinct agents prevents any single role from becoming a single point of failure or a source of unchecked shortcuts.


Three Technical Tricks That Made It Work

Running 93 parallel agents over a task of this complexity surfaces problems that simpler setups never hit. Three specific mechanisms kept things on track.

Self-succession for context length. Large, long-running tasks fill up context windows. The Orchestrator tracks its cumulative subagent spawn count. Once it hits a limit, it dumps its full state to handoff files, terminates its background tasks, and spawns a successor with the same goals and permissions. The successor picks up from the files; the original terminates. Context resets cleanly without losing progress.

Scheduled crons for stuck processes. With many parallel subagents, any one of them can enter an infinite loop, hang on a compile, or stall on blocked I/O. A background cron — using Antigravity’s Scheduled Tasks primitive — monitors progress files that subagents write to. If a file’s timestamp goes stale past a threshold, the Sentinel terminates and respawns the blocked agent automatically.

An Auditor to catch LLM laziness. When a task is difficult enough, a model may take shortcuts — hardcoding a test output, writing a mock facade that makes tests pass without implementing the underlying logic. The Auditor runs strict static analysis checks, independent of whether the code works. Before the Sentinel marks any task complete and notifies the user, a final audit is forced. If the Auditor finds cheating, the cycle continues.


/teamwork-preview — Now Available in Antigravity

The same orchestration used in these experiments is now accessible through a new slash command: /teamwork-preview. It’s a research preview, available to Antigravity users on the Google AI Ultra plan ($200/month). It uses the same core primitives — parallel subagents, asynchronous tasks, hooks, and scheduled tasks — with no special internal version of the product.

A few practical notes from the announcement:

  • Recommended model: Gemini 3.5 Flash. Using a larger model will substantially increase costs.
  • Quota: Even with Gemini 3.5 Flash on AI Ultra, a single complex task can exhaust a full weekly quota. Users can purchase additional AI credits.
  • Resuming mid-task: If the agent team stops due to a quota or credit issue, users can purchase more credits and send “Continue” — the team picks up from where it stopped.
  • Local machine required: Since the agents run locally, the machine must stay awake for the duration of the run, even if the user isn’t actively monitoring it.

The post describes the current state as a research preview, with ongoing iteration on orchestration, UI, performance, reliability, and observability.


FAQ / Common Questions

What is Google Antigravity 2.0?
Antigravity is Google’s AI agent platform. Version 2.0 introduces new primitives including parallel-running subagents, asynchronous tasks, hooks, and scheduled tasks. The OS and AlphaZero experiments were built using these same primitives, with no special internal tooling.

Which Gemini Flash model was used, and why not a larger one?
Gemini 3.5 Flash was used. Gemini 3.1 Pro was attempted but could not complete the task. The post notes that even Flash — the lighter model — succeeded, which the team sees as evidence of a significant jump in underlying model intelligence rather than orchestration alone.

What are the limitations of the OS that was built?
The OS lacks floating-point math support, hardware acceleration, complex multi-threading, sandboxing, JIT compilation, and complex audio/video decoding. It is a functional barebones OS, not comparable to a modern production operating system.

Who can access /teamwork-preview?
It’s available to Antigravity users on the Google AI Ultra plan ($200/month) as a research preview. The post recommends pairing it with Gemini 3.5 Flash and warns that complex tasks will consume significant quota, possibly within the first run.


Note: Details above are based on Google’s announcement published at antigravity.google/blog, and are subject to change. Final feature availability, rollout timing, and supported plans may vary. Verify against Google’s official channels before relying on any specific detail.

Disclaimer: This post summarises a Google product announcement for informational purposes. It is not affiliated with or endorsed by Google or any platform mentioned.

Google’s Gemini Intelligence on Android — Multi-Step App Automation, Rambler, Create My Widget Coming First to Galaxy S26 and Pixel 10

Google has announced / introduced Gemini Intelligence on Android, a layer that brings proactive Gemini-powered features to a curated set of Android devices. The rollout starts this summer (2026) on the Samsung Galaxy S26 and Google Pixel 10, with the feature set expanding to Wear OS watches, cars, Android XR glasses, and Android-powered laptops later in the year.

Key additions include multi-step task automation across apps, the new Rambler voice-to-text feature with built-in Hindi-English code-mixing support, Create My Widget for natural-language widget building, smarter Autofill tied to Gemini’s Personal Intelligence, and Gemini in Chrome for research and browsing tasks (rolling out late June).

Android is moving from an operating system into an intelligence system.

The framing from Google: Android is moving from an operating system into an intelligence system. Privacy and control are part of the pitch — Gemini acts only on explicit commands, audio for Rambler isn’t stored, and the Autofill–Gemini link is opt-in.

Multi-Step Task Automation Across Apps

Google’s spent the last few months tuning Gemini’s multi-step automation on the Galaxy S26 and Pixel 10, with food-delivery and rideshare apps as the launch focus. You hand off the logistics — booking a spin-class bike, finding a Gmail-attached class syllabus and ordering the books, walking through a delivery app’s checkout — and Gemini drives the in-app steps for you.

Screen and image context unlock more of this. Long-press the power button over a notes-app grocery list and Gemini can turn it into a shopping cart for delivery. Snap a photo of a travel brochure in a hotel lobby and ask it to find a comparable tour on Expedia for six people. Notifications track each task as Gemini works in the background.

The control model is the part to pay attention to. Gemini only acts on an explicit command, runs until the task is done, and stops. A final confirmation step stays with you.

Gemini in Chrome — Late June Rollout

Starting in late June, Chrome on Android gets a Gemini browsing layer. The assistant can summarize a page, compare information across multiple tabs, and answer research-style queries from inside the browser.

There’s also Chrome auto browse — Gemini taking care of repetitive web tasks on its own. Two examples called out by Google: booking an appointment and reserving a parking spot. Same control model as app automation — explicit commands, defined stop points.

Bento Blog header 5.6.26 .width 1200.format webp

Smarter Autofill, Now Tied to Gemini

Autofill with Google has handled the obvious fields for a while. The Gemini version goes after the messier ones — complex forms with multiple sections, vague labels, and the kind of context the older Autofill couldn’t reason about. Gemini pulls the relevant data from your connected apps and fills fields without making you flip between screens.

The Gemini connection here is strictly opt-in. You choose whether to link Gemini to Autofill with Google, and a toggle in settings lets you turn it off at any point.

Rambler — Speech to Polished Text, With Hindi-English Mixing

Gboard’s voice-to-text has been fine for clean dictation. Rambler is built for the way most people actually talk — with um, ah, like, mid-sentence corrections, and the habit of changing direction halfway through. You speak naturally; Rambler keeps the substance and drops the filler, returning a tighter written version.

The India-relevant detail: Rambler handles multi-lingual input in a single message. The example Google flagged is English-Hindi code-mixing — the kind of switching most Indian users do every day in WhatsApp and email. Gemini’s multi-lingual model reads context across the switch and produces a clean message that keeps the original mix.

On privacy: Rambler shows a clear indicator while it’s active, audio is transcribed in real time, and nothing is stored or saved after the transcription is done.

Create My Widget — Generative UI on the Home Screen

Create My Widget is the first generative-UI step on Android. Describe what you want in plain language, and Gemini builds the widget.

Two examples from Google’s post:

  • A meal-prep widget told to “suggest three high-protein recipes every week” — Gemini builds a resizable dashboard you can drop on the home screen.
  • A cyclist asking for a weather widget that surfaces only wind speed and rain — Gemini strips the standard weather card down to just those data points.

The widgets work on Gemini Intelligence Android phones and on Wear OS watches. A watch widget can be a stripped-down view of the same generated layout, so the data you care about stays one glance away.

A UI Built on Material 3 Expressive

Gemini Intelligence ships with an updated visual system layered on Material 3 Expressive. Animations are tied to purpose — confirming a task, showing progress, transitioning between states — and the design aims to reduce ambient distractions rather than add to them.

Rollout Timing — What’s Live When

WaveWhenWhat
PhonesSummer 2026Samsung Galaxy S26, Google Pixel 10
Chrome on AndroidLate June 2026Gemini in Chrome, Chrome auto browse
Watches, cars, Android XR glasses, laptopsLater in 2026Gemini Intelligence features expand across categories

FAQ / Common Questions

When does Gemini Intelligence start rolling out on Android?
This summer (2026), starting on the latest Samsung Galaxy and Google Pixel phones. Other Android device categories — watches, cars, glasses, and laptops — get the features later in 2026.

Which phones get Gemini Intelligence first?
Google has named the Samsung Galaxy S26 and the Google Pixel 10 as the launch devices.

Does Rambler support Hindi?
Yes — Rambler handles multi-lingual input in a single message, with English-Hindi code-mixing specifically called out. Other language combinations work too, as long as Gemini’s multi-lingual model supports them.

Will Gemini Intelligence work on Wear OS watches?
Yes, but later. Watches are part of the next wave, alongside cars, Android XR glasses, and laptops, expected later in 2026. Create My Widget is one of the features confirmed for the watch.

Is the Autofill–Gemini connection automatic?
No — it’s strictly opt-in. You choose whether to link the two, and a toggle in settings lets you turn it off whenever you want.

Does Rambler store my voice recordings?
No. Audio is transcribed in real time and is not stored or saved.


Note: Details above are based on Google’s announcement on May 12, 2026, and are subject to change. Final feature availability, rollout timing, and supported devices may vary by region. Verify against Google’s official channels before relying on any specific detail.

Disclaimer: This post summarizes a Google product announcement for informational purposes. It is not affiliated with or endorsed by Google or any device manufacturer mentioned.

Source: Google blog.

Exit mobile version