For the first time on a major AI platform release, security shipped at launch — not bolted on 18 months later. At Nvidia GTC this week, five security vendors announced protection for Nvidia’s agentic AI stack, four with active deployments, one with validated early integration.The timing reflects how fast the threat has moved: 48% of cybersecurity professionals rank agentic AI as the top attack vector heading into 2026. Only 29% of organizations feel fully ready to deploy these technologies securely. Machine identities outnumber human employees 82 to 1 in the average enterprise. And IBM’s 2026 X-Force Threat Intelligence Index documented a 44% surge in attacks exploiting public-facing applications, accelerated by AI-enabled vulnerability scanning.Nvidia CEO Jensen Huang made the case from the GTC keynote stage on Monday: “Agentic systems in the corporate network can access sensitive information, execute code, and communicate externally. Obviously, this can’t possibly be allowed.” Nvidia defined a unified threat model designed to flex and adapt for the unique strengths of five different vendors. Nvidia also names Google, Microsoft Security and TrendAI as Nvidia OpenShell security collaborators. This article maps the five vendors with embargoed GTC announcements and verifiable deployment commitments on record, an analyst-synthesized reference architecture, not Nvidia’s official canonical stack.No single vendor covers all five governance layers. Security leaders can evaluate CrowdStrike for agent decisions and identity, Palo Alto Networks for cloud runtime, JFrog for supply chain provenance, Cisco for prompt-layer inspection, and WWT for pre-production validation. The audit matrix below maps who covers what. Three or more unanswered vendor questions mean ungoverned agents in production.The five-layer governance frameworkThis framework draws from the five vendor announcements and the OWASP Agentic Top 10. The left column is the governance layer. The right column is the question every security leader’s vendor should answer. If they can’t answer it, that layer is ungoverned.Governance LayerWhat To DeployRisk If NotVendor QuestionWho Maps HereAgent DecisionsReal-time guardrails on every prompt, response, and actionPoisoned input triggers privileged actionDetect state drift across sessions?CrowdStrike Falcon AIDR, Cisco AI Defense [runtime enforcement]Local ExecutionBehavioral monitoring for on-device agentsLocal agent runs unprotectedAgent baselines beyond process monitoring?CrowdStrike Falcon Endpoint [runtime enforcement]; WWT ARMOR [pre-prod validation]Cloud OpsRuntime enforcement across cloud deploymentsAgent-to-agent privilege escalationTrust policies between agents?CrowdStrike Falcon Cloud Security [runtime enforcement]; Palo Alto Prisma AIRS [AI Factory validated design]IdentityScoped privileges per agent identityInherited creds; delegation compoundsPrivilege inheritance in delegation?CrowdStrike Falcon Identity [runtime enforcement]; Palo Alto Networks/CyberArk [identity governance platform]Supply ChainModel scanning + provenance before deployCompromised model hits productionProvenance from registry to runtime?JFrog Agent Skills Registry [pre-deployment]; CrowdStrike FalconFive-layer governance audit matrix. Three or more unanswered vendor questions indicate ungoverned agents in production. [runtime enforcement] = inline controls active during agent execution. [pre-deployment] = controls applied before artifacts reach runtime. [pre-prod validation] = proving-ground testing before production rollout. [AI Factory validated design] = Nvidia reference architecture integration, not OpenShell-launch coupling.CrowdStrike’s Falcon platform embeds at four distinct enforcement points in the Nvidia OpenShell runtime: AIDR at the prompt-response-action layer, Falcon Endpoint on DGX Spark and DGX Station hosts, Falcon Cloud Security across AI-Q Blueprint deployments, and Falcon Identity for agent privilege boundaries. Palo Alto Networks enforces at the BlueField DPU hardware layer within Nvidia’s AI Factory validated design. JFrog governs the artifact supply chain from the registry through signing. WWT validates the full stack pre-production in a live environment. Cisco runs an independent guardrail at the prompt layer.CrowdStrike and Nvidia are also building what they call intent-aware controls. That phrase matters. An agent constrained to certain data is access-controlled. An agent whose planning loop is monitored for behavioral drift is governed. Those are different security postures, and the gap between them is where the 4% error rate at 5x speed becomes dangerous.Why the blast radius math changedDaniel Bernard, CrowdStrike’s chief business officer, told VentureBeat in an exclusive interview what the blast radius of a compromised AI agent looks like compared to a compromised human credential.“Anything we could think about from a blast radius before is unbounded,” Bernard said. “The human attacker needs to sleep a couple of hours a day. In the agentic world, there’s no such thing as a workday. It’s work-always.”That framing tracks with architectural reality. A human insider with stolen credentials works within biological limits: typing speed, attention span, a schedule. An AI agent with inherited credentials operates at compute speed across every API, database, and downstream agent it can reach. No fatigue. No shift change. CrowdStrike’s 2026 Global Threat Report puts the fastest observed eCrime breakout at 27 seconds and average breakout times at 29 minutes. An agentic adversary doesn’t have an average. It runs until you stop it.When VentureBeat asked Bernard about the 96% accuracy number and what happens in the 4%, his answer was operational, not promotional: “Having the right kill switches and fail-safes so that if the wrong thing is decided, you’re able to quickly get to the right thing.” The implication is worth sitting on. 96% accuracy at 5x speed means the errors that get through arrive five times faster than they used to. The oversight architecture has to match the detection speed. Most SOCs are not designed for that.Bernard’s broader prescription: “The opportunity for customers is to transform their SOCs from history museums into autonomous fighting machines.” Walk into the average enterprise SOC and inventory what’s running there. He’s not wrong.On analyst oversight when agents get it wrong, Bernard drew the governance line: “We want to keep not only agents in the loop, but also humans in the loop of the actions that the SOC is taking when that variance in what normal is realized. We’re on the same team.”The full vendor stackEach of the five vendors occupies a different enforcement point the other four do not. CrowdStrike’s architectural depth in the matrix reflects four announced OpenShell integration points; security leaders should weigh all five based on their existing tooling and threat model.Cisco shipped Secure AI Factory with AI Defense, extending Hybrid Mesh Firewall enforcement to Nvidia BlueField DPUs and adding AI Defense guardrails to the OpenShell runtime. In multi-vendor deployments, Cisco AI Defense and Falcon AIDR run as parallel guardrails: AIDR enforcing inside the OpenShell sandbox, AI Defense enforcing at the network perimeter. A poisoned prompt that evades one still hits the other.Palo Alto Networks runs Prisma AIRS on Nvidia BlueField DPUs as part of the Nvidia AI Factory validated design, offloading inspection to the data processing unit at the network hardware layer, below the hypervisor and outside the host OS kernel. This integration is best understood as a validated reference architecture pairing rather than a tight OpenShell runtime coupling. Palo Alto intercepts east-west agent traffic on the wire; CrowdStrike monitors agent process behavior inside the runtime. Same cloud runtime row, different integration model and maturity stage.JFrog announced the Agent Skills Registry, a system of record for MCP servers, models, agent skills, and agentic binary assets within Nvidia’s AI-Q architecture. Early integration with Nvidia has been validated, with full OpenShell support in active development. JFrog Artifactory will serve as a governed registry for AI skills, scanning, verifying, and signing every skill before agents can adopt it. This is the only pre-deployment enforcement point in the stack. As Chief Strategy Officer Gal Marder put it: “Just as a malicious software package can compromise an application, an unvetted skill can guide an agent to perform harmful actions.”Worldwide Technology launched a Securing AI Lab inside its Advanced Technology Center, built on Nvidia AI factories and the Falcon platform. WWT’s vendor-agnostic ARMOR framework is a pre-production validation and proving-ground capability, not an inline runtime control. It validates how the integrated stack behaves in a live AI factory environment before any agent touches production data, surfacing control interactions, failure modes, and policy conflicts before they become incidents.Three MDR numbers: what they actually measureOn the MDR side, CrowdStrike fine-tuned Nvidia Nemotron models on first-party threat data and operational SOC data from Falcon Complete engagements. Internal benchmarks show 5x faster investigations, 3x higher triage accuracy in high-confidence benign classification, and 96% accuracy in generating investigation queries within Falcon LogScale. Kroll, a global risk advisory and managed security firm that runs Falcon Complete as its MDR backbone, confirmed the results in production. Because Kroll operates Falcon Complete as its core MDR platform rather than as a neutral third-party evaluator, their validation is operationally meaningful but not independent in the audit sense. Industry-wide third-party benchmarks for agentic SOC accuracy do not yet exist. Treat reported numbers as indicative, not audited.The 5x investigation speed compares average agentic investigation time (8.5 minutes) against the longest observed human investigation in CrowdStrike’s internal testing: a ceiling, not a mean. The 3x triage accuracy measures one internal model against another. The 96% accuracy applies specifically to generating Falcon LogScale investigation queries via natural language, not to overall threat detection or alert classification.JFrog’s Agent Skills Registry operates beneath all four CrowdStrike enforcement layers, scanning, signing, and governing every model and skill before any agent can adopt it — with early Nvidia integration validated and full OpenShell support in active development.Six enterprises are already in deploymentEY selected the CrowdStrike-Nvidia stack to power Agentic SOC services for global enterprises. Nebius ships with Falcon integrated into its AI cloud from day one. CoreWeave CISO Jim Higgins signed off on the Blueprint. Mondelēz North America Regional CISO Emmett Koen said the capability lets his team “focus on higher-value response and decision-making.” MGM Resorts International CISO Bryan Green endorsed WWT’s validated testing environments, saying enterprises need “validated environments that embed protection from the start.” These range from vendor selection and platform validation to production integration. The signal is converging across buyer types, not uniform at-scale deployment.What the five-vendor stack does not coverThe governance framework above represents real progress. It also has three holes that every security leader deploying agentic AI will eventually hit. No vendor at GTC closed any of them. Knowing where they are is as important as knowing what shipped.Agent-to-agent trust. When agents delegate to other agents, credentials compound. The OWASP Top 10 for Agentic Applications lists tool call hijacking and orchestrator manipulation as top-tier risks. Independent research from BlueRock Security scanning over 7,000 MCP servers found 36.7% contain vulnerabilities. An arXiv preprint study across 847 scenarios found a 23 to 41% increase in attack success rates in MCP integrations versus non-MCP. No vendor at GTC demonstrated a complete trust policy framework for agent-to-agent delegation. This is the layer where the 82:1 identity ratio becomes a governance crisis, not just an inventory problem.Memory integrity. Agents with persistent memory create an attack surface that stateless LLM deployments do not have. Poison an agent’s long-term memory once. Influence its decisions weeks later. The OWASP Agentic Top 10 flags this explicitly. CrowdStrike’s intent-aware controls are the closest architectural response announced at GTC. Implementation details remain forward-looking.Registry-to-runtime provenance. JFrog’s Agent Skills Registry addresses the registry side of this problem. The gap that remains is the last mile: end-to-end provenance requires proving the model executing in production is the exact artifact scanned and signed in the registry. That cryptographic continuity from registry to runtime is still an engineering problem, not a solved capability.What running five vendors actually costsThe governance matrix is a coverage map, not an implementation plan. Running five vendors across five enforcement layers introduces real operational overhead that the GTC announcements did not address. Someone has to own policy orchestration: deciding which vendor’s guardrail wins when AIDR and AI Defense return conflicting verdicts on the same prompt. Someone has to normalize telemetry across Falcon LogScale, Prisma AIRS, and JFrog Artifactory into a single incident workflow. And someone has to manage change control when one vendor ships a runtime update that shifts how another vendor’s enforcement layer behaves.A realistic phased rollout looks like this: start with the supply chain layer (JFrog), because it operates pre-deployment and has no runtime dependencies on the other four. Add identity governance (Falcon Identity) second, because scoped agent credentials limit blast radius before you instrument the runtime. Then instrument the agent decision layer (Falcon AIDR or Cisco AI Defense, depending on your existing vendor footprint), then cloud runtime, then local execution. Running all five simultaneously from day one is an integration project, not a configuration task. Budget for it accordingly.What to do before your next board meetingHere is what every CISO should be able to say after running the framework above: “We have audited every autonomous agent against five governance layers. Here is what’s in place, and here are the five questions we are holding vendors to.” If you cannot say that today, the issue is not that you are behind schedule. The issue is that no schedule existed. Five vendors just shipped the architectural scaffolding for one.Do four things before your next board meeting:Run the five-layer audit. Pull every autonomous agent your organization has in production or staging. Map each one against the five governance rows above. Mark which vendor questions you can answer and which you cannot.Count the unanswered questions. Three or more means ungoverned agents in production. That is your board number, not a backlog item.Pressure-test the three open gaps. Ask your vendors, explicitly: How do you handle agent-to-agent trust across MCP delegation chains? How do you detect memory poisoning in persistent agent stores? Can you show a cryptographic binding between the registry scan and the runtime load? None of the five vendors at GTC has a complete answer. That is not an accusation. It is where the next year of agentic security gets built.Establish the oversight model before you scale. Bernard put it plainly: keep agents and humans in the loop. 96% accuracy at 5x speed means errors arrive faster than any SOC designed for human-speed detection can catch them. The kill switches and fail-safes have to be in place before the agents run at scale, not after the first missed breach.The scaffolding is necessary. It is not sufficient. Whether it changes your posture depends on whether you treat the five-layer framework as a working instrument or skip past it in the vendor deck.
Venture Beat
Nvidia lets its ‘claws’ out: NemoClaw brings security, scale to the agent platform taking over AI
Every few years, a piece of open-source software arrives that rewires how the industry thinks about computing. Linux did it for servers. Docker did it for deployment. OpenClaw — the autonomous AI agent platform that went from niche curiosity to the fastest-growing open-source project in history in a matter of weeks — may be doing it for software itself. Nvidia CEO and co-founder Jensen Huang made his position plain at GTC 2026 this week: “OpenClaw is the operating system for personal AI. This is the moment the industry has been waiting for — the beginning of a new renaissance in software.” And Nvidia wants to be the company that makes it enterprise-ready.At its annual large GTC 2026 conference in San Jose this week, Nvidia unveiled NemoClaw, a software stack that integrates directly with OpenClaw and installs in a single command. Along with it came Nvidia OpenShell, an open-source security runtime designed to give autonomous AI agents — or “claws”, as the industry is increasingly calling them — the guardrails they need to operate inside real enterprise environments. Alongside both, the company announced an expanded Nvidia Agent Toolkit, a full-stack platform for building and running production-grade agentic workflows.The message from Jensen Huang was unambiguous. “Claude Code and OpenClaw have sparked the agent inflection point — extending AI beyond generation and reasoning into action,” the Nvidia CEO said ahead of the conference. “Employees will be supercharged by teams of frontier, specialized and custom-built agents they deploy and manage.” Watch my video overview of it below and read on for more:Why ‘claws’ — and why it matters that Nvidia is using the wordThe terminology shift happening inside enterprise AI circles is subtle but significant. Internally, teams building with OpenClaw and similar platforms have taken to calling individual autonomous agents claws — a nod to the platform name, but also a useful shorthand for a new class of software that differs fundamentally from the chatbots and copilots of the last two years.As Kari Briski, Nvidia’s VP of generative AI software, put it during a Sunday briefing: “Claws are autonomous agents that can plan, act, and execute tasks on their own — they’ve gone from just thinking and executing on tasks to achieving entire missions.”That framing matters for IT decision-makers. Claws are not just assistants. They are persistent, tool-using programs that can write code, browse the web, manipulate files, call APIs, and chain actions together over hours or days without human input. The productivity upside is substantial. So is the attack surface. Which is precisely the problem Nvidia is positioning NemoClaw to solve.The enterprise demand is not hypothetical. Harrison Chase, founder of LangChain — whose open-source agent frameworks have been downloaded more than a billion times — put it bluntly in a recent episode of VentureBeat’s Beyond the Pilot podcast: “I guarantee that every enterprise developer out there wants to put a safe version of OpenClaw onto onto their computer or expose it to their users.” The bottleneck, he made clear, has never been interest. It has been the absence of a credible security and governance layer underneath it. NemoClaw is Nvidia’s answer to that gap — and notably, LangChain is one of the launch partners for the Agent Toolkit and OpenShell integration.What NemoClaw actually does — and what it doesn’t replaceNemoClaw is not a competitor to OpenClaw (or the now many alternatives). It is best understood as an enterprise wrapper around it — a distribution that ships with the components a security-conscious organization actually needs before letting an autonomous agent near production systems.The stack has two core components. The first is Nvidia Nemotron, Nvidia’s family of open models, which can run locally on dedicated hardware rather than routing queries through external APIs. Nemotron-3-Super, scored the highest out of all open models on PinchBench, a benchmark that tests the types of tasks and tools calls needed by OpenClaw. The second is OpenShell, the new open-source security runtime that runs each claw inside an isolated sandbox — effectively a Docker container with configurable policy controls written in YAML. Administrators can define precisely which files an agent can access, which network connections it can make, and which cloud services it can call. Everything outside those bounds is blocked.Nvidia describes OpenShell as providing the missing infrastructure layer beneath claws — giving them the access they need to be productive while enforcing policy-based security, network, and privacy guardrails.For organizations that have been watching OpenClaw’s rise with a mixture of excitement and dread, this is a meaningful development. OpenClaw’s early iterations were, by general consensus, a security liability — powerful and fast-moving, but essentially unconstrained. NemoClaw is the first attempt by a major hardware vendor to make that power manageable at enterprise scale.The hardware angle: always-on agents need dedicated computeOne aspect of NemoClaw that deserves more attention than it has received is the hardware strategy underneath it. Claws, by design, are always-on — they do not wait for a human to open a browser tab. They run continuously, monitoring inboxes, executing tasks, building tools, and completing multi-step workflows around the clock.That requires dedicated compute that does not compete with the rest of the organization’s workloads. Nvidia has a clear interest in pointing enterprises toward its own hardware for this purpose.NemoClaw is designed to run on Nvidia GeForce RTX PCs and laptops, RTX PRO workstations, and the company’s DGX Spark and DGX Station AI supercomputers. The hybrid architecture allows agents to use locally-running Nemotron models for sensitive workloads, with a privacy router directing queries to frontier cloud models when higher capability is needed — without exposing private data to those external endpoints.It is an elegant solution to a real problem: many enterprises are not yet ready to send customer data, internal documents, or proprietary code to cloud AI providers, but they still need model capability that exceeds what runs locally. NemoClaw’s privacy router architecture threads that needle, at least in principle.What claws actually look like in the enterprise Before evaluating the platform, it helps to understand what a claw doing real work looks like in practice. Two partner integrations announced alongside NemoClaw offer the clearest window into where this is heading.Box is perhaps the most illustrative case for organizations that manage large volumes of unstructured enterprise content. Box is integrating Nvidia Agent Toolkit to enable claws that use the Box file system as their primary working environment, with pre-built skills for Invoice Extraction, Contract Lifecycle Management, RFP sourcing, and GTM workflows. The architecture supports hierarchical agent management: a parent claw — such as a Client Onboarding Agent — can spin up specialized sub-agents to handle discrete tasks, all governed by the same OpenShell Policy Engine.Critically, an agent’s access to files in Box follows the exact same permissions model that governs human employees — enforced through OpenShell’s gateway layer before any data is exchanged. Every action is logged and attributable; no shadow copies accumulate in agent memory. As Box puts it in their announcement blog, “organizations need to know which agent touched which file, when, and why — and they need the ability to revoke access instantly if something goes wrong.”Cisco’s integration offers perhaps the most visceral illustration of what OpenShell guardrails enable in practice. The Cisco security team has published a scenario in which a zero-day vulnerability advisory drops on a Friday evening. Rather than triggering a weekend-long manual scramble — pulling asset lists, pinging on-call engineers, mapping blast radius — a claw running inside OpenShell autonomously queries the configuration database, maps impacted devices against the network topology, generates a prioritized remediation plan, and produces an audit-grade trace of every decision it made. Cisco AI Defense verifies every tool call against approved policy in real time. The entire response completes in roughly an hour, with a complete record that satisfies compliance requirements. “We are not trusting the model to do the right thing,” the Cisco team noted in their technical writeup. “We are constraining it so that the right thing is the only thing it can do.”An ecosystem play: the partners behind the stackNvidia is not building this alone. The Agent Toolkit and OpenShell announcements came with a significant roster of enterprise partners — Box, Cisco, Atlassian, Salesforce, SAP, Adobe, CrowdStrike, Cohesity, IQVIA, ServiceNow, and more than a dozen others — whose integration depth signals how seriously the broader software industry is treating the agentic shift.On the infrastructure side, OpenShell is available today on build.nvidia.com, supported by cloud inference providers including CoreWeave, Together AI, Fireworks, and DigitalOcean, and deployable on-premises on servers from Cisco, Dell, HPE, Lenovo, and Supermicro. Agents built within OpenShell can also continuously acquire new skills using coding agents including Claude Code, Codex, and Cursor — with every newly acquired capability subject to the same policy controls as the original deployment.Separately, Nvidia announced the Nemotron Coalition — a collaborative initiative bringing together Mistral AI, Perplexity, Cursor, and LangChain to co-develop open frontier models. The coalition’s first project is a base model co-developed with Mistral that will underpin the upcoming Nemotron 4 family, aimed specifically at agentic use cases.What enterprise leaders should be watchingThe NemoClaw announcement marks a turning point in how enterprise AI is likely to be discussed in boardrooms and procurement meetings over the next twelve months. The question is no longer whether organizations will deploy autonomous agents. The industry has clearly moved past that debate. The question is now how — with what controls, on what hardware, using which models, and with what audit trail.Nvidia’s answer is a vertically integrated stack that spans silicon, runtime, model, and security policy. For IT leaders evaluating their agentic roadmap, NemoClaw represents a significant attempt to provide all four layers from a single vendor, with meaningful third-party security integrations already in place.The risks are not trivial. OpenShell’s YAML-based policy model will require operational maturity that most organizations are still building. Claws that can self-evolve and acquire new skills — as Nvidia’s architecture explicitly enables — raise governance questions that no sandbox can fully resolve. And the concentration of agentic infrastructure in a single vendor’s stack carries familiar platform risks.That said the direction is clear. Claws are coming to the enterprise. Nvidia just made its bet on being the platform they run on — and the guardrails that keep them in bounds.
Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthropic and Meta on board
Nvidia on Monday took the wraps off Vera Rubin, a sweeping new computing platform built from seven chips now in full production — and backed by an extraordinary lineup of customers that includes Anthropic, OpenAI, Meta and Mistral AI, along with every major cloud provider.The message to the AI industry, and to investors, was unmistakable: Nvidia is not slowing down. The Vera Rubin platform claims up to 10x more inference throughput per watt and one-tenth the cost per token compared with the Blackwell systems that only recently began shipping. CEO Jensen Huang, speaking at the company’s annual GTC conference, called it “a generational leap” that would kick off “the greatest infrastructure buildout in history.” Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will all offer the platform, and more than 80 manufacturing partners are building systems around it.”Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI,” Huang declared. “The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.”In any other industry, such rhetoric might be dismissed as keynote theater. But Nvidia occupies a singular position in the global economy — a company whose products have become so essential to the AI boom that its market capitalization now rivals the GDP of mid-sized nations. When Huang says the infrastructure buildout is historic, the CEOs of the companies actually writing the checks are standing behind him, nodding.Dario Amodei, the chief executive of Anthropic, said Nvidia’s platform “gives us the compute, networking and system design to keep delivering while advancing the safety and reliability our customers depend on.” Sam Altman, the chief executive of OpenAI, said that “with Nvidia Vera Rubin, we’ll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people.”Inside the seven-chip architecture designed to power the age of AI agentsThe Vera Rubin platform brings together the Nvidia Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch and the newly integrated Groq 3 LPU — a purpose-built inference accelerator. Nvidia organized these into five interlocking rack-scale systems that function as a unified supercomputer.The flagship NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. Nvidia says it can train large mixture-of-experts models using one-quarter the GPUs required on Blackwell, a claim that, if validated in production, would fundamentally alter the economics of building frontier AI systems.The Vera CPU rack packs 256 liquid-cooled processors into a single rack, sustaining more than 22,500 concurrent CPU environments — the sandboxes where AI agents execute code, validate results and iterate. Nvidia describes the Vera CPU as the first processor purpose-built for agentic AI and reinforcement learning, featuring 88 custom-designed Olympus cores and LPDDR5X memory delivering 1.2 terabytes per second of bandwidth at half the power of conventional server CPUs.The Groq 3 LPX rack, housing 256 inference processors with 128 gigabytes of on-chip SRAM, targets the low-latency demands of trillion-parameter models with million-token contexts. The BlueField-4 STX storage rack provides what Nvidia calls “context memory” — high-speed storage for the massive key-value caches that agentic systems generate as they reason across long, multi-step tasks. And the Spectrum-6 SPX Ethernet rack ties it all together with co-packaged optics delivering 5x greater optical power efficiency than traditional transceivers.Why Nvidia is betting the future on autonomous AI agents — and rebuilding its stack around themThe strategic logic binding every announcement Monday into a single narrative is Nvidia’s conviction that the AI industry is crossing a threshold. The era of chatbots — AI that responds to a prompt and stops — is giving way to what Huang calls “agentic AI”: systems that reason autonomously for hours or days, write and execute software, call external tools, and continuously improve.This isn’t just a branding exercise. It represents a genuine architectural shift in how computing infrastructure must be designed. A chatbot query might consume milliseconds of GPU time. An agentic system orchestrating a drug discovery pipeline or debugging a complex codebase might run continuously, consuming CPU cycles to execute code, GPU cycles to reason, and massive storage to maintain context across thousands of intermediate steps. That demands not just faster chips, but a fundamentally different balance of compute, memory, storage and networking.Nvidia addressed this with the launch of its Agent Toolkit, which includes OpenShell, a new open-source runtime that enforces security and privacy guardrails for autonomous agents. The enterprise adoption list is remarkable: Adobe, Atlassian, Box, Cadence, Cisco, CrowdStrike, Dassault Systèmes, IQVIA, Red Hat, Salesforce, SAP, ServiceNow, Siemens and Synopsys are all integrating the toolkit into their platforms. Nvidia also launched NemoClaw, an open-source stack that lets users install its Nemotron models and OpenShell runtime in a single command to run secure, always-on AI assistants on everything from RTX laptops to DGX Station supercomputers.The company separately announced Dynamo 1.0, open-source software it describes as the first “operating system” for AI inference at factory scale. Dynamo orchestrates GPU and memory resources across clusters and has already been adopted by AWS, Azure, Google Cloud, Oracle, Cursor, Perplexity, PayPal and Pinterest. Nvidia says it boosted Blackwell inference performance by up to 7x in recent benchmarks.The Nemotron coalition and Nvidia’s play to shape the open-source AI landscapeIf Vera Rubin represents Nvidia’s hardware ambition, the Nemotron Coalition represents its software ambition. Announced Monday, the coalition is a global collaboration of AI labs that will jointly develop open frontier models trained on Nvidia’s DGX Cloud. The inaugural members — Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam and Thinking Machines Lab, the startup led by former OpenAI executive Mira Murati — will contribute data, evaluation frameworks and domain expertise.The first model will be co-developed by Mistral AI and Nvidia and will underpin the upcoming Nemotron 4 family. “Open models are the lifeblood of innovation and the engine of global participation in the AI revolution,” Huang said.Nvidia also expanded its own open model portfolio significantly. Nemotron 3 Ultra delivers what the company calls frontier-level intelligence with 5x throughput efficiency on Blackwell. Nemotron 3 Omni integrates audio, vision and language understanding. Nemotron 3 VoiceChat supports real-time, simultaneous conversations. And the company previewed GR00T N2, a next-generation robot foundation model that it says helps robots succeed at new tasks in new environments more than twice as often as leading alternatives, currently ranking first on the MolmoSpaces and RoboArena benchmarks.The open-model push serves a dual purpose. It cultivates the developer ecosystem that drives demand for Nvidia hardware, and it positions Nvidia as a neutral platform provider rather than a competitor to the AI labs building on its chips — a delicate balancing act that grows more complex as Nvidia’s own models grow more capable.From operating rooms to orbit: how Vera Rubin’s reach extends far beyond the data centerThe vertical breadth of Monday’s announcements was almost disorienting. Roche revealed it is deploying more than 3,500 Blackwell GPUs across hybrid cloud and on-premises environments in the U.S. and Europe — the largest announced GPU footprint in the pharmaceutical industry. The company is using the infrastructure for biological foundation models, drug discovery and digital twins of manufacturing facilities, including its new GLP-1 facility in North Carolina. Nearly 90 percent of Genentech’s eligible small-molecule programs now integrate AI, Roche said, with one oncology molecule designed 25 percent faster and a backup candidate delivered in seven months instead of more than two years.In autonomous vehicles, BYD, Geely, Isuzu and Nissan are building Level 4-ready vehicles on Nvidia’s Drive Hyperion platform. Nvidia and Uber expanded their partnership to launch autonomous vehicles across 28 cities on four continents by 2028, starting with Los Angeles and San Francisco in the first half of 2027. The company introduced Alpamayo 1.5, a reasoning model for autonomous driving already downloaded by more than 100,000 automotive developers, and Nvidia Halos OS, a safety architecture built on ASIL D-certified foundations for production-grade autonomy.Nvidia also released the first domain-specific physical AI platform for healthcare robotics, anchored by Open-H — the world’s largest healthcare robotics dataset, with over 700 hours of surgical video. CMR Surgical, Johnson & Johnson MedTech and Medtronic are among the adopters.And then there was space. The Vera Rubin Space Module delivers up to 25x more AI compute for orbital inferencing compared with the H100 GPU. Aetherflux, Axiom Space, Kepler Communications, Planet Labs and Starcloud are building on it. “Space computing, the final frontier, has arrived,” Huang said, deploying the kind of line that, from another executive, might draw eye-rolls — but from the CEO of a company whose chips already power the majority of the world’s AI workloads, lands differently.The deskside supercomputer and Nvidia’s quiet push into enterprise hardwareAmid the spectacle of trillion-parameter models and orbital data centers, Nvidia made a quieter but potentially consequential move: it launched the DGX Station, a deskside system powered by the GB300 Grace Blackwell Ultra Desktop Superchip that delivers 748 gigabytes of coherent memory and up to 20 petaflops of AI compute performance. The system can run open models of up to one trillion parameters from a desk.Snowflake, Microsoft Research, Cornell, EPRI and Sungkyunkwan University are among the early users. DGX Station supports air-gapped configurations for regulated industries, and applications built on it move seamlessly to Nvidia’s data center systems without rearchitecting — a design choice that creates a natural on-ramp from local experimentation to large-scale deployment.Nvidia also updated DGX Spark, its more compact system, with support for clustering up to four units into a “desktop data center” with linear performance scaling. Both systems ship preconfigured with NemoClaw and the Nvidia AI software stack, and support models including Nemotron 3, Google Gemma 3, Qwen3, DeepSeek V3.2, Mistral Large 3 and others.Adobe and Nvidia separately announced a strategic partnership to develop the next generation of Firefly models using Nvidia’s computing technology and libraries. Adobe will also build a cloud-native 3D digital twin solution for marketing on Nvidia Omniverse and integrate Nemotron capabilities into Adobe Acrobat. The partnership spans creative tools including Photoshop, Premiere Pro, Frame.io and Adobe Experience Platform.Building the factories that build intelligence: Nvidia’s AI infrastructure blueprintPerhaps the most telling indicator of where Nvidia sees the industry heading is the Vera Rubin DSX AI Factory reference design — essentially a blueprint for constructing entire buildings optimized to produce AI. The reference design outlines how to integrate compute, networking, storage, power and cooling into a system that maximizes what Nvidia calls “tokens per watt,” along with an Omniverse DSX Blueprint for creating digital twins of these facilities before they are built.The software stack includes DSX Max-Q for dynamic power provisioning — which Nvidia says enables 30 percent more AI infrastructure within a fixed-power data center — and DSX Flex, which connects AI factories to power-grid services to unlock what the company estimates is 100 gigawatts of stranded grid capacity. Energy leaders Emerald AI, GE Vernova, Hitachi and Siemens Energy are using the architecture. Nscale and Caterpillar are building one of the world’s largest AI factories in West Virginia using the Vera Rubin reference design.Industry partners Cadence, Dassault Systèmes, Eaton, Jacobs, Schneider Electric, Siemens, PTC, Switch, Trane Technologies and Vertiv are contributing simulation-ready assets and integrating their platforms. CoreWeave is using Nvidia’s DSX Air to run operational rehearsals of AI factories in the cloud before physical delivery.”In the age of AI, intelligence tokens are the new currency, and AI factories are the infrastructure that generates them,” Huang said. It is the kind of formulation — tokens as currency, factories as mints — that reveals how Nvidia thinks about its place in the emerging economic order.What Nvidia’s grand vision gets right — and what remains unprovenThe scale and coherence of Monday’s announcements are genuinely impressive. No other company in the semiconductor industry — and arguably no other technology company, period — can present an integrated stack spanning custom silicon, systems architecture, networking, storage, inference software, open models, agent frameworks, safety runtimes, simulation platforms, digital twin infrastructure and vertical applications from drug discovery to autonomous driving to orbital computing.But scale and coherence are not the same as inevitability. The performance claims for Vera Rubin, while dramatic, remain largely unverified by independent benchmarks. The agentic AI thesis that underpins the entire platform — the idea that autonomous, long-running AI agents will become the dominant computing workload — is a bet on a future that has not yet fully materialized. And Nvidia’s expanding role as a provider of models, software, and reference architectures raises questions about how long its hardware customers will remain comfortable depending so heavily on a single supplier for so many layers of their stack.Competitors are not standing still. AMD continues to close the gap on data center GPU performance. Google’s TPUs power some of the world’s largest AI training runs. Amazon’s Trainium chips are gaining traction inside AWS. And a growing cohort of startups is attacking various pieces of the AI infrastructure puzzle.Yet none of them showed up at GTC on Monday with endorsements from the CEOs of Anthropic and OpenAI. None of them announced seven new chips in full production simultaneously. And none of them presented a vision this comprehensive for what comes next.There is a scene that repeats at every GTC: Huang, in his trademark leather jacket, holds up a chip the way a jeweler holds up a diamond, rotating it slowly under the stage lights. It is part showmanship, part sermon. But the congregation keeps growing, the chips keep getting faster, and the checks keep getting larger. Whether Nvidia is building the greatest infrastructure in history or simply the most profitable one may, in the end, be a distinction without a difference.
Nvidia BlueField-4 STX adds a context memory layer to storage to close the agentic AI throughput gap
When an AI agent loses context mid-task because traditional storage can’t keep pace with inference, it is not a model problem — it is a storage problem. At GTC 2026, Nvidia announced BlueField-4 STX, a modular reference architecture that inserts a dedicated context memory layer between GPUs and traditional storage, claiming 5x the token throughput, 4x the energy efficiency and 2x the data ingestion speed of conventional CPU-based storage.The bottleneck STX targets is key-value cache data. KV cache is the stored record of what a model has already processed — the intermediate calculations an LLM saves so it does not have to recompute attention across the entire context on every inference step. It is what allows an agent to maintain coherent working memory across sessions, tool calls and reasoning steps. As context windows grow and agents take more steps, that cache grows with them. When it has to traverse a traditional storage path to get back to the GPU, inference slows and GPU utilization drops.STX is not a product Nvidia sells directly. It is a reference architecture the company is distributing to its storage partner ecosystem so vendors can build AI-native infrastructure around it.STX puts a context memory layer between GPU and diskThe architecture is built around a new storage-optimized BlueField-4 processor that combines Nvidia’s Vera CPU with the ConnectX-9 SuperNIC. It runs on Spectrum-X Ethernet networking and is programmable through Nvidia’s DOCA software platform.The first rack-scale implementation is the Nvidia CMX context memory storage platform. CMX extends GPU memory with a high-performance context layer designed specifically for storing and retrieving KV cache data generated by large language models during inference. Keeping that cache accessible without forcing a round trip through general-purpose storage is what CMX is designed to do.”Traditional data centers provide high-capacity, general-purpose storage, but generally lack the responsiveness required for interaction with AI agents that need to work across many steps, tools and different sessions,” Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing said in a briefing with press and analysts.In response to a question from VentureBeat, Buck confirmed that STX also ships with a software reference platform alongside the hardware architecture. Nvidia is expanding DOCA to include a new component referred to in the briefing as DOCA Memo. “Our storage providers can leverage the programmability of the BlueField-4 processor to optimize storage for the agentic AI factory,” Buck said. “In addition to having a reference rack architecture, we’re also providing a reference software platform for them to deliver those innovations and optimizations for their customers.”Storage partners building on STX get both a hardware reference design and a software reference platform — a programmable foundation for context-optimized storage.Nvidia’s partner list spans storage incumbents and AI-native cloud providersStorage providers co-designing STX-based infrastructure include Cloudian, DDN, Dell Technologies, Everpure, Hitachi Vantara, HPE, IBM, MinIO, NetApp, Nutanix, VAST Data and WEKA. Manufacturing partners building STX-based systems include AIC, Supermicro and Quanta Cloud Technology.On the cloud and AI side, CoreWeave, Crusoe, IREN, Lambda, Mistral AI, Nebius, Oracle Cloud Infrastructure and Vultr have all committed to STX for context memory storage.That combination of enterprise storage incumbents and AI-native cloud providers is the signal worth watching. Nvidia is not positioning STX as a specialty product for hyperscalers. It is positioning it as the reference standard for anyone building storage infrastructure that has to serve agentic AI workloads — which, within the next two to three years, is likely to include most enterprise AI deployments running multi-step inference at scale.STX-based platforms will be available from partners in the second half of 2026.IBM shows what the data layer problem looks like in productionIBM sits on both sides of the STX announcement. It is listed as a storage provider co-designing STX-based infrastructure, and Nvidia separately confirmed that it has selected IBM Storage Scale System 6000 — certified and validated on Nvidia DGX platforms — as the high-performance storage foundation for its own GPU-native analytics infrastructure.IBM also announced a broader expanded collaboration with Nvidia at GTC, including GPU-accelerated integration between IBM’s watsonx.data Presto SQL engine and Nvidia’s cuDF library. A production proof of concept with Nestlé put numbers on what that acceleration looks like: a data refresh cycle across the company’s Order-to-Cash data mart, covering 186 countries and 44 tables, dropped from 15 minutes to three minutes. IBM reported 83% cost savings and a 30x price-performance improvement.The Nestlé result is a structured analytics workload. It does not directly demonstrate agentic inference performance. But it makes IBM and Nvidia’s shared argument concrete: the data layer is where enterprise AI performance is currently constrained, and GPU-accelerating it produces material results in production.Why the storage layer is becoming a first-class infrastructure decisionSTX is a signal that the storage layer is becoming a first-class concern in enterprise AI infrastructure planning, not an afterthought to GPU procurement.
General-purpose NAS and object storage were not designed to serve KV cache data at inference latency requirements. STX-based systems from partners including Dell, HPE, NetApp and VAST Data are what Nvidia is putting forward as the practical alternative, with the DOCA software platform providing the programmability layer to tune storage behavior for specific agentic workloads.The performance claims — 5x token throughput, 4x energy efficiency, 2x data ingestion — are measured against traditional CPU-based storage architectures. Nvidia has not specified the exact baseline configuration for those comparisons. Before those numbers drive infrastructure decisions, the baseline is worth pinning down.Platforms are expected from partners in the second half of 2026. Given that most major storage vendors are already co-designing on STX, enterprises evaluating storage refreshes for AI infrastructure in the next 12 months should expect STX-based options to be available from their existing vendor relationships.
z.ai debuts faster, cheaper GLM-5 Turbo model for agents and ‘claws’ — but it’s not open-source
Chinese AI startup Z.ai, known for its powerful, open source GLM family of large language models (LLMs), has introduced GLM-5-Turbo, a new, proprietary variant of its open source GLM-5 model aimed at agent-driven workflows, with the company positioning it as a faster model tuned for OpenClaw-style tasks such as tool use, long-chain execution and persistent automation. It’s available now through Z.ai’s application programming interface (API) on third-party provider OpenRouter with roughly a 202.8K-token context window, 131.1K max output, and listed pricing of $0.96 per million input tokens and $3.20 per million output tokens. That makes it about $0.04 cheaper per total input and output cost (at 1 million tokens) than its predecessor, according to our calculations. ModelInputOutputTotal CostSourceGrok 4.1 Fast$0.20$0.50$0.70xAIGemini 3 Flash$0.50$3.00$3.50GoogleKimi-K2.5$0.60$3.00$3.60MoonshotGLM-5-Turbo$0.96$3.20$4.16OpenRouterGLM-5$1.00$3.20$4.20Z.aiClaude Haiku 4.5$1.00$5.00$6.00AnthropicQwen3-Max$1.20$6.00$7.20Alibaba CloudGemini 3 Pro$2.00$12.00$14.00GoogleGPT-5.2$1.75$14.00$15.75OpenAIGPT-5.4$2.50$15.00$17.50OpenAIClaude Sonnet 4.5$3.00$15.00$18.00AnthropicClaude Opus 4.6$5.00$25.00$30.00AnthropicGPT-5.4 Pro$30.00$180.00$210.00OpenAISecond, Z.ai is also adding the model to its GLM Coding subscription product, which is its packaged coding assistant service. That service has three tiers: Lite at $27 per quarter, Pro at $81 per quarter, and Max at $216 per quarter. Z.ai’s March 15 rollout note says Pro subscribers get GLM-5-Turbo in March, while Lite subscribers get the base GLM-5 in March and must wait until April for GLM-5-Turbo. The company is also taking early-access applications for enterprises via a Google Form, which suggests some users may get access ahead of that schedule depending on capacity.z.ai describes GLM-5-Turbo as designed for “fast inference” and “deeply optimized for real-world agent workflows involving long execution chains,” with improvements in complex instruction decomposition, tool use, scheduled and persistent execution, and stability across extended tasks.The release offers developers a new option for building OpenClaw-style autonomous AI agents, and serves as a signal about where model vendors think enterprise demand is heading: away from chat interfaces and toward systems that can reliably execute multi-step work. That is now where much of the competition is moving, as well, especially among vendors trying to win developers and enterprise teams building internal assistants, workflow orchestrators and coding agents. Built for execution, not just conversationZ.ai’s materials frame GLM-5-Turbo as a model for production-like agent behavior rather than static prompt-response use. The pitch centers on reliability in practical task flows: better command following, stronger tool invocation, improved handling of scheduled and persistent tasks, and faster execution across longer logical chains. That positioning puts the model squarely in the market for agents that do more than answer questions. It is aimed at systems that can gather information, call tools, break down instructions and keep working through complex task sequences with less supervision.Rather than a straightforward successor to GLM-5, GLM-5-Turbo appears to be a more execution-focused variant: tuned for speed, tool use and long-chain agent stability, while the base GLM-5 remains Z.ai’s broader open-source flagship. GLM-5-Turbo appears especially competitive in OpenClaw scenarios such as information search and gathering, office and daily tasks, data analysis, development and operations, and automation. Those are company-supplied materials, not independent validation, but they make the intended product positioning clear.Background: z.ai and GLM-5 set the stage for TurboFounded in 2019 as a Tsinghua University spinoff in Beijing, Z.ai — formerly Zhipu AI — is now one of China’s best-known foundation model companies. The company remains headquartered in Beijing and is led by CEO Zhang PengZ.ai listed on the Hong Kong Stock Exchange on January 8, 2026, with shares priced at HK$116.20 and opening at HK$120, for a stated market capitalization of HK$52.83 billion, making it China’s largest independent large language model developer.As of September 30, 2025 its models had reportedly been used by more than 12,000 enterprise customers, more than 80 million end-user devices and more than 45 million developers worldwide.Z.ai’s last major release, GLM-5, which debuted in February 2026, gives useful context for what the company is now trying to do with GLM-5-Turbo.GLM-5 is an open-source flagship model carrying an MIT license, posting a record-low hallucination score on the AA-Omniscience Index, and debuted a native “Agent Mode” that could turn prompts or source materials into ready-to-use .docx, .pdf and .xlsx files. That earlier release was also framed as a major technical step up for the company. GLM-5 scaled to 744 billion parameters with 40 billion active per token in a mixture-of-experts architecture, used 28.5 trillion pretraining tokens, and relied on a new asynchronous reinforcement-learning infrastructure called “slime” to reduce training bottlenecks and support more complex agentic behavior. In that light, GLM-5-Turbo looks less like a replacement for GLM-5 than a narrower commercial offshoot: a variant that keeps the long-context, agentic orientation of the flagship line but emphasizes speed, stability and execution in real-world agent chains.Developer features and model packagingOn the technical side, Z.ai has been packaging the GLM-5 family with the kinds of capabilities developers now expect from serious agent-facing models, including long context handling, tools, reasoning support and structured integrations. OpenRouter’s GLM-5-Turbo page lists support for tools, tool choice and response formatting, while also surfacing live performance data including average throughput and latency. OpenRouter’s provider telemetry adds a useful deployment-level comparison between GLM-5 and GLM-5-Turbo, though the data is not perfectly apples-to-apples because GLM-5 appears across several providers while GLM-5-Turbo is shown only through Z.ai. On throughput, GLM-5-Turbo averages 48 tokens per second on OpenRouter, which puts it below the fastest GLM-5 endpoints shown in the screenshots, including Fireworks at 70 tok/s and Friendli at 58 tok/s, but above Together’s 40 tok/s. On raw first-token latency, GLM-5-Turbo is slower in the available data, posting 2.92 seconds versus 0.41 seconds for Friendli’s GLM-5 endpoint, 1.00 second for Parasail and 1.08 seconds for DeepInfra. But the picture improves on end-to-end completion time: GLM-5-Turbo is shown at 8.16 seconds, faster than the GLM-5 endpoints, which range from 9.34 seconds on Fireworks to 11.23 seconds on DeepInfra. The most notable operational advantage is in tool reliability. GLM-5-Turbo shows a 0.67% tool call error rate, materially lower than the GLM-5 providers shown, where error rates range from 2.33% to 6.41%. For enterprise teams, that suggests a model that may not win on initial responsiveness in its current OpenRouter routing, but could still be better suited to longer agent runs where completion stability and lower tool failure matter more than the fastest first token.Benchmarking and pricingA ZClawBench radar chart released by z.ai shows GLM-5-Turbo as especially competitive in OpenClaw scenarios such as information search and gathering, office and daily tasks, data analysis, development and operations, and automation. Those are company-supplied benchmark visuals, not independent validation, but they do help explain how Z.ai wants the two models understood: GLM-5 as the broader coding and open flagship, and Turbo as the more targeted agent-execution variant.A more nuanced licensing signalOne notable caveat is licensing. Z.ai says GLM-5-Turbo is currently closed-source, but it also says the model’s capabilities and findings will be folded into its next open-source model release. That is an important distinction. The company is not clearly promising to open-source GLM-5-Turbo itself. Instead, it is saying that lessons, techniques and improvements from this release will inform a future open model. That makes the launch more nuanced than a clean break from openness.Z.ai’s earlier GLM strategy leaned heavily on open releases and open-weight distribution, which helped it build visibility among developers. China’s AI market may be rebalancing away from open sourceGLM-5-Turbo’s licensing posture also lands in a wider Chinese market context that makes the launch more notable than a simple product update. In recent weeks, reporting around Alibaba’s Qwen unit has raised fresh questions about how China’s leading AI labs will balance open releases with commercial pressure.Earlier this month, Qwen division head Lin Junyang stepped down, becoming the third senior Qwen executive to leave in 2026, even though Alibaba’s Qwen family remains one of the most prolific open-model efforts anywhere, with more than 400 open-source models released since 2023 and more than 1 billion downloads. Reuters then reported on March 16 that Alibaba CEO Eddie Wu would take direct control of a newly formed AI-focused business group consolidating Qwen and other units, amid scrutiny over strategy, profitability and the brutal price competition surrounding open-model offerings in China. Even without overstating those developments, they help frame the broader question hanging over the sector: whether the economics of frontier AI are starting to push even historically open-leaning Chinese labs toward a more segmented strategy. That does not mean Chinese labs are abandoning open source. But the pattern is becoming harder to ignore: open models help drive adoption, developer goodwill and ecosystem reach, while certain high-value variants aimed at enterprise agents, coding workflows and other commercially attractive use cases may increasingly arrive first as proprietary products. In that sense, GLM-5-Turbo fits a larger possible shift in China’s AI market, one that looks increasingly similar to the playbook used by OpenAI, Anthropic and Google in the U.S.: openness as distribution, proprietary systems as business.Seen in that light, GLM-5-Turbo looks like more than a speed-focused product update. It may be another sign that parts of China’s AI sector are moving toward the same hybrid model already common in the U.S.: openness as distribution, proprietary systems as business. That would not mark the end of open-source AI from Chinese labs, but it could mean their most strategically important agent-focused offerings appear first behind closed access, even if some of their underlying advances later make their way into open releases.For developers evaluating agent platforms, that makes GLM-5-Turbo both a product launch and a useful signal. Z.ai is still speaking the language of open models. But with this release, it is also showing that some of its most commercially relevant work may arrive first as proprietary infrastructure for enterprise-grade agent systems.
OpenClaw can bypass your EDR, DLP and IAM without triggering a single alert
An attacker embeds a single instruction inside a forwarded email. An OpenClaw agent summarizes that email as part of a normal task. The hidden instruction tells the agent to forward credentials to an external endpoint. The agent complies — through a sanctioned API call, using its own OAuth tokens. The firewall logs HTTP 200. EDR records a normal process. No signature fires. Nothing went wrong by any definition your security stack understands.
That is the problem. Six independent security teams shipped six OpenClaw defense tools in 14 days. Three attack surfaces survived every one of them. The exposure picture is already worse than most security teams know. Token Security found that 22% of its enterprise customers have employees running OpenClaw without IT approval, and Bitsight counted more than 30,000 publicly exposed instances in two weeks, up from roughly 1,000. Snyk’s ToxicSkills audit adds another dimension: 36% of all ClawHub skills contain security flaws. Jamieson O’Reilly, founder of Dvuln and now security adviser to the OpenClaw project, has been one of the researchers pushing fixes hardest from inside. His credential leakage research on exposed instances was among the earliest warnings the community received. Since then, he has worked directly with founder Peter Steinberger to ship dual-layer malicious skill detection and is now driving a capabilities specification proposal through the agentskills standards body. The team is clear-eyed about the security gaps, he told VentureBeat. “It wasn’t designed from the ground up to be as secure as possible,” O’Reilly said. “That’s understandable given the origins, and we’re owning it without excuses.”None of it closes the three gaps that matter most.Three attack surfaces your stack cannot seeThe first is runtime semantic exfiltration. The attack encodes malicious behavior in meaning, not in binary patterns, which is exactly what the current defense stack cannot see.Palo Alto Networks mapped OpenClaw to every category in the OWASP Top 10 for Agentic Applications and identified what security researcher Simon Willison calls a “lethal trifecta”: private data access, untrusted content exposure, and external communication capabilities in a single process. EDR monitors process behavior. The agent’s behavior looks normal because it is normal. The credentials are real, and the API calls are sanctioned, so EDR reads it as a credentialed user doing expected work. Nothing in the current defense ecosystem tracks what the agent decided to do with that access, or why.The second is cross-agent context leakage. When multiple agents or skills share session context, a prompt injection in one channel poisons decisions across the entire chain. Giskard researchers demonstrated this in January 2026, showing that agents silently appended attacker-controlled instructions to their own workspace files and waited for commands from external servers. The injected prompt becomes a sleeper payload. Palo Alto Networks researchers Sailesh Mishra and Sean P. Morgan warned that persistent memory turns these attacks into stateful, delayed-execution chains. A malicious instruction hidden inside a forwarded message sits in the agent’s context weeks later, activating during an unrelated task.O’Reilly identified cross-agent context leakage as the hardest of these gaps to close. “This one is especially difficult because it is so tightly bound to prompt injection, a systemic vulnerability that is far bigger than OpenClaw and affects every LLM-powered agent system in the industry,” he told VentureBeat. “When context flows unchecked between agents and skills, a single injected prompt can poison or hijack behavior across the entire chain.” No tool in the current ecosystem provides cross-agent context isolation. IronClaw sandboxes individual skill execution. ClawSec monitors file integrity. Neither tracks how context propagates between agents in the same workflow.The third is agent-to-agent trust chains with zero mutual authentication. When OpenClaw agents delegate tasks to other agents or external MCP servers, no identity verification exists between them. A compromised agent in a multi-agent workflow inherits the trust of every agent it communicates with. Compromise one through prompt injection, and it can issue instructions to every agent in the chain using trust relationships that the legitimate agent already built. Microsoft’s security team published guidance in February calling OpenClaw untrusted code execution with persistent credentials, noting the runtime ingests untrusted text, downloads and executes skills from external sources, and performs actions using whatever credentials it holds. Kaspersky’s enterprise risk assessment added that even agents on personal devices threaten organizational security because those devices store VPN configs, browser tokens, and credentials for corporate services. The Moltbook social network for OpenClaw agents already demonstrated the spillover risk: Wiz researchers found a misconfigured database that exposed 1.5 million API authentication tokens and 35,000 email addresses.What 14 days of emergency patching actually closedThe defense ecosystem split into three approaches. Two tools harden OpenClaw in place. ClawSec, from Prompt Security (a SentinelOne company), wraps agents in continuous verification, monitoring critical files for drift and enforcing zero-trust egress by default. OpenClaw’s VirusTotal integration, shipped jointly by Steinberger, O’Reilly, and VirusTotal’s Bernardo Quintero, scans every published ClawHub skill and blocks known malicious packages.Two tools are full architectural rewrites. IronClaw, NEAR AI’s Rust reimplementation, runs all untrusted tools inside WebAssembly sandboxes where tool code starts with zero permissions and must explicitly request network, filesystem, or API access. Credentials get injected at the host boundary and never touch agent code, with built-in leak detection scanning requests and responses. Carapace, an independent open-source project, inverts every dangerous OpenClaw default with fail-closed authentication and OS-level subprocess sandboxing.Two tools focus on scanning and auditability: Cisco’s open-source scanner combines static, behavioral, and LLM semantic analysis, while NanoClaw reduces the entire codebase to roughly 500 lines of TypeScript, running each session in an isolated Docker container.O’Reilly put the supply chain failure in direct terms. “Right now, the industry basically created a brand-new executable format written in plain human language and forgot every control that should come with it,” he said. His response has been hands-on. He shipped the VirusTotal integration before skills.sh, a much larger repository, adopted a similar pattern. Koi Security’s audit validates the urgency: 341 malicious skills found in early February grew to 824 out of 10,700 on ClawHub by mid-month, with the ClawHavoc campaign planting the Atomic Stealer macOS infostealer inside skills disguised as cryptocurrency trading tools, harvesting crypto wallets, SSH credentials, and browser passwords.OpenClaw Security Defense Evaluation MatrixDimensionClawSecVirusTotal IntegrationIronClawCarapaceNanoClawCisco ScannerDiscoveryAgents onlyClawHub onlyNomDNS scanNoNoRuntime ProtectionConfig driftNoWASM sandboxOS sandbox + prompt guardContainer isolationNoSupply ChainChecksum verifySignature scanCapability grantsEd25519 signedManual audit (~500 LOC)Static + LLM + behavioralCredential IsolationNoNoWASM boundary injectionOS keychain + AES-256-GCMMount-restricted dirsNoAuditabilityDrift logsScan verdictsPermission grant logsPrometheus + audit log500 lines totalScan reportsSemantic MonitoringNoNoNoNoNoNoSource: VentureBeat analysis based on published documentation and security audits, March 2026.The capabilities spec that treats skills like executablesO’Reilly submitted a skills specification standards update to the agentskills maintainers, led primarily by Anthropic and Vercel, that is in active discussion. The proposal requires every skill to declare explicit, user-visible capabilities before execution. Think mobile app permission manifests. He noted the proposal is getting strong early feedback from the security community because it finally treats skills like the executables they are.“The other two gaps can be meaningfully hardened with better isolation primitives and runtime guardrails, but truly closing context leakage requires deep architectural changes to how untrusted multi-agent memory and prompting are handled,” O’Reilly said. “The new capabilities spec is the first real step toward solving these challenges proactively instead of bolting on band-aids later.”What to do on Monday morningAssume OpenClaw is already in your environment. The 22% shadow deployment rate is a floor. These six steps close what can be closed and document what cannot.Inventory what is running. Scan for WebSocket traffic on port 18789 and mDNS broadcasts on port 5353. Watch corporate authentication logs for new App ID registrations, OAuth consent events, and Node.js User-Agent strings. Any instance running a version before v2026.2.25 is vulnerable to the ClawJacked remote takeover flaw.Mandate isolated execution. No agent runs on a device connected to production infrastructure. Require container-based deployment with scoped credentials and explicit tool whitelists.Deploy ClawSec on every agent instance and run every ClawHub skill through VirusTotal and Cisco’s open-source scanner before installation. Both are free. Treat skills as third-party executables, because that is what they are.Require human-in-the-loop approval for sensitive agent actions. OpenClaw’s exec approval settings support three modes: security, ask, and allowlist. Set sensitive tools to ask so the agent pauses and requests confirmation before executing shell commands, writing to external APIs, or modifying files outside its workspace. Any action that touches credentials, changes configurations, or sends data to an external endpoint should stop and wait for a human to approve it.Map the three surviving gaps against your risk register. Document whether your organization accepts, mitigates, or blocks each one: runtime semantic exfiltration, cross-agent context leakage, and agent-to-agent trust chains.Bring the evaluation table to your next board meeting. Frame it not as an AI experiment but as a critical bypass of your existing DLP and IAM investments. Every agentic AI platform that follows will face this same defense cycle. The framework transfers to every agent tool your team will assess for the next two years.The security stack you built for applications and endpoints catches malicious code. It does not catch an agent following a malicious instruction through a legitimate API call. That is where these three gaps live.
Rethinking AEO when software agents navigate the web on behalf of users
For more than two decades, digital businesses have relied on a simple assumption: When someone interacts with a website, that activity reflects a human making a conscious choice. Clicks are treated as signals of interest. Time on page is assumed to indicate engagement. Movement through a funnel is interpreted as intent. Entire growth strategies, marketing budgets, and product decisions have been built on this premise.Today, that assumption is quietly beginning to erode.As AI-powered tools increasingly interact with the web on behalf of users, many of the signals organizations depend on are becoming harder to interpret. The data itself is still accurate — pages are viewed, buttons are clicked, actions are recorded — but the meaning behind those actions is changing. This shift isn’t theoretical or limited to edge cases. It’s already influencing how leaders read dashboards, forecast demand, and evaluate performance.The challenge ahead isn’t stopping AI-driven interactions. It’s learning how to interpret digital behavior in a world where human and automated activity increasingly overlap.A changing assumption about web trafficFor decades, the foundation of the internet rested on a quiet, human-centric model. Behind every scroll, form submission, or purchase flow was a person acting out of curiosity, need, or intent. Analytics platforms evolved to capture these behaviors. Security systems focused on separating “legitimate users” from clearly scripted automation. Even digital advertising economics assumed that engagement equaled human attention.Over the last few years, that model has begun to shift. Advances in large language models (LLMs), browser automation, and AI-driven agents have made it possible for software systems to navigate the web in ways that feel fluid and context-aware. Pages are explored, options are compared, workflows are completed — often without obvious signs of automation.This doesn’t mean the web is becoming less human. Instead, it’s becoming more hybrid. AI systems are increasingly embedded in everyday workflows, acting as research assistants, comparison tools, or task completers on behalf of people. As a result, the line between a human interacting directly with a site and software acting for them is becoming less distinct.The challenge isn’t automation itself. It’s the ambiguity this overlap introduces into the signals businesses rely on.What do we mean by AI-generated traffic?When people hear “automated traffic,” they often think of the bots of the past — rigid scripts that followed predefined paths and broke the moment an interface changed. Those systems were repetitive, predictable, and relatively easy to identify.AI-generated traffic is different.Modern AI agents combine machine learning (ML) with automated browsing capabilities. They can interpret page layouts, adapt to interface changes, and complete multi-step tasks. In many cases, language models guide decision-making, allowing these systems to adjust behavior based on context rather than fixed rules. The result is interaction that appears far more natural than earlier automation.Importantly, this kind of traffic is not inherently problematic. Automation has long played a productive role on the web, from search indexing and accessibility tools to testing frameworks and integrations. Newer AI agents simply extend this evolution — helping users summarize content, compare products, or gather information across multiple sites.The issue is not intent, but interpretation. When AI agents interact with a site successfully on behalf of users, traditional engagement metrics may no longer reflect the same meaning they once did.Why AI-generated traffic is becoming harder to distinguishHistorically, detecting automated activity relied on spotting technical irregularities. Systems flagged behavior that moved too fast, followed perfectly consistent paths, or lacked standard browser features. Automation exposed “tells” that made classification straightforward.AI-driven systems change this dynamic. They operate through standard browsers. They pause, scroll, and navigate non-linearly. They vary timing and interaction sequences. Because these agents are designed to interact with the web as it was built — for humans — their behavior increasingly blends into normal usage patterns.As a result, the challenge shifts from identifying errors to interpreting behavior. The question becomes less about whether an interaction is automated and more about how it unfolds over time. Many of the signals that once separated humans from software are converging, making binary classification less effective.When engagement stops meaning what we thinkConsider a common e-commerce scenario.A retail team notices a sustained increase in product views and “add to cart” actions. Historically, this would be a clear signal of growing demand, prompting increased ad spend or inventory expansion.Now imagine that a portion of this activity is generated by AI agents performing price monitoring or product comparison on behalf of users. The interactions occurred. The metrics are accurate. But the underlying intent is different. The funnel no longer represents a straightforward path toward purchase.Nothing is “wrong” with the data — but the meaning has shifted.Similar patterns are appearing across industries:Digital publishers see spikes in article engagement without corresponding ad revenue.SaaS companies observe heavy feature exploration with limited conversion.Travel platforms record increased search activity that doesn’t translate into bookings.In each case, organizations risk optimizing for activity rather than value.Why this is a data and analytics problemAt its core, AI-generated traffic introduces ambiguity into the assumptions underlying analytics and modeling. Many systems assume that observed behavior maps cleanly to human intent. When automated interactions are mixed into datasets, that assumption weakens.Behavioral data may now include:Exploration without purchase intentResearch-driven navigationTask completion without conversionRepeated patterns driven by automation goalsFor analytics teams, this introduces noise into labels, weakens proxy metrics, and increases the risk of feedback loops. Models trained on mixed signals may learn to optimize for volume rather than outcomes that matter to the business.This doesn’t invalidate analytics. It raises the bar for interpretation.Data integrity in a machine-to-machine worldAs behavioral data increasingly feeds ML systems that shape user experience, the composition of that data matters. If a growing share of interactions comes from automated agents, platforms may begin to optimize for machine navigation rather than human experience.Over time, this can subtly reshape the web. Interfaces may become efficient for extraction and summarization while losing the irregularities that make them intuitive or engaging for people. Preserving a meaningful human signal requires moving beyond raw volume and focusing on interaction context.From exclusion to interpretationFor years, the default response to automation was exclusion. CAPTCHAs, rate limits, and static thresholds worked well when automated behavior was clearly distinct.That approach is becoming less effective. AI-driven agents often provide real value to users, and blanket blocking can degrade user experience without improving outcomes. As a result, many organizations are shifting from exclusion toward interpretation.Rather than asking how to keep automation out, teams are asking how to understand different types of traffic and respond appropriately — serving purpose-aligned experiences without assuming a single definition of legitimacy.Behavioral context as a complementary signalOne promising approach is focusing on behavioral context. Instead of centering analysis on identity, systems examine how interactions unfold over time.Human behavior is inconsistent and inefficient. People hesitate, backtrack, and explore unpredictably. Automated agents, even when adaptive, tend to exhibit a more structured internal logic. By observing navigation flow, timing variability, and interaction sequencing, teams can infer intent probabilistically rather than categorically.This allows organizations to remain open while gaining a more nuanced understanding of activity.Ethics, privacy, and responsible interpretationAs analysis becomes more sophisticated, ethical boundaries become more important. Understanding interaction patterns is not the same as tracking individuals.The most resilient approaches rely on aggregated, anonymized signals and transparent practices. The goal is to protect platform integrity while respecting user expectations. Trust remains a foundational requirement, not an afterthought.The future: A spectrum of agencyLooking ahead, web interactions increasingly fall along a spectrum. On one end humans are browsing directly, in the middle users are assisted by AI tools, on the other end agents are acting independently on a user’s behalf.This evolution reflects a maturing digital ecosystem. It also demands a shift in how success is measured. Simple counts of clicks or visits are no longer sufficient. Value must be assessed in context.What business leaders should focus on nowAI-generated traffic is not a problem to eliminate — it’s a reality to understand.Leaders who adapt successfully will:Reevaluate how engagement metrics are interpretedSeparate activity from intent in analytics reviewsInvest in contextual and probabilistic measurement approachesPreserve data quality as AI participation growsTreat trust and privacy as design principlesThe web has evolved before, and it will evolve again. The question is whether organizations are prepared to evolve how they read the signals it produces.Shashwat Jain is a senior software engineer at Amazon.
Fixing AI failure: Three changes enterprises should make now
Recent reports about AI project failure rates have raised uncomfortable questions for organizations investing heavily in AI. Much of the discussion has focused on technical factors like model accuracy and data quality, but after watching dozens of AI initiatives launch, I’ve noticed that the biggest opportunities for improvement are often cultural, not technical.Internal projects that struggle tend to share common issues. For example, engineering teams build models that product managers don’t know how to use. Data scientists build prototypes that operations teams struggle to maintain. And AI applications sit unused because the people they were built for weren’t involved in deciding what “useful” really meant.In contrast, organizations that achieve meaningful value with AI have figured out how to create the right kind of collaboration across departments, and established shared accountability for outcomes. The technology matters, but the organizational readiness matters just as much.Here are three practices I’ve observed that address the cultural and organizational barriers that can impede AI success.Expand AI literacy beyond engineeringWhen only engineers understand how an AI system works and what it’s capable of, collaboration breaks down. Product managers can’t evaluate trade-offs they don’t understand. Designers can’t create interfaces for capabilities they can’t articulate. Analysts can’t validate outputs they can’t interpret.The solution isn’t making everyone a data scientist. It’s helping each role understand how AI applies to their specific work. Product managers need to grasp what kinds of generated content, predictions or recommendations are realistic given available data. Designers need to understand what the AI can actually do so they can design features users will find useful. Analysts need to know which AI outputs require human validation versus which can be trusted.When teams share this working vocabulary, AI stops being something that happens in the engineering department and becomes a tool the entire organization can use effectively.Establish clear rules for AI autonomyThe second challenge involves knowing where AI can act on its own versus where human approval is required. Many organizations default to extremes, either bottlenecking every AI decision through human review, or letting AI systems operate without guardrails.What’s needed is a clear framework that defines where and how AI can act autonomously. This means establishing rules upfront: Can AI approve routine configuration changes? Can it recommend schema updates but not implement them? Can it deploy code to staging environments but not production?These rules should include three elements: auditability (can you trace how the AI reached its decision?), reproducibility (can you recreate the decision path?), and observability (can teams monitor AI behavior as it happens?). Without this framework, you either slow down to the point where AI provides no advantage, or you create systems making decisions nobody can explain or control.Create cross-functional playbooksThe third step is codifying how different teams actually work with AI systems. When every department develops its own approach, you get inconsistent results and redundant effort.Cross-functional playbooks work best when teams develop them together rather than having them imposed from above. These playbooks answer concrete questions like: How do we test AI recommendations before putting them into production? What’s our fallback procedure when an automated deployment fails – does it hand off to human operators or try a different approach first? Who needs to be involved when we override an AI decision? How do we incorporate feedback to improve the system?The goal isn’t to add bureaucracy. It’s ensuring everyone understands how AI fits into their existing work, and what to do when results don’t match expectations.Moving forwardTechnical excellence in AI remains important, but enterprises that over-index on model performance while ignoring organizational factors are setting themselves up for avoidable challenges. The successful AI deployments I’ve seen treat cultural transformation and workflows just as seriously as technical implementation.The question isn’t whether your AI technology is sophisticated enough. It’s whether your organization is ready to work with it.Adi Polak is director for advocacy and developer experience engineering at Confluent.
NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents
NanoClaw, the open-source AI agent platform created by Gavriel Cohen, is partnering with the containerized development platform Docker to let teams run agents inside Docker Sandboxes, a move aimed at one of the biggest obstacles to enterprise adoption: how to give agents room to act without giving them room to damage the systems around them.The announcement matters because the market for AI agents is shifting from novelty to deployment. It is no longer enough for an agent to write code, answer questions or automate a task. For CIOs, CTOs and platform leaders, the harder question is whether that agent can safely connect to live data, modify files, install packages and operate across business systems without exposing the host machine, adjacent workloads or other agents.That is the problem NanoClaw and Docker say they are solving together.A security argument, not just a packaging updateNanoClaw launched as a security-first alternative in the rapidly growing “claw” ecosystem, where agent frameworks promise broad autonomy across local and cloud environments. The project’s core argument has been that many agent systems rely too heavily on software-level guardrails while running too close to the host machine.This Docker integration pushes that argument down into infrastructure.“The partnership with Docker is integrating NanoClaw with Docker Sandboxes,” Cohen said in an interview. “The initial version of NanoClaw used Docker containers for isolating each agent, but Docker Sandboxes is the proper enterprise-ready solution for rolling out agents securely.”That progression matters because the central issue in enterprise agent deployment is isolation. Agents do not behave like traditional applications. They mutate their environments, install dependencies, create files, launch processes and connect to outside systems. That breaks many of the assumptions underlying ordinary container workflows.Cohen framed the issue in direct terms: “You want to unlock the full potential of these highly capable agents, but you don’t want security to be based on trust. You have to have isolated environments and hard boundaries.”That line gets at the broader challenge facing enterprises now experimenting with agents in production-like settings. The more useful agents become, the more access they need. They need tools, memory, external connections and the freedom to take actions on behalf of users and teams. But each gain in capability raises the stakes around containment. A compromised or badly behaving agent cannot be allowed to spill into the host environment, expose credentials or access another agent’s state.Why agents strain conventional infrastructureDocker president and COO Mark Cavage said that reality forced the company to rethink some of the assumptions built into standard developer infrastructure.“Fundamentally, we had to change the isolation and security model to work in the world of agents,” Cavage said. “It feels like normal Docker, but it’s not.”He explained why the old model no longer holds. “Agents break effectively every model we’ve ever known,” Cavage said. “Containers assume immutability, but agents break that on the very first call. The first thing they want to do is install packages, modify files, spin up processes, spin up databases — they want full mutability and a full machine to run in.”That is a useful framing for enterprise technical decision-makers. The promise of agents is not that they behave like static software with a chatbot front end. The promise is that they can perform open-ended work. But open-ended work is exactly what creates new security and governance problems. An agent that can install a package, rewrite a file tree, start a database process or access credentials is more operationally useful than a static assistant. It is also more dangerous if it is running in the wrong environment.Docker’s answer is Docker Sandboxes, which use MicroVM-based isolation while preserving familiar Docker packaging and workflows. According to the companies, NanoClaw can now run inside that infrastructure with a single command, giving teams a more secure execution layer without forcing them to redesign their agent stack from scratch.Cavage put the value proposition plainly: “What that gets you is a much stronger security boundary. When something breaks out — because agents do bad things — it’s truly bounded in something provably secure.”That emphasis on containment rather than trust lines up closely with NanoClaw’s original thesis. In earlier coverage of the project, NanoClaw was positioned as a leaner, more auditable alternative to broader and more permissive frameworks. The argument was not just that it was open source, but that its simplicity made it easier to reason about, secure and customize for production use.Cavage extended that argument beyond any single product. “Security is defense in depth,” he said. “You need every layer of the stack: a secure foundation, a secure framework to run in, and secure things users build on top.”That is likely to resonate with enterprise infrastructure teams that are less interested in model novelty than in blast radius, auditability and layered control. Agents may still rely on the intelligence of frontier models, but what matters operationally is whether the surrounding system can absorb mistakes, misfires or adversarial behavior without turning one compromised process into a wider incident.The enterprise case for many agents, not oneThe NanoClaw-Docker partnership also reflects a broader shift in how vendors are beginning to think about agent deployment at scale. Instead of one central AI system doing everything, the model emerging here is many bounded agents operating across teams, channels and tasks.“What OpenClaw and the claws have shown is how to get tremendous value from coding agents and general-purpose agents that are available today,” Cohen said. “Every team is going to be managing a team of agents.”He pushed that idea further in the interview, sketching a future closer to organizational systems design than to the consumer assistant model that still dominates much of the AI conversation. “In businesses, every employee is going to have their personal assistant agent, but teams will manage a team of agents, and a high-performing team will manage hundreds or thousands of agents,” Cohen said.That is a more useful enterprise lens than the usual consumer framing. In a real organization, agents are likely to be attached to distinct workflows, data stores and communication surfaces. Finance, support, sales engineering, developer productivity and internal operations may all have different automations, different memory and different access rights. A secure multi-agent future depends less on generalized intelligence than on boundaries: who can see what, which process can touch which file system, and what happens when one agent fails or is compromised.NanoClaw’s product design is built around that kind of orchestration. The platform sits on top of Claude Code and adds persistent memory, scheduled tasks, messaging integrations and routing logic so agents can be assigned work across channels such as WhatsApp, Telegram, Slack and Discord. The release says this can all be configured from a phone, without writing custom agent code, while each agent remains isolated inside its own container runtime.Cohen said one practical goal of the Docker integration is to make that deployment model easier to adopt. “People will be able to go to the NanoClaw GitHub, clone the repository, and run a single command,” he said. “That will get their Docker Sandbox set up running NanoClaw.”That ease of setup matters because many enterprise AI deployments still fail at the point where promising demos have to become stable systems. Security features that are too hard to deploy or maintain often end up bypassed. A packaging model that lowers friction without weakening boundaries is more likely to survive internal adoption.An open-source partnership with strategic weightThe partnership is also notable for what it is not. It is not being positioned as an exclusive commercial alliance or a financially engineered enterprise bundle.“There’s no money involved,” Cavage said. “We found this through the foundation developer community. NanoClaw is open source, and Docker has a long history in open source.”That may strengthen the announcement rather than weaken it. In infrastructure, the most credible integrations often emerge because two systems fit technically before they fit commercially. Cohen said the relationship began when a Docker developer advocate got NanoClaw running in Docker Sandboxes and demonstrated that the combination worked.“We were able to put NanoClaw into Docker Sandboxes without making any architecture changes to NanoClaw,” Cohen said. “It just works, because we had a vision of how agents should be deployed and isolated, and Docker was thinking about the same security concerns and arrived at the same design.”For enterprise buyers, that origin story signals that the integration was not forced into existence by a go-to-market arrangement. It suggests genuine architectural compatibility.Docker is also careful not to cast NanoClaw as the only framework it will support. Cavage said the company plans to work broadly across the ecosystem, even as NanoClaw appears to be the first “claw” included in Docker’s official packaging. The implication is that Docker sees a wider market opportunity around secure agent runtime infrastructure, while NanoClaw gains a more recognizable enterprise foundation for its security posture.The bigger story: infrastructure catching up to agentsThe deeper significance of this announcement is that it shifts attention from model capability to runtime design. That may be where the real enterprise competition is heading.The AI industry has spent the last two years proving that models can reason, code and orchestrate tasks with growing sophistication. The next phase is proving that these systems can be deployed in ways security teams, infrastructure leaders and compliance owners can live with.NanoClaw has argued from the start that agent security cannot be bolted on at the application layer. Docker is now making a parallel argument from the runtime side. “The world is going to need a different set of infrastructure to catch up to what agents and AI demand,” Cavage said. “They’re clearly going to get more and more autonomous.”That could turn out to be the central story here. Enterprises do not just need more capable agents. They need better boxes to put them in.For organizations experimenting with AI agents today, the NanoClaw-Docker integration offers a concrete picture of what that box might look like: open-source orchestration on top, MicroVM-backed isolation underneath, and a deployment model designed around containment rather than trust.In that sense, this is more than a product integration. It is an early blueprint for how enterprise agent infrastructure may evolve: less emphasis on unconstrained autonomy, more emphasis on bounded autonomy that can survive contact with real production systems.
Y Combinator-backed Random Labs launches Slate V1, claiming the first ‘swarm-native’ coding agent
The software engineering world is currently wrestling with a fundamental paradox of the AI era: as models become more capable, the “systems problem” of managing them has become the primary bottleneck to real-world productivity. While a developer might have access to the raw intelligence of a frontier model, that intelligence often degrades the moment a task requires a long horizon or a deep context window. But help appears to be on the way: San Francisco-based, Y Combinator-backed startup Random Labs has officially launched Slate V1, described as the industry’s first “swarm native” autonomous coding agent designed to execute massively parallel, complex engineering tasks.Emerging from an open beta, the tool utilizes a “dynamic pruning algorithm” to maintain context in large codebases while scaling output to enterprise complexity. Co-founded by Kiran and Mihir Chintawar in 2024, the company aims to bridge the global engineering shortage by positioning Slate as a collaborative tool for the “next 20 million engineers” rather than a replacement for human developers.With the release of Slate V1, the team at Random Labs is attempting to architect a way out of this zone by introducing the first “swarm-native” agentic coding environment. Slate is not merely a wrapper or a chatbot with file access; it is an implementation of a “hive mind” philosophy designed to scale agentic work with the complexity of a human organization. By leveraging a novel architectural primitive called Thread Weaving, Slate moves beyond the rigid task trees and lossy compaction methods that have defined the first generation of AI coding assistants.Strategy: Action spaceAt the heart of Slate’s effectiveness is a deep engagement with Recursive Language Models (RLM). In a traditional setup, an agent might be asked to “fix a bug,” a prompt that forces the model to juggle high-level strategy and low-level execution simultaneously. Random Labs identifies this as a failure to tap into “Knowledge Overhang”—the latent intelligence a model possesses but cannot effectively access when it is tactically overwhelmed.Slate solves this by using a central orchestration thread that essentially “programs in action space”. This orchestrator doesn’t write the code directly; instead, it uses a TypeScript-based DSL to dispatch parallel worker threads to handle specific, bounded tasks. This creates a clear separation between the “kernel”—which manages the execution graph and maintains strategic alignment—and the worker “processes” that execute tactical operations in the terminal. By mapping onto an OS-style framework, inspired by Andrej Karpathy’s “LLM OS” concept, Slate is able to treat the limited context window of a model as precious RAM, actively, intelligently managing what is retained and what is discarded.Episodic memory and the swarmThe true innovation of the “Thread Weaving” approach lies in how it handles memory. Most agents today rely on “compaction,” which is often just a fancy term for lossy compression that risks dropping critical project state. Slate instead generates “episodes”. When a worker thread completes a task, it doesn’t return a sprawling transcript of every failed attempt; it returns a compressed summary of the successful tool calls and conclusions.Because these episodes share context directly with the orchestrator rather than relying on brittle message passing, the system maintains a “swarm” intelligence. This architecture allows for massive parallelism. A developer can have Claude Sonnet orchestrating a complex refactor while GPT-5.4 executes code, and GLM 5—a favorite for its agentic search capabilities—simultaneously researches library documentation in the background. It’s a similar approach taken by Perplexity with its new Computer multi-model agent By selecting the “right model for the job,” Slate ensures that users aren’t overspending on intelligence for simple tactical steps while still benefiting from the strategic depth of the world’s most powerful models.The business of autonomyFrom a commercial perspective, Random Labs is navigating the early beta period with a mix of transparency and strategic ambiguity. While the company has not yet published a fixed-price subscription sheet, the Slate CLI documentation confirms a shift toward a usage-based credit model. Commands like /usage and /billing allow users to monitor their credit burn in real-time, and the inclusion of organization-level billing toggles suggests a clear focus on professional engineering teams rather than solo hobbyists.There is also a significant play toward integration. Random Labs recently announced that direct support for OpenAI’s Codex and Anthropic’s Claude Code is slated for release next week. This suggests that Slate isn’t trying to compete with these models’ native interfaces, but rather to act as the superior orchestration layer that allows engineers to use all of them at once, safely and cost-effectively.I’ve reached out to Architecturally, the system is designed to maximize caching through subthread reuse, a “novel context engineering” trick that the team claims keeps the swarm approach from becoming a financial burden for users.Stability AIPerhaps the most compelling argument for the Slate architecture is its stability. In internal testing, an early version of this threading system managed to pass 2/3 of the tests on the make-mips-interpreter task within the Terminal Bench 2.0 suite.This is a task where even the newest frontier models, like Opus 4.6, often succeed less than 20% of the time when used in standard, non-orchestrated harnesses.This success in a “mutated” or changing environment is what separates a tool from a partner. According to Random Labs’ documentation, one fintech founder in NYC described Slate as their “best debugging tool,” a sentiment that echoes the broader goal of Random Labs: to build agents that don’t just complete a prompt, but scale like an organization. As the industry moves past simple “chat with your code” interfaces, the “Thread Weaving” of Slate V1 offers a glimpse into a future where the primary role of the human engineer is to direct a hive mind of specialized models, each working in concert to solve the long-horizon problems of modern software.