Chinese AI startup Z.ai, known for its powerful, open source GLM family of large language models (LLMs), has introduced GLM-5-Turbo, a new, proprietary variant of its open source GLM-5 model aimed at agent-driven workflows, with the company positioning it as a faster model tuned for OpenClaw-style tasks such as tool use, long-chain execution and persistent automation. It’s available now through Z.ai’s application programming interface (API) on third-party provider OpenRouter with roughly a 202.8K-token context window, 131.1K max output, and listed pricing of $0.96 per million input tokens and $3.20 per million output tokens. That makes it about $0.04 cheaper per total input and output cost (at 1 million tokens) than its predecessor, according to our calculations. ModelInputOutputTotal CostSourceGrok 4.1 Fast$0.20$0.50$0.70xAIGemini 3 Flash$0.50$3.00$3.50GoogleKimi-K2.5$0.60$3.00$3.60MoonshotGLM-5-Turbo$0.96$3.20$4.16OpenRouterGLM-5$1.00$3.20$4.20Z.aiClaude Haiku 4.5$1.00$5.00$6.00AnthropicQwen3-Max$1.20$6.00$7.20Alibaba CloudGemini 3 Pro$2.00$12.00$14.00GoogleGPT-5.2$1.75$14.00$15.75OpenAIGPT-5.4$2.50$15.00$17.50OpenAIClaude Sonnet 4.5$3.00$15.00$18.00AnthropicClaude Opus 4.6$5.00$25.00$30.00AnthropicGPT-5.4 Pro$30.00$180.00$210.00OpenAISecond, Z.ai is also adding the model to its GLM Coding subscription product, which is its packaged coding assistant service. That service has three tiers: Lite at $27 per quarter, Pro at $81 per quarter, and Max at $216 per quarter. Z.ai’s March 15 rollout note says Pro subscribers get GLM-5-Turbo in March, while Lite subscribers get the base GLM-5 in March and must wait until April for GLM-5-Turbo. The company is also taking early-access applications for enterprises via a Google Form, which suggests some users may get access ahead of that schedule depending on capacity.z.ai describes GLM-5-Turbo as designed for “fast inference” and “deeply optimized for real-world agent workflows involving long execution chains,” with improvements in complex instruction decomposition, tool use, scheduled and persistent execution, and stability across extended tasks.The release offers developers a new option for building OpenClaw-style autonomous AI agents, and serves as a signal about where model vendors think enterprise demand is heading: away from chat interfaces and toward systems that can reliably execute multi-step work. That is now where much of the competition is moving, as well, especially among vendors trying to win developers and enterprise teams building internal assistants, workflow orchestrators and coding agents. Built for execution, not just conversationZ.ai’s materials frame GLM-5-Turbo as a model for production-like agent behavior rather than static prompt-response use. The pitch centers on reliability in practical task flows: better command following, stronger tool invocation, improved handling of scheduled and persistent tasks, and faster execution across longer logical chains. That positioning puts the model squarely in the market for agents that do more than answer questions. It is aimed at systems that can gather information, call tools, break down instructions and keep working through complex task sequences with less supervision.Rather than a straightforward successor to GLM-5, GLM-5-Turbo appears to be a more execution-focused variant: tuned for speed, tool use and long-chain agent stability, while the base GLM-5 remains Z.ai’s broader open-source flagship. GLM-5-Turbo appears especially competitive in OpenClaw scenarios such as information search and gathering, office and daily tasks, data analysis, development and operations, and automation. Those are company-supplied materials, not independent validation, but they make the intended product positioning clear.Background: z.ai and GLM-5 set the stage for TurboFounded in 2019 as a Tsinghua University spinoff in Beijing, Z.ai — formerly Zhipu AI — is now one of China’s best-known foundation model companies. The company remains headquartered in Beijing and is led by CEO Zhang PengZ.ai listed on the Hong Kong Stock Exchange on January 8, 2026, with shares priced at HK$116.20 and opening at HK$120, for a stated market capitalization of HK$52.83 billion, making it China’s largest independent large language model developer.As of September 30, 2025 its models had reportedly been used by more than 12,000 enterprise customers, more than 80 million end-user devices and more than 45 million developers worldwide.Z.ai’s last major release, GLM-5, which debuted in February 2026, gives useful context for what the company is now trying to do with GLM-5-Turbo.GLM-5 is an open-source flagship model carrying an MIT license, posting a record-low hallucination score on the AA-Omniscience Index, and debuted a native “Agent Mode” that could turn prompts or source materials into ready-to-use .docx, .pdf and .xlsx files. That earlier release was also framed as a major technical step up for the company. GLM-5 scaled to 744 billion parameters with 40 billion active per token in a mixture-of-experts architecture, used 28.5 trillion pretraining tokens, and relied on a new asynchronous reinforcement-learning infrastructure called “slime” to reduce training bottlenecks and support more complex agentic behavior. In that light, GLM-5-Turbo looks less like a replacement for GLM-5 than a narrower commercial offshoot: a variant that keeps the long-context, agentic orientation of the flagship line but emphasizes speed, stability and execution in real-world agent chains.Developer features and model packagingOn the technical side, Z.ai has been packaging the GLM-5 family with the kinds of capabilities developers now expect from serious agent-facing models, including long context handling, tools, reasoning support and structured integrations. OpenRouter’s GLM-5-Turbo page lists support for tools, tool choice and response formatting, while also surfacing live performance data including average throughput and latency. OpenRouter’s provider telemetry adds a useful deployment-level comparison between GLM-5 and GLM-5-Turbo, though the data is not perfectly apples-to-apples because GLM-5 appears across several providers while GLM-5-Turbo is shown only through Z.ai. On throughput, GLM-5-Turbo averages 48 tokens per second on OpenRouter, which puts it below the fastest GLM-5 endpoints shown in the screenshots, including Fireworks at 70 tok/s and Friendli at 58 tok/s, but above Together’s 40 tok/s. On raw first-token latency, GLM-5-Turbo is slower in the available data, posting 2.92 seconds versus 0.41 seconds for Friendli’s GLM-5 endpoint, 1.00 second for Parasail and 1.08 seconds for DeepInfra. But the picture improves on end-to-end completion time: GLM-5-Turbo is shown at 8.16 seconds, faster than the GLM-5 endpoints, which range from 9.34 seconds on Fireworks to 11.23 seconds on DeepInfra. The most notable operational advantage is in tool reliability. GLM-5-Turbo shows a 0.67% tool call error rate, materially lower than the GLM-5 providers shown, where error rates range from 2.33% to 6.41%. For enterprise teams, that suggests a model that may not win on initial responsiveness in its current OpenRouter routing, but could still be better suited to longer agent runs where completion stability and lower tool failure matter more than the fastest first token.Benchmarking and pricingA ZClawBench radar chart released by z.ai shows GLM-5-Turbo as especially competitive in OpenClaw scenarios such as information search and gathering, office and daily tasks, data analysis, development and operations, and automation. Those are company-supplied benchmark visuals, not independent validation, but they do help explain how Z.ai wants the two models understood: GLM-5 as the broader coding and open flagship, and Turbo as the more targeted agent-execution variant.A more nuanced licensing signalOne notable caveat is licensing. Z.ai says GLM-5-Turbo is currently closed-source, but it also says the model’s capabilities and findings will be folded into its next open-source model release. That is an important distinction. The company is not clearly promising to open-source GLM-5-Turbo itself. Instead, it is saying that lessons, techniques and improvements from this release will inform a future open model. That makes the launch more nuanced than a clean break from openness.Z.ai’s earlier GLM strategy leaned heavily on open releases and open-weight distribution, which helped it build visibility among developers. China’s AI market may be rebalancing away from open sourceGLM-5-Turbo’s licensing posture also lands in a wider Chinese market context that makes the launch more notable than a simple product update. In recent weeks, reporting around Alibaba’s Qwen unit has raised fresh questions about how China’s leading AI labs will balance open releases with commercial pressure.Earlier this month, Qwen division head Lin Junyang stepped down, becoming the third senior Qwen executive to leave in 2026, even though Alibaba’s Qwen family remains one of the most prolific open-model efforts anywhere, with more than 400 open-source models released since 2023 and more than 1 billion downloads. Reuters then reported on March 16 that Alibaba CEO Eddie Wu would take direct control of a newly formed AI-focused business group consolidating Qwen and other units, amid scrutiny over strategy, profitability and the brutal price competition surrounding open-model offerings in China. Even without overstating those developments, they help frame the broader question hanging over the sector: whether the economics of frontier AI are starting to push even historically open-leaning Chinese labs toward a more segmented strategy. That does not mean Chinese labs are abandoning open source. But the pattern is becoming harder to ignore: open models help drive adoption, developer goodwill and ecosystem reach, while certain high-value variants aimed at enterprise agents, coding workflows and other commercially attractive use cases may increasingly arrive first as proprietary products. In that sense, GLM-5-Turbo fits a larger possible shift in China’s AI market, one that looks increasingly similar to the playbook used by OpenAI, Anthropic and Google in the U.S.: openness as distribution, proprietary systems as business.Seen in that light, GLM-5-Turbo looks like more than a speed-focused product update. It may be another sign that parts of China’s AI sector are moving toward the same hybrid model already common in the U.S.: openness as distribution, proprietary systems as business. That would not mark the end of open-source AI from Chinese labs, but it could mean their most strategically important agent-focused offerings appear first behind closed access, even if some of their underlying advances later make their way into open releases.For developers evaluating agent platforms, that makes GLM-5-Turbo both a product launch and a useful signal. Z.ai is still speaking the language of open models. But with this release, it is also showing that some of its most commercially relevant work may arrive first as proprietary infrastructure for enterprise-grade agent systems.
Venture Beat
OpenClaw can bypass your EDR, DLP and IAM without triggering a single alert
An attacker embeds a single instruction inside a forwarded email. An OpenClaw agent summarizes that email as part of a normal task. The hidden instruction tells the agent to forward credentials to an external endpoint. The agent complies — through a sanctioned API call, using its own OAuth tokens. The firewall logs HTTP 200. EDR records a normal process. No signature fires. Nothing went wrong by any definition your security stack understands.
That is the problem. Six independent security teams shipped six OpenClaw defense tools in 14 days. Three attack surfaces survived every one of them. The exposure picture is already worse than most security teams know. Token Security found that 22% of its enterprise customers have employees running OpenClaw without IT approval, and Bitsight counted more than 30,000 publicly exposed instances in two weeks, up from roughly 1,000. Snyk’s ToxicSkills audit adds another dimension: 36% of all ClawHub skills contain security flaws. Jamieson O’Reilly, founder of Dvuln and now security adviser to the OpenClaw project, has been one of the researchers pushing fixes hardest from inside. His credential leakage research on exposed instances was among the earliest warnings the community received. Since then, he has worked directly with founder Peter Steinberger to ship dual-layer malicious skill detection and is now driving a capabilities specification proposal through the agentskills standards body. The team is clear-eyed about the security gaps, he told VentureBeat. “It wasn’t designed from the ground up to be as secure as possible,” O’Reilly said. “That’s understandable given the origins, and we’re owning it without excuses.”None of it closes the three gaps that matter most.Three attack surfaces your stack cannot seeThe first is runtime semantic exfiltration. The attack encodes malicious behavior in meaning, not in binary patterns, which is exactly what the current defense stack cannot see.Palo Alto Networks mapped OpenClaw to every category in the OWASP Top 10 for Agentic Applications and identified what security researcher Simon Willison calls a “lethal trifecta”: private data access, untrusted content exposure, and external communication capabilities in a single process. EDR monitors process behavior. The agent’s behavior looks normal because it is normal. The credentials are real, and the API calls are sanctioned, so EDR reads it as a credentialed user doing expected work. Nothing in the current defense ecosystem tracks what the agent decided to do with that access, or why.The second is cross-agent context leakage. When multiple agents or skills share session context, a prompt injection in one channel poisons decisions across the entire chain. Giskard researchers demonstrated this in January 2026, showing that agents silently appended attacker-controlled instructions to their own workspace files and waited for commands from external servers. The injected prompt becomes a sleeper payload. Palo Alto Networks researchers Sailesh Mishra and Sean P. Morgan warned that persistent memory turns these attacks into stateful, delayed-execution chains. A malicious instruction hidden inside a forwarded message sits in the agent’s context weeks later, activating during an unrelated task.O’Reilly identified cross-agent context leakage as the hardest of these gaps to close. “This one is especially difficult because it is so tightly bound to prompt injection, a systemic vulnerability that is far bigger than OpenClaw and affects every LLM-powered agent system in the industry,” he told VentureBeat. “When context flows unchecked between agents and skills, a single injected prompt can poison or hijack behavior across the entire chain.” No tool in the current ecosystem provides cross-agent context isolation. IronClaw sandboxes individual skill execution. ClawSec monitors file integrity. Neither tracks how context propagates between agents in the same workflow.The third is agent-to-agent trust chains with zero mutual authentication. When OpenClaw agents delegate tasks to other agents or external MCP servers, no identity verification exists between them. A compromised agent in a multi-agent workflow inherits the trust of every agent it communicates with. Compromise one through prompt injection, and it can issue instructions to every agent in the chain using trust relationships that the legitimate agent already built. Microsoft’s security team published guidance in February calling OpenClaw untrusted code execution with persistent credentials, noting the runtime ingests untrusted text, downloads and executes skills from external sources, and performs actions using whatever credentials it holds. Kaspersky’s enterprise risk assessment added that even agents on personal devices threaten organizational security because those devices store VPN configs, browser tokens, and credentials for corporate services. The Moltbook social network for OpenClaw agents already demonstrated the spillover risk: Wiz researchers found a misconfigured database that exposed 1.5 million API authentication tokens and 35,000 email addresses.What 14 days of emergency patching actually closedThe defense ecosystem split into three approaches. Two tools harden OpenClaw in place. ClawSec, from Prompt Security (a SentinelOne company), wraps agents in continuous verification, monitoring critical files for drift and enforcing zero-trust egress by default. OpenClaw’s VirusTotal integration, shipped jointly by Steinberger, O’Reilly, and VirusTotal’s Bernardo Quintero, scans every published ClawHub skill and blocks known malicious packages.Two tools are full architectural rewrites. IronClaw, NEAR AI’s Rust reimplementation, runs all untrusted tools inside WebAssembly sandboxes where tool code starts with zero permissions and must explicitly request network, filesystem, or API access. Credentials get injected at the host boundary and never touch agent code, with built-in leak detection scanning requests and responses. Carapace, an independent open-source project, inverts every dangerous OpenClaw default with fail-closed authentication and OS-level subprocess sandboxing.Two tools focus on scanning and auditability: Cisco’s open-source scanner combines static, behavioral, and LLM semantic analysis, while NanoClaw reduces the entire codebase to roughly 500 lines of TypeScript, running each session in an isolated Docker container.O’Reilly put the supply chain failure in direct terms. “Right now, the industry basically created a brand-new executable format written in plain human language and forgot every control that should come with it,” he said. His response has been hands-on. He shipped the VirusTotal integration before skills.sh, a much larger repository, adopted a similar pattern. Koi Security’s audit validates the urgency: 341 malicious skills found in early February grew to 824 out of 10,700 on ClawHub by mid-month, with the ClawHavoc campaign planting the Atomic Stealer macOS infostealer inside skills disguised as cryptocurrency trading tools, harvesting crypto wallets, SSH credentials, and browser passwords.OpenClaw Security Defense Evaluation MatrixDimensionClawSecVirusTotal IntegrationIronClawCarapaceNanoClawCisco ScannerDiscoveryAgents onlyClawHub onlyNomDNS scanNoNoRuntime ProtectionConfig driftNoWASM sandboxOS sandbox + prompt guardContainer isolationNoSupply ChainChecksum verifySignature scanCapability grantsEd25519 signedManual audit (~500 LOC)Static + LLM + behavioralCredential IsolationNoNoWASM boundary injectionOS keychain + AES-256-GCMMount-restricted dirsNoAuditabilityDrift logsScan verdictsPermission grant logsPrometheus + audit log500 lines totalScan reportsSemantic MonitoringNoNoNoNoNoNoSource: VentureBeat analysis based on published documentation and security audits, March 2026.The capabilities spec that treats skills like executablesO’Reilly submitted a skills specification standards update to the agentskills maintainers, led primarily by Anthropic and Vercel, that is in active discussion. The proposal requires every skill to declare explicit, user-visible capabilities before execution. Think mobile app permission manifests. He noted the proposal is getting strong early feedback from the security community because it finally treats skills like the executables they are.“The other two gaps can be meaningfully hardened with better isolation primitives and runtime guardrails, but truly closing context leakage requires deep architectural changes to how untrusted multi-agent memory and prompting are handled,” O’Reilly said. “The new capabilities spec is the first real step toward solving these challenges proactively instead of bolting on band-aids later.”What to do on Monday morningAssume OpenClaw is already in your environment. The 22% shadow deployment rate is a floor. These six steps close what can be closed and document what cannot.Inventory what is running. Scan for WebSocket traffic on port 18789 and mDNS broadcasts on port 5353. Watch corporate authentication logs for new App ID registrations, OAuth consent events, and Node.js User-Agent strings. Any instance running a version before v2026.2.25 is vulnerable to the ClawJacked remote takeover flaw.Mandate isolated execution. No agent runs on a device connected to production infrastructure. Require container-based deployment with scoped credentials and explicit tool whitelists.Deploy ClawSec on every agent instance and run every ClawHub skill through VirusTotal and Cisco’s open-source scanner before installation. Both are free. Treat skills as third-party executables, because that is what they are.Require human-in-the-loop approval for sensitive agent actions. OpenClaw’s exec approval settings support three modes: security, ask, and allowlist. Set sensitive tools to ask so the agent pauses and requests confirmation before executing shell commands, writing to external APIs, or modifying files outside its workspace. Any action that touches credentials, changes configurations, or sends data to an external endpoint should stop and wait for a human to approve it.Map the three surviving gaps against your risk register. Document whether your organization accepts, mitigates, or blocks each one: runtime semantic exfiltration, cross-agent context leakage, and agent-to-agent trust chains.Bring the evaluation table to your next board meeting. Frame it not as an AI experiment but as a critical bypass of your existing DLP and IAM investments. Every agentic AI platform that follows will face this same defense cycle. The framework transfers to every agent tool your team will assess for the next two years.The security stack you built for applications and endpoints catches malicious code. It does not catch an agent following a malicious instruction through a legitimate API call. That is where these three gaps live.
Rethinking AEO when software agents navigate the web on behalf of users
For more than two decades, digital businesses have relied on a simple assumption: When someone interacts with a website, that activity reflects a human making a conscious choice. Clicks are treated as signals of interest. Time on page is assumed to indicate engagement. Movement through a funnel is interpreted as intent. Entire growth strategies, marketing budgets, and product decisions have been built on this premise.Today, that assumption is quietly beginning to erode.As AI-powered tools increasingly interact with the web on behalf of users, many of the signals organizations depend on are becoming harder to interpret. The data itself is still accurate — pages are viewed, buttons are clicked, actions are recorded — but the meaning behind those actions is changing. This shift isn’t theoretical or limited to edge cases. It’s already influencing how leaders read dashboards, forecast demand, and evaluate performance.The challenge ahead isn’t stopping AI-driven interactions. It’s learning how to interpret digital behavior in a world where human and automated activity increasingly overlap.A changing assumption about web trafficFor decades, the foundation of the internet rested on a quiet, human-centric model. Behind every scroll, form submission, or purchase flow was a person acting out of curiosity, need, or intent. Analytics platforms evolved to capture these behaviors. Security systems focused on separating “legitimate users” from clearly scripted automation. Even digital advertising economics assumed that engagement equaled human attention.Over the last few years, that model has begun to shift. Advances in large language models (LLMs), browser automation, and AI-driven agents have made it possible for software systems to navigate the web in ways that feel fluid and context-aware. Pages are explored, options are compared, workflows are completed — often without obvious signs of automation.This doesn’t mean the web is becoming less human. Instead, it’s becoming more hybrid. AI systems are increasingly embedded in everyday workflows, acting as research assistants, comparison tools, or task completers on behalf of people. As a result, the line between a human interacting directly with a site and software acting for them is becoming less distinct.The challenge isn’t automation itself. It’s the ambiguity this overlap introduces into the signals businesses rely on.What do we mean by AI-generated traffic?When people hear “automated traffic,” they often think of the bots of the past — rigid scripts that followed predefined paths and broke the moment an interface changed. Those systems were repetitive, predictable, and relatively easy to identify.AI-generated traffic is different.Modern AI agents combine machine learning (ML) with automated browsing capabilities. They can interpret page layouts, adapt to interface changes, and complete multi-step tasks. In many cases, language models guide decision-making, allowing these systems to adjust behavior based on context rather than fixed rules. The result is interaction that appears far more natural than earlier automation.Importantly, this kind of traffic is not inherently problematic. Automation has long played a productive role on the web, from search indexing and accessibility tools to testing frameworks and integrations. Newer AI agents simply extend this evolution — helping users summarize content, compare products, or gather information across multiple sites.The issue is not intent, but interpretation. When AI agents interact with a site successfully on behalf of users, traditional engagement metrics may no longer reflect the same meaning they once did.Why AI-generated traffic is becoming harder to distinguishHistorically, detecting automated activity relied on spotting technical irregularities. Systems flagged behavior that moved too fast, followed perfectly consistent paths, or lacked standard browser features. Automation exposed “tells” that made classification straightforward.AI-driven systems change this dynamic. They operate through standard browsers. They pause, scroll, and navigate non-linearly. They vary timing and interaction sequences. Because these agents are designed to interact with the web as it was built — for humans — their behavior increasingly blends into normal usage patterns.As a result, the challenge shifts from identifying errors to interpreting behavior. The question becomes less about whether an interaction is automated and more about how it unfolds over time. Many of the signals that once separated humans from software are converging, making binary classification less effective.When engagement stops meaning what we thinkConsider a common e-commerce scenario.A retail team notices a sustained increase in product views and “add to cart” actions. Historically, this would be a clear signal of growing demand, prompting increased ad spend or inventory expansion.Now imagine that a portion of this activity is generated by AI agents performing price monitoring or product comparison on behalf of users. The interactions occurred. The metrics are accurate. But the underlying intent is different. The funnel no longer represents a straightforward path toward purchase.Nothing is “wrong” with the data — but the meaning has shifted.Similar patterns are appearing across industries:Digital publishers see spikes in article engagement without corresponding ad revenue.SaaS companies observe heavy feature exploration with limited conversion.Travel platforms record increased search activity that doesn’t translate into bookings.In each case, organizations risk optimizing for activity rather than value.Why this is a data and analytics problemAt its core, AI-generated traffic introduces ambiguity into the assumptions underlying analytics and modeling. Many systems assume that observed behavior maps cleanly to human intent. When automated interactions are mixed into datasets, that assumption weakens.Behavioral data may now include:Exploration without purchase intentResearch-driven navigationTask completion without conversionRepeated patterns driven by automation goalsFor analytics teams, this introduces noise into labels, weakens proxy metrics, and increases the risk of feedback loops. Models trained on mixed signals may learn to optimize for volume rather than outcomes that matter to the business.This doesn’t invalidate analytics. It raises the bar for interpretation.Data integrity in a machine-to-machine worldAs behavioral data increasingly feeds ML systems that shape user experience, the composition of that data matters. If a growing share of interactions comes from automated agents, platforms may begin to optimize for machine navigation rather than human experience.Over time, this can subtly reshape the web. Interfaces may become efficient for extraction and summarization while losing the irregularities that make them intuitive or engaging for people. Preserving a meaningful human signal requires moving beyond raw volume and focusing on interaction context.From exclusion to interpretationFor years, the default response to automation was exclusion. CAPTCHAs, rate limits, and static thresholds worked well when automated behavior was clearly distinct.That approach is becoming less effective. AI-driven agents often provide real value to users, and blanket blocking can degrade user experience without improving outcomes. As a result, many organizations are shifting from exclusion toward interpretation.Rather than asking how to keep automation out, teams are asking how to understand different types of traffic and respond appropriately — serving purpose-aligned experiences without assuming a single definition of legitimacy.Behavioral context as a complementary signalOne promising approach is focusing on behavioral context. Instead of centering analysis on identity, systems examine how interactions unfold over time.Human behavior is inconsistent and inefficient. People hesitate, backtrack, and explore unpredictably. Automated agents, even when adaptive, tend to exhibit a more structured internal logic. By observing navigation flow, timing variability, and interaction sequencing, teams can infer intent probabilistically rather than categorically.This allows organizations to remain open while gaining a more nuanced understanding of activity.Ethics, privacy, and responsible interpretationAs analysis becomes more sophisticated, ethical boundaries become more important. Understanding interaction patterns is not the same as tracking individuals.The most resilient approaches rely on aggregated, anonymized signals and transparent practices. The goal is to protect platform integrity while respecting user expectations. Trust remains a foundational requirement, not an afterthought.The future: A spectrum of agencyLooking ahead, web interactions increasingly fall along a spectrum. On one end humans are browsing directly, in the middle users are assisted by AI tools, on the other end agents are acting independently on a user’s behalf.This evolution reflects a maturing digital ecosystem. It also demands a shift in how success is measured. Simple counts of clicks or visits are no longer sufficient. Value must be assessed in context.What business leaders should focus on nowAI-generated traffic is not a problem to eliminate — it’s a reality to understand.Leaders who adapt successfully will:Reevaluate how engagement metrics are interpretedSeparate activity from intent in analytics reviewsInvest in contextual and probabilistic measurement approachesPreserve data quality as AI participation growsTreat trust and privacy as design principlesThe web has evolved before, and it will evolve again. The question is whether organizations are prepared to evolve how they read the signals it produces.Shashwat Jain is a senior software engineer at Amazon.
Fixing AI failure: Three changes enterprises should make now
Recent reports about AI project failure rates have raised uncomfortable questions for organizations investing heavily in AI. Much of the discussion has focused on technical factors like model accuracy and data quality, but after watching dozens of AI initiatives launch, I’ve noticed that the biggest opportunities for improvement are often cultural, not technical.Internal projects that struggle tend to share common issues. For example, engineering teams build models that product managers don’t know how to use. Data scientists build prototypes that operations teams struggle to maintain. And AI applications sit unused because the people they were built for weren’t involved in deciding what “useful” really meant.In contrast, organizations that achieve meaningful value with AI have figured out how to create the right kind of collaboration across departments, and established shared accountability for outcomes. The technology matters, but the organizational readiness matters just as much.Here are three practices I’ve observed that address the cultural and organizational barriers that can impede AI success.Expand AI literacy beyond engineeringWhen only engineers understand how an AI system works and what it’s capable of, collaboration breaks down. Product managers can’t evaluate trade-offs they don’t understand. Designers can’t create interfaces for capabilities they can’t articulate. Analysts can’t validate outputs they can’t interpret.The solution isn’t making everyone a data scientist. It’s helping each role understand how AI applies to their specific work. Product managers need to grasp what kinds of generated content, predictions or recommendations are realistic given available data. Designers need to understand what the AI can actually do so they can design features users will find useful. Analysts need to know which AI outputs require human validation versus which can be trusted.When teams share this working vocabulary, AI stops being something that happens in the engineering department and becomes a tool the entire organization can use effectively.Establish clear rules for AI autonomyThe second challenge involves knowing where AI can act on its own versus where human approval is required. Many organizations default to extremes, either bottlenecking every AI decision through human review, or letting AI systems operate without guardrails.What’s needed is a clear framework that defines where and how AI can act autonomously. This means establishing rules upfront: Can AI approve routine configuration changes? Can it recommend schema updates but not implement them? Can it deploy code to staging environments but not production?These rules should include three elements: auditability (can you trace how the AI reached its decision?), reproducibility (can you recreate the decision path?), and observability (can teams monitor AI behavior as it happens?). Without this framework, you either slow down to the point where AI provides no advantage, or you create systems making decisions nobody can explain or control.Create cross-functional playbooksThe third step is codifying how different teams actually work with AI systems. When every department develops its own approach, you get inconsistent results and redundant effort.Cross-functional playbooks work best when teams develop them together rather than having them imposed from above. These playbooks answer concrete questions like: How do we test AI recommendations before putting them into production? What’s our fallback procedure when an automated deployment fails – does it hand off to human operators or try a different approach first? Who needs to be involved when we override an AI decision? How do we incorporate feedback to improve the system?The goal isn’t to add bureaucracy. It’s ensuring everyone understands how AI fits into their existing work, and what to do when results don’t match expectations.Moving forwardTechnical excellence in AI remains important, but enterprises that over-index on model performance while ignoring organizational factors are setting themselves up for avoidable challenges. The successful AI deployments I’ve seen treat cultural transformation and workflows just as seriously as technical implementation.The question isn’t whether your AI technology is sophisticated enough. It’s whether your organization is ready to work with it.Adi Polak is director for advocacy and developer experience engineering at Confluent.
NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents
NanoClaw, the open-source AI agent platform created by Gavriel Cohen, is partnering with the containerized development platform Docker to let teams run agents inside Docker Sandboxes, a move aimed at one of the biggest obstacles to enterprise adoption: how to give agents room to act without giving them room to damage the systems around them.The announcement matters because the market for AI agents is shifting from novelty to deployment. It is no longer enough for an agent to write code, answer questions or automate a task. For CIOs, CTOs and platform leaders, the harder question is whether that agent can safely connect to live data, modify files, install packages and operate across business systems without exposing the host machine, adjacent workloads or other agents.That is the problem NanoClaw and Docker say they are solving together.A security argument, not just a packaging updateNanoClaw launched as a security-first alternative in the rapidly growing “claw” ecosystem, where agent frameworks promise broad autonomy across local and cloud environments. The project’s core argument has been that many agent systems rely too heavily on software-level guardrails while running too close to the host machine.This Docker integration pushes that argument down into infrastructure.“The partnership with Docker is integrating NanoClaw with Docker Sandboxes,” Cohen said in an interview. “The initial version of NanoClaw used Docker containers for isolating each agent, but Docker Sandboxes is the proper enterprise-ready solution for rolling out agents securely.”That progression matters because the central issue in enterprise agent deployment is isolation. Agents do not behave like traditional applications. They mutate their environments, install dependencies, create files, launch processes and connect to outside systems. That breaks many of the assumptions underlying ordinary container workflows.Cohen framed the issue in direct terms: “You want to unlock the full potential of these highly capable agents, but you don’t want security to be based on trust. You have to have isolated environments and hard boundaries.”That line gets at the broader challenge facing enterprises now experimenting with agents in production-like settings. The more useful agents become, the more access they need. They need tools, memory, external connections and the freedom to take actions on behalf of users and teams. But each gain in capability raises the stakes around containment. A compromised or badly behaving agent cannot be allowed to spill into the host environment, expose credentials or access another agent’s state.Why agents strain conventional infrastructureDocker president and COO Mark Cavage said that reality forced the company to rethink some of the assumptions built into standard developer infrastructure.“Fundamentally, we had to change the isolation and security model to work in the world of agents,” Cavage said. “It feels like normal Docker, but it’s not.”He explained why the old model no longer holds. “Agents break effectively every model we’ve ever known,” Cavage said. “Containers assume immutability, but agents break that on the very first call. The first thing they want to do is install packages, modify files, spin up processes, spin up databases — they want full mutability and a full machine to run in.”That is a useful framing for enterprise technical decision-makers. The promise of agents is not that they behave like static software with a chatbot front end. The promise is that they can perform open-ended work. But open-ended work is exactly what creates new security and governance problems. An agent that can install a package, rewrite a file tree, start a database process or access credentials is more operationally useful than a static assistant. It is also more dangerous if it is running in the wrong environment.Docker’s answer is Docker Sandboxes, which use MicroVM-based isolation while preserving familiar Docker packaging and workflows. According to the companies, NanoClaw can now run inside that infrastructure with a single command, giving teams a more secure execution layer without forcing them to redesign their agent stack from scratch.Cavage put the value proposition plainly: “What that gets you is a much stronger security boundary. When something breaks out — because agents do bad things — it’s truly bounded in something provably secure.”That emphasis on containment rather than trust lines up closely with NanoClaw’s original thesis. In earlier coverage of the project, NanoClaw was positioned as a leaner, more auditable alternative to broader and more permissive frameworks. The argument was not just that it was open source, but that its simplicity made it easier to reason about, secure and customize for production use.Cavage extended that argument beyond any single product. “Security is defense in depth,” he said. “You need every layer of the stack: a secure foundation, a secure framework to run in, and secure things users build on top.”That is likely to resonate with enterprise infrastructure teams that are less interested in model novelty than in blast radius, auditability and layered control. Agents may still rely on the intelligence of frontier models, but what matters operationally is whether the surrounding system can absorb mistakes, misfires or adversarial behavior without turning one compromised process into a wider incident.The enterprise case for many agents, not oneThe NanoClaw-Docker partnership also reflects a broader shift in how vendors are beginning to think about agent deployment at scale. Instead of one central AI system doing everything, the model emerging here is many bounded agents operating across teams, channels and tasks.“What OpenClaw and the claws have shown is how to get tremendous value from coding agents and general-purpose agents that are available today,” Cohen said. “Every team is going to be managing a team of agents.”He pushed that idea further in the interview, sketching a future closer to organizational systems design than to the consumer assistant model that still dominates much of the AI conversation. “In businesses, every employee is going to have their personal assistant agent, but teams will manage a team of agents, and a high-performing team will manage hundreds or thousands of agents,” Cohen said.That is a more useful enterprise lens than the usual consumer framing. In a real organization, agents are likely to be attached to distinct workflows, data stores and communication surfaces. Finance, support, sales engineering, developer productivity and internal operations may all have different automations, different memory and different access rights. A secure multi-agent future depends less on generalized intelligence than on boundaries: who can see what, which process can touch which file system, and what happens when one agent fails or is compromised.NanoClaw’s product design is built around that kind of orchestration. The platform sits on top of Claude Code and adds persistent memory, scheduled tasks, messaging integrations and routing logic so agents can be assigned work across channels such as WhatsApp, Telegram, Slack and Discord. The release says this can all be configured from a phone, without writing custom agent code, while each agent remains isolated inside its own container runtime.Cohen said one practical goal of the Docker integration is to make that deployment model easier to adopt. “People will be able to go to the NanoClaw GitHub, clone the repository, and run a single command,” he said. “That will get their Docker Sandbox set up running NanoClaw.”That ease of setup matters because many enterprise AI deployments still fail at the point where promising demos have to become stable systems. Security features that are too hard to deploy or maintain often end up bypassed. A packaging model that lowers friction without weakening boundaries is more likely to survive internal adoption.An open-source partnership with strategic weightThe partnership is also notable for what it is not. It is not being positioned as an exclusive commercial alliance or a financially engineered enterprise bundle.“There’s no money involved,” Cavage said. “We found this through the foundation developer community. NanoClaw is open source, and Docker has a long history in open source.”That may strengthen the announcement rather than weaken it. In infrastructure, the most credible integrations often emerge because two systems fit technically before they fit commercially. Cohen said the relationship began when a Docker developer advocate got NanoClaw running in Docker Sandboxes and demonstrated that the combination worked.“We were able to put NanoClaw into Docker Sandboxes without making any architecture changes to NanoClaw,” Cohen said. “It just works, because we had a vision of how agents should be deployed and isolated, and Docker was thinking about the same security concerns and arrived at the same design.”For enterprise buyers, that origin story signals that the integration was not forced into existence by a go-to-market arrangement. It suggests genuine architectural compatibility.Docker is also careful not to cast NanoClaw as the only framework it will support. Cavage said the company plans to work broadly across the ecosystem, even as NanoClaw appears to be the first “claw” included in Docker’s official packaging. The implication is that Docker sees a wider market opportunity around secure agent runtime infrastructure, while NanoClaw gains a more recognizable enterprise foundation for its security posture.The bigger story: infrastructure catching up to agentsThe deeper significance of this announcement is that it shifts attention from model capability to runtime design. That may be where the real enterprise competition is heading.The AI industry has spent the last two years proving that models can reason, code and orchestrate tasks with growing sophistication. The next phase is proving that these systems can be deployed in ways security teams, infrastructure leaders and compliance owners can live with.NanoClaw has argued from the start that agent security cannot be bolted on at the application layer. Docker is now making a parallel argument from the runtime side. “The world is going to need a different set of infrastructure to catch up to what agents and AI demand,” Cavage said. “They’re clearly going to get more and more autonomous.”That could turn out to be the central story here. Enterprises do not just need more capable agents. They need better boxes to put them in.For organizations experimenting with AI agents today, the NanoClaw-Docker integration offers a concrete picture of what that box might look like: open-source orchestration on top, MicroVM-backed isolation underneath, and a deployment model designed around containment rather than trust.In that sense, this is more than a product integration. It is an early blueprint for how enterprise agent infrastructure may evolve: less emphasis on unconstrained autonomy, more emphasis on bounded autonomy that can survive contact with real production systems.
Y Combinator-backed Random Labs launches Slate V1, claiming the first ‘swarm-native’ coding agent
The software engineering world is currently wrestling with a fundamental paradox of the AI era: as models become more capable, the “systems problem” of managing them has become the primary bottleneck to real-world productivity. While a developer might have access to the raw intelligence of a frontier model, that intelligence often degrades the moment a task requires a long horizon or a deep context window. But help appears to be on the way: San Francisco-based, Y Combinator-backed startup Random Labs has officially launched Slate V1, described as the industry’s first “swarm native” autonomous coding agent designed to execute massively parallel, complex engineering tasks.Emerging from an open beta, the tool utilizes a “dynamic pruning algorithm” to maintain context in large codebases while scaling output to enterprise complexity. Co-founded by Kiran and Mihir Chintawar in 2024, the company aims to bridge the global engineering shortage by positioning Slate as a collaborative tool for the “next 20 million engineers” rather than a replacement for human developers.With the release of Slate V1, the team at Random Labs is attempting to architect a way out of this zone by introducing the first “swarm-native” agentic coding environment. Slate is not merely a wrapper or a chatbot with file access; it is an implementation of a “hive mind” philosophy designed to scale agentic work with the complexity of a human organization. By leveraging a novel architectural primitive called Thread Weaving, Slate moves beyond the rigid task trees and lossy compaction methods that have defined the first generation of AI coding assistants.Strategy: Action spaceAt the heart of Slate’s effectiveness is a deep engagement with Recursive Language Models (RLM). In a traditional setup, an agent might be asked to “fix a bug,” a prompt that forces the model to juggle high-level strategy and low-level execution simultaneously. Random Labs identifies this as a failure to tap into “Knowledge Overhang”—the latent intelligence a model possesses but cannot effectively access when it is tactically overwhelmed.Slate solves this by using a central orchestration thread that essentially “programs in action space”. This orchestrator doesn’t write the code directly; instead, it uses a TypeScript-based DSL to dispatch parallel worker threads to handle specific, bounded tasks. This creates a clear separation between the “kernel”—which manages the execution graph and maintains strategic alignment—and the worker “processes” that execute tactical operations in the terminal. By mapping onto an OS-style framework, inspired by Andrej Karpathy’s “LLM OS” concept, Slate is able to treat the limited context window of a model as precious RAM, actively, intelligently managing what is retained and what is discarded.Episodic memory and the swarmThe true innovation of the “Thread Weaving” approach lies in how it handles memory. Most agents today rely on “compaction,” which is often just a fancy term for lossy compression that risks dropping critical project state. Slate instead generates “episodes”. When a worker thread completes a task, it doesn’t return a sprawling transcript of every failed attempt; it returns a compressed summary of the successful tool calls and conclusions.Because these episodes share context directly with the orchestrator rather than relying on brittle message passing, the system maintains a “swarm” intelligence. This architecture allows for massive parallelism. A developer can have Claude Sonnet orchestrating a complex refactor while GPT-5.4 executes code, and GLM 5—a favorite for its agentic search capabilities—simultaneously researches library documentation in the background. It’s a similar approach taken by Perplexity with its new Computer multi-model agent By selecting the “right model for the job,” Slate ensures that users aren’t overspending on intelligence for simple tactical steps while still benefiting from the strategic depth of the world’s most powerful models.The business of autonomyFrom a commercial perspective, Random Labs is navigating the early beta period with a mix of transparency and strategic ambiguity. While the company has not yet published a fixed-price subscription sheet, the Slate CLI documentation confirms a shift toward a usage-based credit model. Commands like /usage and /billing allow users to monitor their credit burn in real-time, and the inclusion of organization-level billing toggles suggests a clear focus on professional engineering teams rather than solo hobbyists.There is also a significant play toward integration. Random Labs recently announced that direct support for OpenAI’s Codex and Anthropic’s Claude Code is slated for release next week. This suggests that Slate isn’t trying to compete with these models’ native interfaces, but rather to act as the superior orchestration layer that allows engineers to use all of them at once, safely and cost-effectively.I’ve reached out to Architecturally, the system is designed to maximize caching through subthread reuse, a “novel context engineering” trick that the team claims keeps the swarm approach from becoming a financial burden for users.Stability AIPerhaps the most compelling argument for the Slate architecture is its stability. In internal testing, an early version of this threading system managed to pass 2/3 of the tests on the make-mips-interpreter task within the Terminal Bench 2.0 suite.This is a task where even the newest frontier models, like Opus 4.6, often succeed less than 20% of the time when used in standard, non-orchestrated harnesses.This success in a “mutated” or changing environment is what separates a tool from a partner. According to Random Labs’ documentation, one fintech founder in NYC described Slate as their “best debugging tool,” a sentiment that echoes the broader goal of Random Labs: to build agents that don’t just complete a prompt, but scale like an organization. As the industry moves past simple “chat with your code” interfaces, the “Thread Weaving” of Slate V1 offers a glimpse into a future where the primary role of the human engineer is to direct a hive mind of specialized models, each working in concert to solve the long-horizon problems of modern software.