The hit open source autonomous AI agent OpenClaw may have just gotten mogged by Anthropic. Today, Anthropic announced Claude Code Channels, a way to hook up its own powerful Claude Code AI agentic harness to a human user’s Discord or Telegram messaging applications, letting them message Claude Code directly whenever they want while on the go and instruct it to write code for them. Official documentation is here.This isn’t just a new UI; it is a fundamental shift in how developers interact with AI agents, moving from a synchronous “ask-and-wait” model to an asynchronous, autonomous partnership. Previously, Claude Code users were stuck interacting with the agentic harness on the Claude desktop application, terminal or supported developer environment, and Claude mobile app through a somewhat flaky (in my experience) interconnection setting called Remote Control.Now, Anthropic is offering some of the same core functionality as OpenClaw that drove its rapid adoption among software developers and vibe coders following its release in November 2025 by Austrian developer Peter Steinberger (who, ironically, originally called his project “Clawd” in honor of Anthropic’s own AI model Claude which powered it initially, until Anthropic sent him a cease-and-desist for potential trademark violations. Steinberger was since hired by Anthropic’s rival OpenAI.)Central to OpenClaw’s appeal was its capability of allowing users to have a persistent, personal AI worker that they can message 24/7, whenever they feel like, over common messaging apps such as iMessage, Slack, Telegram, WhatsApp and Discord, and have their AI message them back — not just to chat with, but to perform real work for them on its own, from writing, sending and organizing email and files to creating whole applications, applying for jobs on the user’s behalf, to managing complete ongoing social marketing campaigns. When the AI finishes a task, it can immediately alert the human user over their preferred messaging platform.But OpenClaw also came with a high degree of security risk (since it could be given access to a user’s hard drive and file system, or other personal information, and run amok) and difficulty for non-technical users, inspiring a wave of offshoots promising greater ease and security, including NanoClaw, KiloClaw and Nvidia’s recently announced NemoClaw.By giving Claude Code this same basic functionality — the ability for users to message it from popular third-party apps Discord and Telegram, and have it message them back when it finishes a task — Anthropic has effectively countered OpenClaw’s appeal and offered something it does not: the Anthropic brand name with its commitment to AI security and safety, and ease of use right out of the box for less technically inclined users. Technology: The Bridge of the Model Context ProtocolAt the heart of this update is the Model Context Protocol (MCP) open source standard that Anthropic introduced back in 2024. Think of MCP as a universal USB-C port for AI: it provides a standardized way for an AI model to connect to external data and tools. In the new “Channels” architecture, an MCP server acts as a two-way bridge.When a developer starts a Claude Code session with the –channels flag, they aren’t just opening a chat; they are spinning up a polling service. Using the Bun runtime—known for its extreme speed in executing JavaScript—Claude Code monitors specific plugins (currently Telegram and Discord). When a message arrives, it is injected directly into the active session as a event. Claude can then use its internal tools to execute code, run tests, or fix bugs, and reply back to the external platform using a specialized reply tool.The technical achievement here is persistence. Unlike a standard web-chat that times out, a Claude Code session can now run in a background terminal or a persistent server (like a VPS), waiting for a “ping” to spring into action.How to set up Claude Code Connectors on Telegram and DiscordSetting up these native connectors requires Claude Code v2.1.80 or later and the Bun runtime installed on your desktop PC or Mac. Follow the instructions here or below.1. Setting up TelegramCreate your Bot: Open BotFather in Telegram and use the /newbot command to generate a unique bot and access token.Install the Plugin: Inside your Claude Code terminal, run: /plugin install telegram@claude-plugins-officialConfigure the Token: Run /telegram:configure to save your credentials.Restart with Channels: Exit Claude and restart using the channel flag: claude –channels plugin:telegram@claude-plugins-officialPair your Account: DM your new bot on Telegram to receive a pairing code, then enter it in your terminal: /telegram:access pair 2. Setting up DiscordCreate an Application: Go to the Discord Developer Portal, create a “New Application,” and reset the bot token to copy it.Enable Intents: In the Bot settings, you must enable Message Content Intent under “Privileged Gateway Intents.”Install and Configure: In Claude Code, run /plugin install discord@claude-plugins-official followed by /discord:configure .Launch and Pair: Restart with claude –channels plugin:discord@claude-plugins-official. DM your bot on Discord and use the /discord:access pair command to finish the link.Product: From Desktop to “Everywhere”The immediate practical impact is the democratization of mobile AI coding. Previously, if a developer wanted to check a build status or run a quick fix while away from their desk, they had to rely on complex self-hosted setups like OpenClaw.With Channels, the setup is native. A developer can create a Telegram bot via BotFather, link it to Claude Code with a /telegram:configure command, and “pair” their account with a security code. Once configured, the phone becomes a remote control for the development environment.The product also introduces a “Fakechat” demo—a local-only chat UI that allows developers to test the “push” logic on their own machine before connecting to external servers. This reflects Anthropic’s cautious, “research preview” approach, ensuring developers understand the flow of events before exposing their terminal to the internet.Licensing: Proprietary Power on Open StandardsThe licensing implications of this release highlight a growing trend in the AI industry: proprietary engines running on open tracks. Claude Code remains a proprietary product tied to Anthropic’s commercial subscriptions (Pro, Max, and Enterprise). However, by building on the open-source Model Context Protocol, Anthropic is encouraging a developer ecosystem to build the “connectors” that make their model more useful.While the core Claude “brain” is closed, the plugins for Telegram and Discord are being hosted on GitHub under official Anthropic repositories, likely allowing for community contributions or forks. This strategy allows Anthropic to maintain the security and quality of the model while benefiting from the rapid innovation of the open-source community—a direct challenge to the “free” but often fragmented nature of purely open-source agent frameworks.And because it’s built on MCP, the community can now build “Connectors” for Slack or WhatsApp themselves, rather than waiting for Anthropic to ship them.Community Reactions: ‘The OpenClaw Killer’The response from users, especially AI observers on X, was swift and definitive. The sentiment was best captured by Ejaaz (@cryptopunk7213), who noted that Anthropic’s speed of shipping—incorporating texting, thousands of MCP skills, and autonomous bug-fixing in just four weeks—was “fucking crazy.”For many, this update renders local-first agent frameworks obsolete. BentoBoi (@BentoBoiNFT) observed, “Claude just killed OpenClaw with this update. You no longer need to buy a Mac Mini. I say this as someone who owns a one lol,” referring to the common practice of developers buying dedicated hardware to run open-source agents like OpenClaw 24/7. By moving this persistence into the Claude Code environment, Anthropic has simplified the “hardware tax” for autonomy.AI YouTuber Matthew Berman summarized the shift succinctly: “They’ve BUILT OpenClaw.” The consensus among early adopters is that Anthropic has successfully internalized the most desirable features of the open-source movement—multi-channel support and long-term memory—while maintaining the reliability of a tier-one AI provider.While Anthropic’s Claude has long been a favorite for its reasoning, it remained a “brain in a jar”—a stateless entity that waited for a user to type before it could think. Meanwhile, open-source projects like OpenClaw thrived by offering “always-on” persistence, allowing developers to message their AI from Telegram or Discord to trigger complex workflows.Now, with Anthropic closing the gap, it’s up to the users to choose which approach is best for them.
Venture Beat
Why enterprises are replacing generic AI with tools that know their users
The future of AI isn’t just agentic; it’s deep personalization. Rather than simple recommender systems that correlate user behavior to identify patterns and apply those to individual workflows, large language models (LLMs) and AI agents can analyze users directly to create deeply personalized experiences. It’s this kind of aggressive customization users are increasingly demanding — and the savviest enterprises who provide it (and soon) will win. The goal is: “Don’t try to randomize, or guess who I am. I tell you, this is what I care about,” Lijuan Qin, head of product, at Zoom AI, explains in a new Beyond the Pilot podcast. How Zoom is incorporating personalizationZoom is one company that has adapted to this trend: Its generative assistant, AI Companion, goes beyond basic summarization, smart recordings, and after-meeting action items to opinion divergence and user alignment tracking. Users can customize meeting summaries based on their specific interests, and create targeted templates for follow-up emails to different personas (whether it be a salesperson or account executive). The AI assistant can then automatically populate these documents post-call. Meanwhile, a custom dictionary in Zoom AI Studio can process unique enterprise terminology and vocabulary for more relevant AI outputs, and a deep research mode can quickly deliver comprehensive analyses based on “internal expertise and external insights.”Control is key here; the human can be “very specific [and] nail down” agent permissioning, Qin explained. They have “very clear controls” on follow-up actions, such as: Can the agent automatically send emails to specific recipients? Or will it trigger a verification step when it recognizes transcripts contain sensitive information (as dictated by the user)? Knowing that AI can go off the rails at times, human users can track agent behavior in Zoom, enable and disable features, and control data access. This can help prevent outputs that are inaccurate or off-target. “The most important thing is we do not assume AI is smart enough to get everything right,” Qin emphasized. Getting context rightIn this new agentic AI age, there is essentially a “land grab for context,” Sam Witteveen, co-founder of Red Dragon AI and Beyond the Pilot host, explains in the podcast. “Definitely knowing your users is the big thing, right? Knowing what apps they are living in, what day-to-day tasks are they constantly doing?,” he said. “Companies realize the more they have about you, the better the [AI] memory can get, the better they can customize.”Claude Cowork is one app that is “really shining” at this, Witteveen says; OpenClaw is another. Models are good enough that they can begin to make decisions for users and respond to directions like: “You know a bunch of things about me. You’ve got all this context. Go and generate the skills that are going to help me do a better job.”“With something like OpenClaw, you can customize it in any way you want, right? You can chat with it, you can tell it, ‘Hey, at 4 o’clock I want you to do this,’” Witteveen said. However, token usage and security must always be taken into account, he advised. OpenClaw has been plagued by security issues since its launch. This has prompted many enterprises to uninstall the autonomous agent or outright ban its use; however, these uninstalls must be done correctly so that IT leaders don’t inadvertently delete their entire enterprise stack. Meanwhile, in terms of token budget, personalization can run up costs. “You need to think about the metrics you are tracking,” Witteveen said. “This is very different from product to product, but metrics around these things are gonna be key.”Watch the podcast to hear more about: Why the companies that don’t experiment with AI skills right now “may be toast”How Zoom built an AI companion that tracks opinion divergence — not just action items — in your meetingsWhy the build vs. buy question just got a lot more urgent for enterprise softwareWhy “skills” may matter more than MCP for the future of enterprise AIYou can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.
Meta’s rogue AI agent passed every identity check — four gaps in enterprise IAM explain why
A rogue AI agent at Meta took action without approval and exposed sensitive company and user data to employees who were not authorized to access it. Meta confirmed the incident to The Information on March 18 but said no user data was ultimately mishandled. The exposure still triggered a major security alert internally.The available evidence suggests the failure occurred after authentication, not during it. The agent held valid credentials, operated inside authorized boundaries, passing every identity check.Summer Yue, director of alignment at Meta Superintelligence Labs, described a different but related failure in a viral post on X last month. She asked an OpenClaw agent to review her email inbox with clear instructions to confirm before acting.The agent began deleting emails on its own. Yue sent it “Do not do that,” then “Stop don’t do anything,” then “STOP OPENCLAW.” It ignored every command. She had to physically rush to another device to halt the process.When asked if she had been testing the agent’s guardrails, Yue was blunt. “Rookie mistake tbh,” she replied. “Turns out alignment researchers aren’t immune to misalignment.” (VentureBeat could not independently verify the incident.)Yue blamed context compaction. The agent’s context window shrank and dropped her safety instructions. The March 18 Meta exposure hasn’t been publicly explained at a forensic level yet.Both incidents share the same structural problem for security leaders. An AI agent operated with privileged access, took actions its operator did not approve, and the identity infrastructure had no mechanism to intervene after authentication succeeded.The agent held valid credentials the entire time. Nothing in the identity stack could distinguish an authorized request from a rogue one after authentication succeeded. Security researchers call this pattern the confused deputy. An agent with valid credentials executes the wrong instruction, and every identity check says the request is fine. That is one failure class inside a broader problem: post-authentication agent control does not exist in most enterprise stacks.Four gaps make this possible. No inventory of which agents are running. Static credentials with no expiration. Zero intent validation after authentication succeeds. And agents delegating to other agents with no mutual verification.Four vendors shipped controls against these gaps in recent months. The governance matrix below maps all four layers to the five questions a security leader brings to the board before RSAC opens Monday.Why the Meta incident changes the calculusThe confused deputy is the sharpest version of this problem, which is a trusted program with high privileges tricked into misusing its own authority. But the broader failure class includes any scenario where an agent with valid access takes actions that its operator did not authorize. Adversarial manipulation, context loss, and misaligned autonomy all share the same identity gap. Nothing in the stack validates what happens after authentication succeeds.Elia Zaitsev, CTO of CrowdStrike, described the underlying pattern in an exclusive interview with VentureBeat. Traditional security controls assume trust once access is granted and lack visibility into what happens inside live sessions, Zaitsev said. The identities, roles, and services attackers use are indistinguishable from legitimate activity at the control plane.The 2026 CISO AI Risk Report from Saviynt (n=235 CISOs) found 47% observed AI agents exhibiting unintended or unauthorized behavior. Only 5% felt confident they could contain a compromised AI agent. Read those two numbers together. AI agents already function as a new class of insider risk, holding persistent credentials and operating at machine scale.Three findings from a single report — Cloud Security Alliance and Oasis Security’s survey of 383 IT and security professionals — frame the scale of the problem: 79% have moderate or low confidence in preventing NHI-based attacks, 92% lack confidence that their legacy IAM tools can manage AI and NHI risks specifically, and 78% have no documented policies for creating or removing AI identities. The attack surface is not hypothetical. CVE-2026-27826 and CVE-2026-27825 hit mcp-atlassian in late February with SSRF and arbitrary file write through the trust boundaries the Model Context Protocol (MCP) creates by design. mcp-atlassian has over 4 million downloads, according to Pluto Security’s disclosure. Anyone on the same local network could execute code on the victim’s machine by sending two HTTP requests. No authentication required.Jake Williams, a faculty member at IANS Research, has been direct about the trajectory. MCP will be the defining AI security issue of 2026, he told the IANS community, warning that developers are building authentication patterns that belong in introductory tutorials, not enterprise applications.Four vendors shipped AI agent identity controls in recent months. Nobody mapped them into one governance framework. The matrix below does.The four-layer identity governance matrix None of these four vendors replaces a security leader’s existing IAM stack. Each closes a specific identity gap that legacy IAM cannot see. Other vendors, including CyberArk, Oasis Security, and Astrix, ship relevant NHI controls; this matrix focuses on the four that most directly map to the post-authentication failure class the Meta incident exposed. [runtime enforcement] means inline controls active during agent execution.Governance LayerShould Be in PlaceRisk If NotWho Ships It NowVendor QuestionAgent DiscoveryReal-time inventory of every agent, its credentials, and its systemsShadow agents with inherited privileges nobody audited. Enterprise shadow AI deployment rates continue to climb as employees adopt agent tools without IT approvalCrowdStrike Falcon Shield [runtime]: AI agent inventory across SaaS platforms. Palo Alto Networks AI-SPM [runtime]: continuous AI asset discovery. Erik Trexler, Palo Alto Networks SVP: “The collapse between identity and attack surface will define 2026.”Which agents are running that we did not provision?Credential LifecycleEphemeral scoped tokens, automatic rotation, zero standing privilegesStatic key stolen = permanent access at full permissions. Long-lived API keys give attackers persistent access indefinitely. Non-human identities already outnumber humans by wide margins — Palo Alto Networks cited 82-to-1 in its 2026 predictions, the Cloud Security Alliance 100-to-1 in its March 2026 cloud assessment.CrowdStrike SGNL [runtime]: zero standing privileges, dynamic authorization across human/NHI/agent. Acquired January 2026 (expected to close FQ1 2027). Danny Brickman, CEO of Oasis Security: “AI turns identity into a high-velocity system where every new agent mints credentials in minutes.”Any agent authenticating with a key older than 90 days?Post-Auth IntentBehavioral validation that authorized requests match legitimate intentThe agent passes every check and executes the wrong instruction through the sanctioned API. The Meta failure pattern. Legacy IAM has no detection category for thisSentinelOne Singularity Identity [runtime]: identity threat detection and response across human and non-human activity, correlating identity, endpoint, and workload signals to detect misuse inside authorized sessions. Jeff Reed, CTO: “Identity risk no longer begins and ends at authentication.” Launched Feb 25What validates intent between authentication and action?Threat IntelligenceAgent-specific attack pattern recognition, behavioral baselines for agent sessionsAttack inside an authorized session. No signature fires. SOC sees normal traffic. Dwell time extends indefinitelyCisco AI Defense [runtime]: agent-specific threat patterns. Lavi Lazarovitz, CyberArk VP of cyber research: “Think of AI agents as a new class of digital coworkers” that “make decisions, learn from their environment, and act autonomously.” Your EDR baseline human behavior. Agent behavior is harder to distinguish from legitimate automationWhat does a confused deputy look like in our telemetry?The matrix reveals a progression. Discovery and credential lifecycle are closable now with shipping products. Post-authentication intent validation is partially closable. SentinelOne detects identity threats across human and non-human activity after access is granted, but no vendor fully validates whether the instruction behind an authorized request matches legitimate intent. Cisco provides the threat intelligence layer, but detection signatures for post-authentication agent failures barely exist. SOC teams trained on human behavior baselines face agent traffic that is faster, more uniform, and harder to distinguish from legitimate automation.The gap that remains architecturally open No major security vendor ships mutual agent-to-agent authentication as a production product. Protocols, including Google’s A2A and a March 2026 IETF draft, describe how to build it. When Agent A delegates to Agent B, no identity verification happens between them. A compromised agent inherits the trust of every agent it communicates with. Compromise one through prompt injection, and it issues instructions to the entire chain using the trust of the legitimate agent already built. The MCP specification forbids token passthrough. Developers do it anyway. The OWASP February 2026 Practical Guide for Secure MCP Server Development cataloged the confused deputy as a named threat class. Production-grade controls have not caught up. This is the fifth question a security leader brings to the board.What to do before your next board meeting Inventory every AI agent and MCP server connection. Any agent authenticating with a static API key older than 90 days is a post-authentication failure waiting to happen.Kill static API keys. Move every agent to scoped, ephemeral tokens with automatic rotation.Deploy runtime discovery. You cannot audit the identity of an agent you do not know exists. Shadow deployment rates are climbing.Test for confused deputy exposure. For every MCP server connection, check whether the server enforces per-user authorization or grants identical access to every caller. If every agent gets the same permissions regardless of who triggered the request, the confused deputy is already exploitable.Bring the governance matrix to your next board meeting. Four controls deployed, one architectural gap documented, and procurement timeline attached.The identity stack you built for human employees catches stolen passwords and blocks unauthorized logins. It does not catch an AI agent following a malicious instruction through a legitimate API call with valid credentials. The Meta incident proved that it is not theoretical. It happened at a company with one of the largest AI safety teams in the world. Four vendors shipped the first controls designed to find it. The fifth layer does not exist yet. Whether that changes your posture depends on whether you treat this matrix as a working audit instrument or skip past it in the vendor deck.
Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost
Chinese electronics and car manufacturer Xiaomi surprised the global AI community today with the release of MiMo-V2-Pro, a new 1-trillion parameter foundation model with benchmarks approaching those of U.S. AI giants OpenAI and Anthropic, but at around a seventh or sixth the cost when accessed over proprietary API — and importantly, sending less than 256,000 tokens-worth of information back and forth.Led by Fuli Luo, a veteran of the disruptive DeepSeek R1 project, the release represents what Luo characterizes as a “quiet ambush” on the global frontier. Furthermore, Luo stated in an X post that the company does plan to open source a model variant from this latest release, ” when the models are stable enough to deserve it.”By focusing on the “action space” of intelligence—moving from code generation to the autonomous operation of digital “claws”—Xiaomi is attempting to leapfrog the conversational paradigm entirely.Prior to this foray into frontier AI, Beijing-based Xiaomi established itself as a titan of “The Internet of Things” and consumer hardware.Globally recognized as the world’s third-largest smartphone manufacturer, Xiaomi spent the early 2020s executing a high-stakes entry into the automotive sector. Its electric vehicles (EVs), such as the SU7 and the recently launched YU7 SUV, have turned the company into a vertically integrated powerhouse capable of merging hardware, software, and now, advanced reasoning. This pedigree in physical-world engineering informs MiMo-V2-Pro’s architecture; it is built to be the “brain” of complex systems, whether those systems are managing global supply chains or navigating the intricate scaffolds of an autonomous coding agent.Technology: The architecture of agencyThe central challenge of the “Agent Era” is maintaining high-fidelity reasoning over massive spans of data without incurring a prohibitive “intelligence tax” in latency or cost. MiMo-V2-Pro addresses this through a sparse architecture: while it houses 1T total parameters, only 42B are active during any single forward pass, making it roughly three times the size of its predecessor, MiMo-V2-Flash.The model’s efficiency is rooted in an evolved Hybrid Attention mechanism. Standard transformers typically face a quadratic increase in compute requirements as context grows; MiMo-V2-Pro utilizes a 7:1 hybrid ratio (increased from 5:1 in the Flash version) to manage its massive 1M-token context window. This architectural choice allows the model to maintain a deep “memory” of long-running tasks without the performance degradation usually seen in frontier models.The analogy: Think of the model not as a student reading a book page-by-page, but as an expert researcher in a vast library. The 7:1 ratio allows the model to “skim” 85% of the data for context while applying high-density attention to the 15% most relevant to the task at hand. This is paired with a lightweight Multi-Token Prediction (MTP) layer, which allows the model to anticipate and generate multiple tokens simultaneously, drastically reducing the latency required for the “thinking” phases of agentic workflows. According to Luo, these structural decisions were made months in advance, specifically to provide a “structural advantage” for the unexpected speed at which the industry shifted toward agents.Product and benchmarking: A third-party reality checkXiaomi’s internal data paints a picture of a model that excels in “real-world” tasks over synthetic benchmarks. On GDPval-AA, a benchmark measuring performance on agentic real-world work tasks, MiMo-V2-Pro achieved an Elo of 1426, placing it ahead of major Chinese peers like GLM-5 (1406) and Kimi K2.5 (1283). While it still trails Western “max effort” models like Claude Sonnet 4.6 (1633) in raw Elo, it represents the highest recorded performance for a Chinese-origin model in this category.The third-party benchmarking organization Artificial Analysis verified these claims, placing MiMo-V2-Pro at #10 on its global Intelligence Index with a score of 49. This places it in the same tier as GPT-5.2 Codex and ahead of Grok 4.20 Beta. These results suggest that Xiaomi has successfully built a model capable of the high-level reasoning required for engineering and production tasks.Key metrics from Artificial Analysis highlight a significant leap over the previous open-weights version, MiMo-V2-Flash (which scored 41): Hallucination rate: The Pro model reduced hallucination rates to 30%, a sharp improvement over the Flash model’s 48%.Omniscience index: It scored a +5, placing it ahead of GLM-5 (+2) and Kimi K2.5 (-8).Token efficiency: To run the entire Intelligence Index, MiMo-V2-Pro required only 77M output tokens, significantly less than GLM-5 (109M) or Kimi K2.5 (89M), indicating a more concise and efficient reasoning process.Xiaomi’s own charts further emphasize its “General Agent” and “Coding Agent” capabilities. On ClawEval, a benchmark for agentic scaffolds, the model scored 61.5, approaching the performance of Claude Opus 4.6 (66.3) and significantly outpacing GPT-5.2 (50.0). In coding-specific environments like Terminal-Bench 2.0, it achieved an 86.7, suggesting high reliability when executing commands in a live terminal environment.How enterprises should evaluate MiMo-V2-Pro for usageFor the personas outlined in contemporary AI organizations—from Infrastructure to Security—MiMo-V2-Pro represents a paradigm shift in the “Price-Quality” curve. Infrastructure decision-makers will find MiMo-V2-Pro a compelling candidate for the Pareto frontier of intelligence vs. cost. Artificial Analysis reported that running their index cost only $348 for MiMo-V2-Pro, compared to $2,304 for GPT-5.2 and $2,486 for Claude Opus 4.6. For organizations managing GPU clusters or procurement, the ability to access top-10 global intelligence at roughly 1/7th the cost of Western incumbents is a powerful incentive for production-scale testing. Data decision-makers can leverage the 1M context window for RAG-ready architectures, allowing them to feed entire enterprise codebases or documentation sets into a single prompt without the fragmentation required by smaller context models.A systems/orchestration decision-maker should evaluate MiMo-V2-Pro as a primary “brain” for multi-agent coordination. Because the model is optimized for OpenClaw and Claude Code, it can handle long-horizon planning and precise tool use without the constant human intervention that plagues earlier models. Its high ranking in GDPval-AA suggests it is particularly well-suited for the workflow and orchestration layer needed to scale AI across the enterprise. It allows for the creation of systems that can move beyond simple automation into complex, multi-step problem solving.However, security decision-makers must exercise caution. The very “agentic” nature that makes the model powerful—its ability to use terminals and manipulate files—increases the surface area for prompt injection and unauthorized model access. While its low hallucination rate (30%) is a defensive boon, the lack of public weights (unlike the Flash version) means internal security teams cannot perform the deep “model-level” audits sometimes required for highly sensitive deployments. Any enterprise implementation must be accompanied by robust monitoring and auditability protocols.Pricing, availability, and the path forwardXiaomi has priced MiMo-V2-Pro to dominate the developer market. The pricing is tiered based on context usage, with competitive rates for caching to support high-frequency reasoning tasks.MiMo-V2-Pro (up to 256K): $1 per 1M input tokens and $3 per 1M output tokensMiMo-V2-Pro (256K-1M): $2 per 1M input tokens and $6 per 1M output tokensCache read: $0.20 per 1M tokens for the lower tier and $0.40 for the higher tierCache write: Temporarily free ($0)Here’s how it stacks up to other leading frontier models around the world:ModelInputOutputTotal CostSourceGrok 4.1 Fast$0.20$0.50$0.70xAIMiniMax M2.7$0.30$1.20$1.50MiniMaxGemini 3 Flash$0.50$3.00$3.50GoogleKimi-K2.5$0.60$3.00$3.60MoonshotMiMo-V2-Pro (≤256K)$1.00$3.00$4.00Xiaomi MiMoGLM-5-Turbo$0.96$3.20$4.16OpenRouterGLM-5$1.00$3.20$4.20Z.aiClaude Haiku 4.5$1.00$5.00$6.00AnthropicQwen3-Max$1.20$6.00$7.20Alibaba CloudGemini 3 Pro$2.00$12.00$14.00GoogleGPT-5.2$1.75$14.00$15.75OpenAIGPT-5.4$2.50$15.00$17.50OpenAIClaude Sonnet 4.5$3.00$15.00$18.00AnthropicClaude Opus 4.6$5.00$25.00$30.00AnthropicGPT-5.4 Pro$30.00$180.00$210.00OpenAIThis aggressive positioning is designed to encourage the high-intensity application flows that define the next generation of software. The model is currently available via Xiaomi’s first-party API only, with no current support for image or multimodal input—a notable omission in an era of “Omni” models, though Xiaomi has teased a separate MiMo-V2-Omni for those needs.The “Hunter Alpha” period on OpenRouter proved that the market has a high appetite for this specific blend of efficiency and reasoning. Fuli Luo’s philosophy—that research velocity is fueled by a “genuine love for the world you’re building for”—has resulted in a model that ranks 2nd in China and 8th worldwide on established intelligence indices. Whether it remains a “quiet” ambush or becomes the foundation for a global realignment of AI power depends on how quickly developers adopt the “action space” over the “chat window”. For now, Xiaomi has moved the goalposts: the question is no longer just “can it talk?” but “can it act?”
New MiniMax M2.7 proprietary AI model is ‘self-evolving’ and can perform 30-50% of reinforcement learning research workflow
In the last few years, Chinese AI startup MiniMax has become one of the most exciting in the crowded global AI marketplace, carving out a reputation for delivering frontier-level large language models (LLMs) with open source licenses and before that, high-quality AI video generation models (Hailuo). The release of MiniMax M2.7 today — a new proprietary LLM designed to perform well powering AI agents and as the backend to third-party harnesses and tools like Claude Code, Kilo Code and OpenClaw — marks yet a new milestone: Rather than relying solely on human-led fine-tuning, MiniMax has leveraged M2.7 to build, monitor, and optimize its own reinforcement learning harnesses. This move toward recursive self-improvement signals a shift in the industry: a future where the models we use are as much the architects of their progress as they are the products of human research. The model is categorized as a reasoning-only text model that delivers intelligence comparable to other leading systems while maintaining significantly higher cost efficiency.However, with M2.7 being proprietary for now, it is a sign once again that Chinese AI startups — for much of the last year, the standard-bearers in the world of the open source AI frontier, making them appealing for enterprises globally due to low (or no) costs and customization — are shifting strategy and pursuing more proprietary frontier models like U.S. leaders like OpenAI, Google, and Anthropic have been doing for years. MiniMax becomes the second Chinese startup to release a proprietary cutting-edge LLM in recent months following z.ai with its GLM-5 Turbo, and rumors that Alibaba’s Qwen team is also shifting to proprietary development in the wake of the departure of senior leadership and other researchers.Technical achievement: The self-evolution loopThe defining characteristic of MiniMax M2.7 is its role in its own creation. According to company documentation, earlier versions of the model were used to build a research agent harness capable of managing data pipelines, training environments, and evaluation infrastructure. By autonomously triggering log-reading, debugging, and metric analysis, M2.7 handled between 30 percent and 50 percent of its own development workflow. This is not merely an automation of rote tasks; the model optimized its own programming performance by analyzing failure trajectories and planning code modifications over iterative loops of 100 rounds or more.”We intentionally trained the model to be better at planning and at clarifying requirements with the user,” explained MiniMax Head of Engineering Skyler Miao on the social network X. “Next step is a more complex user simulator to push this even further.”This capability extends to complex environments via the MLE Bench Lite, a series of machine learning competitions designed to test autonomous research skills. In these trials, M2.7 achieved a medal rate of 66.6 percent, a performance level that ties with Google’s new Gemini 3.1 and approaches the current state-of-the-art benchmarks set by Anthropic’s Claude Opus 4.6. The goal, according to MiniMax, is a transition toward full autonomy in model training and inference architecture without human involvement. Performance evolution: MiniMax m2.7 vs. m2.5When compared to its predecessor, M2.5, released in February 2026, the M2.7 model demonstrates significant gains in high-stakes software engineering and professional office tasks. While M2.5 was celebrated for polyglot code mastery, M2.7 is designed for real-world engineering—tasks requiring causal reasoning within live production systems.Key performance metrics include:Software engineering: M2.7 scored 56.22 percent on the SWE-Pro benchmark, matching the highest levels of global competitors like GPT-5.3-Codex.Professional office delivery: In document processing, M2.7 achieved an Elo score of 1495 on GDPval-AA, which the company claims is the highest among open-source-accessible models.Hallucination reduction: The model scores plus one on the AA-Omniscience Index, a massive leap from the negative 40 score held by M2.5.Hallucination rate: M2.7 achieves a hallucination rate of 34 percent, which is lower than the rates of 46 percent for Claude Sonnet 4.6 and 50 percent for Gemini 3.1 Pro Preview.System comprehension: On Terminal Bench 2, the model scored 57.0 percent, demonstrating a deep understanding of complex operational logic rather than simple code generation.Skill adherence: On the MM Claw evaluation, which tests 40 complex skills exceeding 2,000 tokens each, M2.7 maintained a 97 percent adherence rate, a substantial improvement over the M2.5 baseline.Intelligence parity: The model’s reasoning capabilities are considered equivalent to GLM-5, yet it uses 20 percent fewer output tokens to achieve similar results.The model’s evolution is further evidenced by its score of 50 on the Artificial Analysis Intelligence Index, representing an 8-point improvement over its predecessor in just one month, and also taking the 8th place overall globally in terms of its overall intelligence across benchmarking tasks in various domains.Not all independent, third-party benchmarks show improvement for M2.7 over M2.5: On BridgeBench, a set of tasks designed by agentic AI coding startup BridgeMind to test a model’s performance for “vibe coding,” or turning natural language into working code, M2.5 scored 12th place while M2.7 scored 19th place.Access, pricing, and integrationMiniMax M2.7 is a proprietary model available through the MiniMax API and MiniMax Agent creation platforms. While the core model weights for M2.7 remain closed, the company continues to contribute to the ecosystem through the open-source interactive project OpenRoom. For direct API integration and via third-party provider OpenRouter, MiniMax M2.7 maintains a cost-leading price point of 0.30 dollars per 1 million input tokens and 1.20 dollars per 1 million output tokens, which is unchanged from the pricing for M2.5.To support different usage scales and modalities, MiniMax offers a structured Token Plan with various subscription tiers. These plans allow users to access models across text, speech, video, image, and music under a single unified quota. To further drive adoption, MiniMax has launched an Invite and Earn referral program, providing a 10 percent discount to new invitees and a 10 percent rebate voucher to the inviter.Monthly standard Token Plan pricing: The standard monthly tiers are designed for entry-level developers to heavy regular users.Starter: $10 per month for 1,500 requests per 5 hours.Plus: $20 per month for 4,500 requests per 5 hours.Max: $50 per month for 15,000 requests per 5 hours.Monthly high-speed Token Plan pricing: For production-scale workloads requiring the M2.7-highspeed variant, the following tiers are available:Plus-Highspeed: $40 per month for 4,500 requests per 5 hours.Max-Highspeed: $80 per month for 15,000 requests per 5 hours.Ultra-High-Speed: $150 per month for 30,000 requests per 5 hours.Yearly Token Plan pricing: Yearly subscriptions provide significant discounts for long-term commitment:Standard Starter: $100 per year (saves 20 dollars).Standard Plus: $200 per year (saves 40 dollars).Standard Max: $500 per year (saves 100 dollars).High-Speed Plus: $400 per year (saves 80 dollars).High-Speed Max: $800 per year (saves 160 dollars).High-Speed Ultra: $1,500 per year (saves 300 dollars).One request in these plans is roughly equivalent to one call to MiniMax M2.7, though other models in the suite, such as video or high-definition speech, consume requests at a higher rate.Official tool integrationsTo ensure seamless adoption, MiniMax has provided official documentation for integrating M2.7 into over 11 major developer tools and agent harnesses. This includes widely used platforms such as Claude Code, Cursor, Trae, and Zed. Other officially supported tools include OpenCode, Kilo Code, Cline, Roo Code, Droid, Grok CLI, and Codex CLI.Additionally, the model supports the Model Context Protocol, allowing it to natively use tools like Web Search and Understand Image for multimodal reasoning. Developers using the Anthropic SDK can easily integrate M2.7 by modifying the ANTHROPIC_BASE_URL to point to the MiniMax endpoint. When using MiniMax as a provider in tools like OpenClaw, image understanding capabilities are automatically configured via the model’s VLM API endpoint, requiring no extra setup from the user.With its deep bench of integrations and its pioneering approach to recursive self-evolution, MiniMax M2.7 represents a significant step toward an AI-native future where models are as involved in their own progress as the humans who guide them.Strategic implications for enterprise decision-makersTechnical decision-makers should interpret the M2.7 release as evidence that agentic AI has moved from theoretical prototyping to production-ready utility. The model’s ability to reduce recovery time for live production incidents to under three minutes by autonomously correlating monitoring metrics with code repositories suggests a paradigm shift for SRE and DevOps teams.Enterprises currently facing pressure to adopt AI-driven efficiencies must decide whether they are content with AI as a sophisticated assistant or if they are ready to integrate native agent teams capable of end-to-end full project delivery.From a financial perspective, M2.7 represents a significant breakthrough in cost efficiency for high-level reasoning. Analysis indicates that M2.7 costs less than one-third as much to run as GLM-5 at equivalent intelligence levels. For example, running a standard intelligence index cost 176 dollars on M2.7 compared to 547 dollars for GLM-5 and 371 dollars for Kimi K2.5. This aggressive pricing strategy places M2.7 on the Pareto frontier of the intelligence vs. cost chart, offering enterprise-level reasoning at a fraction of the market rate.The current market is saturated with high-performance models, many of which still hold slight edges in general reasoning scores. But the specific optimization of M2.7 for Office Suite fidelity in Excel, PPT, and Word and its high performance in the GDPval-AA benchmark make it a primary candidate for organizations focused on professional document workflows and financial modeling. Decision-makers must weigh the benefits of a general-purpose frontier model against a specialized engine like M2.7, which is built to interact with complex internal scaffolds and toolsets.Ultimately, the fact that it is fielded by a Chinese company (headquartered in Shanghai) and subject to that country’s laws in addition to the user’s country, and is not available for offline or local usage yet, may make it a tough sell for enterprises operating in the U.S. and the West — especially those in highly-regulated or government-facing industries. Nonetheless, the shift toward self-evolving models suggests that the ROI of AI investment will increasingly be tied to the recursive gains of the system itself. Organizations that adopt models capable of improving their own harnesses may find themselves on a faster iteration curve than those relying on static, human-only refinement. With MiniMax’s aggressive integration into the modern developer stack, the barrier to testing these autonomous workflows has dropped significantly, placing pressure on competitors to deliver similar native agent capabilities.
Enterprise AI agents keep operating from different versions of reality — Microsoft says Fabric IQ is the fix
In 2026, data engineers working with multi-agent systems are hitting a familiar problem: Agents built on different platforms don’t operate from a shared understanding of the business. The result isn’t model failure — it’s hallucination driven by fragmented context.The problem is that agents built on different platforms, by different teams, do not share a common understanding of how the business actually operates. Each one carries its own interpretation of what a customer, an order or a region means. When those definitions diverge across a workforce of agents, decisions break down.A set of announcements from Microsoft this week directly targets that problem. The centerpiece is a significant expansion of Fabric IQ, the semantic intelligence layer the company debuted in November 2025. Fabric IQ’s business ontology is now accessible via MCP to any agent from any vendor, not just Microsoft’s. Alongside that, Microsoft is adding enterprise planning to Fabric IQ, unifying historical data, real-time signals and formal organizational goals in one queryable layer. The new Database Hub brings Azure SQL, Cosmos DB, PostgreSQL, MySQL and SQL Server under a single management plane inside Fabric. Fabric data agents reach general availability. The overall goal is a unified platform where all data and semantics are available and accessible by any agent to get the context that enterprises require.Amir Netz, CTO of Microsoft Fabric, reached for a film analogy to explain why the shared context layer matters. “It’s a little bit like the girl from 50 First Dates,” Netz told VentureBeat. “Every morning they wake up and they forget everything and you have to explain it again. This is the explanation that you give them every morning.”Why MCP access changes the equationMaking the ontology MCP-accessible is the step that moves Fabric IQ from a Fabric-specific feature into shared infrastructure for multi-vendor agent deployments. Netz was explicit about the design intent.”It doesn’t really matter whose agent it is, how it was built, what the role is,” Netz said. “There’s certain common knowledge, certain common context that all the agents will share.”That shared context is also where Netz draws a clear line between what the ontology does and what RAG does. He did not dismiss retrieval-augmented generation as a technique — he placed it specifically. RAG handles large document bodies such as regulations, company handbooks and technical documentation, where on-demand retrieval is more practical than loading everything into context.
“We don’t expect humans to remember everything by heart,” he said. “When somebody asks a question, you have to know to go and do a little bit of a search, find the right relevant part and bring it back.”But RAG does not solve for real-time business state, he argued. It does not tell an agent which planes are in the air right now, whether a crew has enough rest hours, or what the current priority is on a given product line.
“The mistake of the past was they thought one technology can just give you everything,” Netz said. “The cognitive model of the agents is similar to humans. You have to have things that are available out of memory, things that are available on demand, things that are constantly observed and detected in real time.”The execution gap analysts say Microsoft still has to closeIndustry analysts see the logic behind Microsoft’s direction but have questions about what comes next.Robert Kramer, analyst at Moor Insights and Strategy, noted that Microsoft’s broad stack gives it a structural advantage in the race to become the default platform for enterprise agent deployments. “Fabric ties into Power BI, Microsoft 365, Dynamics and Azure services. That gives Microsoft a natural path to connect enterprise data with business users, operational workflows and now AI systems operating across that environment,” he said. The trade-off, Kramer said, is that Microsoft is competing across a wider surface area than Databricks or Snowflake, which built their reputations on depth of the data platform itself.The more immediate question for data teams, Kramer said, is whether MCP access actually reduces integration work.”Most enterprises do not operate in a single AI environment. Finance might be using one set of tools, engineering another, supply chain something else,” Kramer told VentureBeat. “If Fabric IQ can act as a common data context layer those agents can access, it starts to reduce some of the fragmentation that typically shows up around enterprise data.”But, he said, “If it just adds another protocol that still requires a lot of engineering work, adoption will be slower.”Whether the engineering work is the harder problem is open to debate. Independent analyst Sanjeev Mohan, told VentureBeat, that the bigger challenge is organizational, not technical. “I don’t think they fully understand the implications yet,” he said of enterprise data teams. “This is a classical capabilities overhang — capabilities are expanding faster than people’s imagination to use them. The harder work will be ensuring that the context layer is reliable and trustworthy.”Holger Mueller, principal analyst at Constellation Research, sees MCP as the right mechanism but urges caution on execution.
“For enterprise to benefit from AI, they need to get access to their data — that is in many places unorganized, siloed — and they want that in a way that makes it easy for AI in a standard way to get there. That is what MCP does,” Mueller told VentureBeat. “The devil is in the details. How good is the access, how well does it perform and what does it cost. Access and governance still need to be sorted out.”The Database Hub and the competitive pictureThe Fabric IQ announcements arrive alongside the Database Hub, now in early access, which brings Azure SQL, Azure Cosmos DB, PostgreSQL, MySQL and SQL Server under a single management and observability layer inside Fabric. The intent is to give data operations teams one place to monitor, govern and optimize their database estate without changing how each service is deployed.Devin Pratt, research director at IDC, said the integrated direction tracks with where the broader market is heading. IDC expects that by 2029, 60% of enterprise data platforms will unify transactional and analytical workloads.
“Microsoft’s angle is to bring more of those pieces together in one coordinated approach, while rivals are moving along similar lines from different starting points,” Pratt told VentureBeat.What this means for enterprise data teamsFor data engineers responsible for making pipelines AI-ready, the practical implication of this week’s announcements is a shift in where the hard work lives.
Connecting data sources to a platform is a solved problem. Defining what that data means in business terms, and making that definition consistently available to every agent that queries it, is not.That shift has a concrete implication for data professionals. The semantic layer — the ontology that maps business entities, relationships and operational rules — is becoming production infrastructure. It will need to be built, versioned, governed and maintained with the same discipline as a data pipeline. That is a new category of responsibility for data engineering teams, and most organizations have not yet staffed or structured for it.The broader trend this week’s announcements reflect is that the data platform race in 2026 is no longer primarily about compute or storage. It is about which platform can deliver the most reliable shared context to the widest range of agents.
Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency
The generative AI era began for most people with the launch of OpenAI’s ChatGPT in late 2022, but the underlying technology — the “Transformer” neural network architecture that allows AI models to weigh the importance of different words in a sentence (or pixels in an image) differently and train on information in parallel — dates back to Google’s seminal 2017 paper “Attention Is All You Need.”Yet while Transformers deliver unparalleled model quality and have underpinned most of the major generative AI models used today, they are computationally gluttonous. They are burdened by quadratic compute and linear memory demands that make large-scale inference an expensive, often prohibitive, endeavor. Hence, the desire by some researchers to improve on them by developing a new architecture, Mamba, in 2023, which has gone on to be included in hybrid Mamba-Transformer models like Nvidia’s Nemotron 3 Super.Now, the same researchers behind the original Mamba architecture including leaders Albert Gu of Carnegie Mellon and Tri Dao of Princeton have released the latest version of their new architecture, Mamba-3, as a language model under a permissive Apache 2.0 open source license — making it immediately available to developers, including enterprises for commercial purposes. A technical paper has also been published on arXiv.org.This model signals a paradigm shift from training efficiency to an “inference-first” design. As Gu noted in the official announcement, while Mamba-2 focused on breaking pretraining bottlenecks, Mamba-3 aims to solve the “cold GPU” problem: the reality that during decoding, modern hardware often remains idle, waiting for memory movement rather than performing computation.Perplexity (no, not the company) and the newfound efficiency of Mamba 3Mamba, including Mamba 3, is a type of State Space Model (SSM).These are effectively a high-speed “summary machine” for AI. While many popular models (like the ones behind ChatGPT) have to re-examine every single word they’ve already seen to understand what comes next—which gets slower and more expensive the longer the conversation lasts—an SSM maintains a compact, ever-changing internal state. This state is essentially a digital “mental snapshot” of the entire history of the data. As new information flows in, the model simply updates this snapshot instead of re-reading everything from the beginning. This allows the AI to process massive amounts of information, like entire libraries of books or long strands of DNA, with incredible speed and much lower memory requirements.To appreciate the leap Mamba-3 represents, one must first understand perplexity, the primary metric used in the research to measure model quality. In the context of language modeling, perplexity is a measure of how “surprised” a model is by new data.Think of a model as a professional gambler. If a model has high perplexity, it is unsure where to place its bets; it sees many possible next words as equally likely. A lower perplexity score indicates that the model is more “certain”—it has a better grasp of the underlying patterns of human language. For AI builders, perplexity serves as a high-fidelity proxy for intelligence. The breakthrough reported in the Mamba-3 research is that it achieves comparable perplexity to its predecessor, Mamba-2, while using only half the state size. This means a model can be just as smart while being twice as efficient to run.A new philosophyThe philosophy guiding Mamba-3 is a fundamental shift in how we think about AI “intelligence” versus the speed of the hardware it runs on. While the previous generation, Mamba-2, was designed to be trained at record-breaking speeds, Mamba-3 is an “inference-first” architecture — inference referring to the way AI models are served to end users, through websites like ChatGPT or Google Gemini, or through application programming interfaces (APIs).Mamba 3’s primary goal is to maximize every second the computer chip (GPU) is active, ensuring that the model is thinking as hard as possible without making the user wait for an answer.In the world of language models, every point of accuracy is hard-won. At the 1.5-billion-parameter scale, the most advanced “MIMO” variant of Mamba-3 achieved a 57.6% average accuracy across benchmarks, representing a 2.2-percentage-point leap over the industry-standard Transformer. While a two-point jump might sound modest, it actually represents a nearly 4% relative increase in language modeling capability compared to the Transformer baseline. Even more impressively, as alluded to above, Mamba-3 can match the predictive quality of its predecessor while using only half the internal “state size,” effectively delivering the same level of intelligence with significantly less memory lag.For years, efficient alternatives to Transformers suffered from a “logic gap”—they often failed at simple reasoning tasks, like keeping track of patterns or solving basic arithmetic, because their internal math was too rigid. Mamba-3 solves this by introducing complex-valued states. This mathematical upgrade acts like an internal compass, allowing the model to represent “rotational” logic. By using this “rotary” approach, Mamba-3 can near-perfectly solve logic puzzles and state-tracking tasks that its predecessors could only guess at, finally bringing the reasoning power of linear models on par with the most advanced systems.The final piece of the puzzle is how Mamba-3 interacts with physical hardware. Most AI models today are “memory-bound,” meaning the computer chip spends most of its time idle, waiting for data to move from memory to the processor. Mamba-3 introduces a Multi-Input, Multi-Output (MIMO) formulation that fundamentally changes this dynamic. By performing up to four times more mathematical operations in parallel during each step, Mamba-3 utilizes that previously “idle” power. This allows the model to do significantly more “thinking” for every word it generates without increasing the actual time a user spends waiting for a response. More on these below.Three new technological leapsThe appeal of linear models has always been their constant memory requirements and linear compute scaling.However, as the Mamba 3 authors point out, there is “no free lunch”. By fixing the state size to ensure efficiency, these models are forced to compress all historical context into a single representation—the exact opposite of a Transformer’s ever-growing KV cache. Mamba-3 pulls three specific levers to make that fixed state do more work.1. Exponential-Trapezoidal DiscretizationState Space Models are fundamentally continuous-time systems that must be “discretized” to handle the discrete sequences of digital data. Previous iterations relied on “Exponential-Euler” discretization—a heuristic that provided only a first-order approximation of the system.Mamba-3 introduces a generalized trapezoidal rule, providing second-order accurate approximation. This isn’t just a mathematical refinement; it induces an “implicit convolution” within the core recurrence. By combining this with explicit B and C bias terms, the researchers were able to remove the short causal convolution that has been a staple of recurrent architectures for years.2. Complex-Valued SSMs and the “RoPE Trick”One of the most persistent criticisms of linear models has been their inability to solve simple state-tracking tasks, such as determining the parity of a bit sequence. This failure stems from restricting the transition matrix to real numbers, which prevents the model from representing “rotational” dynamics.Mamba-3 overcomes this by viewing the underlying SSM as complex-valued.Using what the team calls the “RoPE trick,” they demonstrate that a complex-valued state update is mathematically equivalent to a data-dependent rotary embedding (RoPE) applied to the input and output projections. This allows Mamba-3 to solve synthetic reasoning tasks that were impossible for Mamba-2.3. MIMO: Boosting Arithmetic IntensityThe most significant leap in inference efficiency comes from the transition from Single-Input, Single-Output (SISO) to Multi-Input, Multi-Output (MIMO) SSMs. In a standard SSM, the state update is an outer-product operation that is heavily memory-bound.By switching to a matrix-multiplication-based state update, Mamba-3 increases the “arithmetic intensity” of the model—the ratio of FLOPs to memory traffic. This allows the model to perform more computation during the memory-bound decoding phase. Essentially, Mamba-3 utilizes the “idle” compute cores of the GPU to increase model power for “free,” maintaining the same decoding speed as its simpler predecessors.What Mamba 3 means for enterprises and AI buildersFor enterprises, Mamba-3 represents a strategic shift in the total cost of ownership (TCO) for AI deployments.Cost vs. Performance: By matched-parameter performance, Mamba-3 (MIMO) matches the perplexity of Mamba-2 while using half the state size. For enterprise deployment, this effectively doubles the inference throughput for the same hardware footprint.Agentic Workflows: As organizations move toward parallel, agentic workflows (like automated coding or real-time customer service agents), the demand for low-latency generation increases exponentially. Mamba-3 is designed specifically to prevent GPU hardware from sitting “cold” during these tasks.The Hybrid Advantage: The researchers predict that the future of enterprise AI lies in hybrid models. By interleaving Mamba-3 with self-attention, organizations can combine the efficient “memory” of SSMs with the precise “database” storage of Transformers.Availability, licensing, and usageMamba-3 is not merely a theoretical research paper; it is a fully realized, open-source release available for immediate use with model code published on Github.The project is released under the Apache-2.0 License. This is a permissive, business-friendly license that allows for free usage, modification, and commercial distribution without requiring the disclosure of proprietary source code.This release is good for developers building long-context applications, real-time reasoning agents, or those seeking to reduce GPU costs in high-volume production environments.Leading the State Space Models (SSM) revolutionThe release was met with enthusiasm on social media, particularly regarding the “student-led” nature of the project. Gu, whose X/Twitter bio describes him as “leading the ssm revolution,” gave full credit to the student leads, including Aakash Lahoti and Kevin Y. Li.Gu’s thread highlighted the team’s satisfaction with the design:”We’re quite happy with the final model design! The three core methodological changes are inspired by (imo) some elegant math and methods.”As agentic workflows push inference demand “through the roof,” the arrival of Mamba-3 suggests that the future of AI may not just be about having the biggest model, but about having the most efficient one. Mamba-3 has successfully re-aligned the SSM with the realities of modern hardware, proving that even in the age of the Transformer, the principles of classical control theory still have a vital role to play.
Mistral AI launches Forge to help companies build proprietary AI models, challenging cloud giants
Mistral AI on Monday launched Forge, an enterprise model training platform that allows organizations to build, customize, and continuously improve AI models using their own proprietary data — a move that positions the French AI lab squarely against the hyperscale cloud providers in one of the most consequential and least understood markets in enterprise technology.The announcement caps a remarkably aggressive week for Mistral, which also released its Mistral Small 4 model, unveiled Leanstral — an open-source code agent for formal verification — and joined the newly formed Nvidia Nemotron Coalition as a co-developer of the coalition’s first open frontier base model. Together, these moves paint the picture of a company that is no longer content to compete on model benchmarks alone and is instead racing to become the infrastructure backbone for organizations that want to own their AI rather than rent it.Forge goes significantly beyond the fine-tuning APIs that Mistral and its competitors have offered for the past year. The platform supports the full model training lifecycle: pre-training on large internal datasets, post-training through supervised fine-tuning, DPO, and ODPO, and — critically — reinforcement learning pipelines designed to align models with internal policies, evaluation criteria, and operational objectives over time.”Forge is Mistral’s model training platform,” said Maliena Guy, head of product at Mistral AI, in an exclusive interview with VentureBeat ahead of the launch. “We’ve been building this out behind the scenes with our AI scientists. What Forge actually brings to the table is that it lets enterprises and governments customize AI models for their specific needs.”Why Mistral says fine-tuning APIs are no longer enough for serious enterprise AIThe distinction Mistral is drawing — between lightweight fine-tuning and full-cycle model training — is central to understanding why Forge exists and whom it serves.For the past two years, most enterprise AI adoption has followed a familiar pattern: companies select a general-purpose model from OpenAI, Anthropic, Google, or an open-source provider, then apply fine-tuning through a cloud API to adjust the model’s behavior for a narrow set of tasks. This approach works well for proof-of-concept deployments and many production use cases. But Guy argues that it fundamentally plateaus when organizations try to solve their hardest problems.”We had a fine-tuning API relying on supervised fine-tuning. I think it was kind of what was the standard a couple of months ago,” Guy told VentureBeat. “It gets you to a proof-of-concept state. Whenever you actually want to have the performance that you’re targeting, you need to go beyond. AI scientists today are not using fine-tuning APIs. They’re using much more advanced tools, and that’s what Forge is bringing to the table.”What Forge packages, in Guy’s telling, is the training methodology that Mistral’s own AI scientists use internally to build the company’s flagship models — including data mixing strategies, data generation pipelines, distributed computing optimizations, and battle-tested training recipes. She drew a sharp line between Forge and the open-source tools and community tutorials that are freely available today.”There’s no platform out there that provides you real-world training recipes that work,” Guy said. “Other open-source repositories or other tools can give you generic configurations or community tutorials, but they don’t give you the recipe that’s been validated — that we’ve been doing for all of our flagship models today.”From ancient manuscripts to hedge fund quant languages, early customers reveal what off-the-shelf AI can’t doThe obvious question facing any product like Forge is demand. In a market where GPT-5, Claude, Gemini, and a growing fleet of open-source models can handle an enormous range of tasks, why would an enterprise invest the time, compute, and expertise required to train its own model from scratch?Guy acknowledged the question head-on but argued that the need emerges quickly once companies move beyond generic use cases. “A lot of the existing models can get you very far,” she said. “But when you’re looking at what’s going to make you competitive compared to your competition — everyone can adopt and use the models that are out there. When you want to go a step beyond that, you actually need to create your own models. You need to leverage your proprietary information.”The real-world examples she cited illustrate the edges of the current model ecosystem. In one case, Mistral worked with a public institution that had ancient manuscripts with missing text from damaged sections. “The models that were available were not able to do this because they’ve never seen the data,” Guy explained. “Digitization was not very good. There were some unique patterns and characters, and so we actually created a model for them to fill in the spans. This is now used by their researchers, and it’s accelerating their publication and understanding of these documents.”In another engagement, Mistral partnered with Ericsson to customize its Codestral model for legacy-to-modern code translation. Ericsson, Guy said, has built up half a decade of proprietary knowledge around an internal calling language — a codebase so specialized that no off-the-shelf model has ever encountered it. “The concrete impact is like turning a year-long manual migration process, where each engineer needs six months of onboarding, to something that’s really more scalable and faster,” she said.Perhaps the most telling example involves hedge funds. Guy described working with financial firms to customize models for proprietary quantitative languages — the kind of deeply guarded intellectual property that these firms keep on-premises and never expose to cloud-hosted AI services. Using Forge’s reinforcement learning capabilities, Mistral helped one hedge fund develop custom benchmarks and then trained the model to outperform on them, producing what Guy called “a unique model that was able to give them the competitive edge that was needed.”How Forge makes money: license fees, data pipelines, and embedded AI scientistsForge’s business model reflects the complexity of enterprise model training. According to Guy, it operates across several revenue streams. For customers who run training jobs on their own GPU clusters — a common requirement in highly regulated or IP-sensitive industries — Mistral does not charge for compute. Instead, the company charges a license fee for the Forge platform itself, along with optional fees for data pipeline services and what Mistral calls “forward-deployed scientists” — embedded AI researchers who work alongside the customer’s team.”No competitor out there today is kind of selling this embedded scientist as part of their training platform offering,” Guy said.This model has clear echoes of Palantir’s early playbook, where forward-deployed engineers served as the critical bridge between powerful software and the messy reality of enterprise data. It also suggests that Mistral recognizes a fundamental truth about the current state of enterprise AI: the technology alone is not enough. Most organizations lack the internal expertise to design effective training recipes, curate data at scale, or navigate the treacherous optimization landscape of distributed GPU training.The infrastructure itself is flexible. Training can happen on Mistral’s own clusters, on Mistral Compute (the company’s dedicated infrastructure offering), or entirely on-premises within the customer’s own data centers. “We have all these different cases, and we support everything,” Guy said.Keeping proprietary data off the cloud is Forge’s sharpest selling pointOne of the sharpest points of differentiation Mistral is pressing with Forge is data privacy. When customers train on their own infrastructure, Guy emphasized that Mistral never sees the data at all.”It’s on their clusters, it’s with their data — we don’t see anything of it, and so it’s completely under their control,” she said. “I think this is something that sets us apart from the competition, where you actually need to upload your data, and you have a black box effect.”This matters enormously in sectors like defense, intelligence, financial services, and healthcare, where the legal and reputational risks of exposing proprietary data to a third-party cloud service can be deal-breakers. Mistral has already partnered with organizations including ASML, DSO National Laboratories Singapore, the European Space Agency, Home Team Science and Technology Agency Singapore, and Reply — a roster that suggests the company is deliberately targeting the most data-sensitive corners of the enterprise market.Forge also includes data pipeline capabilities that Mistral has developed through its own model training: data acquisition, curation, and synthetic data generation. “Data is a critical piece of any training job today,” Guy said. “You need to have good data. You need to have a good amount of data to make sure that the model is going to be good performing. We’ve acquired, as a company, really great knowledge building out these data pipelines.”In the age of AI agents, Mistral argues that custom models still matter more than MCP serversThe timing of Forge’s launch raises an important strategic question. The AI industry in 2026 has been consumed by agents — autonomous AI systems that can use tools, navigate multi-step workflows, and take actions on behalf of users. If the future belongs to agents, why does the underlying model matter? Can’t companies simply plug into the best available frontier model through an MCP server or API and focus their energy on orchestration?Guy pushed back on this framing with conviction. “The customers that we’ve been working on — some of these specific problems are things that no MCP server would ever solve,” she said. “You actually need that intelligence. You actually need to create that model that will help you solve your most critical business problem.”She also argued that model customization is essential even in purely agentic architectures. “There are some agentic behaviors that you need to bring to the model,” Guy said. “It can be about reasoning patterns, specific types of documentation, making sure that you have the right reasoning traces. Even in these cases where people are going completely agentic, you still need model customization — like reinforcement learning techniques — to actually get the right level of performance.”Mistral’s press release makes this connection explicit, arguing that custom models make enterprise agents more reliable by providing deeper understanding of internal environments: more precise tool selection, more dependable multi-step workflows, and decisions that reflect internal policies rather than generic assumptions.The platform also supports an “agent-first” design philosophy. Forge exposes interfaces that allow autonomous agents — including Mistral’s own Vibe coding agent — to launch training experiments, find optimal hyperparameters, schedule jobs, and generate synthetic data. “We’ve actually been building Forge in an AI-native way,” Guy said. “We’re already testing out how autonomous agents can actually launch training experiments.”Mistral Small 4, Leanstral, and the Nvidia coalition: the week that redefined the company’s ambitionsTo fully appreciate Forge’s significance, it helps to view it alongside the other announcements Mistral made in the same week — a barrage of releases that together represent the most ambitious expansion in the company’s short history.Just yesterday, Mistral released Leanstral, the first open-source code agent for Lean 4, the proof assistant used in formal mathematics and software verification. Leanstral operates with just 6 billion active parameters and is designed for realistic formal repositories — not isolated math competition problems. On the same day, Mistral launched Mistral Small 4, a mixture-of-experts model with 119 billion total parameters but only 6 billion active per query, running 40 percent faster than its predecessor while handling three times more queries per second. Both models ship under the Apache 2.0 license — the most permissive open-source license in wide use.And then there is the Nvidia Nemotron Coalition. Announced at Nvidia’s GTC conference, the coalition is a first-of-its-kind collaboration between Nvidia and a group of AI labs — including Mistral, Perplexity, LangChain, Cursor, Black Forest Labs, Reflection AI, Sarvam, and Thinking Machines Lab — to co-develop open frontier models. The coalition’s first project is a base model co-developed specifically by Mistral AI and Nvidia, trained on Nvidia DGX Cloud, which will underpin the upcoming Nvidia Nemotron 4 family of open models.”Open frontier models are how AI becomes a true platform,” said Arthur Mensch, cofounder and CEO of Mistral AI, in Nvidia’s announcement. “Together with Nvidia, we will take a leading role in training and advancing frontier models at scale.”This coalition role is strategically significant. It positions Mistral not merely as a consumer of Nvidia’s compute infrastructure but as a co-creator of the foundational models that the broader ecosystem will build upon. For a company that is still a fraction of the size of its American competitors, this is an outsized seat at the table.Forge takes aim at Amazon, Microsoft, and Google — and says they can’t go deep enoughForge enters a market that is already crowded — at least on the surface. Amazon Bedrock, Microsoft Azure AI Foundry, and Google Cloud Vertex AI all offer model training and customization capabilities. But Guy argued that these offerings are fundamentally limited in two respects.First, they are cloud-only. “In one set of cases, it’s very easy to answer — they want to run this on their premises, and so all these tools that are available on the cloud are just not available for them,” Guy said. Second, she argued that the hyperscalers’ training tools largely offer simplified API interfaces that don’t provide the depth of control that serious model training requires.There is also the dependency question. Guy described digital-native companies that had built products on top of closed-source models, only to have a new model release — more verbose than its predecessor — crash their production pipelines. “When you’re relying on closed-source models, you are also super dependent on the updates of the model that have side effects,” she warned.This argument resonates with the broader sovereignty narrative that has powered Mistral’s rise in Europe and beyond. The company has positioned itself as the alternative for organizations that want to own their AI stack rather than lease it from American hyperscalers. Forge extends that argument from inference to training: not just running models you own, but building them in the first place.The open-source foundation matters here, too. Mistral has been releasing models under permissive licenses since its founding, and Guy emphasized that the company is building Forge as an open platform. While it currently works with Mistral’s own models, she confirmed that support for other open-source architectures is planned. “We’re deeply rooted into open source. This has been part of our DNA since the beginning, and we have been building Forge to be an open platform — it’s just a question of a matter of time that we’ll be opening this to other open-source models.”A co-founder’s departure to xAI underscores why Mistral is turning expertise into a productThe timing of Forge’s launch also arrives against a backdrop of fierce talent competition. As FinTech Weekly reported on March 14, Devendra Singh Chaplot — a co-founder of Mistral AI who headed the company’s multimodal group and contributed to training Mistral 7B, Mixtral 8x7B, and Mistral Large — left to join Elon Musk’s xAI, where he will work on Grok model training. Chaplot had previously also been a founding member of Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati.The loss of a co-founder is never insignificant, but Mistral appears to be compensating with institutional capability rather than individual brilliance. Forge is, in essence, a productization of the company’s collective training expertise — the recipes, the pipelines, the distributed computing optimizations — in a form that can scale beyond any single researcher. By packaging this knowledge into a platform and pairing it with forward-deployed scientists, Mistral is attempting to build a durable competitive asset that doesn’t walk out the door when a key hire departs.Mistral’s big bet: the companies that own their AI models will be the ones that winForge is a bet on a specific theory of the enterprise AI future: that the most valuable AI systems will be those trained on proprietary knowledge, governed by internal policies, and operated under the organization’s direct control. This stands in contrast to the prevailing paradigm of the past two years, in which enterprises have largely consumed AI as a cloud service — powerful but generic, convenient but uncontrolled.The question is whether enough enterprises will be willing to make the investment. Model training is expensive, technically demanding, and requires sustained organizational commitment. Forge lowers the barriers — through its infrastructure automation, its battle-tested recipes, and its embedded scientists — but it does not eliminate them.What Mistral appears to be banking on is that the organizations with the most to gain from AI — the ones sitting on decades of proprietary knowledge in highly specialized domains — are precisely the ones for whom generic models are least sufficient. These are the companies where the gap between what a general-purpose model can do and what the business actually needs is widest, and where the competitive advantage of closing that gap is greatest.Forge supports both dense and mixture-of-experts architectures, accommodating different trade-offs between performance, cost, and operational constraints. It handles multimodal inputs. It is designed for continuous adaptation rather than one-time training, with built-in evaluation frameworks that let enterprises test models against internal benchmarks before production deployment.For the past two years, the enterprise AI playbook has been straightforward: pick a model, call an API, ship a feature. Mistral is now asking a harder question — whether the organizations willing to do the difficult, expensive, unglamorous work of training their own models will end up with something the API-callers never get.An unfair advantage.
The authorization problem that could break enterprise AI
When an AI agent needs to log into your CRM, pull records from your database, and send an email on your behalf, whose identity is it using? And what happens when no one knows the answer? Alex Stamos, chief product officer at Corridor, and Nancy Wang, CTO at 1Password joined the VB AI Impact Salon Series to dig into the new identity framework challenges that come along with the benefits of agentic AI. “At a high level, it’s not just who this agent belongs to or which organization this agent belongs to, but what is the authority under which this agent is acting, which then translates into authorization and access,” Wang said.How 1Password ended up at the center of the agent identity problemWang traced 1Password’s path into this territory through its own product history. The company started as a consumer password manager, and its enterprise footprint grew organically as employees brought tools they already trusted into their workplaces. “Once those people got used to the interface, and really enjoyed the security and privacy standards that we provide as guarantees for our customers, then they brought it into the enterprise,” she said. The same dynamic is now happening with AI, she added. “Agents also have secrets, or passwords, just like humans do.”Internally, 1Password is navigating the same tension it helps customers manage: how to let engineers move fast without creating a security mess. Wang said the company actively tracks the ratio of incidents to AI-generated code as engineers use tools like Claude Code and Cursor. “That’s a metric we track intently to make sure we’re generating quality code.”How developers are incurring major security risksStamos said one of the most common behaviors Corridor observes is developers pasting credentials directly into prompts, which is a huge security risk. Corridor flags it and sends the developer back toward proper secrets management.”The standard thing is you just go grab an API key or take your username and password and you just paste it into the prompt,” he said. “We find this all the time because we’re hooked in and grabbing the prompt.” Wang described 1Password’s approach as working on the output side, scanning code as it is written and vaulting any plain text credentials before they persist. The tendency toward the cut-and-paste method of system access is a direct influence on 1Password’s design choices, which is to avoid security tooling that creates friction. “If it’s too hard to use, to bootstrap, to get onboarded, it’s not going to be secure because frankly people will just bypass it and not use it,” she said.Why you cannot treat a coding agent like a traditional security scannerAnother challenge in building feedback between security agents and coding models is false positives, which very friendly and agreeable large language models are prone toward. Unfortunately, these false positives from security scanners can derail an entire code session. “If you tell it this is a flaw, it’ll be like, yes sir, it’s a total flaw!” Stamos said. But, he added, “You cannot screw up and have a false positive, because if you tell it that and you’re wrong, you will completely ruin its ability to write correct code.” That tradeoff between precision and recall is structurally different from what traditional static analysis tools are designed to optimize for, and it has required significant engineering to get right at the latency required, on the order of a few hundred milliseconds per scan.Authentication is easy, but authorization is where things get hard”An agent typically has a lot more access than any other software in your environment,” noted Spiros Xanthos, founder and CEO at Resolve AI, in an earlier session at the event. “So, it is understandable why security teams are very concerned about that. Because if that attack vector gets utilized, then it can both result in a data breach, but even worse, maybe you have something in there that can take action on behalf of an attacker.”So how do you give autonomous agents scoped, auditable, time-limited identities? Wang pointed to SPIFFE and SPIRE, workload identity standards developed for containerized environments, as candidates being tested in agentic contexts. But she acknowledged the fit is rough. “We’re kind of force-fitting a square peg into a round hole,” she said. But authentication is only half of it. Once an agent has a credential, what is it actually allowed to do? Here’s where the principle of least privilege should be applied to tasks rather than roles. “You wouldn’t want to give a human a key card to an entire building that has access to every room in the building,” she explained. “You also don’t want to give an agent the keys to the kingdom, an API key to do whatever it needs to do forever. It needs to be time-bound and also bound to the task you want that agent to do.”In enterprise environments, it won’t be enough to grant scoped access, organizations will need to know which agent acted, under what authority, and what credentials were used. Stamos pointed to OIDC extensions as the current frontrunner in standards conversations, while dismissing the crop of proprietary solutions. “There are 50 startups that believe their proprietary patented solution will be the winner,” he said. “None of those will win, by the way, so I would not recommend.”At a billion users, edge cases are not edge cases anymoreOn the consumer side, Stamos predicted the identity problem will consolidate around a small number of trusted providers, most likely the platforms that already anchor consumer authentication. Drawing on his time as CISO at Facebook, where the team handled roughly 700,000 account takeovers per day, he reframed what scale does to the concept of an edge case. “When you’re the CISO of a company that has a billion users, corner case is something that means real human harm,” he explained. “And so identity, for normal people, for agents, going forward is going to be a humongous problem.”Ultimately, the challenges CTOs face on the agent side stem from incomplete standards for agent identity, improvised tooling, and enterprises deploying agents faster than the frameworks meant to govern them can be written. The path forward requires building identity infrastructure from scratch around what agents actually are, not retrofitting what was built for the humans who created them.
Nvidia’s agentic AI stack is the first major platform to ship with security at launch, but governance gaps remain
For the first time on a major AI platform release, security shipped at launch — not bolted on 18 months later. At Nvidia GTC this week, five security vendors announced protection for Nvidia’s agentic AI stack, four with active deployments, one with validated early integration.The timing reflects how fast the threat has moved: 48% of cybersecurity professionals rank agentic AI as the top attack vector heading into 2026. Only 29% of organizations feel fully ready to deploy these technologies securely. Machine identities outnumber human employees 82 to 1 in the average enterprise. And IBM’s 2026 X-Force Threat Intelligence Index documented a 44% surge in attacks exploiting public-facing applications, accelerated by AI-enabled vulnerability scanning.Nvidia CEO Jensen Huang made the case from the GTC keynote stage on Monday: “Agentic systems in the corporate network can access sensitive information, execute code, and communicate externally. Obviously, this can’t possibly be allowed.” Nvidia defined a unified threat model designed to flex and adapt for the unique strengths of five different vendors. Nvidia also names Google, Microsoft Security and TrendAI as Nvidia OpenShell security collaborators. This article maps the five vendors with embargoed GTC announcements and verifiable deployment commitments on record, an analyst-synthesized reference architecture, not Nvidia’s official canonical stack.No single vendor covers all five governance layers. Security leaders can evaluate CrowdStrike for agent decisions and identity, Palo Alto Networks for cloud runtime, JFrog for supply chain provenance, Cisco for prompt-layer inspection, and WWT for pre-production validation. The audit matrix below maps who covers what. Three or more unanswered vendor questions mean ungoverned agents in production.The five-layer governance frameworkThis framework draws from the five vendor announcements and the OWASP Agentic Top 10. The left column is the governance layer. The right column is the question every security leader’s vendor should answer. If they can’t answer it, that layer is ungoverned.Governance LayerWhat To DeployRisk If NotVendor QuestionWho Maps HereAgent DecisionsReal-time guardrails on every prompt, response, and actionPoisoned input triggers privileged actionDetect state drift across sessions?CrowdStrike Falcon AIDR, Cisco AI Defense [runtime enforcement]Local ExecutionBehavioral monitoring for on-device agentsLocal agent runs unprotectedAgent baselines beyond process monitoring?CrowdStrike Falcon Endpoint [runtime enforcement]; WWT ARMOR [pre-prod validation]Cloud OpsRuntime enforcement across cloud deploymentsAgent-to-agent privilege escalationTrust policies between agents?CrowdStrike Falcon Cloud Security [runtime enforcement]; Palo Alto Prisma AIRS [AI Factory validated design]IdentityScoped privileges per agent identityInherited creds; delegation compoundsPrivilege inheritance in delegation?CrowdStrike Falcon Identity [runtime enforcement]; Palo Alto Networks/CyberArk [identity governance platform]Supply ChainModel scanning + provenance before deployCompromised model hits productionProvenance from registry to runtime?JFrog Agent Skills Registry [pre-deployment]; CrowdStrike FalconFive-layer governance audit matrix. Three or more unanswered vendor questions indicate ungoverned agents in production. [runtime enforcement] = inline controls active during agent execution. [pre-deployment] = controls applied before artifacts reach runtime. [pre-prod validation] = proving-ground testing before production rollout. [AI Factory validated design] = Nvidia reference architecture integration, not OpenShell-launch coupling.CrowdStrike’s Falcon platform embeds at four distinct enforcement points in the Nvidia OpenShell runtime: AIDR at the prompt-response-action layer, Falcon Endpoint on DGX Spark and DGX Station hosts, Falcon Cloud Security across AI-Q Blueprint deployments, and Falcon Identity for agent privilege boundaries. Palo Alto Networks enforces at the BlueField DPU hardware layer within Nvidia’s AI Factory validated design. JFrog governs the artifact supply chain from the registry through signing. WWT validates the full stack pre-production in a live environment. Cisco runs an independent guardrail at the prompt layer.CrowdStrike and Nvidia are also building what they call intent-aware controls. That phrase matters. An agent constrained to certain data is access-controlled. An agent whose planning loop is monitored for behavioral drift is governed. Those are different security postures, and the gap between them is where the 4% error rate at 5x speed becomes dangerous.Why the blast radius math changedDaniel Bernard, CrowdStrike’s chief business officer, told VentureBeat in an exclusive interview what the blast radius of a compromised AI agent looks like compared to a compromised human credential.“Anything we could think about from a blast radius before is unbounded,” Bernard said. “The human attacker needs to sleep a couple of hours a day. In the agentic world, there’s no such thing as a workday. It’s work-always.”That framing tracks with architectural reality. A human insider with stolen credentials works within biological limits: typing speed, attention span, a schedule. An AI agent with inherited credentials operates at compute speed across every API, database, and downstream agent it can reach. No fatigue. No shift change. CrowdStrike’s 2026 Global Threat Report puts the fastest observed eCrime breakout at 27 seconds and average breakout times at 29 minutes. An agentic adversary doesn’t have an average. It runs until you stop it.When VentureBeat asked Bernard about the 96% accuracy number and what happens in the 4%, his answer was operational, not promotional: “Having the right kill switches and fail-safes so that if the wrong thing is decided, you’re able to quickly get to the right thing.” The implication is worth sitting on. 96% accuracy at 5x speed means the errors that get through arrive five times faster than they used to. The oversight architecture has to match the detection speed. Most SOCs are not designed for that.Bernard’s broader prescription: “The opportunity for customers is to transform their SOCs from history museums into autonomous fighting machines.” Walk into the average enterprise SOC and inventory what’s running there. He’s not wrong.On analyst oversight when agents get it wrong, Bernard drew the governance line: “We want to keep not only agents in the loop, but also humans in the loop of the actions that the SOC is taking when that variance in what normal is realized. We’re on the same team.”The full vendor stackEach of the five vendors occupies a different enforcement point the other four do not. CrowdStrike’s architectural depth in the matrix reflects four announced OpenShell integration points; security leaders should weigh all five based on their existing tooling and threat model.Cisco shipped Secure AI Factory with AI Defense, extending Hybrid Mesh Firewall enforcement to Nvidia BlueField DPUs and adding AI Defense guardrails to the OpenShell runtime. In multi-vendor deployments, Cisco AI Defense and Falcon AIDR run as parallel guardrails: AIDR enforcing inside the OpenShell sandbox, AI Defense enforcing at the network perimeter. A poisoned prompt that evades one still hits the other.Palo Alto Networks runs Prisma AIRS on Nvidia BlueField DPUs as part of the Nvidia AI Factory validated design, offloading inspection to the data processing unit at the network hardware layer, below the hypervisor and outside the host OS kernel. This integration is best understood as a validated reference architecture pairing rather than a tight OpenShell runtime coupling. Palo Alto intercepts east-west agent traffic on the wire; CrowdStrike monitors agent process behavior inside the runtime. Same cloud runtime row, different integration model and maturity stage.JFrog announced the Agent Skills Registry, a system of record for MCP servers, models, agent skills, and agentic binary assets within Nvidia’s AI-Q architecture. Early integration with Nvidia has been validated, with full OpenShell support in active development. JFrog Artifactory will serve as a governed registry for AI skills, scanning, verifying, and signing every skill before agents can adopt it. This is the only pre-deployment enforcement point in the stack. As Chief Strategy Officer Gal Marder put it: “Just as a malicious software package can compromise an application, an unvetted skill can guide an agent to perform harmful actions.”Worldwide Technology launched a Securing AI Lab inside its Advanced Technology Center, built on Nvidia AI factories and the Falcon platform. WWT’s vendor-agnostic ARMOR framework is a pre-production validation and proving-ground capability, not an inline runtime control. It validates how the integrated stack behaves in a live AI factory environment before any agent touches production data, surfacing control interactions, failure modes, and policy conflicts before they become incidents.Three MDR numbers: what they actually measureOn the MDR side, CrowdStrike fine-tuned Nvidia Nemotron models on first-party threat data and operational SOC data from Falcon Complete engagements. Internal benchmarks show 5x faster investigations, 3x higher triage accuracy in high-confidence benign classification, and 96% accuracy in generating investigation queries within Falcon LogScale. Kroll, a global risk advisory and managed security firm that runs Falcon Complete as its MDR backbone, confirmed the results in production. Because Kroll operates Falcon Complete as its core MDR platform rather than as a neutral third-party evaluator, their validation is operationally meaningful but not independent in the audit sense. Industry-wide third-party benchmarks for agentic SOC accuracy do not yet exist. Treat reported numbers as indicative, not audited.The 5x investigation speed compares average agentic investigation time (8.5 minutes) against the longest observed human investigation in CrowdStrike’s internal testing: a ceiling, not a mean. The 3x triage accuracy measures one internal model against another. The 96% accuracy applies specifically to generating Falcon LogScale investigation queries via natural language, not to overall threat detection or alert classification.JFrog’s Agent Skills Registry operates beneath all four CrowdStrike enforcement layers, scanning, signing, and governing every model and skill before any agent can adopt it — with early Nvidia integration validated and full OpenShell support in active development.Six enterprises are already in deploymentEY selected the CrowdStrike-Nvidia stack to power Agentic SOC services for global enterprises. Nebius ships with Falcon integrated into its AI cloud from day one. CoreWeave CISO Jim Higgins signed off on the Blueprint. Mondelēz North America Regional CISO Emmett Koen said the capability lets his team “focus on higher-value response and decision-making.” MGM Resorts International CISO Bryan Green endorsed WWT’s validated testing environments, saying enterprises need “validated environments that embed protection from the start.” These range from vendor selection and platform validation to production integration. The signal is converging across buyer types, not uniform at-scale deployment.What the five-vendor stack does not coverThe governance framework above represents real progress. It also has three holes that every security leader deploying agentic AI will eventually hit. No vendor at GTC closed any of them. Knowing where they are is as important as knowing what shipped.Agent-to-agent trust. When agents delegate to other agents, credentials compound. The OWASP Top 10 for Agentic Applications lists tool call hijacking and orchestrator manipulation as top-tier risks. Independent research from BlueRock Security scanning over 7,000 MCP servers found 36.7% contain vulnerabilities. An arXiv preprint study across 847 scenarios found a 23 to 41% increase in attack success rates in MCP integrations versus non-MCP. No vendor at GTC demonstrated a complete trust policy framework for agent-to-agent delegation. This is the layer where the 82:1 identity ratio becomes a governance crisis, not just an inventory problem.Memory integrity. Agents with persistent memory create an attack surface that stateless LLM deployments do not have. Poison an agent’s long-term memory once. Influence its decisions weeks later. The OWASP Agentic Top 10 flags this explicitly. CrowdStrike’s intent-aware controls are the closest architectural response announced at GTC. Implementation details remain forward-looking.Registry-to-runtime provenance. JFrog’s Agent Skills Registry addresses the registry side of this problem. The gap that remains is the last mile: end-to-end provenance requires proving the model executing in production is the exact artifact scanned and signed in the registry. That cryptographic continuity from registry to runtime is still an engineering problem, not a solved capability.What running five vendors actually costsThe governance matrix is a coverage map, not an implementation plan. Running five vendors across five enforcement layers introduces real operational overhead that the GTC announcements did not address. Someone has to own policy orchestration: deciding which vendor’s guardrail wins when AIDR and AI Defense return conflicting verdicts on the same prompt. Someone has to normalize telemetry across Falcon LogScale, Prisma AIRS, and JFrog Artifactory into a single incident workflow. And someone has to manage change control when one vendor ships a runtime update that shifts how another vendor’s enforcement layer behaves.A realistic phased rollout looks like this: start with the supply chain layer (JFrog), because it operates pre-deployment and has no runtime dependencies on the other four. Add identity governance (Falcon Identity) second, because scoped agent credentials limit blast radius before you instrument the runtime. Then instrument the agent decision layer (Falcon AIDR or Cisco AI Defense, depending on your existing vendor footprint), then cloud runtime, then local execution. Running all five simultaneously from day one is an integration project, not a configuration task. Budget for it accordingly.What to do before your next board meetingHere is what every CISO should be able to say after running the framework above: “We have audited every autonomous agent against five governance layers. Here is what’s in place, and here are the five questions we are holding vendors to.” If you cannot say that today, the issue is not that you are behind schedule. The issue is that no schedule existed. Five vendors just shipped the architectural scaffolding for one.Do four things before your next board meeting:Run the five-layer audit. Pull every autonomous agent your organization has in production or staging. Map each one against the five governance rows above. Mark which vendor questions you can answer and which you cannot.Count the unanswered questions. Three or more means ungoverned agents in production. That is your board number, not a backlog item.Pressure-test the three open gaps. Ask your vendors, explicitly: How do you handle agent-to-agent trust across MCP delegation chains? How do you detect memory poisoning in persistent agent stores? Can you show a cryptographic binding between the registry scan and the runtime load? None of the five vendors at GTC has a complete answer. That is not an accusation. It is where the next year of agentic security gets built.Establish the oversight model before you scale. Bernard put it plainly: keep agents and humans in the loop. 96% accuracy at 5x speed means errors arrive faster than any SOC designed for human-speed detection can catch them. The kill switches and fail-safes have to be in place before the agents run at scale, not after the first missed breach.The scaffolding is necessary. It is not sufficient. Whether it changes your posture depends on whether you treat the five-layer framework as a working instrument or skip past it in the vendor deck.