NanoClaw, the open-source AI agent platform created by Gavriel Cohen, is partnering with the containerized development platform Docker to let teams run agents inside Docker Sandboxes, a move aimed at one of the biggest obstacles to enterprise adoption: how to give agents room to act without giving them room to damage the systems around them.The announcement matters because the market for AI agents is shifting from novelty to deployment. It is no longer enough for an agent to write code, answer questions or automate a task. For CIOs, CTOs and platform leaders, the harder question is whether that agent can safely connect to live data, modify files, install packages and operate across business systems without exposing the host machine, adjacent workloads or other agents.That is the problem NanoClaw and Docker say they are solving together.A security argument, not just a packaging updateNanoClaw launched as a security-first alternative in the rapidly growing “claw” ecosystem, where agent frameworks promise broad autonomy across local and cloud environments. The project’s core argument has been that many agent systems rely too heavily on software-level guardrails while running too close to the host machine.This Docker integration pushes that argument down into infrastructure.“The partnership with Docker is integrating NanoClaw with Docker Sandboxes,” Cohen said in an interview. “The initial version of NanoClaw used Docker containers for isolating each agent, but Docker Sandboxes is the proper enterprise-ready solution for rolling out agents securely.”That progression matters because the central issue in enterprise agent deployment is isolation. Agents do not behave like traditional applications. They mutate their environments, install dependencies, create files, launch processes and connect to outside systems. That breaks many of the assumptions underlying ordinary container workflows.Cohen framed the issue in direct terms: “You want to unlock the full potential of these highly capable agents, but you don’t want security to be based on trust. You have to have isolated environments and hard boundaries.”That line gets at the broader challenge facing enterprises now experimenting with agents in production-like settings. The more useful agents become, the more access they need. They need tools, memory, external connections and the freedom to take actions on behalf of users and teams. But each gain in capability raises the stakes around containment. A compromised or badly behaving agent cannot be allowed to spill into the host environment, expose credentials or access another agent’s state.Why agents strain conventional infrastructureDocker president and COO Mark Cavage said that reality forced the company to rethink some of the assumptions built into standard developer infrastructure.“Fundamentally, we had to change the isolation and security model to work in the world of agents,” Cavage said. “It feels like normal Docker, but it’s not.”He explained why the old model no longer holds. “Agents break effectively every model we’ve ever known,” Cavage said. “Containers assume immutability, but agents break that on the very first call. The first thing they want to do is install packages, modify files, spin up processes, spin up databases — they want full mutability and a full machine to run in.”That is a useful framing for enterprise technical decision-makers. The promise of agents is not that they behave like static software with a chatbot front end. The promise is that they can perform open-ended work. But open-ended work is exactly what creates new security and governance problems. An agent that can install a package, rewrite a file tree, start a database process or access credentials is more operationally useful than a static assistant. It is also more dangerous if it is running in the wrong environment.Docker’s answer is Docker Sandboxes, which use MicroVM-based isolation while preserving familiar Docker packaging and workflows. According to the companies, NanoClaw can now run inside that infrastructure with a single command, giving teams a more secure execution layer without forcing them to redesign their agent stack from scratch.Cavage put the value proposition plainly: “What that gets you is a much stronger security boundary. When something breaks out — because agents do bad things — it’s truly bounded in something provably secure.”That emphasis on containment rather than trust lines up closely with NanoClaw’s original thesis. In earlier coverage of the project, NanoClaw was positioned as a leaner, more auditable alternative to broader and more permissive frameworks. The argument was not just that it was open source, but that its simplicity made it easier to reason about, secure and customize for production use.Cavage extended that argument beyond any single product. “Security is defense in depth,” he said. “You need every layer of the stack: a secure foundation, a secure framework to run in, and secure things users build on top.”That is likely to resonate with enterprise infrastructure teams that are less interested in model novelty than in blast radius, auditability and layered control. Agents may still rely on the intelligence of frontier models, but what matters operationally is whether the surrounding system can absorb mistakes, misfires or adversarial behavior without turning one compromised process into a wider incident.The enterprise case for many agents, not oneThe NanoClaw-Docker partnership also reflects a broader shift in how vendors are beginning to think about agent deployment at scale. Instead of one central AI system doing everything, the model emerging here is many bounded agents operating across teams, channels and tasks.“What OpenClaw and the claws have shown is how to get tremendous value from coding agents and general-purpose agents that are available today,” Cohen said. “Every team is going to be managing a team of agents.”He pushed that idea further in the interview, sketching a future closer to organizational systems design than to the consumer assistant model that still dominates much of the AI conversation. “In businesses, every employee is going to have their personal assistant agent, but teams will manage a team of agents, and a high-performing team will manage hundreds or thousands of agents,” Cohen said.That is a more useful enterprise lens than the usual consumer framing. In a real organization, agents are likely to be attached to distinct workflows, data stores and communication surfaces. Finance, support, sales engineering, developer productivity and internal operations may all have different automations, different memory and different access rights. A secure multi-agent future depends less on generalized intelligence than on boundaries: who can see what, which process can touch which file system, and what happens when one agent fails or is compromised.NanoClaw’s product design is built around that kind of orchestration. The platform sits on top of Claude Code and adds persistent memory, scheduled tasks, messaging integrations and routing logic so agents can be assigned work across channels such as WhatsApp, Telegram, Slack and Discord. The release says this can all be configured from a phone, without writing custom agent code, while each agent remains isolated inside its own container runtime.Cohen said one practical goal of the Docker integration is to make that deployment model easier to adopt. “People will be able to go to the NanoClaw GitHub, clone the repository, and run a single command,” he said. “That will get their Docker Sandbox set up running NanoClaw.”That ease of setup matters because many enterprise AI deployments still fail at the point where promising demos have to become stable systems. Security features that are too hard to deploy or maintain often end up bypassed. A packaging model that lowers friction without weakening boundaries is more likely to survive internal adoption.An open-source partnership with strategic weightThe partnership is also notable for what it is not. It is not being positioned as an exclusive commercial alliance or a financially engineered enterprise bundle.“There’s no money involved,” Cavage said. “We found this through the foundation developer community. NanoClaw is open source, and Docker has a long history in open source.”That may strengthen the announcement rather than weaken it. In infrastructure, the most credible integrations often emerge because two systems fit technically before they fit commercially. Cohen said the relationship began when a Docker developer advocate got NanoClaw running in Docker Sandboxes and demonstrated that the combination worked.“We were able to put NanoClaw into Docker Sandboxes without making any architecture changes to NanoClaw,” Cohen said. “It just works, because we had a vision of how agents should be deployed and isolated, and Docker was thinking about the same security concerns and arrived at the same design.”For enterprise buyers, that origin story signals that the integration was not forced into existence by a go-to-market arrangement. It suggests genuine architectural compatibility.Docker is also careful not to cast NanoClaw as the only framework it will support. Cavage said the company plans to work broadly across the ecosystem, even as NanoClaw appears to be the first “claw” included in Docker’s official packaging. The implication is that Docker sees a wider market opportunity around secure agent runtime infrastructure, while NanoClaw gains a more recognizable enterprise foundation for its security posture.The bigger story: infrastructure catching up to agentsThe deeper significance of this announcement is that it shifts attention from model capability to runtime design. That may be where the real enterprise competition is heading.The AI industry has spent the last two years proving that models can reason, code and orchestrate tasks with growing sophistication. The next phase is proving that these systems can be deployed in ways security teams, infrastructure leaders and compliance owners can live with.NanoClaw has argued from the start that agent security cannot be bolted on at the application layer. Docker is now making a parallel argument from the runtime side. “The world is going to need a different set of infrastructure to catch up to what agents and AI demand,” Cavage said. “They’re clearly going to get more and more autonomous.”That could turn out to be the central story here. Enterprises do not just need more capable agents. They need better boxes to put them in.For organizations experimenting with AI agents today, the NanoClaw-Docker integration offers a concrete picture of what that box might look like: open-source orchestration on top, MicroVM-backed isolation underneath, and a deployment model designed around containment rather than trust.In that sense, this is more than a product integration. It is an early blueprint for how enterprise agent infrastructure may evolve: less emphasis on unconstrained autonomy, more emphasis on bounded autonomy that can survive contact with real production systems.
Venture Beat
Y Combinator-backed Random Labs launches Slate V1, claiming the first ‘swarm-native’ coding agent
The software engineering world is currently wrestling with a fundamental paradox of the AI era: as models become more capable, the “systems problem” of managing them has become the primary bottleneck to real-world productivity. While a developer might have access to the raw intelligence of a frontier model, that intelligence often degrades the moment a task requires a long horizon or a deep context window. But help appears to be on the way: San Francisco-based, Y Combinator-backed startup Random Labs has officially launched Slate V1, described as the industry’s first “swarm native” autonomous coding agent designed to execute massively parallel, complex engineering tasks.Emerging from an open beta, the tool utilizes a “dynamic pruning algorithm” to maintain context in large codebases while scaling output to enterprise complexity. Co-founded by Kiran and Mihir Chintawar in 2024, the company aims to bridge the global engineering shortage by positioning Slate as a collaborative tool for the “next 20 million engineers” rather than a replacement for human developers.With the release of Slate V1, the team at Random Labs is attempting to architect a way out of this zone by introducing the first “swarm-native” agentic coding environment. Slate is not merely a wrapper or a chatbot with file access; it is an implementation of a “hive mind” philosophy designed to scale agentic work with the complexity of a human organization. By leveraging a novel architectural primitive called Thread Weaving, Slate moves beyond the rigid task trees and lossy compaction methods that have defined the first generation of AI coding assistants.Strategy: Action spaceAt the heart of Slate’s effectiveness is a deep engagement with Recursive Language Models (RLM). In a traditional setup, an agent might be asked to “fix a bug,” a prompt that forces the model to juggle high-level strategy and low-level execution simultaneously. Random Labs identifies this as a failure to tap into “Knowledge Overhang”—the latent intelligence a model possesses but cannot effectively access when it is tactically overwhelmed.Slate solves this by using a central orchestration thread that essentially “programs in action space”. This orchestrator doesn’t write the code directly; instead, it uses a TypeScript-based DSL to dispatch parallel worker threads to handle specific, bounded tasks. This creates a clear separation between the “kernel”—which manages the execution graph and maintains strategic alignment—and the worker “processes” that execute tactical operations in the terminal. By mapping onto an OS-style framework, inspired by Andrej Karpathy’s “LLM OS” concept, Slate is able to treat the limited context window of a model as precious RAM, actively, intelligently managing what is retained and what is discarded.Episodic memory and the swarmThe true innovation of the “Thread Weaving” approach lies in how it handles memory. Most agents today rely on “compaction,” which is often just a fancy term for lossy compression that risks dropping critical project state. Slate instead generates “episodes”. When a worker thread completes a task, it doesn’t return a sprawling transcript of every failed attempt; it returns a compressed summary of the successful tool calls and conclusions.Because these episodes share context directly with the orchestrator rather than relying on brittle message passing, the system maintains a “swarm” intelligence. This architecture allows for massive parallelism. A developer can have Claude Sonnet orchestrating a complex refactor while GPT-5.4 executes code, and GLM 5—a favorite for its agentic search capabilities—simultaneously researches library documentation in the background. It’s a similar approach taken by Perplexity with its new Computer multi-model agent By selecting the “right model for the job,” Slate ensures that users aren’t overspending on intelligence for simple tactical steps while still benefiting from the strategic depth of the world’s most powerful models.The business of autonomyFrom a commercial perspective, Random Labs is navigating the early beta period with a mix of transparency and strategic ambiguity. While the company has not yet published a fixed-price subscription sheet, the Slate CLI documentation confirms a shift toward a usage-based credit model. Commands like /usage and /billing allow users to monitor their credit burn in real-time, and the inclusion of organization-level billing toggles suggests a clear focus on professional engineering teams rather than solo hobbyists.There is also a significant play toward integration. Random Labs recently announced that direct support for OpenAI’s Codex and Anthropic’s Claude Code is slated for release next week. This suggests that Slate isn’t trying to compete with these models’ native interfaces, but rather to act as the superior orchestration layer that allows engineers to use all of them at once, safely and cost-effectively.I’ve reached out to Architecturally, the system is designed to maximize caching through subthread reuse, a “novel context engineering” trick that the team claims keeps the swarm approach from becoming a financial burden for users.Stability AIPerhaps the most compelling argument for the Slate architecture is its stability. In internal testing, an early version of this threading system managed to pass 2/3 of the tests on the make-mips-interpreter task within the Terminal Bench 2.0 suite.This is a task where even the newest frontier models, like Opus 4.6, often succeed less than 20% of the time when used in standard, non-orchestrated harnesses.This success in a “mutated” or changing environment is what separates a tool from a partner. According to Random Labs’ documentation, one fintech founder in NYC described Slate as their “best debugging tool,” a sentiment that echoes the broader goal of Random Labs: to build agents that don’t just complete a prompt, but scale like an organization. As the industry moves past simple “chat with your code” interfaces, the “Thread Weaving” of Slate V1 offers a glimpse into a future where the primary role of the human engineer is to direct a hive mind of specialized models, each working in concert to solve the long-horizon problems of modern software.
Agents need vector search more than RAG ever did
What’s the role of vector databases in the agentic AI world? That’s a question that organizations have been coming to terms with in recent months.
The narrative had real momentum. As large language models scaled to million-token context windows, a credible argument circulated among enterprise architects: purpose-built vector search was a stopgap, not infrastructure. Agentic memory would absorb the retrieval problem. Vector databases were a RAG-era artifact.The production evidence is running the other way.Qdrant, the Berlin-based open source vector search company, announced a $50 million Series B on Thursday, two years after a $28 million Series A. The timing is not incidental. The company is also shipping version 1.17 of its platform. Together, they reflect a specific argument: The retrieval problem did not shrink when agents arrived. It scaled up and got harder.”Humans make a few queries every few minutes,” Andre Zayarni, Qdrant’s CEO and co-founder, told VentureBeat. “Agents make hundreds or even thousands of queries per second, just gathering information to be able to make decisions.”That shift changes the infrastructure requirements in ways that RAG-era deployments were never designed to handle.Why agents need a retrieval layer that memory can’t replaceAgents operate on information they were never trained on: proprietary enterprise data, current information, millions of documents that change continuously. Context windows manage session state. They don’t provide high-recall search across that data, maintain retrieval quality as it changes, or sustain the query volumes autonomous decision-making generates.”The majority of AI memory frameworks out there are using some kind of vector storage,” Zayarni said. The implication is direct: even the tools positioned as memory alternatives rely on retrieval infrastructure underneath.Three failure modes surface when that retrieval layer isn’t purpose-built for the load. At document scale, a missed result is not a latency problem — it is a quality-of-decision problem that compounds across every retrieval pass in a single agent turn. Under write load, relevance degrades because newly ingested data sits in unoptimized segments before indexing catches up, making searches over the freshest data slower and less accurate precisely when current information matters most. Across distributed infrastructure, a single slow replica pushes latency across every parallel tool call in an agent turn — a delay a human user absorbs as inconvenience but an autonomous agent cannot.Qdrant’s 1.17 release addresses each directly. A relevance feedback query improves recall by adjusting similarity scoring on the next retrieval pass using lightweight model-generated signals, without retraining the embedding model. A delayed fan-out feature queries a second replica when the first exceeds a configurable latency threshold. A new cluster-wide telemetry API replaces node-by-node troubleshooting with a single view across the entire cluster.Why Qdrant doesn’t want to be called a vector database anymoreNearly every major database now supports vectors as a data type — from hyperscalers to traditional relational systems. That shift has changed the competitive question. The data type is now table stakes. What remains specialized is retrieval quality at production scale.That distinction is why Zayarni no longer wants Qdrant called a vector database.”We’re building an information retrieval layer for the AI age,” he said. “Databases are for storing user data. If the quality of search results matters, you need a search engine.”His advice for teams starting out: use whatever vector support is already in your stack. The teams that migrate to purpose-built retrieval do so when scale forces the issue.
“We see companies come to us every day saying they started with Postgres and thought it was good enough — and it’s not.”Qdrant’s architecture, written in Rust, gives it memory efficiency and low-level performance control that higher-level languages don’t match at the same cost. The open source foundation compounds that advantage — community feedback and developer adoption are what allow a company at Qdrant’s scale to compete with vendors that have far larger engineering resources.
“Without it, we wouldn’t be where we are right now at all,” Zayarni said.How two production teams found the limits of general-purpose databasesThe companies building production AI systems on Qdrant are making the same argument from different directions: agents need a retrieval layer, and conversational or contextual memory is not a substitute for it.GlassDollar helps enterprises including Siemens and Mahle evaluate startups. Search is the core product: a user describes a need in natural language and gets back a ranked shortlist from a corpus of millions of companies. The architecture runs query expansion on every request – a single prompt fans out into multiple parallel queries, each retrieving candidates from a different angle, before results are combined and re-ranked. That is an agentic retrieval pattern, not a RAG pattern, and it requires purpose-built search infrastructure to sustain it at volume.The company migrated from Elasticsearch as it scaled toward 10 million indexed documents. After moving to Qdrant it cut infrastructure costs by roughly 40%, dropped a keyword-based compensation layer it had maintained to offset Elasticsearch’s relevance gaps, and saw a 3x increase in user engagement.”We measure success by recall,” Kamen Kanev, GlassDollar’s head of product, told VentureBeat. “If the best companies aren’t in the results, nothing else matters. The user loses trust.” Agentic memory and extended context windows aren’t enough to absorb the workload that GlassDollar needs, either. “That’s an infrastructure problem, not a conversation state management task,” Kanev said. “It’s not something you solve by extending a context window.”Another Qdrant user is &AI, which is building infrastructure for patent litigation. Its AI agent, Andy, runs semantic search across hundreds of millions of documents spanning decades and multiple jurisdictions. Patent attorneys will not act on AI-generated legal text, which means every result the agent surfaces has to be grounded in a real document.”Our whole architecture is designed to minimize hallucination risk by making retrieval the core primitive, not generation,” Herbie Turner, &AI’s founder and CTO, told VentureBeat. For &AI, the agent layer and the retrieval layer are distinct by design. “Andy, our patent agent, is built on top of Qdrant,” Turner said. “The agent is the interface. The vector database is the ground truth.”Three signals it’s time to move off your current setupThe practical starting point: use whatever vector capability is already in your stack. The evaluation question isn’t whether to add vector search — it’s when your current setup stops being adequate. Three signals mark that point: retrieval quality is directly tied to business outcomes; query patterns involve expansion, multi-stage re-ranking, or parallel tool calls; or data volume crosses into the tens of millions of documents.At that point the evaluation shifts to operational questions: how much visibility does your current setup give you into what’s happening across a distributed cluster, and how much performance headroom does it have when agent query volumes increase.”There’s a lot of noise right now about what replaces the retrieval layer,” Kanev said. “But for anyone building a product where retrieval quality is the product, where missing a result has real business consequences, you need dedicated search infrastructure.”
The team behind continuous batching says your idle GPUs should be running inference, not sitting dark
Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.The obvious workaround is spot GPU markets — renting spare capacity to whoever needs it. But spot instances mean the cloud vendor is still the one doing the renting, and engineers buying that capacity are still paying for raw compute with no inference stack attached. FriendliAI’s answer is different: run inference directly on the unused hardware, optimize for token throughput, and split the revenue with the operator. FriendliAI was founded by Byung-Gon Chun, the researcher whose paper on continuous batching became foundational to vLLM, the open source inference engine used across most production deployments today.Chun spent over a decade as a professor at Seoul National University studying efficient execution of machine learning models at scale. That research produced a paper called Orca, which introduced continuous batching. The technique processes inference requests dynamically rather than waiting to fill a fixed batch before executing. It is now industry standard and is the core mechanism inside vLLM.This week, FriendliAI is launching a new platform called InferenceSense. Just as publishers use Google AdSense to monetize unsold ad inventory, neocloud operators can use InferenceSense to fill unused GPU cycles with paid AI inference workloads and collect a share of the token revenue. The operator’s own jobs always take priority — the moment a scheduler reclaims a GPU, InferenceSense yields.”What we are providing is that instead of letting GPUs be idle, by running inferences they can monetize those idle GPUs,” Chun told VentureBeat.How a Seoul National University lab built the engine inside vLLMChun founded FriendliAI in 2021, before most of the industry had shifted attention from training to inference. The company’s primary product is a dedicated inference endpoint service for AI startups and enterprises running open-weight models. FriendliAI also appears as a deployment option on Hugging Face alongside Azure, AWS and GCP, and currently supports more than 500,000 open-weight models from the platform.InferenceSense now extends that inference engine to the capacity problem GPU operators face between workloads.How it worksInferenceSense runs on top of Kubernetes, which most neocloud operators are already using for resource orchestration. An operator allocates a pool of GPUs to a Kubernetes cluster managed by FriendliAI — declaring which nodes are available and under what conditions they can be reclaimed. Idle detection runs through Kubernetes itself.”We have our own orchestrator that runs on the GPUs of these neocloud — or just cloud — vendors,” Chun said. “We definitely take advantage of Kubernetes, but the software running on top is a really highly optimized inference stack.”When GPUs are unused, InferenceSense spins up isolated containers serving paid inference workloads on open-weight models including DeepSeek, Qwen, Kimi, GLM and MiniMax. When the operator’s scheduler needs hardware back, the inference workloads are preempted and GPUs are returned. FriendliAI says the handoff happens within seconds.Demand is aggregated through FriendliAI’s direct clients and through inference aggregators like OpenRouter. The operator supplies the capacity; FriendliAI handles the demand pipeline, model optimization and serving stack. There are no upfront fees and no minimum commitments. A real-time dashboard shows operators which models are running, tokens being processed and revenue accrued.Why token throughput beats raw capacity rentalSpot GPU markets from providers like CoreWeave, Lambda Labs and RunPod involve the cloud vendor renting out its own hardware to a third party. InferenceSense runs on hardware the neocloud operator already owns, with the operator defining which nodes participate and setting scheduling agreements with FriendliAI in advance. The distinction matters: spot markets monetize capacity, InferenceSense monetizes tokens.Token throughput per GPU-hour determines how much InferenceSense can actually earn during unused windows. FriendliAI claims its engine delivers two to three times the throughput of a standard vLLM deployment, though Chun notes the figure varies by workload type.
Most competing inference stacks are built on Python-based open source frameworks. FriendliAI’s engine is written in C++ and uses custom GPU kernels rather than Nvidia’s cuDNN library. The company has built its own model representation layer for partitioning and executing models across hardware, with its own implementations of speculative decoding, quantization and KV-cache management.Since FriendliAI’s engine processes more tokens per GPU-hour than a standard vLLM stack, operators should generate more revenue per unused cycle than they could by standing up their own inference service. What AI engineers evaluating inference costs should watchFor AI engineers evaluating where to run inference workloads, the neocloud versus hyperscaler decision has typically come down to price and availability.InferenceSense adds a new consideration: if neoclouds can monetize idle capacity through inference, they have more economic incentive to keep token prices competitive.That is not a reason to change infrastructure decisions today — it is still early. But engineers tracking total inference cost should watch whether neocloud adoption of platforms like InferenceSense puts downward pressure on API pricing for models like DeepSeek and Qwen over the next 12 months.
“When we have more efficient suppliers, the overall cost will go down,” Chun said. “With InferenceSense we can contribute to making those models cheaper.”
Nvidia’s new open weights Nemotron 3 super combines three different architectures to beat gpt-oss and Qwen in throughput
Multi-agent systems, designed to handle long-horizon tasks like software engineering or cybersecurity triaging, can generate up to 15 times the token volume of standard chats — threatening their cost-effectiveness in handling enterprise tasks. But today, Nvidia sought to help solve this problem with the release of Nemotron 3 Super, a 120-billion-parameter hybrid model, with weights posted on Hugging Face.By merging disparate architectural philosophies—state-space models, transformers, and a novel “Latent” mixture-of-experts design—Nvidia is attempting to provide the specialized depth required for agentic workflows without the bloat typical of dense reasoning models, and all available for commercial usage under mostly open weights.Triple hybrid architecture At the core of Nemotron 3 Super is a sophisticated architectural triad that balances memory efficiency with precision reasoning. The model utilizes a Hybrid Mamba-Transformer backbone, which interleaves Mamba-2 layers with strategic Transformer attention layers.To understand the implications for enterprise production, consider the “needle in a haystack” problem. Mamba-2 layers act like a “fast-travel” highway system, handling the vast majority of sequence processing with linear-time complexity. This allows the model to maintain a massive 1-million-token context window without the memory footprint of the KV cache exploding. However, pure state-space models often struggle with associative recall. To fix this, Nvidia strategically inserts Transformer attention layers as “global anchors,” ensuring the model can precisely retrieve specific facts buried deep within a codebase or a stack of financial reports.Beyond the backbone, the model introduces Latent Mixture-of-Experts (LatentMoE). Traditional Mixture-of-Experts (MoE) designs route tokens to experts in their full hidden dimension, which creates a computational bottleneck as models scale. LatentMoE solves this by projecting tokens into a compressed space before routing them to specialists. This “expert compression” allows the model to consult four times as many specialists for the exact same computational cost. This granularity is vital for agents that must switch between Python syntax, SQL logic, and conversational reasoning within a single turn.Further accelerating the model is Multi-Token Prediction (MTP). While standard models predict a single next token, MTP predicts several future tokens simultaneously. This serves as a “built-in draft model,” enabling native speculative decoding that can deliver up to 3x wall-clock speedups for structured generation tasks like code or tool calls.The Blackwell advantageFor enterprises, the most significant technical leap in Nemotron 3 Super is its optimization for the Nvidia Blackwell GPU platform. By pre-training natively in NVFP4 (4-bit floating point), Nvidia has achieved a breakthrough in production efficiency. On Blackwell, the model delivers 4x faster inference than 8-bit models running on the previous Hopper architecture, with no loss in accuracy.In practical performance, Nemotron 3 Super is a specialized tool for agentic reasoning. It currently holds the No. 1 position on the DeepResearch Bench, a benchmark measuring an AI’s ability to conduct thorough, multi-step research across large document sets. BenchmarkNemotron 3 SuperQwen3.5-122B-A10BGPT-OSS-120BGeneral KnowledgeMMLU-Pro83.7386.7081.00ReasoningAIME25 (no tools)90.2190.3692.50HMMT Feb25 (no tools)93.6791.4090.00HMMT Feb25 (with tools)94.7389.55—GPQA (no tools)79.2386.6080.10GPQA (with tools)82.70—80.09LiveCodeBench (v5 2024-07↔2024-12)81.1978.9388.00SciCode (subtask)42.0542.0039.00HLE (no tools)18.2625.3014.90HLE (with tools)22.82—19.0AgenticTerminal Bench (hard subset)25.7826.8024.00Terminal Bench Core 2.031.0037.5018.70SWE-Bench (OpenHands)60.4766.4041.9SWE-Bench (OpenCode)59.2067.40—SWE-Bench (Codex)53.7361.20—SWE-Bench Multilingual (OpenHands)45.78—30.80TauBench V2Airline56.2566.049.2Retail62.8362.667.80Telecom64.3695.0066.00Average61.1574.5361.0BrowseComp with Search31.28—33.89BIRD Bench41.80—38.25Chat & Instruction FollowingIFBench (prompt)72.5673.7768.32Scale AI Multi-Challenge55.2361.5058.29Arena-Hard-V273.8875.1590.26Long ContextAA-LCR58.3166.9051.00RULER @ 256k96.3096.7452.30RULER @ 512k95.6795.9546.70RULER @ 1M91.7591.3322.30MultilingualMMLU-ProX (avg over langs)79.3685.0676.59WMT24++ (en→xx)86.6787.8488.89It also demonstrates significant throughput advantages, achieving up to 2.2x higher throughput than gpt-oss-120B and 7.5x higher than Qwen3.5-122B in high-volume settings.Custom ‘open’ license — commercial usage but with important caveats The release of Nemotron 3 Super under the Nvidia Open Model License Agreement (updated October 2025) provides a permissive framework for enterprise adoption, though it carries distinct “safeguard” clauses that differentiate it from pure open-source licenses like MIT or Apache 2.0.Key Provisions for Enterprise Users:Commercial Usability: The license explicitly states that models are “commercially usable” and grants a perpetual, worldwide, royalty-free license to sell and distribute products built on the model.Ownership of Output: Nvidia makes no claim to the outputs generated by the model; the responsibility for those outputs—and the ownership of them—rests entirely with the user.Derivative Works: Enterprises are free to create and own “Derivative Models” (fine-tuned versions), provided they include the required attribution notice: “Licensed by Nvidia Corporation under the Nvidia Open Model License.”The “Red Lines”:The license includes two critical termination triggers that production teams must monitor:Safety Guardrails: The license automatically terminates if a user bypasses or circumvents the model’s “Guardrails” (technical limitations or safety hyperparameters) without implementing a “substantially similar” replacement appropriate for the use case.Litigation Trigger: If a user institutes copyright or patent litigation against Nvidia alleging that the model infringes on their IP, their license to use the model terminates immediately.This structure allows Nvidia to foster a commercial ecosystem while protecting itself from “IP trolling” and ensuring that the model isn’t stripped of its safety features for malicious use.‘The team really cooked’The release has generated significant buzz within the developer community. Chris Alexiuk, a Senior Product Research Enginner at Nvidia, heralded the launch on X under his handle @llm_wizard as a “SUPER DAY,” emphasizing the model’s speed and transparency. “Model is: FAST. Model is: SMART. Model is: THE MOST OPEN MODEL WE’VE DONE YET,” Chris posted, highlighting the release of not just weights, but 10 trillion tokens of training data and recipes.The industry adoption reflects this enthusiasm:Cloud and Hardware: The model is being deployed as an Nvidia NIM microservice, allowing it to run on-premises via the Dell AI Factory or HPE, as well as across Google Cloud, Oracle, and shortly, AWS and Azure.Production Agents: Companies like CodeRabbit (software development) and Greptile are integrating the model to handle large-scale codebase analysis, while industrial leaders like Siemens and Palantir are deploying it to automate complex workflows in manufacturing and cybersecurity.As Kari Briski, Nvidia VP of AI Software, noted: “As companies move beyond chatbots and into multi-agent applications, they encounter… context explosion.” Nemotron 3 Super is Nvidia’s answer to that explosion—a model that provides the “brainpower” of a 120B parameter system with the operational efficiency of a much smaller specialist. For the enterprise, the message is clear: the “thinking tax” is finally coming down.
Anthropic gives Claude shared context across Microsoft Excel and PowerPoint, enabling reusable workflows in multiple applications
Anthropic has upgraded its Claude AI model with new capabilities for Microsoft Excel and PowerPoint, marking a strategic move to expand its enterprise footprint and potentially challenging Microsoft’s newly launched Copilot Cowork — which Claude also partially powers. The updated add-ins are available to Mac and Windows users on paid Claude plans starting today, March 11. Anthropic is also expanding how enterprises can deploy the tools. Claude for Excel and Claude for PowerPoint can now be accessed either through a Claude account or through an existing LLM gateway routing to Claude models on Amazon Bedrock, Google Cloud Vertex AI or Microsoft Foundry. That gives enterprises more flexibility to use the add-ins within cloud and compliance setups they may already have in place.Shared context across Office appsStarting March 11, paid Claude users on Mac and Windows can access a new beta experience in which Claude for Excel and Claude for PowerPoint share the full context of a user’s conversation with the AI model between the two applications — no need for manually copying and pasting it over.That means Claude can carry information, instructions and task history between an open spreadsheet and an open presentation in a single continuous session. For example, Claude can write formulas to extract data from an Excel workbook and immediately apply it to a stylized PowerPoint slide in the same session.“In practice: a financial analyst can ask Claude to pull comparable company financials from an open workbook, build out a trading comps table in Excel, drop the valuation summary into the pitch deck, and draft the email to the MD—without switching tabs or re-explaining the dataset at each step,” Anthropic said in a press release. This builds on Anthropic’s release of a Claude plugin for Excel back in October 2025.Repeatable workflows inside applications A central feature of this launch is Skills, which allows teams to build and save repeatable workflows directly inside the Excel and PowerPoint sidebars. Rather than re-uploading references or re-prompting instructions, users can save standardized processes—such as specific variance analyses or approved slide templates—as one-click actions available to the entire organization.That could include workflows for recurring financial analysis, preparing presentations in a preferred house style or running common review steps that would otherwise need to be rewritten as prompts each time. Anthropic said every Skill, whether personal or organization-wide, will work inside the add-ins the same way MCP connectors do. “Workflows that previously lived in one person’s head become one-click actions available to the whole organization,” the company said. Anthropic distinguishes these Skills from Instructions, which let users set persistent preferences across the add-ins, such as preferred number formatting in Excel or presentation-writing rules in PowerPoint.Anthropic is also shipping a preloaded starter set of Skills, including:Excel: Auditing models for formula errors, populating DCF and LBO templates, and cleaning messy data ranges.PowerPoint: Building competitive landscape decks and reviewing investment banking materials for narrative alignment.Similarly, Microsoft’s new Copilot Cowork capability introduced on Monday enables enterprise users to deploy agents to complete tasks across Microsoft applications such as Excel and PowerPoint. The software giant openly stated it was built in conjunction with Anthropic, which also released its own stand-alone Claude Cowork application for Mac and Windows earlier this year offering a way for Claude to access, edit, create and move information between files on a user’s computer, autonomously, at the user’s direction. Previously, even with autonomous tools like the standalone Claude Cowork app, users often had to ask the AI to complete tasks in separate steps for each application. Now, Claude maintains a continuous session that reads live data and writes formulas across both apps simultaneously.Battle of the enterprise app agentsEver since the launch of Claude Cowork earlier this year, Anthropic has been making a case to be the chat and productivity platform of choice for enterprises. Competitors like Google, with its close association with Google Workspace, which includes Gmail and Google Docs, and Microsoft, with its continued leadership in the Office suite, can directly bring AI capabilities to users’ workflows. Anthropic did not present the new Skills feature as equivalent to the more autonomous, agentic behavior Microsoft is now emphasizing with its own Copilot Cowork. But the release does show Anthropic steadily expanding beyond chatbot use cases and into more structured, repeatable work inside the applications many business users already rely on.Anthropic, through Claude Cowork, Claude Code and the Claude model family, has seeped into many organizations’ systems, using its high performance in coding benchmarks and general knowledge to navigate a computer better and complete knowledge work rapidly, at scale, with high quality. OpenClaw, the open source AI agent that has taken the developer world by storm, owes much of its existence to Claude Code. The result is another sign that the battle over enterprise AI is no longer just about which model performs best on benchmarks. It is increasingly about what AI tools and systems enterprises trust to get real work done across their existing applications, files, and workflows.
Google’s Gemini Embedding 2 arrives with native multimodal support to cut costs and speed up your enterprise data stack
Yesterday amid a flurry of enterprise AI product updates, Google announced arguably its most significant one for enterprise customers: the public preview availability of Gemini Embedding 2, its new embeddings model — a significant evolution in how machines represent and retrieve information across different media types. While previous embedding models were largely restricted to text, this new model natively integrates text, images, video, audio, and documents into a single numerical space — reducing latency by as much as 70% for some customers and reducing total cost for enterprises who use AI models powered by their own data to complete business tasks.Who needs and uses an embedding model?For those who have encountered the term “embeddings” in AI discussions but find it abstract, a useful analogy is that of a universal library. In a traditional library, books are organized by metadata: author, title, or genre. In the “embedding space” of an AI, information is organized by ideas.Imagine a library where books aren’t organized by the Dewey Decimal System, but by their “vibe” or “essence”. In this library, a biography of Steve Jobs would physically fly across the room to sit next to a technical manual for a Macintosh. A poem about a sunset would drift toward a photography book of the Pacific Coast, with all thematically similar content organized in beautiful hovering “clouds” of books. This is basically what an embedding model does.An embedding model takes complex data—like a sentence, a photo of a sunset, or a snippet of a podcast—and converts it into a long list of numbers called a vector. These numbers represent coordinates in a high-dimensional map. If two items are “semantically” similar (e.g., a photo of a golden retriever and the text “man’s best friend”), the model places their coordinates very close to each other in this map. Today, these models are the invisible engine behind:Search Engines: Finding results based on what you mean, not just the specific words you typed.Recommendation Systems: Netflix or Spotify suggesting content because its “coordinates” are near things you already like.Enterprise AI: Large companies use them for Retrieval-Augmented Generation (RAG), where an AI assistant “looks up” a company’s internal PDFs to answer an employee’s question accurately.The concept of mapping words to vectors dates back to the 1950s with linguists like John Rupert Firth, but the modern “vector revolution” began in the early 2000s when Yoshua Bengio’s team first used the term “word embeddings”. The real breakthrough for the industry was Word2Vec, released by a team at Google led by Tomas Mikolov in 2013. Today, the market is led by a handful of major players:OpenAI: Known for its widely-used text-embedding-3 series.Google: With the new Gemini and previous Gecko models.Anthropic and Cohere: Providing specialized models for enterprise search and developer workflows.By moving beyond text to a natively multimodal architecture, Google is attempting to create a singular, unified map for the sum of human digital expression—text, images, video, audio, and documents—all residing in the same mathematical neighborhood.Why Gemini Embedding 2 is such a big dealMost leading models are still “text-first.” If you want to search a video library, the AI usually has to transcribe the video into text first, then embed that text. Google’s Gemini Embedding 2 is natively multimodal. As Logan Kilpatrick of Google DeepMind posted on X, the model allows developers to “bring text, images, video, audio, and docs into the same embedding space”.It understands audio as sound waves and video as motion directly, without needing to turn them into text first. This reduces “translation” errors and captures nuances that text alone might miss.For developers and enterprises, the “natively multimodal” nature of Gemini Embedding 2 represents a shift toward more efficient AI pipelines. By mapping all media into a single 3,072-dimensional space, developers no longer need separate systems for image search and text search; they can perform “cross-modal” retrieval—using a text query to find a specific moment in a video or an image that matches a specific sound.And unlike its predecessors, Gemini Embedding 2 can process requests that mix modalities. A developer can send a request containing both an image of a vintage car and the text “What is the engine type?”. The model doesn’t process them separately; it treats them as a single, nuanced concept. This allows for a much deeper understanding of real-world data where the “meaning” is often found in the intersection of what we see and what we say.One of the model’s more technical features is Matryoshka Representation Learning. Named after Russian nesting dolls, this technique allows the model to “nest” the most important information in the first few numbers of the vector.An enterprise can choose to use the full 3072 dimensions for maximum precision, or “truncate” them down to 768 or 1536 dimensions to save on database storage costs with minimal loss in accuracy.Benchmarking the performance gains of moving to multimodalGemini Embedding 2 establishes a new performance ceiling for multimodal depth, specifically outperforming previous industry leaders across text, image, and video evaluation tasks. The model’s most significant lead is found in video and audio retrieval, where its native architecture allows it to bypass the performance degradation typically associated with text-based transcription pipelines. Specifically, in video-to-text and text-to-video retrieval tasks, the model demonstrates a measurable performance gap over existing industry leaders, accurately mapping motion and temporal data into a unified semantic space.The technical results show a distinct advantage in the following standardized categories:Multimodal Retrieval: Gemini Embedding 2 consistently outperforms leading text and vision models in complex retrieval tasks that require understanding the relationship between visual elements and textual queries.Speech and Audio Depth: The model introduces a new standard for native audio embeddings, achieving higher accuracy in capturing phonetic and tonal intent compared to models that rely on intermediate text-transcription.Contextual Scaling: In text-based benchmarks, the model maintains high precision while utilizing its expansive 8,192 token context window, ensuring that long-form documents are embedded with the same semantic density as shorter snippets.Dimension Flexibility: Testing across the Matryoshka Representation Learning (MRL) layers reveals that even when truncated to 768 dimensions, the model retains a significant majority of its 3,072-dimension performance, outperforming fixed-dimension models of similar size.What it means for enterprise databasesFor the modern enterprise, information is often a fragmented mess. A single customer issue might involve a recorded support call (audio), a screenshot of an error (image), a PDF of a contract (document), and a series of emails (text).In previous years, searching across these formats required four different pipelines. With Gemini Embedding 2, an enterprise can create a Unified Knowledge Base. This enables a more advanced form of RAG, wherein a company’s internal AI doesn’t just look up facts, but understands the relationship between them regardless of format.Early partners are already reporting drastic efficiency gains:Sparkonomy, a creator economy platform, reported that the model’s native multimodality slashed their latency by up to 70%. By removing the need for intermediate LLM “inference” (the step where one model explains a video to another), they nearly doubled their semantic similarity scores for matching creators with brands.Everlaw, a legal tech firm, is using the model to navigate the “high-stakes setting” of litigation discovery. In legal cases where millions of records must be parsed, Gemini’s ability to index images and videos alongside text allows legal professionals to find “smoking gun” evidence that traditional text-search would miss.Understanding the limitsIn its announcement, Google was upfront about some of the current limitations of Gemini Embedding 2. The new model can accommodate vectorization of individual files that comprise of as many as 8,192 text tokens, 6 images (in as single batch), 128 seconds of video (2 minutes, 8 seconds long), 80 seconds of native audio (1.34 minutes), and a 6-page PDF.It is vital to clarify that these are input limits per request, not a cap on what the system can remember or store.Think of it like a scanner. If a scanner has a limit of “one page at a time,” it doesn’t mean you can only ever scan one page. it means you have to feed the pages in one by one.Individual File Size: You cannot “embed” a 100-page PDF in a single call. You must “chunk” the document—splitting it into segments of 6 pages or fewer—and send each segment to the model individually.Cumulative Knowledge: Once those chunks are converted into vectors, they can all live together in your database. You can have a database containing ten million 6-page PDFs, and the model will be able to search across all of them simultaneously.Video and Audio: Similarly, if you have a 10-minute video, you would break it into 128-second segments to create a searchable “timeline” of embeddings.Licensing, pricing, and availabilityAs of March 10, 2026, Gemini Embedding 2 is officially in Public Preview. For developers and enterprise leaders, this means the model is accessible for immediate testing and production integration, though it is still subject to the iterative refinements typical of “preview” software before it reaches General Availability (GA).The model is deployed across Google’s two primary AI gateways, each catering to a different scale of operation:Gemini API: Targeted at rapid prototyping and individual developers, this path offers a simplified pricing structure.Vertex AI (Google Cloud): The enterprise-grade environment designed for massive scale, offering advanced security controls and integration with the broader Google Cloud ecosystem.It’s also already integrated with the heavy hitters of AI infrastructure: LangChain, LlamaIndex, Haystack, Weaviate, Qdrant, and ChromaDB.In the Gemini API, Google has introduced a tiered pricing model that distinguishes between “standard” data (text, images, and video) and “native” audio.The Free Tier: Developers can experiment with the model at no cost, though this tier comes with rate limits (typically 60 requests per minute) and uses data to improve Google’s products.The Paid Tier: For production-level volume, the cost is calculated per million tokens. For text, image, and video inputs, the rate is $0.25 per 1 million tokens.The “Audio Premium”: Because the model natively ingests audio data without intermediate transcription—a more computationally intensive task—the rate for audio inputs is doubled to $0.50 per 1 million tokens.For large-scale deployments on Vertex AI, the pricing follows an enterprise-centric “Pay-as-you-go” (PayGo) model. This allows organizations to pay for exactly what they use across different processing modes:Flex PayGo: Best for unpredictable, bursty workloads.Provisioned Throughput: Designed for enterprises that require guaranteed capacity and consistent latency for high-traffic applications.Batch Prediction: Ideal for re-indexing massive historical archives, where time-sensitivity is lower but volume is extremely high.By making the model available through these diverse channels and integrating it natively with libraries like LangChain, LlamaIndex, and Weaviate, Google has ensured that the “switching cost” for businesses isn’t just a matter of price, but of operational ease. Whether a startup is building its first RAG-based assistant or a multinational is unifying decades of disparate media archives, the infrastructure is now live and globally accessible.In addition, the official Gemini API and Vertex AI Colab notebooks, which contain the Python code necessary to implement these features, are licensed under the Apache License, Version 2.0.The Apache 2.0 license is highly regarded in the tech community because it is “permissive.” It allows developers to take Google’s implementation code, modify it, and use it in their own commercial products without having to pay royalties or “open source” their own proprietary code in return.How enterprises should respond: migrate to Gemini 2 Embedding or not?For Chief Data Officers and technical leads, the decision to migrate to Gemini Embedding 2 hinges on the transition from a “text-plus” strategy to a “natively multimodal” one. If your organization currently relies on fragmented pipelines — where images and videos are first transcribed or tagged by separate models before being indexed — the upgrade is likely a strategic necessity. This model eliminates the “translation tax” of using intermediate LLMs to describe visual or auditory data, a move that partners like Sparkonomy found reduced latency by up to 70% while doubling semantic similarity scores. For businesses managing massive, diverse datasets, this isn’t just a performance boost; it is a structural simplification that reduces the number of points where “meaning” can be lost or distorted.The effort to switch from a text-only foundation is lower than one might expect due to what early users describe as excellent “API continuity”. Because the model integrates with industry-standard frameworks like LangChain, LlamaIndex, and Vector Search, it can often be “dropped into” existing workflows with minimal code changes. However, the real cost and energy investment lies in re-indexing. Moving to this model requires re-embedding your existing corpus to ensure all data points exist in the same 3,072-dimensional space. While this is a one-time computational hurdle, it is the prerequisite for unlocking cross-modal search—where a simple text query can suddenly “see” into your video archives or “hear” specific customer sentiment in call recordings.The primary trade-off for data leaders to weigh is the balance between high-fidelity retrieval and long-term storage economics. Gemini Embedding 2 addresses this directly through Matryoshka Representation Learning (MRL), which allows you to truncate vectors from 3072 dimensions down to 768 without a linear drop in quality.This gives CDOs a tactical lever: you can choose maximum precision for high-stakes legal or medical discovery—as seen in Everlaw’s 20% lift in recall—while utilizing smaller, more efficient vectors for lower-priority recommendation engines to keep cloud storage costs in check. Ultimately, the ROI is found in the “lift” of accuracy; in a landscape where an AI’s value is defined by its context, the ability to natively index a 6-page PDF or 128 seconds of video directly into a knowledge base provides a depth of insight that text-only models simply cannot replicate.
Manufact raises $6.3M as MCP becomes the ‘USB-C for AI’ powering ChatGPT and Claude apps
For decades, software companies designed their products for a single type of customer: a human being staring at a screen. Every button, menu, and dashboard existed to translate a person’s intention into a machine’s action. But a small startup based in San Francisco and Zurich believes that era is ending — and that the future belongs to companies that build software not for people, but for the artificial intelligence agents that increasingly act on their behalf.Manufact, a three-person company that emerged from Y Combinator’s Summer 2025 batch, announced in February that it raised $6.3 million in seed funding led by Peak XV, the venture capital firm formerly known as Sequoia Capital India and Southeast Asia, which now manages more than $10 billion in assets. Liquid 2 Ventures, Ritual Capital, Pioneer Fund, and Y Combinator also participated in the round, alongside angel investors including the co-founder and chief operating officer of Supabase.The company’s thesis is deceptively simple and potentially enormous: as AI agents take over more of the work that humans perform inside software applications — filing expense reports, managing customer support tickets, writing code, booking travel — every software product on earth will need a new kind of interface designed specifically for those agents. Manufact is building the open-source tools and cloud infrastructure to make that transition possible.“Software products are already being accessed by and will be accessed mainly by AI agents, or by users through chat interfaces,” Luigi Pederzani, co-founder and co-CEO of Manufact, said in an interview with VentureBeat. “That’s our bet. That’s our thesis. And that’s what we are really rooting our company on.”How Anthropic’s Model Context Protocol became the universal standard for AI agentsTo understand Manufact, you first have to understand the technology it is built on: the Model Context Protocol, or MCP, an open standard introduced by Anthropic in late 2024 that has rapidly become the dominant way for AI agents to communicate with external software tools and data sources.Before MCP, connecting an AI agent to a company’s software required custom integration work for every single tool — a bespoke connector for Slack, another for Salesforce, another for a database. It was tedious, expensive, and fragile. MCP standardized this process into a single protocol, functioning as what CIO magazine recently called “the USB-C of AI” — a universal connector that lets any AI model plug into any software system through a single, consistent interface.The adoption has been explosive. In December 2025, Anthropic donated MCP to the Linux Foundation’s new Agentic AI Foundation, co-founded with Block and OpenAI, with support from Google, Microsoft, Amazon Web Services, and Cloudflare. More than 10,000 active public MCP servers now operate across the ecosystem. ChatGPT, Cursor, Google Gemini, Microsoft Copilot, and Visual Studio Code all support the protocol. Enterprise-grade deployment infrastructure exists from AWS, Cloudflare, Google Cloud, and Microsoft Azure. An estimated 7 million downloads of MCP servers occur every month.“Great protocols are as good as their adoption,” Pederzani said, drawing a comparison to the mobile revolution. “We saw the same transition with mobile, right? In the beginning, companies were just creating a pretty simple mobile app. Who would have bought a hotel or a flight or used a bank account from a mobile app? But as time passed, the web became mobile first. What we think is that software products will be MCP first, or chat first.”The stakes are high. The global AI agents market reached $7.84 billion in 2025 and is projected to surge to $52.62 billion by 2030, according to industry analysts. The MCP Dev Summit, the largest conference dedicated to the protocol, takes place April 2–3 in New York City under the Linux Foundation’s banner, with speakers from Docker, Workato, and major cloud providers — and Manufact will be among the companies presenting.Two Italian founders, a Zurich co-working space, and an open-source library that went viralManufact’s origin story reads like a case study in the power of open-source communities to validate a startup idea before a single dollar of venture capital is raised.Pietro Zullo and Luigi Pederzani, both originally from Italy, met at a co-working space in Zurich — the same space that produced Browser Use, Bloom, and other startups that went through YC in previous batches. Zullo was studying at ETH Zurich; Pederzani was working at Morgen, an ETH spin-off AI startup used by teams at Spotify, GitHub, and Linear, after leading a 12-engineer team at Accenture Switzerland. Both were winding down previous projects in early 2025 when MCP launched.“We both wrote agents in the past, and it was such a mess to write the tools, the integrations,” Zullo recalled. “When MCP came out, it looked like the perfect fit for what we were trying to do. But only Cursor, Claude Code, a few closed-source applications allowed you to actually use the protocol. I don’t think I’m going to do groceries or browse the internet or check my emails from Cursor — it’s like, not the right code, right? So we wrote an open-source library to basically do what you could do in Cursor with MCP servers, but on your own machine, on your own application, in your own terms.”They called the library mcp-use, with a slogan that resonated across the developer community: “Connect any MCP to any LLM in six lines of code.” The repository attracted 2,000 to 2,500 GitHub stars within weeks. Today, the SDK has surpassed 5 million downloads and 9,000 GitHub stars. Organizations including NASA, Nvidia, and SAP use the library, and Manufact claims that 20 percent of the US 500 have experimented with it.“The amount of power that you can put in six lines of code was really staggering,” Zullo said. The pair applied to Y Combinator on the day of the deadline. “We were super spontaneous because we had this open-source vibe and just enjoyed the process. We had so much energy from the community that was lifting us up, and we knew it was going to be fine.”Inside Manufact’s plan to become the ‘Vercel for MCP’ — from SDK to cloud in 60 secondsManufact’s strategy borrows directly from the playbook that turned Vercel into a multi-billion-dollar company by providing hosting and developer tools for front-end web applications. The analogy is deliberate: just as Vercel made it trivially easy to deploy a Next.js app, Manufact wants to make it trivially easy to build, test, and deploy the MCP servers and MCP apps that AI agents need to interact with software.The company offers three core products. First, the open-source mcp-use SDK, available in both Python and TypeScript, lets developers spin up a fully functional AI agent connected to MCP tools in as few as six lines of code. It supports any large language model, including local models, and has integrations with LangChain and other popular frameworks. Second, a built-in inspector and testing suite allows developers to visually debug their MCP servers in a browser, view raw JSON-RPC traffic, and test tool execution in a sandbox — without connecting to a live AI agent. Third, the Manufact Cloud platform handles deployment, scaling, authentication, access control, and observability, allowing teams to go from a GitHub push to a production MCP server in under 60 seconds.“As software becomes more agentic, the hard part isn’t the model anymore — it’s everything around it,” Zullo said. “We started Manufact because developers were spending too much time on plumbing instead of building and shipping their products.”The company has also moved aggressively into MCP apps, a newer extension of the protocol that allows developers to render interactive user interface components — React widgets, data visualizations, input forms — directly inside chat clients like ChatGPT and Claude. Manufact’s SDK lets a developer scaffold an MCP app with a single terminal command, edit React widgets, and deploy to ChatGPT in under a minute. This positions the company at the center of a potentially massive new distribution channel: ChatGPT alone has more than 800 million users.5 million downloads, zero revenue, and a crowded field of cloud giantsEvery open-source company faces the same fundamental tension: the community that makes the project valuable is not the same thing as a paying customer base. Manufact has been candid about this challenge.Pederzani said the company made a deliberate decision after Y Combinator to focus entirely on the open-source product and community, rather than rushing to monetize. “A lot of open-source projects jump immediately on the monetization part and kind of betray the community,” he said. While NASA, Nvidia, and other prominent organizations use the SDK, Pederzani acknowledged they are not paying customers. Manufact’s target is to reach $2 million to $3 million in annual recurring revenue by the end of 2026, which would position it for a Series A fundraise.The competitive landscape is crowding fast. AWS, Cloudflare, Vercel, and Docker have all launched MCP hosting features. But Manufact’s founders argue they sit in a complementary position relative to the model providers. “Anthropic and OpenAI are betting that their own chat products — Claude and ChatGPT — will become the primary interfaces through which people access all software,” Pederzani said. “If that bet plays out, we will serve these systems. That’s going to be massive.”Why software companies without MCP servers risk becoming “dumb databases” for AI agentsBehind Manufact’s optimism lies a darker observation about the software industry that gives their pitch urgency. Pederzani argued that companies that fail to make their products accessible to AI agents risk being reduced to “systems of record” — dumb databases that agents query but that no longer own the user experience or the customer relationship.“Now we have customers that come to us and say that their customers are choosing to adopt their product over a competitor because they offer an MCP server,” Pederzani said. “At the same time, there is a threat here that could put companies to become just systems of records. And this is really something that a lot of companies are scared of.”In late February, Manufact co-hosted what it called the largest MCP apps hackathon to date at Y Combinator’s headquarters in San Francisco. The event drew 650 applications and 300 builders. OpenAI, Cloudflare, and Anthropic all sponsored it. Perhaps the most telling detail: eight employees from Anthropic attended — more people than Manufact’s own three-person team. The model providers, it appears, view Manufact as an ally rather than a threat.Three employees, $6.3 million, and the ambition to capture a share of every AI tool call on EarthFor all its momentum, Manufact faces significant headwinds. The company has just three employees and has not yet demonstrated a scalable revenue model. Its most high-profile users are not paying customers. The $6.3 million seed round provides limited runway in an industry where infrastructure companies often require substantial capital to reach profitability. And the cloud providers that have launched MCP hosting features already own the customer relationships and billing infrastructure that enterprise buyers rely on.But when asked what success looks like in two years, both founders pointed to a single metric: the percentage of global AI tool calls that flow through their infrastructure. “Our metric is the global tool calls or servers that run on Manufact — how many tool calls are passing through Manufact, made by agents,” Pederzani said. “Like Stripe is doing for the global GDP. We’re going to win if we can get a great number for it.”The Stripe analogy is ambitious — Stripe processes hundreds of billions of dollars annually and is valued at roughly $90 billion — but it captures the scope of what Manufact’s founders believe is at stake. If MCP becomes the universal standard through which AI agents interact with all software, the company that provides the infrastructure for building and deploying MCP servers could occupy a position of outsized influence.“In the end, what matters is to make something agents want,” Zullo said, riffing on Y Combinator’s famous dictum to “make something people want.” “What we’re focusing on and what we’re building is to help this transition of building for agents instead of building for humans.”
Anthropic and OpenAI just exposed SAST’s structural blind spot with free tools
OpenAI launched Codex Security on March 6, entering the application security market that Anthropic had disrupted 14 days earlier with Claude Code Security. Both scanners use LLM reasoning instead of pattern matching. Both proved that traditional static application security testing (SAST) tools are structurally blind to entire vulnerability classes. The enterprise security stack is caught in the middle.Anthropic and OpenAI independently released reasoning-based vulnerability scanners, and both found bug classes that pattern-matching SAST was never designed to detect. The competitive pressure between two labs with a combined private-market valuation exceeding $1.1 trillion means detection quality will improve faster than any single vendor can deliver alone. Neither Claude Code Security nor Codex Security replaces your existing stack. Both tools change procurement math permanently. Right now, both are free to enterprise customers. The head-to-head comparison and seven actions below are what you need before the board of directors asks which scanner you are piloting and why.How Anthropic and OpenAI reached the same conclusion from different architecturesAnthropic published its zero-day research on February 5 alongside the release of Claude Opus 4.6. Anthropic said Claude Opus 4.6 found more than 500 previously unknown high-severity vulnerabilities in production open-source codebases that had survived decades of expert review and millions of hours of fuzzing. In the CGIF library, Claude discovered a heap buffer overflow by reasoning about the LZW compression algorithm, a flaw that coverage-guided fuzzing could not catch even with 100% code coverage. Anthropic shipped Claude Code Security as a limited research preview on February 20, available to Enterprise and Team customers, with free expedited access for open-source maintainers. Gabby Curtis, Anthropic’s communications lead, told VentureBeat in an exclusive interview that Anthropic built Claude Code Security to make defensive capabilities more widely available.OpenAI’s numbers come from a different architecture and a wider scanning surface. Codex Security evolved from Aardvark, an internal tool powered by GPT-5 that entered private beta in 2025. During the Codex Security beta period, OpenAI’s agent scanned more than 1.2 million commits across external repositories, surfacing what OpenAI said were 792 critical findings and 10,561 high-severity findings. OpenAI reported vulnerabilities in OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, resulting in 14 assigned CVEs. Codex Security’s false positive rates fell more than 50% across all repositories during beta, according to OpenAI. Over-reported severity dropped more than 90%.Checkmarx Zero researchers demonstrated that moderately complicated vulnerabilities sometimes escaped Claude Code Security’s detection. Developers could trick the agent into ignoring vulnerable code. In a full production-grade codebase scan, Checkmarx Zero found that Claude identified eight vulnerabilities, but only two were true positives. If moderately complex obfuscation defeats the scanner, the detection ceiling is lower than the headline numbers suggest. Neither Anthropic nor OpenAI has submitted detection claims to an independent third-party audit. Security leaders should treat the reported numbers as indicative, not audited.Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat that the competitive scanner race compresses the window for everyone. Baer advised security teams to prioritize patches based on exploitability in their runtime context rather than CVSS scores alone, shorten the window between discovery, triage, and patch, and maintain software bill of materials visibility so they know instantly where a vulnerable component runs.Different methods, almost no overlap in the codebases they scanned, yet the same conclusion. Pattern-matching SAST has a ceiling, and LLM reasoning extends detection past it. When two competing labs distribute that capability at the same time, the dual-use math gets uncomfortable. Any financial institution or fintech running a commercial codebase should assume that if Claude Code Security and Codex Security can find these bugs, adversaries with API access can find them, too. Baer put it bluntly: open-source vulnerabilities surfaced by reasoning models should be treated closer to zero-day class discoveries, not backlog items. The window between discovery and exploitation just compressed, and most vulnerability management programs are still triaging on CVSS alone.What the vendor responses proveSnyk, the developer security platform used by engineering teams to find and fix vulnerabilities in code and open-source dependencies, acknowledged the technical breakthrough but argued that finding vulnerabilities has never been the hard part. Fixing them at scale, across hundreds of repositories, without breaking anything. That is the bottleneck. Snyk pointed to research showing AI-generated code is 2.74 times more likely to introduce security vulnerabilities compared to human-written code, according to Veracode’s 2025 GenAI Code Security Report. The same models finding hundreds of zero-days also introduce new vulnerability classes when they write code.Cycode CTO Ronen Slavin wrote that Claude Code Security represents a genuine technical advancement in static analysis, but that AI models are probabilistic by nature. Slavin argued that security teams need consistent, reproducible, audit-grade results, and that a scanning capability embedded in an IDE is useful but does not constitute infrastructure. Slavin’s position: SAST is one discipline within a much broader scope, and free scanning does not displace platforms that handle governance, pipeline integrity, and runtime behavior at enterprise scale.“If code reasoning scanners from major AI labs are effectively free to enterprise customers, then static code scanning commoditizes overnight,” Baer told VentureBeat. Over the next 12 months, Baer expects the budget to move toward three areas. Runtime and exploitability layers, including runtime protection and attack path analysis. AI governance and model security, including guardrails, prompt injection defenses, and agent oversight. Remediation automation. “The net effect is that AppSec spending probably doesn’t shrink, but the center of gravity shifts away from traditional SAST licenses and toward tooling that shortens remediation cycles,” Baer said.Seven things to do before your next board meetingRun both scanners against a representative codebase subset. Compare Claude Code Security and Codex Security findings against your existing SAST output. Start with a single representative repository, not your entire codebase. Both tools are in research preview with access constraints that make full-estate scanning premature. The delta is your blind spot inventory.Build the governance framework before the pilot, not after. Baer told VentureBeat to treat either tool like a new data processor for the crown jewels, which is your source code. Baer’s governance model includes a formal data-processing agreement with clear statements on training exclusion, data retention, and subprocessor use, a segmented submission pipeline so only the repos you intend to scan are transmitted, and an internal classification policy that distinguishes code that can leave your boundary from code that cannot. In interviews with more than 40 CISOs, VentureBeat found that formal governance frameworks for reasoning-based scanning tools barely exist yet. Baer flagged derived IP as the blind spot most teams have not addressed. Can model providers retain embeddings or reasoning traces, and are those artifacts considered your intellectual property? The other gap is data residency for code, which historically was not regulated like customer data but increasingly falls under export control and national security review.Map what neither tool covers. Software composition analysis. Container scanning. Infrastructure-as-code. DAST. Runtime detection and response. Claude Code Security and Codex Security operate at the code-reasoning layer. Your existing stack handles everything else. That stack’s pricing power is what shifted.Quantify the dual-use exposure. Every zero-day Anthropic and OpenAI surfaced lives in an open-source project that enterprise applications depend on. Both labs are disclosing and patching responsibly, but the window between their discovery and your adoption of those patches is exactly where attackers operate. AI security startup AISLE independently discovered all 12 zero-day vulnerabilities in OpenSSL’s January 2026 security patch, including a stack buffer overflow (CVE-2025-15467) that is potentially remotely exploitable without valid key material. Fuzzers ran against OpenSSL for years and missed every one. Assume adversaries are running the same models against the same codebases.Prepare the board comparison before they ask. Claude Code Security reasons about code contextually, traces data flows, and uses multi-stage self-verification. Codex Security builds a project-specific threat model before scanning and validates findings in sandboxed environments. Each tool is in research preview and requires human approval before any patch is applied. The board needs side-by-side analysis, not a single-vendor pitch. When the conversation turns to why your existing suite missed what Anthropic found, Baer offered framing that works at the board level. Pattern-matching SAST solved a different generation of problems, Baer told VentureBeat. It was designed to detect known anti-patterns. That capability still matters and still reduces risk. But reasoning models can evaluate multi-file logic, state transitions, and developer intent, which is where many modern bugs live. Baer’s board-ready summary: “We bought the right tools for the threats of the last decade; the technology just advanced.”Track the competitive cycle. Both companies are heading toward IPOs, and enterprise security wins drive the growth narrative. When one scanner misses a blind spot, it lands on the other lab’s feature roadmap within weeks. Both labs ship model updates on monthly cycles. That cadence will outrun any single vendor’s release calendar. Baer said that running both is the right move: “Different models reason differently, and the delta between them can reveal bugs neither tool alone would consistently catch. In the short term, using both isn’t redundancy. It’s defense through diversity of reasoning systems.”Set a 30-day pilot window. Before February 20, this test did not exist. Run Claude Code Security and Codex Security against the same codebase and let the delta drive the procurement conversation with empirical data instead of vendor marketing. Thirty days gives you that data.Fourteen days separated Anthropic and OpenAI. The gap between the next releases will be shorter. Attackers are watching the same calendar.
OpenAI upgrades ChatGPT with interactive learning tools as lawsuits and Pentagon backlash mount
OpenAI on Monday launched a set of interactive visual tools inside ChatGPT that let users manipulate mathematical and scientific formulas in real time — a genuinely impressive education feature that also serves as the company’s most direct attempt yet to change the subject during the worst ten days of its corporate life.The new experience covers more than 70 core math and science concepts, from the Pythagorean theorem to Ohm’s law to compound interest. When a user asks ChatGPT to explain one of these topics, the chatbot now generates a dynamic module with adjustable sliders alongside its written response. Drag a variable, and the equations, graphs, and diagrams update instantly. The feature is available today to all logged-in users worldwide, across every plan, including free.OpenAI tells VentureBeat that 140 million people already use ChatGPT each week for math and science learning. That is a staggering number — and it goes a long way toward explaining why the company chose this particular week to ship a product designed to make those users’ experience meaningfully better. Since late February, OpenAI has been sued by the family of a 12-year-old mass shooting victim who alleges the company knew the attacker was planning violence through ChatGPT; lost its head of robotics over a Pentagon deal that triggered a near-300% spike in app uninstalls; watched more than 30 of its own employees file a legal brief supporting rival Anthropic against the U.S. government; and scrapped plans with Oracle to expand a flagship data center in Texas. Its chief competitor’s app, Claude, now sits atop the App Store.The interactive learning tools are, on their merits, a strong product. But they arrive at a company fighting on every front simultaneously — and burning through an estimated $15 billion in cash this year to do it.How the new ChatGPT learning tools actually workThe feature is built on a simple pedagogical premise: students understand formulas better when they can see what happens as the inputs change.Ask ChatGPT “help me understand the Pythagorean theorem,” and the system now responds with a written explanation alongside an interactive panel. On the left, the formula $a^2 + b^2 = c^2$ appears in clean notation with sliders for sides $a$ and $b$. On the right, a geometric visualization — a right triangle with squares drawn on each side — reshapes dynamically as you adjust the values. The computed hypotenuse updates in real time. The same treatment applies across topics: voltage and resistance for Ohm’s law, pressure and temperature for the ideal gas equation, radius and height for cone volume.OpenAI’s initial roster of more than 70 topics targets high school and introductory college material: binomial squares, Charles’ law, circle equations, Coulomb’s law, cylinder volume, degrees of freedom, exponential decay, Hooke’s law, kinetic energy, the lens equation, linear equations, slope-intercept form, surface area of a sphere, trigonometric angle sum identities, and others.The company cited research suggesting that “visual, interaction-based learning can lead to stronger conceptual understanding than traditional instruction for many students,” and pointed to a recent Gallup survey in which more than half of U.S. adults said they struggle with math. In early testing, OpenAI said, students reported the modules helped them grasp how variables relate to one another, and parents described using them to work through problems alongside their children.Anjini Grover, a high school mathematics teacher quoted in OpenAI’s announcement, said the feature stands out for “how strongly this feature emphasizes conceptual understanding.” Raquel Gibson, a high school algebra teacher, called it “a step towards empowering students to independently explore abstract concepts.”The tools build on ChatGPT’s existing education features — a “study mode” for step-by-step problem solving and a quizzes feature for exam prep — and OpenAI said it plans to expand interactive learning to additional subjects. The company also said it intends to publish research through its NextGenAI initiative and OpenAI Learning Lab to study how AI shapes learning outcomes over time.A lawsuit alleging OpenAI knew a mass shooter was planning an attackThe education launch shares the calendar with the most serious legal challenge OpenAI has ever faced.On Monday, the mother of 12-year-old Maya Gebala filed a civil lawsuit against OpenAI in B.C. Supreme Court, alleging the company had “specific knowledge of the shooter’s long-range planning of a mass casualty event” through ChatGPT interactions and “took no steps to act upon this knowledge.” Gebala was shot three times during a mass shooting in Tumbler Ridge, British Columbia on February 10 that killed eight people and the 18-year-old attacker. She suffered what the lawsuit describes as a catastrophic traumatic brain injury with permanent cognitive and physical disabilities.The claim paints a damning picture of how the shooter used ChatGPT. It alleges the platform functioned as a “counsellor, pseudo-therapist, trusted confidante, friend, and ally” and was “intentionally designed to foster psychological dependency between the user and ChatGPT.” The shooter was under 18 when they began using the service, the suit states, and despite OpenAI’s requirement that minors obtain parental consent, the company “took no steps to implement age verification or consent procedures.”OpenAI has separately acknowledged that it suspended the shooter’s account months before the attack but did not alert Canadian law enforcement — a decision that provoked sharp political fallout. B.C. Premier David Eby said after a virtual meeting with Altman that the CEO agreed to apologize to the people of Tumbler Ridge and work with the provincial government on AI regulation recommendations.None of the claims have been proven in court. OpenAI has not publicly commented on the lawsuit. But the case poses a question that transcends any single legal proceeding: when an AI company’s own internal systems identify a user as dangerous enough to ban, what obligation does it have to tell someone?The Pentagon deal that split OpenAI from the insideThe Tumbler Ridge lawsuit is unfolding against the backdrop of an internal crisis that has already cost OpenAI key talent and millions of users.On February 28, CEO Sam Altman announced a deal giving the Pentagon access to OpenAI’s AI models inside secure government computing systems. The agreement came days after Anthropic CEO Dario Amodei publicly refused similar terms, saying his company could not proceed without assurances against autonomous weapons and mass domestic surveillance. The Pentagon responded by designating Anthropic a “supply-chain risk” — a classification normally reserved for foreign adversaries — and Defense Secretary Pete Hegseth barred any military contractor from conducting commercial activity with the company.The reaction inside OpenAI was immediate. Caitlin Kalinowski, who joined from Meta in 2024 to build out the company’s robotics hardware division, resigned on principle. “AI has an important role in national security,” she wrote publicly. “But surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got.” Research scientist Aidan McLaughlin wrote on social media that he “personally don’t think this deal was worth it.” Another employee told CNN that many OpenAI staffers “really respect” Anthropic for walking away.The reaction outside the company was even more dramatic. ChatGPT uninstalls spiked more than 295% on the day the deal was announced. Anthropic’s Claude surged to No. 1 among free apps on the U.S. Apple App Store and remained there as of this past weekend. Protesters gathered outside OpenAI’s San Francisco headquarters calling for a “QuitGPT” movement.And in the most extraordinary development, more than 30 OpenAI and Google DeepMind employees — including DeepMind chief scientist Jeff Dean — filed an amicus brief Monday supporting Anthropic’s lawsuit against the Defense Department. The brief argued that the Pentagon’s actions, “if allowed to proceed,” would “undoubtedly have consequences for the United States’ industrial and scientific competitiveness in the field of artificial intelligence and beyond.” The employees signed in their personal capacity, but the spectacle of OpenAI’s own researchers rallying to a competitor’s legal defense against the same government their company just partnered with has no real precedent in the industry.Altman, to his credit, has not pretended the situation is fine. In an internal memo later shared publicly, he admitted the deal “was definitely rushed” and “just looked opportunistic and sloppy.” He revised the contract to include explicit prohibitions against mass domestic surveillance and the use of OpenAI technology on commercially acquired data. He also publicly said that enforcing the supply-chain risk designation against Anthropic “would be very bad for our industry and our country.”Meanwhile, Anthropic warned in court filings that the Pentagon’s blacklisting could cost it up to $5 billion in lost business — roughly equivalent to its total revenue since commercializing its AI technology in 2023. The company is seeking a temporary court order to continue working with military contractors while the case proceeds.Why OpenAI’s $15 billion cash burn makes every user countStrip away the lawsuits and the politics, and OpenAI still has a math problem of its own.The company is expected to burn through approximately $15 billion in cash this year, up from $9 billion in 2025. It has roughly 910 million weekly users. About 95% of them pay nothing. Subscriptions alone cannot bridge that gap, which is why OpenAI is simultaneously building out an internal advertising infrastructure and leaning on partners like Criteo — and reportedly The Trade Desk — to bring advertisers into ChatGPT.The company is hiring aggressively for this effort: a monetization infrastructure engineer, an engineering manager, a product designer for the ads experience, a senior manager for ad revenue accounting, and a trust and safety specialist dedicated to the ads product, all based at headquarters in San Francisco. The compensation bands run as high as $385,000 — the kind of investment a company makes when it plans to own its ad stack, not rent it.But advertising inside ChatGPT introduces a trust problem that compounds the ones OpenAI is already managing. Users who abandoned the app over the Pentagon deal demonstrated that loyalty to ChatGPT is thinner than its market share suggests. Adding commercial messages to a product already under fire for its military ties and its handling of a mass shooter’s data will require OpenAI to navigate user sentiment with a precision it has not recently demonstrated.The infrastructure picture is equally unsettled. Oracle and OpenAI recently scrapped plans to expand a flagship AI data center in Abilene, Texas, after negotiations stalled over financing and OpenAI’s evolving needs. Meta and Nvidia moved quickly to explore the site — a reminder that in the current AI arms race, any gap in execution gets filled by a competitor within days.Why interactive learning is OpenAI’s strongest remaining argumentThis is where the education feature becomes more than a product announcement.Education has always been ChatGPT’s cleanest use case — the application where the technology most obviously augments human capability rather than surveilling it, weaponizing it, or monetizing the attention of people who came looking for help. It is the use case that resonates across demographics: students prepping for the SAT, parents revisiting algebra at the kitchen table, adults circling back to concepts they never quite understood. And it is the use case where ChatGPT still holds a clear lead. Google’s Gemini, Anthropic’s Claude, and xAI’s Grok are all investing in education, but none has shipped anything comparable to real-time interactive formula visualization embedded in a conversational interface.OpenAI acknowledged that the “research landscape on how AI affects learning is still taking shape,” but pointed to its own early findings on study mode as showing “promising early signals.” The company said it will continue working with educators and researchers through its NextGenAI initiative and OpenAI Learning Lab, and plans to publish findings and expand into additional subjects.Somewhere tonight, a ninth-grader will open ChatGPT, drag a slider, and watch a hypotenuse lengthen across her screen. The Pythagorean theorem will make sense for the first time. She will not know about the Pentagon deal, or the Tumbler Ridge lawsuit, or the 295% spike in uninstalls, or the $15 billion cash burn underwriting the server that just rendered her triangle. She will only know that it worked. For OpenAI, that may have to be enough — for now.