February 12, 2025

Scaled Cognition introduces the Agentic Pretrained Transformer, topping agentic benchmarks.

Today, we’re introducing the Agentic Pretrained Transformer, a new modeling paradigm designed specifically for agentic applications. Our first system, APT-1, outperforms existing foundation models on the two most challenging agentic benchmarks, Tau-Bench and ComplexFuncBench. Both the APT-1 system and our Agent Builder development environment are now available in research preview on our website. We're backed by Khosla Ventures and Vinod Khosla has joined our board.

Agentic Benchmarks

* Benchmark details at https://github.com/sierra-research/tau-bench. Reported numbers are averaged over 10 runs.
** Benchmark details at https://github.com/THUDM/ComplexFuncBench.

Actions Speak Louder Than Tokens

Agents are AI systems that don’t just converse – they take actions in the real world, like processing insurance claims, booking flights or issuing refunds. Agents can enable people to get things done more easily than ever—but to achieve that potential, they need to be more accurate, reliable, and consistent than what today’s LLM technologies allow. Existing AI models, while powerful, are fundamentally built to predict tokens, not actions. We've taken a new approach that instead refocuses the entire agentic stack around decision making and action taking.

Scaled Cognition’s APT technology is focused on agentic performance in exactly this way. It is full stack, including synthetic data generation, action modeling, training innovations, and evaluation solutions, all specialized to agentic workflows.

A fully agentic system like APT-1 ultimately requires training data that contains not only conversations but also actions that should be taken in context of those conversations. The majority of the text data that traditional LLMs train on is scraped from the web, and so while it is broad and plentiful, it is not grounded in the right way. It’s rich in language, but missing the grounded actions at the center of agentic behavior. Neither the web nor enterprise datastores have the right data, so we had to generate it. We created a fully synthetic training data pipeline that combines simulation and reinforcement learning. It applies to any agentic application, and teaches the system to take actions correctly, subject to policies and instructions.

Another important element backing our system is agent-to-agent self play. Self play has been a powerful tool in AI, but it has primarily been applied to games, like Chess and Go, where interactions are simple and outcomes are clear. Getting self play to work for real-world agentic tasks is much harder. The interactions between users, agents, and APIs can be extremely complex, and the conversation outcomes do not divide neatly into wins and losses. Our system brings a new RL approach to agentic self play that allows it to learn robustly from agent-to-agent simulations before interacting with any real humans.

Finally, our models are optimized around action-level objectives rather than token-level ones, producing training regimes better suited to agentic tasks. Useful agents must strictly follow developer-provided business logic in order to make the appropriate decisions – not just plausible ones. Enterprise agents in particular must operate over much more complex business logic structure than simple chatbots, so we designed APTs to combine deterministic controls for crucial business policies with the general language understanding capabilities of modern LLMs.

These research breakthroughs have led to APT-1, a specialist system for agentic tasks. APT-1 behaves more deterministically, makes fewer hallucinations, and outperforms standard LLMs on agentic workflows. Because it is pretrained on a wide range of scenarios, it requires no customer-specific fine tuning to deploy. In our first release, APT-1 is targeted at conversational AI applications like customer support, with plans to expand to other verticals in the future.

Agent Builder: A Better Way To Build Agents

We are also introducing our Agent Builder platform, which lets developers create, test, and deploy enterprise-grade agents in less than an hour. Our AI-powered tooling ingests business logic and builds structured agent specifications, making it seamless to develop robust agents. Agent Builder also includes GenAPI™, a new model-backed technology that makes it possible to test agent behaviors without needing to integrate with real APIs during development.

APT-backed agents can be deployed on-prem, via a virtual private cloud, or through our hosted service. All our systems support both text and voice, in a range of languages.

Our team has been working on what are now called agents for decades. Scaled Cognition is the fourth company I’ve co-founded, and the second with our CTO, UC Berkeley AI Professor Dan Klein. Our previous startup, Semantic Machines, was a pioneer in agentic AI and was acquired by Microsoft in 2018, where I became CVP of Conversational AI. At Microsoft, Professor Klein served as a Microsoft Technical Fellow for AI/NLP. Today, Scaled Cognition is a team of 20 exceptional researchers, engineers, and entrepreneurs, all with proven track records in fundamental AI development with experience ranging from top industry labs at Microsoft, Google, and the Allen Institute for AI, to leading academic institutions like UC Berkeley, CMU, Stanford, and MIT.

Agent Builder is currently in early access. To register for an account, join the waitlist or get in touch at info@scaledcognition.com. If you are passionate about what we’re building, consider joining our team.

Dan Roth

Co-Founder & CEO Scaled Cognition

Dan is Co-Founder and CEO of Scaled Cognition. Before starting Scaled Cognition, he was CVP of Conversational AI at Microsoft, Co-Founder and CEO of Semantic Machines (acquired by Microsoft), Shaser BioScience (acquired by Spectrum Brands), Voice Signal Technologies (acquired by Nuance).