An AI that takes action produces effects in the world rather than text in a chat. The leap from a system that produces text to a system that produces effects is the central engineering challenge of this generation of AI. The leap requires reliability, not just intelligence, and four architectural commitments. Anthropic published research in February 2026, titled Measuring AI agent autonomy in practice, finding that production agents pause for clarification more than twice as often as users interrupt them on complex tasks. moccet is being built around the same discipline.
This essay explains what taking action actually requires, why most products that claim it have not built it, and how to tell the difference.
Why is taking action different from answering questions?
A system that produces text needs to be smart. The output is the product, so the output must be good. Engineering effort is concentrated in the underlying model and in the prompts that elicit good behaviour. If the text is wrong, the user reads it, recognises that it is wrong, and either corrects it or discards it. The system does not have to be reliable in the strong sense. The user supplies the reliability by serving as a check on the output before anything irreversible happens.
A system that produces effects needs to be smart and reliable. Reliability is a different property from intelligence. A brilliant agent that succeeds 80 percent of the time and fails catastrophically 20 percent of the time is worse than a less brilliant agent that succeeds 95 percent of the time and fails gracefully when it does. Catastrophic failures are not bad text the user can ignore. Catastrophic failures are emails sent to the wrong people, meetings booked at the wrong times, payments dispatched to the wrong accounts.
The gap between intelligence and reliability is what most agent products underestimate. Improving the underlying language model raises capability. Improvement does not, on its own, raise reliability. Reliability requires sandboxing, confirmation steps, observability, idempotent operations, rollback paths, and the discipline to refuse to act when uncertain. None of these is a feature of the model. All of them are properties of the architecture around the model. A team that has built a smarter model has not, by virtue of building a smarter model, built a more reliable agent. They have built a smarter unreliable one, which is in some respects more dangerous.
What does the research show about reliable AI agents?
Anthropic's February 2026 research paper Measuring AI agent autonomy in practice drew on the company's data about how Claude was being used as an agent across customer deployments. The findings were specific and modest.
Most agent actions on Anthropic's API were low-risk and reversible. Software engineering accounted for nearly half of all agentic activity. Agent-initiated stops, where the agent paused to ask the user a clarifying question rather than forging ahead, occurred more than twice as often as human-initiated interruptions on the most complex tasks. The conclusion the researchers drew was that effective oversight of agents in production looks less like a system that confidently completes tasks and more like a system that frequently knows when not to.
The last finding contains the engineering insight the marketing has obscured. Acting is not what the AI industry has spent the past two years getting good at. Refusing to act, intelligently, is.
The discipline that produces reliability has a recognisable shape, and the shape is the orchestrator-worker pattern that has emerged as the consensus architecture for production multi-agent systems. A central orchestrator agent receives a goal, decomposes it into subtasks, routes each subtask to a specialised worker, evaluates the results, and decides what to do next. Microsoft's Azure architecture documentation, Anthropic's research papers on multi-agent design, and the open-source frameworks consolidated into the Agentic AI Foundation in December 2025 have all converged on variants of this pattern. A fuller account is in the essay on the orchestrator-agents pattern.
