AI Agents Never Graduate from Onboarding
They learn the manual. They never get to day two.
This is the second post in the series The Headless Firm. The first post, The Mediocrity Tax, argued that the economic logic behind the integrated software suite has collapsed. This post examines the most immediate obstacle to what comes next. It draws on a paper we co-authored: “The Headless Firm: How AI Reshapes Enterprise Boundaries”.
Think about the last time your organisation rolled out a new system. There was documentation, probably a training session, and then weeks of people figuring out how things actually worked in practice: the shortcuts, the workarounds, the sequences nobody wrote down because everyone on the team already knew them. That gap between the official process and the real one is not a failure of implementation. It is how every enterprise environment works. The living process and the documented process are always different, and it is the living one that actually runs the business. Now consider what happens when you deploy an AI agent into that environment and ask it to automate something.
The Conversation We Are Not Having
The current debate about enterprise AI focuses almost entirely on model capability: which model reasons better, which one is faster, which one handles longer context. That is the wrong conversation, or at least an incomplete one. The limiting factor in most enterprise AI deployments is not what the model can do in isolation. It is what the model can actually see.
Most enterprise workflows exist only in the user interface of the applications running them. There is no underlying data feed an agent can tap into, no process map it can read, no structural description of how the software behaves under different conditions. The workflow lives in the sequence of clicks, form fields, and screen states that a trained human navigates every day without thinking about it. To an agent, that whole layer is invisible.
The data on this is sobering. MuleSoft’s Connectivity Benchmark Report finds that the average enterprise runs nearly 900 applications and 71% of them remain unintegrated with other systems, not because the technology to connect them does not exist, but because the coordination cost was never worth paying. Celonis’s 2026 Process Optimization Report, which surveyed over 1,600 senior business leaders, found that 82% believe AI will fail to deliver meaningful returns if it does not understand how the business actually runs. McKinsey’s CFO Pulse survey adds a sharper angle: 41% of CFOs report that one-quarter or fewer of their processes are currently automated, despite nearly all of them having invested in automation technology. The ambition is real. The operational foundation is not.
Why Agents Break in Production
We have seen this play out in three ways that show up across enterprise deployments. The first is context blindness. An agent attempting to automate a procurement approval does not know that one field in the form behaves differently depending on supplier category, or that submissions after the 25th of the month route to a different approval queue. A human learns this on day two. An agent has no way to learn it unless someone manually encodes every exception, which defeats the purpose of deploying an agent in the first place. The practical consequence is that agents either get so constrained they cannot do useful work, or they operate with enough latitude to cause problems that are only discovered after the fact.
The second is permission bypass. Enterprise applications have permission models that are often highly granular: this user can approve up to a certain threshold, that user cannot access a particular supplier category, and certain records require dual sign-off. These constraints live in the application logic, and when an agent navigates the interface without understanding that logic, it can trigger actions that would have been blocked for a human user. That is not just a broken workflow. It is a compliance failure, the kind that surfaces in audits, creates regulatory exposure, and tends to end AI programmes rather than just individual deployments.
The third is brittleness on exceptions. Real business processes branch. A standard purchase order follows one path, one requiring a regulatory check follows another, and one flagged for a policy exception follows a third. Agents trained on the happy path fail when they encounter a branch they were not prepared for, and in enterprise environments, exceptions are common. They are a significant share of the actual workload. The result is an agent that performs well in controlled conditions but unpredictably in production, a pattern that erodes organisational trust in AI faster than any technical failure. None of these failures can be traced back to insufficient model intelligence. They trace back to insufficient process knowledge. The agent is not stupid. It is blind.
What Has to Come First
Before agents can be trusted to act, someone or something has to have built a reliable model of how the environment they are operating in actually works. Not a process document written two years ago, not a workflow diagram describing the intended sequence, but a model grounded in how software is actually used by real users in production, including the edge cases and exceptions that never made it into any documentation.
Building that model manually is expensive and fragile. It requires process mining initiatives, annotation, and constant upkeep as applications change. Most organisations do not have the capacity to do it properly, and the ones that try find it out of date almost immediately. This is why so many enterprise AI deployments work well in controlled demos but fail in production. The model of the environment was never there to begin with.
What makes this particularly difficult is that the problem is invisible until it fails. A demo runs against a clean, well-understood workflow with known inputs and predictable outputs. Production runs against everything else. The gap between the two is not a testing problem or an engineering problem in the conventional sense. It is an epistemological one: the agent simply does not know what it does not know, and nobody has given it a way to find out.
The Missing Precondition
The first post in this series described the headless firm as an hourglass: an intent layer at the top, a competitive market of vertical agents at the bottom, and a thin trust layer in the middle. That architecture only functions if the agents at the bottom can operate reliably in real enterprise environments. Right now, the process foundation that would make that possible is the part of the stack nobody has built properly. Not because the problem is unsolvable, but because the industry’s attention has been almost entirely on the model layer, where the progress is visible and measurable, rather than on the infrastructure layer beneath it, where the real friction lives.
The consequence is a widening gap between what enterprise AI promises and what it actually delivers in the field. Closing that gap is the precondition for the transition described in the first post to actually happen rather than stall indefinitely at the pilot stage. The next post examines what that foundation needs to look like and why it must exist as independent infrastructure rather than something bolted onto the platforms already in the room.
Next in this series: The Missing Layer of the Agentic Stack — on why the process foundation for enterprise AI has to exist as independent infrastructure, and what that means for the companies currently trying to build it into their existing products.
The full paper is available here.
Tassilo Klein and Sebastian Wieczorek are co-founders of Mantix.




