The hard problem in multi-agent is context transfer

Feb 10, 2026

agents · context-engineering · tool-design · infrastructure

A developer posted a 15-stage multi-agent pipeline that ships 2,800 lines a day through Claude Code. The internet focused on the agent count. I think they’re looking at the wrong thing.

Loops work because context stays

The pipeline’s quality loops - review up to 5 times, test up to 10 - are effective. But not because iteration is magic. They work because a single agent looping on its own work retains full context. It remembers what it tried, what failed, why. Every iteration builds on the last.

This is test-time compute in practice. More thinking time on the same problem, with the same context, produces better results. No surprise there.

The lossy handoff

The moment you introduce a second agent, you have a context transfer problem. Agent A built the feature. Agent B reviews it. Agent B doesn’t know what Agent A considered and rejected. It doesn’t know the constraints that shaped the implementation. It’s reviewing code with half the story.

This is the mythical man-month for agents. Adding more agents to a problem adds coordination overhead that can exceed the value they provide. Every agent boundary is a lossy compression of context.

Anthropic showed this when they had 16 parallel agents build a C compiler. The parallel agents worked - but only after investing heavily in the decomposition. The lexer agent produced tokens in a format that made sense given its internal constraints. The parser agent expected a different structure. Neither agent was wrong. They just didn’t share context about why each made its decisions. The fix wasn’t more agents or smarter prompts. It was defining boundaries so clean that agents didn’t need each other’s context to do their jobs. That interface design work took longer than writing the actual agent prompts.

The same thing happens at smaller scales. Two agents doing code review and implementation. The reviewer flags a function as “too complex” and sends it back. The implementer simplifies it but breaks an edge case the reviewer doesn’t know about, because the context for why the function was complex in the first place got lost in the handoff. Three rounds later you’re back where you started.

When to loop vs. when to split

So when does adding an agent actually help?

Loop when the task benefits from refinement. Same context, deeper thinking. A single agent iterating on test failures has full history of what it tried. Each pass narrows the search space. This is where test-time compute shines - the context compounds.

Split when the task requires a genuinely different capability. A code writer and a security auditor look at the same code with different eyes. A frontend agent and a backend agent work in different domains. The key: the boundary between them must be a clean interface, not a shared context. If agent B needs to understand agent A’s reasoning to do its job, you don’t have two tasks - you have one task with a bad seam.

The inflection point is context dependency. Ask: does the next step need to know why the previous step made its choices, or just what it produced? If the output is self-explanatory - a test suite, an API schema, a compiled artifact - split freely. If understanding the output requires understanding the reasoning, keep it in one agent and loop.

The agent harness matters more than the agent count. A good harness preserves context across handoffs. A bad one loses it. Most multi-agent failures aren’t intelligence failures. They’re context transfer failures.

Fix the handoff, and the pipeline works. Add more agents without fixing the handoff, and you just multiply the confusion.