Read more about: #agents#claude-code#infrastructure#llms#cli

What 16 parallel agents building a C compiler teaches about coordination

agents · claude-code · infrastructure · context-engineering

Anthropic put 16 Claude agents on a shared Git repo and told them to write a C compiler in Rust. Two weeks and $20,000 later, the compiler builds Linux 6.9, SQLite, PostgreSQL, and FFmpeg. 100,000 lines of code, 99% pass rate on the GCC torture test suite.

The result is impressive. The coordination problems are more interesting.

Git as a coordination primitive

The agents didn’t use a message bus or a task queue. They used Git. Each agent grabs a task by writing a lock file to current_tasks/parse_if_statement.txt. If two agents try to claim the same task, Git’s merge conflict tells the second one to pick something else.

This is elegant and brutal. No central scheduler. No leader election. Just the filesystem and merge semantics. It works because Git already solves the hard distributed systems problems: conflict detection, atomic commits, history. The agents just inherited those guarantees.

The tricky part: merge conflicts happened constantly. Not from lock contention, but from 16 agents pushing changes to overlapping files. Claude resolved them autonomously. That’s a nontrivial capability. Merge conflict resolution requires understanding the intent behind both sides of the diff. It’s the kind of agentic task that breaks most automation.

The single-task bottleneck

Here’s the failure mode that matters. When the compiler tried to build the Linux kernel (one giant task), all 16 agents hit the same bugs, fixed them independently, then overwrote each other’s changes. Parallelism collapsed to zero.

The fix was clever: use GCC as an oracle. Randomly compile most kernel files with GCC, only send a subset to the Claude compiler. Now each agent works on different files, and failures are isolated.

This is a general principle for agent harness design. Parallel agents need decomposable tasks. If your problem doesn’t decompose, throwing more agents at it makes things worse, not better. The hard work isn’t running agents in parallel. It’s splitting the problem so parallel work is possible.

Context as infrastructure

The harness was designed around Claude’s constraints, not a human engineer’s. Verbose output was minimized because it burns context window. Important data went to files the agent could selectively retrieve. A --fast flag ran 1-10% random sampling to prevent agents from burning hours on full test suites.

Fresh containers meant agents needed to orient themselves constantly. The system maintained READMEs and progress files so each agent could figure out where things stood. This is context engineering in practice: designing the information environment so the agent can stay effective across long sessions.

The researcher said something that stuck: “I was writing this test harness for Claude and not for myself.” If you’re building multi-agent systems and your harness still assumes a human operator, you’re building the wrong thing.

What this actually means

Agent teams is now a Claude Code feature. You can spin up multiple agents that coordinate peer-to-peer on a shared codebase. The compiler was the stress test.

The patterns from this experiment generalize: Git for coordination, file locks for task claims, oracle-based decomposition for monolithic problems, context-aware harness design. These aren’t specific to compilers. They’re the primitives of multi-agent architecture.

The $20,000 price tag sounds steep until you consider what it replaced: a team of engineers over weeks, or more likely, the project never happening at all. The cost curve only goes one direction.

The interesting question isn’t whether agents can build a compiler. It’s what happens when this coordination pattern gets applied to problems that actually decompose well. Microservices. Test suites. Documentation. Migration scripts. The compiler was the hard case. The easy cases are coming.