Multi-Agent Architecture
How Coding Agents Spawn and Coordinate Specialized Workers
Some coding tasks are too large or complex for a single agent loop to handle well. Researching a sprawling codebase, refactoring multiple modules simultaneously, or tasks that need both deep exploration and careful editing can exhaust a single agent's context window or take too long running sequentially. Multi-agent architecture solves this by letting an orchestrator agent delegate sub-tasks to specialized workers that run independently, each with their own context window and tool access.
Think of it like a lead engineer who breaks a project into work items and assigns them to team members. The lead doesn't need to know every line each person reads -- they just need the final result. This is the core insight: encapsulation of intermediate work.
Interactive: Orchestrator and Sub-Agents
Watch the orchestrator delegate tasks to sub-agents and receive results. Toggle between parallel and sequential execution modes.
Execution Timeline
Why a Single Agent Isn't Always Enough
Context Window Pressure
When an agent searches across dozens of files, the accumulated tool results fill the context window rapidly. A sub-agent can absorb all that noise, distill it into a paragraph, and return only the relevant findings. The orchestrator's context stays clean for decision-making.
Wall-Clock Time
If two research tasks are independent -- say, understanding the auth module and checking test coverage -- running them in parallel halves the wait time. A single agent would have to do them one after another, doubling the latency.
Expertise Specialization
Different tasks benefit from different tool configurations and system prompts. An exploration agent needs fast search tools and read-only access. An implementation agent needs file editing and shell access. Splitting these concerns produces better results than one generalist trying to do everything.
Failure Isolation
If a sub-agent goes down a wrong path or runs into an error, the orchestrator can catch the failure, adjust the prompt, and retry. The main session is never corrupted by a sub-agent's mistakes. This is the same principle as process isolation in operating systems.
The Orchestrator Pattern
How the main agent decides what to delegate, to whom, and how to integrate results.
Task Decomposition
The orchestrator analyzes the user's request and identifies sub-tasks that can be delegated. It considers dependencies between tasks: if Task B needs the output of Task A, they must run sequentially. Independent tasks are candidates for parallel execution.
Brief Writing
For each sub-task, the orchestrator writes a detailed prompt that includes: what to accomplish, which files or areas to focus on, what format the result should take, and any constraints. A well-written brief is the single biggest factor in sub-agent success.
Agent Spawning
Each sub-agent is created with a fresh context window containing only the brief and its tool definitions. It has no knowledge of the orchestrator's conversation history, other sub-agents, or the broader task. This isolation is intentional -- it prevents context pollution and keeps each agent focused.
Execution and Monitoring
Sub-agents run their own agentic loops: reading files, searching code, making edits, running tests. The orchestrator waits for results. In parallel mode, it waits for all agents to complete. In sequential mode, it processes results one at a time, potentially adjusting later briefs based on earlier results.
Result Integration
Each sub-agent returns a concise summary. The orchestrator reads all summaries, synthesizes them into a coherent picture, and either delivers the final answer to the user or spawns additional agents for follow-up work. The sub-agents' full internal traces are discarded.
Context Isolation: The Key Insight
The orchestrator pays only 200 tokens for what was 19K tokens of work. This 95% compression is why delegation scales -- you can run five sub-agents and still use fewer tokens in the orchestrator's context than if it had done all the research itself.
Specialized Agent Types
Explorer Agent
Optimized for fast, read-only codebase research. Has search and file reading tools but no ability to modify files or run commands. Ideal for tasks like "find all usages of the UserAuth class" or "understand the database schema." Low risk, high speed.
Planner Agent
Analyzes architecture and proposes changes without executing them. Reads code to understand structure, then produces a plan: which files to modify, what changes to make, and in what order. Useful for complex refactors where you want to review the plan before execution.
Implementer Agent
Full tool access for making code changes. Can read files, edit them, create new files, and run shell commands (tests, linters, builds). Usually runs in a worktree to isolate changes. The most capable but also highest-risk agent type.
Reviewer Agent
Reviews changes made by other agents. Can read the diff, run the test suite, check for common issues, and produce a review report. Often runs after an implementer agent finishes, providing a second set of eyes before changes are accepted.
Worktree Isolation for Safe Parallel Edits
When multiple agents need to modify code simultaneously, they can't all edit the same working directory -- they'd overwrite each other's changes. The solution is git worktree isolation: each agent gets its own checkout of the repository at a separate filesystem path.
How It Works
Git worktrees allow multiple working directories to share the same repository history. Each worktree can be on a different branch or at a different commit. The agents edit files in their own worktree, and when they finish, their changes are merged back to the main branch -- similar to how developers work on feature branches.
Merge Conflicts
If two agents edit the same file, the merge step may produce conflicts. The orchestrator can resolve these automatically (using the LLM's understanding of both changes) or flag them for the user. Careful task decomposition minimizes overlap: assign different files or modules to different agents when possible.
Communication: Prompt In, Summary Out
This is encapsulation applied to AI reasoning. The orchestrator doesn't see the sub-agent's grep results, file contents, or internal reasoning steps -- only the final answer. This boundary is what makes the architecture scalable.
Failure Handling and Recovery
If a sub-agent takes too long (stuck in a loop, overly broad search), the orchestrator can kill it and either retry with a more specific prompt or skip that sub-task entirely.
If the returned summary seems incomplete or contradictory, the orchestrator can spawn a new agent with a refined brief. "The previous search missed test files -- please also check the __tests__ directory."
If a sub-agent's tool calls fail (file not found, command error), the sub-agent handles it within its own loop. If it can't recover, it reports the failure. The orchestrator decides whether to retry or adapt.
If a sub-agent's context window fills up before it finishes, it returns a partial result with what it found so far. The orchestrator can spawn a continuation agent to pick up where it left off.
Parallel vs. Sequential: When to Use Which
Parallel Execution
Use when tasks are independent
- "Research the auth system" + "Check test coverage" -- no shared data dependency
- "Refactor module A" + "Refactor module B" -- different files, no conflicts
- "Search for security issues" + "Audit performance" -- different concerns, same codebase
Sequential Execution
Use when later tasks depend on earlier results
- "Understand the schema" then "Write migration based on schema" -- B needs A's output
- "Identify bug location" then "Fix the bug" -- must find before fixing
- "Write implementation" then "Review implementation" -- review needs the code
Frequently Asked Questions
When should a coding agent delegate to sub-agents instead of doing everything itself?
Delegation pays off when the main agent's context window would overflow from intermediate results, when independent tasks can run in parallel to save wall-clock time, or when a task requires a different expertise profile (e.g., a read-only search vs. a code modification). If the task is linear and fits in context, a single agent loop is simpler and faster.
How do sub-agents avoid conflicting file edits?
The two main strategies are role-based separation and worktree isolation. Role-based separation assigns different files or directories to each agent. Worktree isolation gives each agent its own copy of the repository via git worktrees, so they can edit the same files independently and merge changes afterward, much like parallel feature branches.
Can sub-agents spawn their own sub-agents?
In principle yes, but in practice this is usually limited to one level of nesting. Deeply nested delegation creates hard-to-debug chains, makes error recovery complex, and multiplies token costs. Most implementations cap recursion depth to keep the system predictable.
What happens to the sub-agent's context after it finishes?
The sub-agent's full internal context (all the files it read, searches it ran, intermediate reasoning) is discarded. Only the final summary it produces is returned to the orchestrator. This is the key benefit: the orchestrator gets a concise result without paying the context cost of all that exploration.
How does multi-agent coordination differ from microservices?
Both decompose complex work into independent units, but the communication model is different. Microservices use defined APIs and persistent state. Multi-agent systems use natural language prompts and ephemeral context windows. Sub-agents are stateless and short-lived, more like serverless functions than long-running services.