Multi-Agent Architecture

How Coding Agents Spawn and Coordinate Specialized Workers

Some coding tasks are too large or complex for a single agent loop to handle well. Researching a sprawling codebase, refactoring multiple modules simultaneously, or tasks that need both deep exploration and careful editing can exhaust a single agent's context window or take too long running sequentially. Multi-agent architecture solves this by letting an orchestrator agent delegate sub-tasks to specialized workers that run independently, each with their own context window and tool access.

Think of it like a lead engineer who breaks a project into work items and assigns them to team members. The lead doesn't need to know every line each person reads -- they just need the final result. This is the core insight: encapsulation of intermediate work.

Interactive: Orchestrator and Sub-Agents

Watch the orchestrator delegate tasks to sub-agents and receive results. Toggle between parallel and sequential execution modes.

Orchestrator Explorer Agent Planner Agent Implementer Agent Reviewer Agent

Execution Timeline

0s --

Why a Single Agent Isn't Always Enough

Context Window Pressure

When an agent searches across dozens of files, the accumulated tool results fill the context window rapidly. A sub-agent can absorb all that noise, distill it into a paragraph, and return only the relevant findings. The orchestrator's context stays clean for decision-making.

Wall-Clock Time

If two research tasks are independent -- say, understanding the auth module and checking test coverage -- running them in parallel halves the wait time. A single agent would have to do them one after another, doubling the latency.

Expertise Specialization

Different tasks benefit from different tool configurations and system prompts. An exploration agent needs fast search tools and read-only access. An implementation agent needs file editing and shell access. Splitting these concerns produces better results than one generalist trying to do everything.

Failure Isolation

If a sub-agent goes down a wrong path or runs into an error, the orchestrator can catch the failure, adjust the prompt, and retry. The main session is never corrupted by a sub-agent's mistakes. This is the same principle as process isolation in operating systems.

The Orchestrator Pattern

How the main agent decides what to delegate, to whom, and how to integrate results.

1

Task Decomposition

The orchestrator analyzes the user's request and identifies sub-tasks that can be delegated. It considers dependencies between tasks: if Task B needs the output of Task A, they must run sequentially. Independent tasks are candidates for parallel execution.

2

Brief Writing

For each sub-task, the orchestrator writes a detailed prompt that includes: what to accomplish, which files or areas to focus on, what format the result should take, and any constraints. A well-written brief is the single biggest factor in sub-agent success.

3

Agent Spawning

Each sub-agent is created with a fresh context window containing only the brief and its tool definitions. It has no knowledge of the orchestrator's conversation history, other sub-agents, or the broader task. This isolation is intentional -- it prevents context pollution and keeps each agent focused.

4

Execution and Monitoring

Sub-agents run their own agentic loops: reading files, searching code, making edits, running tests. The orchestrator waits for results. In parallel mode, it waits for all agents to complete. In sequential mode, it processes results one at a time, potentially adjusting later briefs based on earlier results.

5

Result Integration

Each sub-agent returns a concise summary. The orchestrator reads all summaries, synthesizes them into a coherent picture, and either delivers the final answer to the user or spawns additional agents for follow-up work. The sub-agents' full internal traces are discarded.

Context Isolation: The Key Insight

Orchestrator Context
User request System prompt Brief for Agent A Brief for Agent B Result from Agent A (200 tokens) Result from Agent B (150 tokens)
Total: ~2K tokens used for delegation
vs.
What Agent A Actually Processed
Brief from orchestrator grep results: 15 files (8K tokens) Read auth.ts (3K tokens) Read middleware.ts (2K tokens) Read test files (5K tokens) Bash output (1K tokens) Final summary (200 tokens)
Total: ~19K tokens -- only 200 returned

The orchestrator pays only 200 tokens for what was 19K tokens of work. This 95% compression is why delegation scales -- you can run five sub-agents and still use fewer tokens in the orchestrator's context than if it had done all the research itself.

Specialized Agent Types

🔍

Explorer Agent

Tools: Read, Grep, Glob

Optimized for fast, read-only codebase research. Has search and file reading tools but no ability to modify files or run commands. Ideal for tasks like "find all usages of the UserAuth class" or "understand the database schema." Low risk, high speed.

Read-only Fast Low risk
📋

Planner Agent

Tools: Read, Grep, Glob

Analyzes architecture and proposes changes without executing them. Reads code to understand structure, then produces a plan: which files to modify, what changes to make, and in what order. Useful for complex refactors where you want to review the plan before execution.

Read-only Analytical Plan output
🔨

Implementer Agent

Tools: Read, Edit, Write, Bash, Grep

Full tool access for making code changes. Can read files, edit them, create new files, and run shell commands (tests, linters, builds). Usually runs in a worktree to isolate changes. The most capable but also highest-risk agent type.

Full access Worktree isolated Higher risk

Reviewer Agent

Tools: Read, Grep, Bash (test-only)

Reviews changes made by other agents. Can read the diff, run the test suite, check for common issues, and produce a review report. Often runs after an implementer agent finishes, providing a second set of eyes before changes are accepted.

Verification Test runner Quality gate

Worktree Isolation for Safe Parallel Edits

When multiple agents need to modify code simultaneously, they can't all edit the same working directory -- they'd overwrite each other's changes. The solution is git worktree isolation: each agent gets its own checkout of the repository at a separate filesystem path.

Main Repo
/project (branch: main)
Orchestrator
Worktree A
/tmp/wt-agent-1
Agent 1: refactor auth
Worktree B
/tmp/wt-agent-2
Agent 2: add tests
Worktree C
/tmp/wt-agent-3
Agent 3: update docs

How It Works

Git worktrees allow multiple working directories to share the same repository history. Each worktree can be on a different branch or at a different commit. The agents edit files in their own worktree, and when they finish, their changes are merged back to the main branch -- similar to how developers work on feature branches.

Merge Conflicts

If two agents edit the same file, the merge step may produce conflicts. The orchestrator can resolve these automatically (using the LLM's understanding of both changes) or flag them for the user. Careful task decomposition minimizes overlap: assign different files or modules to different agents when possible.

Communication: Prompt In, Summary Out

Orchestrator
Prompt: "Search the auth module for all JWT validation logic. List each file and function, note any security concerns."
Sub-Agent
Internal (not visible to orchestrator)
Grep for JWT... Read auth/validate.ts Read auth/middleware.ts Read auth/refresh.ts Analyze patterns...
Sub-Agent
Result: "Found 3 files with JWT validation. validate.ts handles token verification, middleware.ts checks on each request, refresh.ts manages rotation. Security concern: no expiry check in refresh flow."
Orchestrator

This is encapsulation applied to AI reasoning. The orchestrator doesn't see the sub-agent's grep results, file contents, or internal reasoning steps -- only the final answer. This boundary is what makes the architecture scalable.

Failure Handling and Recovery

Sub-Agent Timeout

If a sub-agent takes too long (stuck in a loop, overly broad search), the orchestrator can kill it and either retry with a more specific prompt or skip that sub-task entirely.

Wrong Results

If the returned summary seems incomplete or contradictory, the orchestrator can spawn a new agent with a refined brief. "The previous search missed test files -- please also check the __tests__ directory."

Tool Errors

If a sub-agent's tool calls fail (file not found, command error), the sub-agent handles it within its own loop. If it can't recover, it reports the failure. The orchestrator decides whether to retry or adapt.

Context Overflow

If a sub-agent's context window fills up before it finishes, it returns a partial result with what it found so far. The orchestrator can spawn a continuation agent to pick up where it left off.

Parallel vs. Sequential: When to Use Which

Parallel Execution

Use when tasks are independent

  • "Research the auth system" + "Check test coverage" -- no shared data dependency
  • "Refactor module A" + "Refactor module B" -- different files, no conflicts
  • "Search for security issues" + "Audit performance" -- different concerns, same codebase
Benefit: Wall-clock time = slowest agent, not sum of all agents

Sequential Execution

Use when later tasks depend on earlier results

  • "Understand the schema" then "Write migration based on schema" -- B needs A's output
  • "Identify bug location" then "Fix the bug" -- must find before fixing
  • "Write implementation" then "Review implementation" -- review needs the code
Benefit: Each agent builds on confirmed results from prior agents

Frequently Asked Questions

When should a coding agent delegate to sub-agents instead of doing everything itself?

Delegation pays off when the main agent's context window would overflow from intermediate results, when independent tasks can run in parallel to save wall-clock time, or when a task requires a different expertise profile (e.g., a read-only search vs. a code modification). If the task is linear and fits in context, a single agent loop is simpler and faster.

How do sub-agents avoid conflicting file edits?

The two main strategies are role-based separation and worktree isolation. Role-based separation assigns different files or directories to each agent. Worktree isolation gives each agent its own copy of the repository via git worktrees, so they can edit the same files independently and merge changes afterward, much like parallel feature branches.

Can sub-agents spawn their own sub-agents?

In principle yes, but in practice this is usually limited to one level of nesting. Deeply nested delegation creates hard-to-debug chains, makes error recovery complex, and multiplies token costs. Most implementations cap recursion depth to keep the system predictable.

What happens to the sub-agent's context after it finishes?

The sub-agent's full internal context (all the files it read, searches it ran, intermediate reasoning) is discarded. Only the final summary it produces is returned to the orchestrator. This is the key benefit: the orchestrator gets a concise result without paying the context cost of all that exploration.

How does multi-agent coordination differ from microservices?

Both decompose complex work into independent units, but the communication model is different. Microservices use defined APIs and persistent state. Multi-agent systems use natural language prompts and ephemeral context windows. Sub-agents are stateless and short-lived, more like serverless functions than long-running services.