The Agent Loop
How agentic coding assistants use a read-think-act-observe cycle to solve multi-step programming tasks
Why the Loop Pattern Exists
Large language models are powerful reasoners, but they are fundamentally text-in, text-out systems. They cannot open a file, run a terminal command, or check if their code compiles. They can only produce strings of characters. This creates a gap: the model can figure out what needs to be done but has no way to do it.
The agent loop bridges this gap by wrapping the LLM in a cycle. A harness program sits between the model and the outside world. When the model decides it needs to read a file, it emits a structured tool call. The harness executes that call against the real filesystem and feeds the result back as a new message. Now the model can see what the file contains, reason about the next step, and emit another tool call. This continues until the model decides the task is complete and responds with plain text instead of a tool call.
Think of it this way: the LLM is a brain in a jar. The tool system gives it hands to manipulate code and eyes to observe the results. The loop is the nervous system connecting them. Without it, the model can only give you advice. With it, the model can actually fix your code.
The Core Loop Visualized
Watch a token circulate through the agent loop. Each phase lights up as the agent processes a real task: the user asks to fix a failing test, and the agent searches, reads, diagnoses, edits, and verifies.
Anatomy of a Single Iteration
Each trip around the loop has four distinct phases. Understanding them is key to understanding why agents behave the way they do.
Think
The model receives the full conversation so far, including all prior tool results. It reasons about what to do next. This is where chain-of-thought happens. The model might consider multiple approaches, weigh trade-offs, or revise its plan based on new information. The output is either a tool call (continuing the loop) or a text response (ending it).
Select Tool
Rather than producing free-form text, the model emits a structured JSON object specifying which tool to call and with what arguments. This is not just string matching; the model has internalized the tool schemas from the system prompt and generates valid calls. It can call file search, code editing, terminal execution, web fetching, or spawn sub-agents.
Execute
The harness program (not the model) executes the tool call. This is the only point where real side effects happen. A file edit modifies actual bytes on disk. A bash command runs a real process. The model never touches the filesystem directly; the harness acts as a privileged intermediary, often checking permissions before executing.
Observe
The tool's output (file contents, command output, error messages, search results) is appended to the conversation as a new message. The model now has fresh information it did not have before. This is what makes the loop powerful: each observation changes the model's understanding, allowing it to adapt its strategy in real time.
When Does the Loop Stop?
The loop is not infinite. Several conditions cause it to terminate, and understanding these explains many observed agent behaviors.
Task Complete
The model decides it has finished. Instead of emitting a tool call, it produces a text response summarizing what was done. This is the happy path. The model has agency over when to stop, which means it sometimes stops too early (missing edge cases) or too late (over-engineering).
Unrecoverable Error
A tool returns an error the model cannot work around, such as a permission denial on a protected file, a network timeout when fetching a required resource, or a compilation failure it cannot diagnose. The model will typically report the error to the user.
User Interruption
The user presses Escape or a stop button. Because the loop streams its reasoning in real time, users can watch the agent's thought process and interrupt if it goes off track. This is a critical feedback mechanism that prevents runaway loops.
Token Budget Exhausted
The context window fills up. On a long task with many tool results, the conversation can exceed 200K tokens. At that point, older messages must be evicted or summarized. If the essential context gets evicted, the agent may lose track of the task. Some systems set a hard turn limit as a guardrail.
Planning vs. Step-by-Step Execution
There are two dominant strategies for how agents approach multi-step tasks, and most real systems use a hybrid.
Plan-First Approach
Strengths
- Provides a roadmap the user can review before execution starts
- Helps prevent forgetting steps in complex tasks
- Makes progress visible and predictable
Weaknesses
- Plans often become stale as the agent discovers new information
- Consumes tokens maintaining a todo structure
- Can lead to rigid execution when flexibility is needed
Reactive Step-by-Step
Strengths
- Adapts naturally as new information is discovered
- No overhead from maintaining a plan structure
- Works well for exploratory and debugging tasks
Weaknesses
- Can lose the forest for the trees on large tasks
- Harder for users to predict what the agent will do next
- Risk of going in circles without a high-level map
In practice, effective agents blend both approaches. They form a loose mental plan during the first "think" phase but stay flexible enough to deviate when reality diverges from expectations. The best results come from planning at the right granularity: broad enough to maintain direction, detailed enough for the immediate next step.
Backtracking and Error Recovery
One of the most powerful properties of the agent loop is its ability to recover from mistakes. Because every tool result (including errors) is fed back to the model, the agent can see exactly what went wrong and try something different. This is fundamentally different from a script, which would simply fail and stop.
auth.ts line 42TypeError: Cannot read property 'token' of undefinedsession.ts, not in auth.tsauth.ts change and fixes the null check in session.tsThis backtracking ability is why agent loops handle ambiguous tasks far better than simple code generation. The model does not need to get it right on the first try. It just needs to be able to recognize failure and adjust.
Streaming: Transparency Builds Trust
A critical UX decision in agent systems is making the loop visible to the user. Rather than running silently and returning a final result, streaming agents show their reasoning token by token. The user watches the agent think through the problem, sees it pick a tool, observes the tool output, and follows the next reasoning step.
Streaming serves three purposes. First, it lets users verify the agent's understanding before it acts. If the model misunderstands the task, the user can interrupt immediately rather than waiting for an incorrect result. Second, it builds trust by showing the agent's work. A black box that silently edits your codebase is terrifying; one that explains each step is a collaborator. Third, it reduces perceived latency. A 30-second task feels faster when you can watch progress in real time.
Frequently Asked Questions
What happens if the agent gets stuck in an infinite loop?
Several safeguards prevent this. Most agent systems impose a maximum turn count (often 50-200 iterations). The token budget acts as a natural ceiling since every iteration consumes context window space. Users can also interrupt at any time via keyboard shortcuts. Additionally, well-tuned models learn to recognize circular patterns in their own output and break out by trying a different approach or asking the user for clarification.
How does the agent decide which tool to use?
The model sees tool descriptions in its system prompt, which include the tool name, parameter schema, and usage guidelines. Based on its current reasoning about the task, it generates a tool call as structured output. This is not a lookup table or decision tree; it is the same next-token prediction process the model uses for all text generation, but constrained to output valid tool-call JSON. The model learns tool selection patterns during training and from the detailed instructions in its prompt.
Why not just generate all the code at once instead of using a loop?
One-shot code generation works for small, well-defined tasks (write a sort function, create a React component). But real coding tasks are rarely self-contained. You need to understand the existing codebase, find the right files, check how functions are called, verify your change does not break anything, and handle edge cases you discover along the way. The loop lets the agent gather information incrementally, exactly like a human developer who reads code, makes a change, runs tests, and iterates. Without the loop, you would have to stuff the entire codebase into the prompt, which is impractical for any non-trivial project.
How much of the context window do tool results consume?
On a typical task, tool results account for roughly 70-85% of all tokens in the conversation. A single file read can be 500-5000 tokens. A grep search across a codebase might return 2000 tokens of matches. Command output from running a test suite can easily be 3000+ tokens. This is why context window management is so critical; the agent must be selective about which tools it calls and smart about which results to keep as the conversation grows. Most harnesses truncate large outputs and may summarize older results to free up space.
Can the agent undo its own changes if something goes wrong?
Yes, in several ways. The agent can use git to revert files, it can re-edit a file to restore its previous content, or it can use version control to reset to a known good state. Some harnesses also keep internal snapshots of file state before modifications, providing an implicit undo capability. The backtracking pattern described above is the model-level version of undo: the agent sees a test fail, recognizes its change was wrong, and applies a different fix. This combination of model-level reasoning and system-level safeguards makes the loop robust against mistakes.