The ReAct Loop — Agentic · Engineering Playbook

What it is#

ReAct — Reason + Act — is the foundational agent pattern: at each step, the model produces both a reasoning trace (“here’s what I’m thinking”) and an action (“here’s the tool I’m calling”); the action’s result becomes part of the next step’s input, and the loop continues until the model decides it’s done.

The pattern came out of a 2022 paper that argued the interleaving was load-bearing: separating “thinking” from “acting” by a turn boundary lets the model use one to inform the other. Pure-reasoning models could plan but couldn’t ground their plans in environment state. Pure-acting models could execute but drifted off-task with no reflection. ReAct combines them and is, in some form, the substrate of nearly every modern agent — coding agents, web agents, research agents, multi-modal agents.

When to use it#

ReAct is the default pattern for any agent that interacts with an environment. Specifically:

Multi-step tasks with observable intermediate state. The agent does X, sees what happens, decides Y. Coding agents (read file, edit, run tests, repeat), web agents (load page, take screenshot, click, repeat), research agents (search, read, search more).
Tasks where tool results inform the next decision. A database query’s result shapes the next query. An API call’s error message shapes the retry. ReAct’s interleaving is what makes this work.
Tasks with discoverable structure. When the agent doesn’t know up-front how many steps the task takes, ReAct lets it discover the structure as it goes.

Don’t use ReAct when:

The task is one-shot. A single tool call with no follow-up doesn’t need a loop. Function-calling-with-one-call is simpler and cheaper.
The task has a known fixed plan. If you can write the steps out as a deterministic pipeline, do that — ReAct’s flexibility is wasted compute.
Latency is critical and the task is bounded. Each ReAct step is a round-trip to the model. Hard latency budgets often rule out long loops.

How it works#

The two-message-per-step interleave#

A clean ReAct step looks like this in the conversation transcript:

[Assistant]
Thought: I need to find out the current weather in Tokyo before
suggesting an outfit. I'll call the weather API.
Action: get_weather(city="Tokyo")

[Tool result]
{"temp_c": 14, "conditions": "rainy", "wind_kph": 22}

[Assistant]
Thought: Rainy and cool. A light waterproof jacket plus a long-sleeve
shirt should work. No more API calls needed.
Action: respond(message="Light waterproof jacket + long-sleeve shirt — it's 14°C and rainy in Tokyo.")

Two things to notice:

The Thought is visible and structured. Modern frameworks may hide it from the user, but it’s there in the model’s output, and it’s load-bearing — without it, the model often misjudges what to do next.
The Action is typed. Each tool call has a schema; the model isn’t producing free-form text and hoping it parses. Function-calling APIs enforce this.

Step boundaries and the loop control#

A ReAct step is one Thought-Action-Observation triple. The loop runs steps until an exit condition fires:

Goal reached. The model emits a designated “I’m done” action (often final_answer or respond).
Step budget exhausted. The orchestrator counted N steps; that’s the maximum it’ll allow.
Stagnation detected. The same action with the same input was called too many times in a row, or the observations show no progress.
Error budget exceeded. Too many tool errors in a row.
Time budget exceeded. Wall-clock deadline.

The loop control is not in the model — it’s in the orchestrator code wrapping the model. Get this right and your agent is well-behaved under load; get it wrong and you’ll have agents that run for hours or terminate prematurely.

Why the reasoning step matters#

A frequent question: does the Thought actually help, or is it ceremony? The empirical answer is yes, it helps — across most published benchmarks, ReAct (Thought + Action) outperforms Act-only (just emit actions) on multi-step tasks. The mechanism is roughly:

The Thought serves as scratch space — the model writes out its plan and then references it.
The Thought makes the agent legible to debugging — when something goes wrong, the trace shows what the model was trying to do.
The Thought lets the model recover from observations — “that didn’t work; let me try X instead” is hard to do without first naming what didn’t work.

Modern reasoning-trained models (the o1-family, Claude with extended thinking, Gemini Thinking) have moved part of the Thought inside the model’s hidden reasoning trace, before the action emerges. The pattern is the same; the surface changes.

Variants#

The pattern is stable; the variants are stacks on top:

ReAct + Reflection. After the loop ends (or periodically during it), the model is asked to critique its own trajectory. Useful when the task has a verifier (tests pass / fail, output matches schema / doesn’t). Diminishing returns past one reflection cycle.
Plan-then-ReAct. A separate planning step decomposes the goal into sub-goals before the loop begins. The loop then iterates on the sub-goals. Useful for long-horizon work where the model’s ad-hoc planning is unreliable.
Hierarchical ReAct. A high-level loop sets sub-goals; a low-level loop executes each one. Better at long-horizon tasks; harder to debug because you have a nested state machine.
Multi-agent ReAct. Each specialised agent runs its own ReAct loop; a router or supervisor coordinates handoffs. Useful when sub-tasks need distinct system prompts or tool surfaces.
Speculative ReAct. The model proposes multiple candidate actions; the orchestrator runs them in parallel; the best result is taken forward. Aggressive use of compute to reduce latency.

Example systems#

ReAct underlies most named systems in this workbook:

WebVoyager — a multimodal ReAct loop where each step is “look at screenshot, decide on click/type/scroll, execute, observe new screenshot.”
NVIDIA Eureka — a ReAct loop wrapping reward-function generation, with reflection cycles and evolutionary candidate selection.
Claude Code, Cursor, Aider, and most agentic coding tools — ReAct loops over file/shell/search tools.
Autonomous AI Agents (Gen AI) — the conceptual gateway in the Gen AI workbook covers the same pattern from the model-side view.

If you look closely at any modern agent, you’ll find a ReAct loop in there somewhere. The named system’s contribution is usually what’s in the Thought, what’s in the Action surface, or what wraps the loop — not a different fundamental pattern.

Trade-offs#

ReAct (Thought + Action interleaved) — better task success on multi-step problems; trace is debuggable; recovers from observations. Token cost is higher (Thought tokens add up); latency is higher (more round-trips); requires good loop control to avoid runaway.

Act-only (no Thought, just emit actions) — cheaper, faster per step. Lower task success on multi-step problems; trace is harder to debug; recovery is reactive instead of reflective. Acceptable for short loops with strong tool semantics.

Other axes worth knowing:

Visible vs hidden Thought. Showing the Thought to the user is great for debug, terrible for UX in user-facing products. Most production systems log Thought server-side and never show it.
Step budget vs no budget. No budget is a recipe for cost overruns. A budget that’s too low cuts off real work. The right answer is per-task — easy tasks need ~3 steps, hard ones need 20+.
Single tool per step vs parallel tools. ReAct as originally described is sequential — one tool per step. Modern function-calling APIs allow parallel calls, which lets the model batch independent operations. The pattern still works; the per-step cost goes down.