Reasoning and Planning

Chain-of-thought, tree-of-thought, plan-then-execute. Where the agent's 'thinking' happens before it acts.

Concept Intermediate
8 min read
reasoning planning chain-of-thought react

Summary#

Reasoning is the model’s internal deliberation between observing the world and choosing an action. Planning is the agent-level extension of that — building a multi-step roadmap, possibly with sub-goals, before committing to actions. The two blur together in practice: a chain-of-thought is a one-shot plan, a plan-then-execute pipeline is structured reasoning amortised across steps.

The shape of the reasoning step decides almost everything about the agent’s behaviour. Inline chain-of-thought feels lightweight but is opaque and re-runs every step. Explicit plans are heavier but inspectable, repairable, and reusable. Modern reasoning models (o1-style architectures) push more of this into the model itself — at the cost of latency and visibility. Picking the right reasoning shape for the task is a key design lever.

Why it matters#

Three concrete reasons reasoning-and-planning deserves dedicated design:

  • Reasoning is the most expensive token category. Thinking tokens scale faster than IO tokens, especially with reasoning models. A misconfigured reasoning step can 10x the bill for marginal capability gains.
  • Plans are where you intervene. A bad action is hard to undo. A bad plan is easy to edit before any action runs. Agents that expose an explicit plan let humans (or other agents) review before the blast-radius opens.
  • The model is not always the right planner. For some tasks (deterministic workflows, regulated processes) a hand-written plan with the model in worker-only roles outperforms anything the model would generate. Knowing when to trust the model with planning is itself a skill.

A common anti-pattern is to slap “let’s think step by step” on every prompt and assume reasoning is solved. It isn’t — that produces reasoning tokens but no plan-as-artefact, no inspection point, no reuse, no repair.

How it works#

The reasoning spectrum#

From least to most structured:

  1. Implicit (no chain-of-thought). The model just answers. Cheap and fast; fragile on multi-step tasks. Reserved for trivial steps where the model has already internalised the procedure.
  2. Inline chain-of-thought. The model produces a reasoning trace (“first I’ll… then I’ll…”) and an answer in one response. Cheap, simple, opaque — the trace is not a structured artefact, just text the model wrote.
  3. ReAct-style interleaved thought + action. Each step is “thought → action → observation”. The thought is short, action-oriented, and re-emitted every step. This is the default for most production agents because it composes with tool-use naturally.
  4. Explicit plan-then-execute. A separate planning call produces a structured plan (numbered steps, sub-goals, dependencies). A separate executor walks the plan, calling tools. The plan is a first-class artefact that can be logged, reviewed, edited, replayed.
  5. Tree-of-thought / search. Multiple candidate next-thoughts generated, scored, and explored. A search procedure (BFS, DFS, MCTS, beam) picks the best path. Expensive; reserved for problems where wrong early decisions are very costly.
  6. Reasoning-model reasoning. o1-style models do extended internal reasoning before emitting any output. From the outside it’s a single call; inside, the model is doing search-and-deliberate. Higher latency, often higher quality, opaque traces (most providers hide the reasoning text).

Most production agents pick a primary shape (ReAct, plan-then-execute, or reasoning-model) and selectively combine. Hybrid example: use a reasoning model for the initial plan, a faster cheap model for per-step ReAct execution.

Where planning lives#

Planning can live in several places in the architecture:

  • In the prompt. “Before you act, list your steps.” Cheapest planning; least durable.
  • As an output artefact. A separate tool call writes the plan to a scratchpad or returns it as a structured object. Inspectable, persists across steps.
  • In a dedicated planner agent. A first agent produces a plan, a second agent executes it. The classic supervisor + worker shape. Useful when planning and execution need different models or contexts.
  • In code. The “plan” is a hand-written workflow, and the model fills in steps. Common for regulated or deterministic processes — the agent is doing inference inside a fixed loop, not deciding the loop’s shape.

The choice is mostly about who needs to read the plan. If only the model needs it, in-prompt is fine. If humans need to review it before execution, it must be an artefact. If the plan is reused across runs, it belongs in code or a procedural-memory store.

Plan repair#

Plans go wrong. They make wrong assumptions, miss dependencies, propose actions that fail at runtime. Three repair strategies:

  • Replan from scratch. When an action fails, throw out the rest of the plan and generate a new one from the current state. Robust but expensive.
  • Local patch. Repair only the affected step(s). Cheaper; assumes the rest of the plan is still valid, which it often isn’t.
  • Reflect-and-revise. After failure, the model writes a short reflection (“I assumed X, but the API requires Y”) and includes it in the prompt for the next attempt. Effective when the failure carries useful information.

A robust agent has at least one repair pathway. An agent that always replans from scratch can loop forever on a recurring environment error.

Variants and trade-offs#

ReAct (interleaved) — short reasoning per step, tool call, observation, repeat. Pros: natural fit for tool-use, easy to log, recovers from per-step errors gracefully. Cons: no global plan, can wander on long horizons, each step re-derives context.
Plan-then-execute — generate a plan once, execute against it. Pros: inspectable plan artefact, fewer reasoning calls, easier human review. Cons: brittle if environment changes mid-execution, replanning logic is non-trivial, planning quality dominates outcome.

Other dimensions:

  • Single-model vs router. A router sends “easy” steps to a cheap model and “hard” steps to a reasoning model. Saves cost when the task has mixed difficulty; introduces routing-quality bugs.
  • Tree-of-thought vs single-path. Tree-of-thought explores multiple candidate next-steps and picks the best. Worth it on problems with high downstream cost of bad decisions (planning a research strategy, choosing an architecture). Overkill on tool-heavy day-to-day tasks.
  • Self-consistency. Run the same reasoning step N times and majority-vote on the answer. Cheap variance-reduction, often more effective than fancier search; works when the task has a discrete answer.
  • Reasoning-model vs prompt-engineered reasoning. A reasoning model does its own search internally; a prompt-engineered chain-of-thought on a base model gives you visibility but less raw capability per token. As reasoning models mature, the trade-off is shifting toward “let the model do it” for most cases.
When each shape actually wins
  • ReAct: tool-heavy short-to-medium horizon tasks (coding agent, web agent, support bot). The default; pick something else only if you have a reason.
  • Plan-then-execute: tasks with > ~10 dependent steps, tasks where humans must approve before execution, tasks where the same plan will run repeatedly.
  • Tree-of-thought: high-stakes one-shot decisions where the cost of acting on the wrong path is high (strategy generation, architecture proposals, research planning).
  • Reasoning model: math, code, complex multi-hop reasoning, anywhere the per-step quality matters more than visibility.
  • Hand-coded workflow + LLM inside: regulated processes, deterministic pipelines, anything where you’d never let the model decide the workflow shape — give it a fixed structure and let it fill in.

The mistake is treating these as a hierarchy (“reasoning models are best”) rather than a menu. They solve different problems.

When this is asked in interviews#

Reasoning-and-planning is the second-most-common AI design question after “what is an agent”. Interviewers are looking for taxonomy plus opinion.

  • AI-product design loops. “How does your agent decide what to do?” — the expected answer is a specific shape (ReAct / plan-then-execute / etc.) with a reason for picking it.
  • ML engineering loops. “When would you use a reasoning model vs a base model + CoT?” — task structure, latency budget, visibility requirements, per-step quality vs end-to-end quality.
  • Senior platform loops. “Where do plans live in your agent runtime?” — in-prompt, scratchpad, dedicated planner agent, or code. The answer tells the interviewer how seriously you take reviewability and replay.

Common follow-ups:

  • “How would you reduce reasoning cost?” — caching the prompt prefix, routing easy steps to a cheap model, dropping CoT on steps that don’t benefit from it, summarising old trajectory instead of resending raw.
  • “How do you keep the agent from looping forever in its reasoning?” — step budget, stagnation detector (no new state for K steps), explicit “are you stuck?” predicate the agent must answer periodically.
  • “How do you evaluate plan quality?” — independent of execution: have a human or judge-model grade the plan against the goal before any action runs. If the plan is bad, the trajectory will be too.
  • “Have you used tree-of-thought in production?” — most candidates say no, and that’s fine; the right answer is “we considered it for X, decided the latency wasn’t worth the gain”. Showing you can spot when not to use a fancy method is more valuable than name-dropping.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.