Multi-Agent Orchestration

Sequential, router, swarm, supervisor-worker. The four shapes of multi-agent coordination and what each costs.

Pattern Intermediate
11 min read
pattern multi-agent orchestration supervisor router

What it is#

Multi-agent orchestration is the pattern of composing two or more specialised agents into a single system that solves a task no single agent would solve as well. Each constituent agent has its own role, its own prompt, its own tool surface, and often its own model — and the orchestration layer decides who talks to whom, in what order, and when the work is done.

The motivation has three threads. First, specialisation: a researcher prompt is different from a writer prompt is different from a critic prompt; bundling them into one mega-prompt degrades each. Second, tool surface management: agents with distinct tool sets are easier to design, audit, and rate-limit than one agent with everything. Third, bounded context: each agent maintains only the conversation history relevant to its role, so context-window pressure scales with the largest single role, not with the sum.

The cost is orchestration overhead. Coordinating multiple agents adds latency, complexity, and failure modes (deadlocks, infinite handoffs, lost messages) that single-agent designs don’t have. Multi-agent is the right answer when the specialisation gains outweigh the coordination cost — and that line is further out than the framework marketing suggests.

When to use it#

Multi-agent is the right shape when:

  • Roles are clearly distinct. Researcher, writer, critic. Triage, diagnostician, second-opinion. Planner, executor, verifier. If you can write a different job description for each, you have multi-agent.
  • Tool surfaces are too big for one agent. Past ~20 well-described tools, model performance on tool selection degrades. Splitting into a sub-agent per domain (one for code, one for search, one for files) recovers that performance.
  • Context-window pressure is the bottleneck. A task that needs to remember 50K tokens of source material and produce 5K tokens of output can be split into a reader-agent that summarises and a writer-agent that uses the summary — each well within its context budget.
  • Different sub-tasks need different models. A vision-and-text task split between a vision model (for grounding) and a text model (for reasoning). A safety-critical task split between a strong but expensive model for the dangerous step and a cheap model for the rest.

Don’t use multi-agent when:

  • Sub-roles are illusory. “Brainstormer, writer, editor” as three agents for a 200-word output is overkill — the same model with one prompt does it cheaper and faster.
  • State sharing dominates. If every agent needs every other agent’s context, the handoff overhead eats the specialisation gains. Use flat ReAct with a memory store instead.
  • Latency is critical. Each handoff is a round-trip. A user-facing chat that needs to answer in under 2 seconds won’t tolerate three sequential agent calls.

How it works#

Four canonical shapes#

The literature converges on roughly four orchestration topologies. They cover most real systems.

1. Sequential pipeline. Agents fire in a fixed order; each produces input for the next.

[User input]
[Researcher agent] → research notes
[Writer agent] → draft
[Editor agent] → final
[Output]

Used when: roles are stable, order is known, and each stage’s output is the next stage’s input. Closest cousin of a traditional ETL pipeline; easiest to reason about, hardest to make adaptive.

2. Router. A router agent inspects the input and dispatches to one of N specialist agents.

[User input]
[Router] → "this is a code question"
[Code agent]
[Output]

Used when: the input space is heterogeneous and each branch needs distinct handling. Classic example: a customer-support assistant routing to billing, technical-support, or sales sub-agents.

3. Supervisor-worker (hierarchical). A supervisor agent decomposes the goal into sub-tasks and delegates to worker agents; workers report back; supervisor decides what’s next.

[Supervisor]
/ | \
[Worker A] [Worker B] [Worker C]
\ | /
[Supervisor]

Used when: the task is long-horizon, decomposable, and the plan may change based on worker results. This is the multi-agent expression of Hierarchical Planning.

4. Swarm / peer-to-peer. Agents communicate directly with each other; no central orchestrator. Handoffs are explicit calls one agent makes to another.

[Agent A] ←→ [Agent B]
↘ ↙
[Agent C]

Used when: the coordination is complex enough that pre-defined topology is too rigid — agents need to decide handoffs based on context. Riskier (no central place to enforce termination), harder to debug, but more flexible. Used in some research-style multi-agent setups and in frameworks that explicitly support agent-to-agent handoff (e.g., AutoGen’s group chat, swarm-style designs).

Communication mechanics#

Across topologies, agents communicate by passing messages — structured payloads that include at minimum a sender, a recipient (explicit or implicit), and a content payload. The structure of the message is the schema of the multi-agent system.

Three common message types:

  • Task assignment — supervisor to worker: “Do X. Return Y.”
  • Result return — worker to supervisor: “Done. Here’s Y.”
  • Query — peer to peer: “Hey, can you help with X?”

Some frameworks expose these as primitives (Google ADK’s transfer calls, AutoGen’s group-chat messages, LangChain’s supervisor pattern). Others let you build them on top of a function-calling primitive: each agent has a handoff_to_X tool that, when called, dispatches the message to agent X.

The schema matters. Free-form prose handoffs (“hey writer, take this and polish it”) lose structure across the boundary. Typed handoffs ({ task: 'edit', target: writer, content: draft, constraints: { word_count: 500 } }) preserve it.

Shared vs isolated state#

Two design choices interact:

  • Shared state. All agents see all messages. Easy to implement (a single conversation log). Risk: context bloat — by message 50, every agent is reading a transcript that no longer fits.
  • Isolated state. Each agent sees only the messages addressed to it (plus a system prompt). Bounded context per agent. Risk: agents don’t have full picture; coordination requires explicit information sharing.

The pragmatic answer is usually a hybrid: a small shared scratchpad (current goal, key facts) plus per-agent isolated histories. The supervisor manages what goes on the scratchpad.

Termination#

When does the system stop? Three common approaches:

  • Terminal agent. One agent in the topology is designated as the terminator. When it speaks, the system returns its output to the user. In sequential pipelines this is the last stage; in supervisor systems it’s the supervisor producing a final answer.
  • Termination predicate. A simple condition checked after each message: “Did the supervisor say ‘DONE’?” or “Has the goal been verified?” Often implemented as a special tool the terminator calls.
  • Budget exhaustion. Step / message / cost limit. The orchestrator stops the system and returns whatever the latest output is.

Most production systems use a combination — a terminal agent or predicate plus a budget as a backstop. Letting a swarm run without a budget is asking for an agent loop that costs four-figure dollars.

Failure modes#

Multi-agent introduces failure modes single-agent doesn’t have:

  • Handoff loop. A sends to B, B sends back to A, A sends back to B. Each pass adds nothing. Mitigation: track handoff history; force a third party (or a terminator) in after N round trips.
  • Lost message. Agent A intends a handoff but emits malformed JSON; the orchestrator fails to parse; the message is dropped silently. Mitigation: strict schema validation, explicit error handling on parse failure.
  • Conflicting state. Two agents update shared state concurrently with incompatible changes. Mitigation: serialise state writes through the supervisor, or designate one agent as the canonical state owner.
  • Specialisation collapse. Agents start sounding the same — the researcher writes prose, the writer adds citations. The system has the cost of multi-agent without the benefit of specialisation. Mitigation: stronger role prompts, distinct tool surfaces, periodic prompt audits.

Variants#

  • Group chat. A single shared transcript with multiple named participants and an outer controller that picks who speaks next. Used in AutoGen-style frameworks. Easy to set up, hard to make efficient at scale.
  • Manager-worker with parallel workers. Supervisor dispatches independent sub-tasks to worker agents in parallel. Latency wins are real when the sub-tasks are truly independent.
  • Critic + actor pair. A two-agent system: the actor produces, the critic reviews. Equivalent to the reflection pattern with explicit role separation.
  • Tool-as-agent. A “tool” is itself an agent — e.g., the main agent has a search tool that’s actually a small ReAct loop with its own browser-driving capability. The boundary between “tool” and “sub-agent” blurs; pick the framing that fits.
  • Hand-off chains. A user-facing agent that hands off to a specialist (booking agent), which hands off to a more specialised agent (flight-search agent). Common in customer-support flows. Each hand-off has its own prompt and authority scope.
  • Voting / ensemble. N agents each independently attempt the task; an aggregator picks the best (by vote, by score, by ensemble). Closer to ML-style ensemble than to traditional multi-agent; expensive but raises ceiling on hard tasks.
When 'multi-agent' is actually a single agent with personas

A practical observation: a lot of “multi-agent systems” in production are actually a single underlying model that gets re-prompted with different system messages for different roles. There is no architectural multi-agent runtime — just a single API client switching prompts. This is fine, and often correct. The pattern is multi-agent in design (role separation, schema’d handoffs, bounded per-role context); the implementation is single-model. Use the lighter implementation unless you have a specific reason (different models per role, parallelism across roles, isolation for security) to make it heavier.

Example systems#

  • MACRS — a literal multi-agent conversational recommender. Different agents handle act planning, recommendation generation, and reflection-on-user-feedback; the orchestration is supervisor-worker with explicit role distinction.
  • ChainBuddy — multi-agent pipeline generator: a requirement-gathering agent collects user intent, then specialist agents generate each component of the pipeline; a final assembly agent composes the result.
  • OpenClaw — sub-agents per modality (calendar, mail, search, reminders) behind a single conversational surface. The user-facing agent routes intent to the right specialist; specialists do not see each other directly.
  • Design exercise: Multi-Agent Medical Diagnosis System — triage agent, diagnosis agent, second-opinion agent, uncertainty-handling agent. The case study walks the full multi-agent design for a safety-critical setting.

Trade-offs#

Multi-agent (specialised roles) — better task quality on complex tasks; bounded context per role; parallelism possible; clear audit boundaries. Higher orchestration complexity; more failure modes; handoff schemas to design and maintain; per-task cost roughly N times a single agent.
Single agent (one prompt, all tools) — simplest to build; single trace to debug; lowest latency. Capacity limit at ~20 tools; mega-prompts blur specialisation; context-window pressure scales with total complexity.

Other axes:

  • Number of agents. Three is often the sweet spot — enough specialisation to matter, not enough to make coordination painful. Past five, scrutinise hard. Past ten, you have a distributed system, not an agent.
  • Same model vs different models. Different models can be load-balanced for cost (cheap model on most agents, frontier model on the planner) or for capability (vision model for visual tasks, text model for reasoning). Mixing models adds operational complexity (different APIs, different rate limits, different failure modes).
  • Synchronous vs asynchronous orchestration. Synchronous (each agent blocks the next) is easier to reason about. Asynchronous (agents emit and consume from a shared queue) scales better for high-throughput systems. Most product agents are synchronous; some research-style systems are asynchronous.
  • Framework choice. Google ADK, LangChain (LangGraph), AutoGen, CrewAI, and many others all offer multi-agent primitives. Each makes some topologies easy and others awkward. Match framework to your dominant topology — sequential pipelines fit some, group-chat-style fits others, supervisor-worker fits a third.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.