Human-in-the-Loop

When and how to ask for human confirmation, feedback, or override. Designing the handoff so it's neither annoying nor unsafe.

Pattern Foundational
11 min read
pattern human-in-the-loop confirmation oversight safety

What it is#

Human-in-the-loop (HITL) is the pattern where an agent’s autonomous run is interrupted, at chosen points, to ask a human for input — confirmation, correction, choice, or veto — before continuing. The agent does not stop being autonomous; it stops being unilateral. The human is a callable participant in the loop, not a final reviewer at the end.

The motivation is the gap between “the agent can do this” and “the agent should do this without supervision”. Many actions are reversible enough or low-stakes enough that asking the human first is overkill. Other actions are irreversible (send the email, charge the card, delete the row) or high-stakes enough (medical advice, financial transfer, legal commitment) that pure autonomy is wrong even if the model would be right 99% of the time. HITL is the pattern that decides which is which and routes each action accordingly.

Done well, HITL is invisible — the human is asked exactly the questions that matter, no more. Done badly, HITL is either constant interruption (annoying to the point of useless) or rare interruption on the wrong things (the agent did everything dangerous unilaterally and asked for confirmation on the trivial step at the end).

When to use it#

HITL is essential when:

  • The action is irreversible. Sending external communications, committing financial transactions, deleting data, executing trades. The cost of being wrong is non-recoverable.
  • The action is high-stakes. Medical recommendations, legal advice, hiring decisions, security-critical operations. Even with high model accuracy, the tail risk justifies a human check.
  • Trust is being established. New agent in production, new user, new tool — early uses benefit from confirmation gates that get relaxed as confidence builds.
  • The model is uncertain. The agent itself flagged that it’s unsure. Asking the human is cheaper than guessing wrong.
  • Regulation or policy requires it. Some domains (medical, financial, legal) have explicit human-oversight requirements that the system must satisfy by construction.

HITL is overkill when:

  • The action is fully reversible and low-stakes. Reading data, scratching a working memory note, calling a free-tier API with no side effects.
  • Latency demands prevent it. A real-time chat surface can’t pause to ask “are you sure?” before every response. Async approval-workflows fit some products, but not all.
  • The human approver has no useful information. If the human can only say “yes” without context, the gate adds friction without adding safety — there’s no judgement happening. Better to remove the gate or fix the framing.

How it works#

The three handoff types#

Not every human-in-the-loop interaction is the same. Distinguish three patterns:

1. Confirmation. The agent has a proposed action; the human says yes or no. The information flow is one-way (agent → human → agent), and the human’s decision space is bounded (approve / reject / approve-with-edit).

Example: “I’m about to send this email to your team. OK to proceed?”

2. Choice. The agent presents N candidate actions or answers; the human picks one. The agent contributed the candidates; the human contributed the selection.

Example: “I found three flights that match. Which would you like to book?”

3. Open input. The agent stops and asks for free-form information it doesn’t have. The human provides the input; the agent resumes with it.

Example: “I need a budget for this trip. What’s the maximum you’d like to spend?”

Each shape costs the human a different amount of attention. Confirmation is cheapest (one bit of input). Choice is medium (a few bits). Open input is expensive (cognitive load, typing). A well-designed HITL system minimises the bits requested — open input only when the agent genuinely couldn’t infer it.

The handoff lifecycle#

A clean handoff has four phases:

  1. Pause. The agent recognises a HITL point in its loop and stops emitting actions. State (current task, plan, intermediate findings) is checkpointed.
  2. Ask. The agent emits a structured prompt to the human — context, proposed action (if any), the specific question, optional candidates.
  3. Wait. The agent does nothing until the human responds. Depending on the product, this might be milliseconds (chat surface), minutes (notification on a phone), or hours/days (async approval workflow).
  4. Resume. The human’s response is parsed; the agent’s state is restored; the loop continues with the response incorporated.

The pause/resume mechanics are where the engineering goes. In a synchronous chat surface, pause is trivial — the next assistant turn just happens to be a question. In an async workflow (the agent paused 6 hours ago waiting for approval), pause means persisting full agent state to durable storage and rehydrating on resume. The longer the pause, the heavier the persistence.

Picking the interruption points#

The hardest design question: where does the agent pause? Five common triggers:

  • Action allowlist / blocklist. Certain tool calls are always gated (“send_email” requires confirmation; “search” does not). Static, easy to reason about, mirrored in the safety layer.
  • Risk score. Each proposed action is scored for risk by a (small) evaluator; calls above a threshold are gated. More flexible than allowlists; needs careful calibration to avoid score drift.
  • Confidence threshold. The agent emits a self-reported confidence with each major decision; low-confidence decisions are gated for human input. Useful but rests on the model’s confidence being well-calibrated, which it often isn’t.
  • Critical path checkpoints. Specific milestones in a longer task (“plan finalised”, “draft ready for sending”, “transaction about to commit”) have fixed gates regardless of the specific actions. Most predictable for users.
  • User-driven pause. The user can interject at any time (“stop”, “wait, change X”). The agent must support out-of-band interruption without losing state.

Most production systems combine several — an action allowlist for the hard cases, checkpoints for the major milestones, and user-driven pause as a backstop. Confidence and risk score are tempting but rarely robust enough to be primary triggers.

Pre-action vs post-action handoff#

Two different shapes that look similar:

  • Pre-action. Agent says “I plan to do X. Approve?” Human approves; agent executes; result observed. Lowest risk — nothing happens without consent. Higher friction — the human is asked even when X turns out to be fine.
  • Post-action. Agent does X; agent says “I did X. Result: Y. OK?” Human reviews; if rejected, agent attempts undo. Higher risk — the action happened first. Lower friction — most actions succeed silently.

Pick pre-action for irreversible / high-stakes actions; post-action for reversible / low-stakes ones. The line is the cost-of-undo curve: if undo is free (delete a draft, retract a working-memory note), post-action is fine; if undo is impossible (send an email, refund a payment), pre-action is mandatory.

Asynchronous HITL#

Many real workflows can’t wait synchronously. The agent kicks off in the morning; needs human approval on the third step; the human is in a meeting; the approval comes 90 minutes later. Async HITL is the pattern:

  • Agent emits an approval request to a queue (Slack message, email, mobile push, ticket in an approvals dashboard).
  • State is checkpointed to durable storage, keyed by an approval-token.
  • Human responds at their convenience via the same surface (button click, reply, action in the dashboard).
  • Agent is rehydrated when the response arrives, sees the human input, and resumes.

The engineering here is similar to long-running workflow systems generally — durable state, idempotent resume, timeout handling (what if the human never replies?). Frameworks differ in how much of this they give you out of the box.

Showing your work#

When asking for confirmation, show the human what they need to confirm. Three pieces of information are usually load-bearing:

  • What’s about to happen. Concretely. “Send this email to alice@x.com” beats “send the email”.
  • Why. A one-liner from the agent’s reasoning. “Because she asked about the report.”
  • What could go wrong. Failure modes the human should consider. “Note: this email mentions Q4 numbers that haven’t been published externally yet.”

Hiding the third one is a common product mistake — the human is asked to confirm without enough information to refuse intelligently, and the confirmation becomes pro-forma. The whole point of HITL is that the human is making a decision. Make sure they can.

Variants#

  • Confirmation gate (default). Agent pauses before specific actions and asks yes/no. The simplest and most common shape.
  • Approval workflow. Async confirmation routed through a separate approver (a manager, a security team, a designated reviewer). Useful when the user of the agent is not the right approver.
  • Co-pilot / interactive. The human is in the loop continuously, not at gates. The agent suggests; the human accepts, rejects, or edits each step. Common in coding agents and writing agents.
  • Escalation on failure. Fully autonomous by default; pause to a human only when the agent fails repeatedly or hits a confidence floor. Lowest friction; depends on the agent reliably knowing when it’s failing.
  • Override / brake. The human can interject anytime to stop the agent. Always-on safety valve, even in mostly-autonomous designs. The hard part is preserving state across the interruption.
  • Sampled review. The agent runs autonomously; a fraction of its actions are sampled for human review after the fact. Useful for quality monitoring; not a safety mechanism on its own.
  • Active learning loop. The human’s corrections are captured and used to retrain or fine-tune the agent over time. The handoff produces both an immediate decision and a longer-term improvement signal.
The 'alert fatigue' failure mode

A specific HITL anti-pattern worth knowing: gating too many actions on confirmation. Users habituate. By week two of “agent asks 30 times a day”, the user is clicking approve without reading. The gate is now safety theatre — present in the system, absent in practice.

The cure is to be aggressive about removing gates that don’t earn their keep. Audit your HITL points monthly: how often is the human asked? How often do they actually reject? A gate with a 99% approval rate and no recent rejections is a gate you can probably remove (or replace with sampled review). Reserve HITL for the actions where the human meaningfully changes the outcome.

Example systems#

  • OpenClaw — sensitive actions (send mail, create calendar event) go through explicit user confirmation; pure lookups (search calendar, read mail) do not. The action allowlist is the central design choice.
  • MACRS — every recommendation surfaces back to the user before the system commits; the user’s reaction is then fed into reflection. HITL is the dominant interaction shape, not an exception.
  • Coding agents (Claude Code, Cursor, Aider). Confirmation is gated on file writes and shell commands; reads and searches are unprompted. Many also support an “auto-approve safe commands” mode that learns over time.
  • Design exercise: Multi-Agent Medical Diagnosis System — explicit human-in-the-loop gates at triage and at every critical recommendation point, because the safety profile of the domain demands it.

Trade-offs#

HITL on the right gates — safety on irreversible / high-stakes actions; user retains control; trust builds over time; lower model risk burden. Latency overhead per gate; design effort to pick the right gates; risk of alert fatigue if over-applied.
Pure autonomy — fastest, lowest friction, scales without human bottleneck. Carries full risk on every action; mistakes are unilateral; trust harder to build; unsuitable for irreversible / high-stakes domains without strong external safety mechanisms.

Other axes:

  • Static gates vs adaptive gates. Static gates (allowlist of actions) are predictable but coarse. Adaptive gates (risk-scored, confidence-thresholded) are sharper but harder to calibrate. Most systems start static and refine into adaptive once they have enough operational data.
  • In-loop vs out-of-loop confirmation. In-loop blocks until the human responds. Out-of-loop sends a notification but proceeds — useful for “FYI, did this” rather than “please approve, did this”. Don’t confuse them; the safety profile is different.
  • Synchronous vs asynchronous resume. Sync resume is simple but limits agent flexibility. Async resume requires durable state but unlocks long-running workflows. Pick the latter only if your domain genuinely needs it.
  • Default-deny vs default-approve. Default-deny on a confirmation gate is safer (silence is rejection); default-approve is friendlier (silence is acceptance). For irreversible actions, default-deny is the right setting; for low-stakes “are you sure?” prompts, default-approve after a timeout can be OK.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.