Loop Control and Exit Conditions

When to stop. Step budgets, success predicates, stagnation detection, escape-hatch handoffs — the under-loved half of every agent.

Implementation Intermediate
11 min read
implementation adk loop-control safety

Goal#

Every agent has two halves: the part that does the work, and the part that decides when the work is done. The first half is glamorous — tools, prompts, models, multimodal magic. The second half is what separates demos from systems. An agent that doesn’t know when to stop will, given enough budget, do something you don’t want.

This writeup is about the second half. By the end you should have: a working library of exit-condition predicates, a stagnation detector that handles the common cases, a step- and time-budget pattern that integrates with ADK’s Runner, and a sense of when to add a human-in-the-loop escape hatch instead of trying to make the agent decide.

The examples use ADK, but the patterns are framework-agnostic. If you’re using LangGraph, AutoGen, or a hand-rolled loop, the same exit predicates apply.

Prerequisites#

Before starting:

Step-by-step#

1. Inventory of what can go wrong#

Before writing exit conditions, name the failures they prevent. In rough order of how often they hit production agents:

  • Step explosion. The agent gets stuck in a tool-call/respond loop and never converges. Token cost climbs; latency tanks.
  • Stagnation. The agent calls the same tool with the same arguments, or oscillates between two states, without making progress.
  • Hallucinated completion. The agent declares victory while the task is incomplete — for example, the search tool returned nothing useful but the agent writes a confident answer.
  • Silent partial failure. The agent completes but skipped a required step (e.g., didn’t actually send the email it claimed to send).
  • Runaway side effects. The agent’s actions are succeeding but at a rate or volume that shouldn’t be allowed (sending too many emails, calling an API beyond rate limits).
  • Wall-clock blow-up. A tool blocks for minutes; the agent’s step count is small but its time-spent is large.

Each failure has a different exit-condition shape. A single max_steps catches step explosion but misses everything else.

2. The three universal knobs#

ADK’s Runner accepts these; if your framework doesn’t, build them.

universal_loop_config.py
from google.adk.runners import RunnerConfig
config = RunnerConfig(
max_steps=20, # step budget — hard cap
max_tokens=200_000, # token budget — defends against context blow-up
timeout_seconds=120, # wall-clock cap
)

These are the floor. Every agent in production has them. Tune the numbers per use case, but never ship without all three.

A few rules:

  • max_steps should be at least 2× the expected step count for the hardest legitimate task. Set it tight enough that a runaway is noticeable, loose enough that hard tasks complete.
  • max_tokens scales with how rich the context is — multimodal agents need more; single-tool text agents need less.
  • timeout_seconds is the one users feel directly. For interactive use, 30 seconds is patient; 5 minutes is rude. For batch jobs, an hour is fine.

3. Success predicates#

Beyond budget caps, the right way to exit is because the task is done. ADK’s exit_predicate accepts a callable that’s evaluated after each step.

predicates.py
from typing import Callable, Any
def has_answer_action(state: dict) -> bool:
"""Exit when the last action is `answer` (for web agents, Q&A agents)."""
last = state.get("history", [])[-1] if state.get("history") else None
return bool(last and last.get("action") == "answer")
def output_matches_schema(schema) -> Callable:
"""Exit when the agent's last output validates against the schema."""
def predicate(state):
out = state.get("last_response")
try:
schema.model_validate(out)
return True
except Exception:
return False
return predicate
def file_written(path: str) -> Callable:
"""Exit when the agent wrote a specific file."""
import os
def predicate(state):
return os.path.exists(path) and os.path.getsize(path) > 0
return predicate

A success predicate is part of the task specification, not the model’s job. The model decides what to do; you decide when to stop watching.

Wire one into the runner:

runner = Runner(
agent=agent,
config=RunnerConfig(
max_steps=20,
exit_predicate=output_matches_schema(MyOutputSchema),
),
)

4. Stagnation detection#

Stagnation is the most common silent failure. The agent isn’t crashing or timing out — it’s just not progressing. Detect by comparing recent state to older state and flagging when they look identical.

stagnation.py
from dataclasses import dataclass
from typing import Any
@dataclass
class StagnationCheck:
"""Detects when the agent has stalled.
Maintains a window of recent action signatures. If the latest K
actions all match, fires.
"""
window: int = 4
def __call__(self, state: dict) -> bool:
history = state.get("history", [])
if len(history) < self.window:
return False
sigs = [self._signature(h) for h in history[-self.window:]]
return len(set(sigs)) == 1
@staticmethod
def _signature(step: dict) -> str:
"""Reduce a step to a comparable string. Tool calls + their args."""
action = step.get("action", "")
args = step.get("args", {})
return f"{action}({sorted(args.items())})"

A subtler stagnation pattern is oscillation — the agent toggles between two states (e.g., clicks A then clicks B then clicks A again). Detect with a longer window and looser equality.

def oscillation(state, window=6, oscillation_set=2) -> bool:
history = state.get("history", [])
if len(history) < window:
return False
sigs = [StagnationCheck._signature(h) for h in history[-window:]]
return len(set(sigs)) <= oscillation_set

When the detector fires, the agent is probably stuck. The right response depends on the system: abort and surface the failure, force a different action via a hint, or hand off to a human.

5. Combining predicates#

In practice you want multiple exit conditions running in parallel. The first one to fire wins.

combine.py
from typing import Callable, Iterable
def any_of(*predicates: Callable) -> Callable:
def combined(state):
for p in predicates:
if p(state):
return True
return False
return combined
def all_of(*predicates: Callable) -> Callable:
def combined(state):
return all(p(state) for p in predicates)
return combined
# Use:
runner = Runner(
agent=agent,
config=RunnerConfig(
max_steps=30,
exit_predicate=any_of(
has_answer_action,
StagnationCheck(window=4),
output_matches_schema(MyOutputSchema),
),
),
)

For richer logic, the predicate can return a tagged value — success, stagnation, budget — so the post-loop code knows why the loop ended. The downstream UX depends on whether the agent stopped because it finished, gave up, or ran out of time.

tagged_exit.py
class ExitReason:
SUCCESS = "success"
STAGNATION = "stagnation"
BUDGET = "budget"
USER_INTERVENTION = "user_intervention"
def tagged_predicate(state):
if state["history"] and state["history"][-1].get("action") == "answer":
return ExitReason.SUCCESS
if StagnationCheck(window=4)(state):
return ExitReason.STAGNATION
return None # keep going

6. The escape hatch — handing off to a human#

When the agent can’t make progress and the task is high-value, the best exit isn’t a failure; it’s a handoff.

handoff.py
class HumanHandoff(Exception):
def __init__(self, reason: str, state: dict, suggested_action: str | None = None):
self.reason = reason
self.state = state
self.suggested_action = suggested_action
def maybe_handoff(state) -> bool:
if StagnationCheck(window=4)(state):
raise HumanHandoff(
reason="agent appears stuck",
state=state,
suggested_action=(
"review the last few actions; either provide a hint or abort"
),
)
return False

The handoff isn’t an exit predicate in the same sense — it’s an interrupt. The agent doesn’t decide to hand off; the loop infrastructure does. This separation matters: the agent doesn’t get to claim “I tried my best” while doing the wrong thing.

Handoff design choices:

  • Synchronous (interactive) — the loop blocks waiting for a human response. Right for assistant-shaped agents where a user is present.
  • Asynchronous (queue) — the agent posts a “needs review” record and exits. A human picks it up later. Right for batch agents.
  • Tiered — first hand off to the user, then escalate to support, then escalate to engineering. Each tier has different latency expectations.

7. Budget exhaustion as a first-class failure mode#

When the agent runs out of budget, the failure record matters. Don’t just return None.

exhaustion.py
from dataclasses import dataclass
@dataclass
class AgentResult:
reason: str # success | budget | stagnation | error | handoff
output: Any
steps_used: int
tokens_used: int
seconds_elapsed: float
history: list
def run_with_budget(runner, query, budget):
import time
start = time.time()
try:
result = runner.run(query)
return AgentResult(
reason="success" if result.exit_reason == "success" else "budget",
output=result.final_response,
steps_used=len(result.steps),
tokens_used=result.token_count,
seconds_elapsed=time.time() - start,
history=result.steps,
)
except TimeoutError:
return AgentResult(reason="timeout", ...)

The structured result lets the caller distinguish “completed” from “ran out of time” from “got stuck” — three different downstream behaviours, three different log lines, three different alerts.

8. Observability for exits#

You can’t tune exit conditions you can’t observe. Log per run:

  • Final exit reason (the tagged value)
  • Steps used vs budget
  • Tokens used vs budget
  • Time elapsed vs budget
  • Stagnation detector triggers and the action signatures that triggered them
  • Handoff occurrences and the suggested action that was attached

A weekly aggregation tells you whether max_steps is too tight (lots of budget exits, missing real success) or too loose (most runs use a fraction of the budget but the few runaway runs are expensive). Tune from data, not from intuition.

Code structure#

A reusable shape for an agent project’s loop-control module:

my-agent/
├── agent.py
├── loop/
│ ├── __init__.py
│ ├── config.py (default RunnerConfig values)
│ ├── predicates.py (has_answer_action, output_matches_schema, ...)
│ ├── stagnation.py (StagnationCheck, oscillation)
│ ├── handoff.py (HumanHandoff exception + runner)
│ ├── observability.py (structured logging hooks)
│ └── result.py (AgentResult dataclass)
└── main.py (assembles everything)

Top-level usage:

main.py
from google.adk.runners import Runner, RunnerConfig
from agent import build_agent
from loop.predicates import has_answer_action, output_matches_schema
from loop.stagnation import StagnationCheck
from loop.handoff import HumanHandoff
from loop.result import AgentResult
def run(query: str) -> AgentResult:
agent = build_agent()
runner = Runner(
agent=agent,
config=RunnerConfig(
max_steps=20,
max_tokens=200_000,
timeout_seconds=120,
exit_predicate=lambda s: (
has_answer_action(s) or StagnationCheck(window=4)(s)
),
),
)
try:
return runner.run(query)
except HumanHandoff as h:
return AgentResult(reason="handoff", ...)

This shape keeps the agent code (tools, prompts, model) cleanly separated from the loop code (budgets, predicates, exits). When the agent’s capability changes, you don’t touch the loop; when the loop policy changes (tighter budgets, new exit conditions), you don’t touch the agent.

Loop control and exit conditions#

This section, in a writeup about loop control, is the meta-summary — the rules at a glance.

Budget Success Failure
───────────────── ────────────── ─────────────────
max_steps answer action stagnation
max_tokens output schema oscillation
timeout file written tool error rate
rate limit custom predicate budget exhausted
user intervention

Run all three columns. Budget caps protect against runaway. Success predicates exit cleanly when the task is done. Failure predicates surface the silent failures.

Common pitfalls#

The recurring failures of poorly-designed loop control:

  • max_steps only. The single most common mistake. Step budget alone misses time blow-ups, token blow-ups, and stagnation. All four knobs matter.
  • Default budgets too generous. “We’ll tighten it later” never happens. Default to safe; loosen specifically when measurements demand it.
  • Predicates that read the agent’s own claim. A predicate that exits when the agent says “I’m done” is the agent grading its own homework. Validate against an observable side effect (file exists, API returned 2xx, schema validates) — not against text the agent generated.
  • No tagged exit reasons. Returning None on failure forces the caller to guess what went wrong. A structured AgentResult with a reason tag is the minimum viable contract.
  • Stagnation detection with window=2. Two identical actions is normal noise. Use window=4 or more.
  • Handoffs that don’t carry state. A human picks up a handoff and has to re-derive what the agent was doing. The handoff record must include task, history, and a suggested next action.
  • Treating budget exhaustion as success. If the agent burned through max_steps, the final answer is suspect. Don’t return it as if the agent had completed normally; tag it.
  • No observability on exits. You can’t tune what you can’t measure. Log every exit with its reason; aggregate weekly; tune from data.
Why is loop control under-loved?

Two reasons. First, it isn’t where the capability lives — the agent’s intelligence is in the model, tools, and prompts. Exit conditions feel like plumbing. Second, in development the budgets are rarely hit because you’re driving the agent through clean trajectories on small inputs. The budget-and-stagnation cases only surface at scale, in production, when an edge-case input puts the agent into an unfamiliar regime. The discipline is to write the budgets and exit predicates first, before you fully understand which capabilities the agent will have — because by the time you know, the budgets have already silently saved you a dozen times.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.