Agent Development Kit (ADK) Overview
Google's Agent Development Kit. What's in the box — agent primitives, tools, orchestration, evaluation harness, and the workflow it expects.
Goal#
This writeup is the entry-point to the Implementations topic — what Google’s Agent Development Kit (ADK) is, what primitives it gives you, and how to think about it before you write any code. The follow-on writeups build concrete systems with it (Eureka-style reward loop, multimodal web agent); this one is the orientation.
By the end you should be able to: read an ADK example without confusion, decide whether ADK is the right tool for your problem, and start a hello-world agent project without copy-pasting from the docs.
Prerequisites#
Before you start with ADK, internalise these:
- What Is an AI Agent? — the four-component framing (model, tools, memory, instructions). ADK’s primitives map directly onto these.
- The ReAct Loop — ADK’s default orchestration is ReAct-shaped. If you don’t have the loop in your head, ADK’s
Runnerabstraction won’t make sense. - A working Python environment with model access. ADK is a Python SDK (with a TypeScript variant); you need credentials for at least one supported model provider — Google Gemini (the most thoroughly tested with ADK), OpenAI, Anthropic, or an open-weights model via LiteLLM.
That’s it. ADK doesn’t require you to have used LangChain, AutoGen, or any other framework. Coming in clean is often easier than coming in with prior framework habits.
Step-by-step#
This isn’t a code walkthrough (the next writeup is); it’s the conceptual walkthrough of what ADK has you assemble.
1. Define the goal in words#
Before any code: write down what success looks like for the agent. One paragraph. “The agent receives a research question, searches the web, reads up to five sources, synthesises an answer with citations, and stops.” This is the same first step as any agent design — ADK doesn’t change it.
2. Pick the primary pattern#
ADK supports several orchestration patterns out of the box:
- Single agent with tools. One ReAct loop, one tool surface. The default and most common shape.
- Sequential workflow. Pre-defined steps; each step’s output feeds the next. Useful when the pipeline is fixed.
- Multi-agent (supervisor/worker, router, swarm). Multiple agents collaborating; ADK provides primitives for hand-offs and message-passing.
- Loop with custom controller. You write the loop yourself; ADK provides the per-step primitives.
For a first build, pick single-agent. Move to multi-agent only when the problem genuinely demands it.
3. Design the tool surface#
A tool in ADK is a Python function with a typed signature (the parameters become the JSON schema the model sees) and a docstring (the model reads it to know when to call the tool). Decide:
- What can the agent do (write operations)?
- What can the agent see (read operations)?
- What is off-limits (operations not exposed)?
Five well-named tools beat fifteen vague ones. The model’s context spent describing tools is a tax on every call.
4. Wire memory#
Two layers:
- Working memory. ADK’s
Sessioncarries conversation state across turns. For most loops this is enough. - Long-term memory. ADK doesn’t ship a built-in vector store, but it has hooks for one — typically you’d add a
retrieveandremembertool that the agent calls when relevant. For early builds, skip this layer.
5. Set instructions#
ADK takes a system prompt (the role, the rules, the format, the safety constraints) and in-context examples. Write the system prompt the way you’d write a CLAUDE.md: terse, rule-shaped, with the things that aren’t obvious from the tools.
6. Pick the model#
ADK supports Gemini natively, and other providers via adapters. Model choice affects capability, latency, cost — and the prompt-engineering shape, since each frontier model has slightly different conventions for tool-call format and reasoning style.
For development: a smaller, faster model. For production: whatever your evaluation tells you works.
7. Run the agent#
ADK’s Runner is the loop. You give it the agent, the input, and (optionally) constraints — step budget, time budget, callback hooks. The Runner drives the ReAct loop until exit.
8. Add evaluation#
ADK ships an evaluation harness — you write or generate test cases (input + expected end-state), and the harness runs the agent against them and grades the trajectories. This is the part most homegrown agent systems forget; ADK makes it cheap to start.
9. Add observability#
ADK emits traces — every step, every tool call, every token-usage report. Pipe them to your logging/observability of choice. Don’t ship an agent you can’t trace.
10. Iterate#
Re-run the eval. Look at the failures. Adjust prompts, tools, memory, or model. Re-run. This is the only step that takes long; the previous nine are setup.
Code structure#
A minimal ADK agent project, at the directory level:
my-agent/├── pyproject.toml (deps: google-adk, ...)├── agent.py (Agent definition: model, tools, instructions)├── tools/ (Python modules, one tool per file is fine)│ ├── search.py│ ├── read_url.py│ └── synthesise.py├── prompts/│ └── system.md (system prompt — kept as text, not inline)├── eval/│ ├── cases.jsonl (test cases: input + expected end-state)│ └── run_eval.py (runs the eval harness)└── main.py (entry-point: instantiate Runner, dispatch)This layout is convention, not framework requirement — but the separation is. Tools, prompts, and evals each want to live in their own files. An agent.py with 600 lines of inline tool definitions and a string-literal system prompt is the road to an unmaintainable agent. Set the structure on day one.
Loop control and exit conditions#
ADK’s Runner accepts several knobs that map to the loop-control concerns we’ve discussed:
max_steps. Hard cap on the number of ReAct steps. The single most important knob; don’t ship without it.max_tokens. Cap on total tokens consumed across the run. Defends against accidental long-context blow-ups.timeout. Wall-clock cap. Critical for any agent embedded in a request/response path.exit_predicate. A Python callable evaluated after each step; returningTrueexits. Use it for task-specific success criteria.on_stepcallback. Fires after every step. Use it for logging, audit, or stagnation detection (compare the current state to the previous N).
If you only ever set max_steps and timeout, you’ve covered the worst failure modes. The other knobs are quality-of-life.
Common pitfalls#
The pitfalls that hit ADK-first-time users most often:
- Skipping the eval harness. ADK gives it to you for free. Use it from day one. “I’ll add tests later” never happens.
- One giant tool. A
do_anything(action: str, params: dict)tool defeats the purpose — the model can’t tell from its description when to call it. Split into specific tools with specific docstrings. - System prompt as a wall of text. A 2000-word system prompt eats context and confuses the model. Tighten it. The numbered-rules format beats the prose format.
- Wrong model for development. Using a top-tier model for development iteration is slow and expensive. Use a fast model while iterating, evaluate on the production model.
- Ignoring streaming. ADK supports streaming. For user-facing agents, streaming the Thought + Action as they emerge is a huge UX win. Don’t ship a blocking response when streaming is one parameter away.
- No observability. ADK emits traces; if you don’t capture them, you can’t debug. Wire your logging on day one, not after the first production incident.
- Building multi-agent before single-agent works. A single capable agent with a good tool surface usually beats three coordinated agents. Don’t multi-agent until you’ve hit a single-agent ceiling.
Related implementations#