Custom Sub-Agents — Claude Code · Engineering Playbook

What it is#

A custom sub-agent is a named worker process you can dispatch from the main conversation. It runs in its own fresh context window, with its own system prompt, its own tool allowlist, and its own model selection. When it completes, it returns a single text payload (typically a summary) back to the parent conversation — and only that payload, not the work-in-progress reasoning, lands in the parent’s context.

Sub-agents are the primary mechanism for bounded, parallelisable, context-isolated work. The built-in agents (Explore, general-purpose) ship with Claude Code; custom sub-agents extend the catalogue with project-specific workers — a migration-runner, a flaky-test-bisector, a release-notes-author — that the main agent can dispatch the same way it dispatches Explore.

A custom sub-agent is defined as a markdown file with frontmatter, very similar to a slash command — but the lifecycle is fundamentally different: a slash command continues the current conversation, while a sub-agent starts a new one.

When to use it#

Reach for a custom sub-agent when:

Context isolation matters. The work touches a lot of files or produces a lot of intermediate output; you don’t want that noise in the parent’s context.
Parallel fan-out helps. You want to run three searches simultaneously, or audit five services in parallel. Sub-agents can be dispatched concurrently.
The work needs a tighter tool set than the parent. A read-only Explore agent can’t accidentally edit a file. A pr-reviewer agent with no Bash access can’t accidentally push.
The work is recurring. You write release notes the same way every week; the prompt is stable; it deserves a named home.
The work needs a different model. Run a cheap fast model in the main session and dispatch expensive deep-reasoning work to a sub-agent only when needed.

Don’t reach for a sub-agent when:

The work needs the parent conversation’s context. Sub-agents start fresh.
The work is trivial. Just do it in the main loop.
You need a hard guarantee that something runs. That’s a hook, not a sub-agent.
The work should be invocable interactively as a one-liner. That’s a slash command.

Slash command. Loaded into the current conversation. Inherits session context, sees everything so far. Output appears inline. Cannot run in parallel with the main loop.

Sub-agent. Runs in a fresh, isolated context. Returns a summary to the parent. Can be dispatched in parallel batches. Tool allowlist and system prompt independent of the parent.

How it works#

The dispatch contract#

A sub-agent is invoked via the Task tool by the main agent. The parent provides a brief — what the sub-agent should do — and any inputs needed. The sub-agent then runs its own conversation loop: it sees its system prompt, the brief, and an empty conversation history. It uses its allowed tools to do the work. When done, it returns a single final text message.

The parent only ever sees that final text. The intermediate steps — searches, reads, tool calls — happen in the sub-agent’s private context window. This is the whole point: the parent’s context stays clean even if the sub-agent does a lot of work.

A typical interaction:

The parent decides it needs to find every call site of getUser() across a large monorepo.
The parent dispatches an Explore sub-agent: "Find all callers of getUser. Group by file. Return a list."
The sub-agent runs grep, reads files, navigates references — possibly twenty tool calls.
The sub-agent returns: "Found 47 callers across 12 files. Here they are: ..."
The parent’s context now has one new turn — the summary — not 47 grep outputs.

Anatomy of a custom sub-agent file#

A sub-agent is a markdown file with frontmatter, stored in ~/.claude/agents/<name>.md (user-global) or <repo>/.claude/agents/<name>.md (project-scoped):

---
name: pr-reviewer
description: Reviews a pull request for correctness, style, and risk. Read-only.
model: claude-opus-4-7
tools: ["Bash", "Read", "Grep"]
---

You are a senior reviewer. Your job is to review pull requests with a
high signal-to-noise ratio: catch real problems, do not nitpick.

Given a PR number, fetch the diff with `gh pr view`, read the surrounding
code, and produce a review with three sections:

1. Correctness issues (bugs, missing cases).
2. Style observations (only those worth changing).
3. Risk assessment (blast radius, rollout).

End with one of: APPROVE / APPROVE WITH NITS / REQUEST CHANGES.

You do not have access to Edit or Write. You cannot fix issues directly;
you can only report them.

The body is the system prompt for the sub-agent’s session. Everything in it shapes how the sub-agent behaves across every invocation.

Tool allowlists are first-class#

The tools array in frontmatter is more than a hint — Claude Code enforces it. A sub-agent listed without Edit literally cannot call Edit. This means you can confidently build read-only or audit-only agents whose behaviour is structurally bounded, not just norm-bounded.

Common patterns:

Read-only auditor: ["Bash", "Read", "Grep"] — no file modification.
Search specialist: ["Grep", "Read"] — no Bash, no edits.
Migration runner: ["Bash", "Read", "Edit", "Write"] — full toolset, scoped to a specific kind of work.
MCP-dispatcher: a curated subset of MCP tools, no file tools at all.

Model selection#

Per-sub-agent model selection lets you tier your compute. Cheap fast model for routine grep, expensive deep model for review work. The main agent can dispatch to either without your having to switch the session model.

Configuration#

Directory layout#

~/.claude/agents/             # personal agents
  notebook-explainer.md
.claude/agents/               # project-shared agents (commit these)
  pr-reviewer.md
  migration-runner.md
  flaky-test-bisector.md
  release-notes-author.md

Project agents shadow user agents with the same name. Project agents should be committed alongside the codebase so the whole team gets identical workers.

Frontmatter fields#

Field	Purpose
`name`	The identifier used in `Task` dispatch. Kebab-case.
`description`	One-line summary. Surfaces in autocomplete and helps the main agent decide when to dispatch.
`model`	Optional model override for this sub-agent.
`tools`	Allowlist of tool names. The sub-agent cannot call anything outside this list.
`color`	Optional UI hint for the agent’s badge.

The description field deserves care. It’s literally what the main agent reads when deciding “is there a sub-agent for this task?” — a vague description means the main agent won’t reach for the sub-agent when it should. A description like "Reviews PRs for correctness, style, and risk. Read-only." is far more useful than "Helps with PRs.".

Project conventions#

The same brand-scrub / commit-and-share rules that apply to slash commands apply here. A project .claude/agents/ directory acts as the shared catalogue of project-specific workers. Each one should be:

Named for its job, not its mechanism (pr-reviewer, not read-only-grep-agent).
Documented in the description field clearly enough that future-you knows when to dispatch it.
Tested by being dispatched manually a few times before relying on the main agent to auto-dispatch it.

Examples#

A read-only review agent#

---
name: pr-reviewer
description: Reviews a pull request for correctness, style, and risk. Read-only — cannot modify files.
model: claude-opus-4-7
tools: ["Bash", "Read", "Grep"]
---

You are a senior code reviewer. Given a PR number or URL, produce a
high-signal review.

Workflow:
1. Run `gh pr view <id> --json title,body,files,additions,deletions`.
2. Run `gh pr diff <id>` for the actual changes.
3. For each touched file, `Read` the surrounding code to understand context.
4. Optionally `Grep` for callers of changed functions.

Output format:
- **Summary** — one paragraph.
- **Correctness** — bullet list, line-numbered, only real bugs.
- **Style** — bullet list, only nits worth raising.
- **Risk** — one paragraph: blast radius, reversibility.
- **Verdict** — APPROVE / APPROVE WITH NITS / REQUEST CHANGES.

Be specific. Cite filenames and line numbers. If you have no concerns
in a section, write "None.".

Dispatch: the main agent uses the Task tool with subagent_type: "pr-reviewer" and a brief like "Review PR 123.". The review comes back as a single summary turn. Several reviews can be dispatched in parallel.

A migration runner with full file access#

---
name: migration-runner
description: Applies a mechanical refactor pattern across many files. Has Edit and Write — supervise carefully.
model: claude-opus-4-7
tools: ["Bash", "Read", "Edit", "Write", "Grep"]
---

You apply mechanical migrations: rename a symbol, swap a deprecated
API for its replacement, update an import path across the repo.

Workflow:
1. Confirm the pattern is mechanical. If the migration requires
   judgement per call site, refuse and ask for a more specific brief.
2. Grep for every call site.
3. Edit each one. Do not batch — one Edit per file, verified.
4. After all edits, run the project's test suite. Report pass / fail.
5. If tests fail, do not attempt to fix — return the failure for the
   parent to triage.

You will be supervised. If anything looks ambiguous, stop and ask.

This sub-agent intentionally has destructive tools — that’s the point. The brief from the parent will be specific ("Rename getUser to fetchUser across src/"); the agent’s system prompt narrows what kinds of work it accepts.

A flaky-test bisector#

---
name: flaky-test-bisector
description: Given a flaky test, finds the commit that introduced the flake via git bisect.
tools: ["Bash", "Read", "Grep"]
---

You are a flaky-test detective. Given a test name that fails
intermittently, your job is to find the commit that introduced the flake.

Workflow:
1. Establish a fail rate baseline — run the test 20 times on HEAD,
   count failures. If it fails 0/20, you cannot proceed; report this.
2. Identify a known-good commit (the user provides one or you ask).
3. `git bisect start <bad> <good>` and use a script that runs the
   test 10 times per step, marking the step bad if any failure.
4. When bisect finishes, read the offending commit's diff and propose
   the most likely cause in one paragraph.

Return: the commit SHA, the diff summary, and your hypothesis.

Parallel fan-out#

The main agent can dispatch multiple sub-agents simultaneously by issuing several Task calls in one assistant turn. Example: auditing five services for a common pattern.

// Inside the main loop, the model effectively does:
parallel([
  Task("Audit service-auth for sync DB calls inside async handlers."),
  Task("Audit service-billing for the same pattern."),
  Task("Audit service-notifications for the same pattern."),
  Task("Audit service-search for the same pattern."),
  Task("Audit service-storefront for the same pattern."),
]);

Five sub-agents run concurrently. Each one’s grep + read activity stays in its own context. The main agent collects five summary payloads and synthesises them.

Gotchas#

Sub-agents do not see the parent’s context. Whatever the parent learned earlier in the conversation is not visible to the sub-agent. If the work depends on prior context, include the relevant facts in the brief.
The return payload is a single string. No structured object, no list-of-files, no JSON guaranteed. If you need structured output, ask for JSON in the system prompt and parse it in the parent — and be prepared for the occasional malformed return.
Sub-agents cannot dispatch further sub-agents in the same tree. A sub-agent is a leaf. If you need a multi-level decomposition, structure it from the parent.
Tool allowlists are real constraints, not advisory. Forget to include Bash in tools and the sub-agent literally cannot run a command. This is good for safety, surprising the first time.
Each dispatch is expensive. A sub-agent spins up its own session — model warm-up cost, system prompt cost, fresh context. Don’t dispatch a sub-agent for a single grep; the overhead exceeds the savings.
The description field is load-bearing. The main agent uses it to decide whether to dispatch. A vague description means under-utilised sub-agents.
Project agents shadow user agents. A repo’s pr-reviewer.md will mask your personal one. Usually intentional, sometimes surprising.
Long-running sub-agents block the parent. A flaky-test-bisector that takes 20 minutes leaves the parent waiting. For truly long work, consider the Agent SDK with a job-tracking pattern instead.

The two design failure modes I see most often

Failure one: making the sub-agent too general. A helper sub-agent with every tool and a vague description never gets dispatched, because the main agent has no idea when it should reach for it. Sub-agents earn their keep by being specific — a tighter description and a smaller tool allowlist make them more useful, not less. Failure two: making the sub-agent too procedural. A 30-step playbook in the system prompt fights the model’s planning ability and produces brittle agents. State the goal and the constraints; let the agent plan within them.