Code Review Workflow — Claude Code

The workflow#

Run a self-review before opening the PR. Use Claude as the first reviewer, in a fresh session, with no context other than the diff and the PR description. Iterate on the findings. Push. Let human reviewers do the higher-order work — taste, architecture, business fit — without burning their attention on issues a machine could catch.

Claude is a remarkably good first-pass reviewer because the things it’s best at — pattern matching, consistency checking, spotting missed edge cases, finding sibling bugs — are exactly the things human reviewers find tedious and skip. Claude is a remarkably bad only reviewer because it lacks judgment about architecture, business context, and team conventions. The workflow leverages the first without overweighting the second.

When to reach for it#

Pre-PR review with Claude is the right default for:

Anything you’d merge to main. Self-review catches the embarrassing stuff before a human sees it.
Refactors and migrations. A second pass catches the call site you missed, the test you didn’t update, the import you left dangling.
Bug fixes. Claude will catch the obvious “fix the symptom, missed the cause” pattern and flag it before review.
PRs in a codebase you don’t know well. When you don’t have the muscle memory for the conventions, a reviewer that’s read the whole repo is a force multiplier.
PRs where the diff is large. Even if you can’t split (sometimes you can’t), getting an exhaustive first-pass before a human reviewer is a courtesy that gets faster human reviews.

Reach for human-as-only-reviewer — skipping Claude — for:

Trivial changes (typo, README tweak, version bump). Don’t waste cycles.
Highly-sensitive changes where the conversation itself is the review — e.g., a CISO sign-off, a launch checklist with named owners.

Reach for /ultrareview (the deep-review slash command) when the diff has architectural stakes — a new module boundary, a public API change, a security-sensitive surface, a performance-critical path. /ultrareview is the heavy version of the workflow; the default version is what you want 80% of the time.

Step-by-step#

1. Self-review the diff first#

Before opening Claude or pushing to the remote, read your own diff. Use git diff main...HEAD or whatever your base is. Don’t skip this.

Self-review catches the obvious stuff: leftover console.logs, commented-out blocks, unused imports, half-finished docstrings. If you delegate this to Claude or to a human, you’ve wasted their attention on things only you could write better.

2. Open a fresh session for review#

Start a new Claude Code session. Not the session you wrote the code in. The session you wrote the code in is contaminated — Claude already knows what you intended and will rationalize what it sees toward that intent.

A fresh reviewer reads the diff with no priors. That’s what you want.

3. Give it the diff and the PR description#

The right input is: the PR description (or your draft of it), the diff, and a one-sentence framing.

Review this PR as if you were the staff engineer on the team. The PR description is at the top; the diff follows. Surface concerns by severity: blocking, important, nit. Don’t suggest stylistic changes the linter would catch.

Three things matter in that prompt:

“Staff engineer on the team” — sets the bar. Not “any reviewer,” not “an expert in this domain Claude doesn’t know” — a thoughtful senior generalist.
“By severity” — forces Claude to triage. Without it, you get a flat list where blocking issues sit next to nits and you have to re-sort.
“Not stylistic changes the linter would catch” — Claude has an enormous appetite for nits if you don’t constrain it. Constrain it.

4. Read the findings as if a colleague wrote them#

When Claude returns findings, read them with two filters:

Is this finding correct? Sometimes Claude flags something that looks wrong but isn’t, because it doesn’t have context — a deprecation that’s actually intentional, a pattern that’s deliberate in your codebase. Push back when you’re confident, but don’t dismiss without looking.
Is this finding worth acting on? “You could add a comment explaining this constant” is correct and not worth acting on if the constant’s meaning is clear from context. Triage ruthlessly.

A good rule: every blocking finding gets addressed before pushing. Every “important” finding gets considered. Nits get ignored unless they’re cheap.

5. Iterate, don’t batch-fix#

When you find a real issue, fix it in the original session (the one with the code context), not the review session. Then re-run the review on the new diff.

This is annoying once and worth it. Each iteration usually surfaces fewer findings, and the diminishing returns are how you know you’re done.

6. Reach for `/ultrareview` when stakes are high#

/ultrareview is the deep version of this workflow — a slash command that runs a more rigorous, multi-pass review against the diff. Reach for it when:

The diff introduces a new public API surface that other teams will depend on.
The diff is security-sensitive (auth, authz, secrets handling, user input).
The diff is performance-critical (hot path, big-O changes, allocation-sensitive).
You’re about to merge something you’re not personally sure about.

/ultrareview is more expensive in tokens and in your time — it surfaces more, including more false positives. Use it deliberately, not by default.

7. Open the PR with Claude’s findings already addressed#

When you push, you’ve already done what would otherwise be the first round of human review comments. Human reviewers can spend their attention on architecture, naming, and the things only humans can judge well.

In the PR description, it’s fine to mention “ran self-review with Claude Code, here’s what I changed” — colleagues appreciate seeing the upstream filter applied. Don’t paste the raw findings; that’s noise.

8. After merge, optionally have Claude review the merged state#

This is a luxury step, not a requirement. After merge, ask Claude to review the merged state in context — looking at the integrated code, not the diff. This sometimes surfaces issues that only appear when the new code interacts with the surrounding system, especially for refactors and integrations.

Treat the findings as next-PR fodder, not as a blocker on what just shipped.

Anti-patterns#

The shapes that don’t work:

Asking Claude to review in the same session that wrote the code. Contaminated context. You’ll get a rubber stamp instead of a review. Always open a fresh session.
Treating Claude’s findings as ground truth. Some findings are wrong. Some are nits dressed up as concerns. Some apply only to a generic codebase, not yours. Read with judgment; don’t apply blindly.
Letting Claude propose code changes without re-running the review. A proposed fix can introduce its own issues. Apply the fix, then re-review. Don’t accept-and-push from the review session.
Using Claude as the only reviewer. The first-pass-then-human structure is the whole point. Skipping the human pass leaves architectural and business-context issues unaddressed.
Using Claude as the last reviewer. Some teams use Claude review as a post-approval rubber stamp. That inverts the value — Claude is best as upstream filter, not downstream double-check.
Pasting raw findings into the PR description. Reviewers don’t want to read Claude’s output verbatim. Summarize in one sentence what you addressed; let them spend attention on the actual code.
Reaching for /ultrareview on everything. It’s the heavy hammer. Routine PRs don’t need it; daily use dulls your sense of when it’s actually warranted. Reserve it for genuine stakes.
Skipping self-review because “Claude will catch it.” Claude is a force multiplier on careful work, not a substitute for it. If you push a sloppy diff with the expectation that Claude will clean it up, you’re outsourcing your own discipline and the diff will ship sloppier than you’d like.
Ignoring findings about test coverage. Claude is genuinely good at spotting “this code path has no test.” Those findings usually deserve action, even when they feel like nits at the time.

Evaluation#

How do you know the review workflow is working?

Human reviewer comments shift upward. Less “you forgot a semicolon,” more “have you considered a different boundary here?” If human reviewers are still catching nits, your Claude review pass isn’t catching enough.
Self-review changes the diff materially. If you self-review and change nothing on most PRs, your self-review is performative. Push the bar higher.
Time-to-merge goes down. Counter-intuitive metric, but real — better upstream review means fewer back-and-forth rounds with the human reviewer.
Regressions per merged PR goes down. The truest evaluation. If you’re still shipping the same rate of “this used to work” bugs, your review process isn’t catching enough.
You disagree with Claude’s findings sometimes. A workflow where you agree with everything is a workflow where you’re not reading carefully. Push back when you disagree; learn from when Claude was right and you weren’t.
You reach for /ultrareview on the right PRs. Roughly 5-10% of your PRs warrant it. Less than that and you might be underusing; more and you might be over-investing in low-stakes changes.
Reviewers thank you. Soft signal, real signal. When colleagues mention that your PRs are easy to review, the upstream filter is doing its job.

Claude-first review. Self-review, fresh-session Claude review, iterate, then human review. Higher upstream quality; faster human turnaround; better caught nits. Best for any PR worth more than a glance.

Human-only review. Self-review, push, wait for human comments. Faster to open the PR; slower to merge because review rounds catch nits. Human attention burned on issues a machine would catch.

The exact review prompt I keep saved

“You are reviewing this PR as a thoughtful senior engineer on the team. The PR description and diff follow. Surface concerns in three buckets: BLOCKING (correctness, security, data loss, breaking change to public surface), IMPORTANT (missing tests for non-trivial paths, error-handling gaps, sibling bugs in nearby code, contract or naming issues), and NITS (style only if not linter-catchable). For each finding, cite the file and line, explain the concern in one sentence, and suggest the smallest change that would resolve it. Don’t suggest stylistic changes the linter would catch. Be honest about confidence — flag when you’re guessing.” That prompt gets me usable review output in under a minute, every time.