Claude Code handoffs fail when the run record is vague
A Claude Code run can look successful and still leave a mess behind.
The tests are green. The patch is small enough. The final answer says what changed. Someone opens the PR, reads the summary, and thinks, “Fine, probably.” Then a question appears: why did it touch this file? Which command proved the fix? Was that skipped check harmless or awkward? What happens if we need to undo it?
Now review has turned into archaeology.
That is the handoff problem. Claude Code is good at producing an answer. Production engineering needs something duller and more useful: a run record another human can trust when they were not in the session.
The final summary is not the handoff
I do not trust polished summaries by default. They are useful, but they arrive after the model has already selected the story it wants to tell.
A handoff needs the rawer bits:
- what the task boundary was before the run started
- which files were read for context
- which files changed
- which commands ran
- which checks passed
- which checks were skipped
- where the agent hit a boundary
- what the agent is least sure about
- how to roll back the change
That sounds basic. Good. Basic is what survives when the reviewer is tired, the incident channel is noisy, or the person approving the change did not watch the agent work.
In financial-services engineering, I learned to respect boring evidence. You do not want a clever explanation when a payment workflow or risk control changes. You want to know what changed, why it changed, what proved it, and how to reverse it. Coding agents do not get a discount on that discipline just because they move faster.
The person reviewing the patch is often not the person who ran it
Solo experiments hide this problem. The same developer prompts Claude Code, watches the run, reads the output, and reviews the patch. Their memory fills in the gaps.
Teams do not work that neatly.
One engineer may start the run. Another may review the PR. A platform lead may check the risk. A future incident reviewer may ask why the change was approved. If the handoff only says “fixed validation and updated tests,” every later human has to rebuild context from the diff.
That is expensive. It also rewards confidence over evidence.
A better handoff says:
Task boundary: fix null handling in invoice reference parsing only.
Files changed: src/invoice/reference_parser.ts, tests/invoice/reference_parser.test.ts.
Files read: docs/invoice-format.md, src/invoice/types.ts.
Commands run: npm test -- invoice/reference_parser.
Checks skipped: full test suite, because this path has a targeted test and no shared types changed.
Risk: parser behavior for legacy imported invoices still depends on upstream normalization.
Rollback: revert this commit. No migration or data rewrite.
That note is not fancy. It is useful. A reviewer can challenge it, approve it, or ask for the full test suite without guessing what happened.
Record the boundary before the agent starts
The best handoff begins before the run.
If the task starts with “make this better,” the record will be vague because the work was vague. Claude Code may still produce a decent patch, but the reviewer has no clean way to tell whether the agent stayed inside the job.
I prefer a boundary that names the boring constraints:
Goal: fix one failing validation path.
Write scope: invoice parser and parser tests only.
Read-only context: invoice docs and type definitions.
Do not touch: billing totals, migrations, auth, customer data, generated clients.
Stop if: the fix needs a shared contract change.
Evidence required: targeted test command, risk note, rollback note.
Now the handoff has a ruler. Did the run stay inside scope or not? Did it stop at the shared contract boundary? Did it provide the evidence that was requested?
Without that ruler, review becomes taste. One person sees a helpful agent. Another sees scope creep. Both may be right, but the team has no operating standard.
Make skipped checks visible
Skipped checks are not automatically bad. Hidden skipped checks are bad.
There are good reasons not to run every command on every small change. Some suites are slow. Some tests need services. Some checks belong in CI. But if Claude Code skips a check, the reviewer should see the reason in plain language.
I want notes like this:
Skipped: npm run e2e.
Reason: change is isolated to parser unit behavior, no browser or API route touched.
Reviewer decision needed: run e2e if you think invoice import touches UI display.
That is a real handoff. It gives the human a decision instead of a shrug.
The weaker version is “all relevant tests pass.” Relevant according to whom? The model? The prompt? The exhausted developer who wanted to ship before lunch?
Name the skipped checks. Name the reason. If the reason is weak, the reviewer will catch it.
Risk notes should include doubt
This is where many agent handoffs sound too smooth.
A good human reviewer does not only want to know what worked. They want to know where the change might be wrong. Claude Code should be pushed to write down its doubt.
Useful risk notes are specific:
- “This assumes invoice references are normalized before parsing. I verified the main path but not the legacy import path.”
- “The fix changes retry behavior for API timeouts. I did not inspect mobile clients that may depend on the old error message.”
- “The new permission check covers write access. Read access is still controlled elsewhere.”
Those notes are not weakness. They are review handles.
A vague risk note such as “low risk” is almost useless unless it says why. Low risk because the diff is tiny? Because the path is covered by tests? Because the feature is behind a flag? Because rollback is a one-line revert?
Make the claim earn its place.
Rollback belongs in the handoff, not in the incident
Rollback notes are easy to skip when the patch is small. That is exactly when teams should build the habit.
The note can be short:
Rollback: revert this commit. No schema change, no cache flush, no data migration.
Or it can warn the reviewer:
Rollback: revert commit plus remove the new feature flag value from staging config. No production data touched.
The point is not ceremony. The point is to stop pretending rollback will be obvious later. Later is when people are rushed, annoyed, and reading the change with less context than the agent had during the run.
Claude Code is often used because it helps move work faster. Fine. Then the workflow needs to make reversal faster too.
A handoff template I would use
For production Claude Code work, I would start with this:
Task boundary:
Files read:
Files changed:
Commands run:
Checks passed:
Checks skipped and why:
MCP tools used:
Surprises during the run:
Risk notes:
Rollback:
Recommended reviewer focus:
The last line matters. “Recommended reviewer focus” tells the human where to spend attention. Maybe it is the permission edge. Maybe it is the test fixture. Maybe it is the fact that the agent had to infer behavior from old code because docs were stale.
That is the difference between a model saying “please review” and a useful teammate saying “look here first.”
For a compact version of the pre-flight checks, use the Claude Code production checklist. For the fuller operating model around permissions, review packets, evals, observability, and rollback, see Claude Code: From Vibe Coding to Production.
Read the book: Claude Code: From Vibe Coding to Production on Amazon Kindle.
Want the operating checklist first? Start with the free Claude Code production checklist.