Claude Code needs a stop rule before more autonomy
The scary Claude Code run is not always the one that fails loudly.
Sometimes the dangerous one looks busy. It runs the same command again. It edits a nearby file. It changes a test fixture because the test is annoying. It summarizes the mess with a tone that sounds more certain than the work deserves.
That is the moment I want a stop rule.
Not a vague instruction like “be careful”. A real operating rule: after two similar failures, stop patching and produce a review packet. If the task boundary needs to widen, ask. If the rollback path is unclear, stop. If the agent is about to touch a blocked file or tool, stop.
This sounds small, but it changes the whole shape of a Claude Code workflow. It turns a bad run from a private irritation into evidence the team can use.
The loop usually starts with a plausible patch
A common pattern looks like this:
- The agent gets a clear-looking task.
- It edits the obvious file.
- A test fails.
- It tries a second patch.
- The same test fails again, or a different failure appears.
- The agent keeps searching for a way through.
That sixth step is where the risk starts to leak.
Humans do this too. I have spent enough years in production systems, especially in financial services, to know the feeling: one more change, one more test run, one more theory. The difference is that a human usually feels the embarrassment rising. An agent does not. It can keep producing confident work long after the signal has gone bad.
Claude Code is useful because it can push through boring work quickly. But when the work stops being boring, speed is not the metric anymore. Control is.
A stop rule belongs in the task contract
I do not want stop rules hidden in someone’s head. Put them in the prompt or task contract where the agent and reviewer can see them.
A practical version can be short:
Stop rules:
- Stop after two similar test failures.
- Stop if the fix requires files outside the allowed scope.
- Stop before changing tests unless explicitly allowed.
- Stop before using write-capable MCP tools.
- Stop if rollback is unclear.
When stopped:
- summarize what changed
- list commands run and failures seen
- name the suspected cause
- propose next options
- do not keep patching
This is not ceremony. It is a cheap guardrail against the agent drifting from “solve the task” into “make something pass”.
I wrote about related boundaries in Claude Code permissions: the production mistake that bites later and Claude Code MCP tools need a blast radius, not a vibe check. The same idea applies here: if the boundary matters, write it down before the run starts.
Repeated failure is a signal, not an invitation
Two failed attempts do not always mean the agent is wrong. They do mean the run has changed category.
It may need more context. The test may be flaky. The local setup may be broken. The task may have a hidden dependency. The bug may sit one layer lower than the prompt assumed.
Fine. But that is exactly why the agent should stop and explain the situation instead of thrashing around the codebase.
A useful stop packet might say:
Stopped because auth_retry_test failed twice with the same assertion.
Changed:
- billing/retry_policy.py
- tests/billing/test_retry_policy.py
Observed:
- duplicate retry path now rejects correctly
- first-payment path fails on missing idempotency key
Risk:
- my patch may have changed the default retry contract
- I touched a test fixture to expose the failure, but did not weaken assertions
Next options:
1. inspect idempotency key creation in billing/session.py
2. revert retry_policy.py and try a narrower patch
3. ask for permission to widen scope to billing/session.py
Rollback:
- revert this patch
- rerun pytest tests/billing/test_retry_policy.py
That is a very different artifact from a third patch. It gives the human a decision point.
The rule protects review time
Teams often notice agent risk in code review, but by then the cost has already moved.
A reviewer opens a diff and has to reconstruct the agent’s journey. Why did it touch this helper? Why did the fixture change? Why is a config file different? Did the tests fail before the final run? Did Claude Code use an MCP tool or only local files?
Without a stop rule, the reviewer gets the final answer and not the trail. That makes review slower and more brittle.
This is why I keep coming back to the Claude Code review packet. A review packet is not paperwork. It is the minimum story a reviewer needs before trusting an agent-shaped diff.
The stop rule makes that packet appear at the right moment, before the run has buried the useful evidence under another round of edits.
Failed runs are the best eval material
The most useful evals rarely start as neat benchmark cases. They start as annoying internal moments.
The agent weakened an assertion. It changed a fixture instead of the implementation. It asked for broad MCP access during a local task. It retried the same failing command five times. It claimed the issue was fixed because one narrow test passed, while the surrounding suite was still red.
Do not waste those examples.
Turn them into checks:
- The agent must stop after two identical failures.
- The agent must not edit tests unless the task contract allows test changes.
- The agent must name any file it wanted to touch but did not have permission to edit.
- The agent must include failed commands in the review packet.
- The agent must include a rollback note for config, schema, or permission changes.
I covered this habit in Claude Code evals should start with bad runs. A stop rule gives you cleaner eval material because the agent stops while the failure is still understandable.
Autonomy should be earned after boring runs
The temptation is to respond to a successful demo by giving Claude Code more room.
More files. More tools. More background work. More permission to open PRs or touch external systems.
I would slow down there.
Autonomy should expand after the team has seen boring evidence: scoped tasks completed cleanly, repeated failures stopped early, review packets useful enough for humans, rollback notes present, MCP calls logged, and no pattern of surprise edits.
A simple promotion rule works well:
| If the agent does this | Keep it at this level |
|---|---|
| Stops cleanly and explains failed attempts | Continue narrow patch work |
| Keeps patching after repeated failures | Reduce scope and improve prompts |
| Touches blocked files or tools | Block autonomy until the policy is fixed |
| Produces review packets humans trust | Consider wider but still bounded tasks |
The word “boring” matters. A production agent workflow should be impressive only in hindsight. During the run, it should feel controlled.
The practical rule I would use tomorrow
If your team is already using Claude Code, add this to the next serious task:
You have a failure budget of two similar failures.
After that, stop changing code and produce a review packet.
Do not change tests unless the task allows it.
Do not widen file scope or MCP access without asking.
Include rollback notes before proposing approval.
Then watch what happens.
If the agent still loops, you have found a workflow problem before it becomes a production habit. If it stops cleanly, you have a reviewable run. Either way, the team learns something useful.
That is the point. Claude Code does not need unlimited room to be valuable. It needs enough room to help, and enough boundaries that a bad guess cannot travel too far.
If you want the lightweight version of these controls, start with the free Claude Code production checklist. It covers permissions, MCP boundaries, review packets, evals, observability, cost, and rollback.
You can also read the broader book notes on the Claude Code book page.
I am writing the full field guide in Claude Code: Building Production Agents That Actually Scale. It is for engineers who want agent autonomy with permissions, evals, observability, review packets, and rollback built in from the start.