Before you let Claude Code loose on a serious repo, ask a less glamorous question:
Can the workflow fail safely?
This checklist is the short version of the operating model I am writing about in Claude Code: Building Production Agents That Actually Work. It is not meant to slow you down forever. It is meant to stop you from discovering the blast radius after the agent has already found it.
Get the Claude Code book See what the book covers
1. Workspace boundaries
- The agent starts in a known working directory.
- It cannot write outside the intended repo or task folder.
- Generated scratch files go into a predictable location.
- The workflow has a clean reset path.
- The agent cannot silently modify local config, shell profiles, SSH files, or credential stores.
If you cannot explain where the agent can write, you do not have a workspace boundary. You have hope.
2. Tool permissions
- Every tool has a named reason to exist.
- File read/write permissions are scoped by path.
- Shell commands are constrained or reviewed.
- Network access is off by default or allowlisted.
- Dangerous commands require explicit human approval.
- The workflow records which tools were available during the run.
Do not confuse a friendly chat interface with a safe operating environment. If the agent has tools, the tools are the risk surface.
3. Secrets and sensitive data
- The agent cannot read
.env, SSH keys, cloud credentials, or token files unless the workflow explicitly requires it. - Secret paths are blocked, not merely undocumented.
- Logs and traces do not capture secrets.
- Generated diffs are checked for accidental credential exposure.
- Test fixtures use fake values that look fake.
A model does not need bad intent to leak a secret. It only needs access and a reason that sounds plausible.
4. MCP and integration boundaries
- MCP servers are treated as production integrations.
- Each MCP server has a documented owner and purpose.
- Read/write capabilities are separated where possible.
- The agent cannot call broad business systems without approval gates.
- Tool output is logged enough to reconstruct what happened.
MCP is powerful because it gives the agent reach. That is also why it needs governance.
5. Review and merge discipline
- Agent-generated diffs are reviewed like human diffs.
- Reviewers know which files were touched by the agent.
- Large agent changes are split before review.
- The agent explains why the diff exists, not just what changed.
- Approval is not a reflex click after a wall of generated text.
The danger is not that humans stay in the loop. The danger is that they stay in the loop as decoration.
6. Evals and regression checks
- The workflow has task-level acceptance checks.
- Tests run before merge or handoff.
- Known failure modes are turned into eval cases.
- The agent cannot declare success without evidence.
- Repeated tasks have a baseline success rate.
If you do not measure agent quality, you are managing vibes.
7. Observability and audit trail
- Each run has a session ID or trace ID.
- Prompts, tool calls, approvals, and changed files are recorded where appropriate.
- The final output includes enough context for another engineer to inspect it.
- Failures are categorized, not buried in logs.
- You can answer: what was the agent allowed to do at the time?
This is boring until production breaks. Then it is archaeology or evidence. Choose early.
8. Rollback and incident response
- Every agent-assisted change has a rollback note.
- Deploy-affecting changes are clearly marked.
- CI/deploy files require extra review.
- The team knows how to disable the workflow quickly.
- Incidents feed back into permissions, evals, and review policy.
The agent moving fast is not the problem. The team being unable to reverse it is.
9. Team adoption
- Start with low-risk repeatable workflows.
- Keep risky one-off work behind review.
- Teach people the operating loop, not just the tool commands.
- Track where Claude Code saves time and where it creates cleanup.
- Stop workflows that produce low-trust diffs.
The goal is not maximum automation. The goal is useful autonomy with a short leash where the risk demands it.
The deeper playbook
This checklist is a starting point. The book goes deeper into how to design the operating loop around Claude Code: permissions, hooks, MCP, evals, observability, cost controls, review gates, and rollback.
Or start with the overview: