Before you buy the agent platform, write the delegation policy
The easiest mistake in an AI agent rollout is buying the tool before naming the boundary.
A vendor demo makes the future look tidy. The agent reads the ticket, edits the code, opens the pull request, updates the workflow, and leaves a neat summary. Claude Code can make that loop feel very close. So can MCP servers, RAG systems, observability tools, and internal automation.
Then somebody asks the boring question that decides whether this can survive contact with production: “What exactly are we allowing the agent to do?”
That question is where Thomas De Vos’s two books fit together. Claude Code: Building Production Agents That Actually Scale is about making Claude Code useful in real engineering work: scoped tasks, review packets, evals, observability, cost control, and rollback. The book page routes readers to the Kindle edition on Amazon, with other formats available for people who prefer them. Securing Enterprise AI Agents covers the security side: identity, permissions, MCP and RAG boundaries, policy gates, audit evidence, and incident response. The Enterprise AI Agents in Production bundle is for teams that need the delivery model and the control model together.
A platform is not an operating model
Agent platforms are useful. They can centralize tools, record actions, apply policy, manage approvals, and make agent runs easier to replay. I would rather have that than a mess of local scripts and undocumented prompts.
But a platform cannot decide the hard parts for you.
It cannot know whether your onboarding service is safe for autonomous edits. It cannot know whether “read Jira” includes customer escalation detail. It cannot know whether a pull request that touches payments needs a staff engineer, a security architect, or both. It cannot know when your team would rather lose speed than widen the blast radius.
Those choices belong in a delegation policy.
The policy does not need to be a 40 page governance document. In fact, that usually means nobody will read it. Start with one page that answers five questions:
Which work can an agent do alone?
Which work can an agent prepare but not finish?
Which tools and data sources can it use for each class of work?
What evidence must come back with the change?
Who approves release when the agent touched something sensitive?
If the team cannot answer those questions in plain language, the platform will only make the confusion faster.
Classify the work before classifying the tool
Most agent governance discussions start with tools. Can the agent use GitHub? Can it query logs? Can it read tickets? Can it call a deployment system?
That is backwards.
Start with the work.
A safe classification can be simple:
Class 1: read only analysis
The agent can inspect approved material and produce a recommendation.
Class 2: low risk code preparation
The agent can prepare a patch in scoped files, run targeted tests, and open a PR.
Class 3: sensitive code preparation
The agent can prepare work that touches auth, payments, customer data paths, deployment, or regulated workflows, but release needs named approval.
Class 4: operational action
The agent can change state outside Git only through a tightly approved workflow.
Class 5: forbidden for now
The agent cannot perform the work until the team has better controls.
This is not glamorous, but it is the sort of thing that keeps a rollout honest. It lets the engineering team talk to the security team without pretending every task has the same risk.
Claude Code might be perfectly reasonable for a Class 2 refactor in one repo. The same agent, with the same model, may be unacceptable for a Class 4 workflow that can change production state through an MCP tool. The difference is not the model. The difference is the authority.
Give every tool a price
MCP makes agents more useful because it gives them reach. It also makes them more dangerous for the same reason.
A connector name hides too much. “Docs search” could mean public API documentation or private incident notes. “Logs” could mean synthetic test output or production telemetry. “Ticket access” could mean a sanitized bug report or a thread full of customer detail. “GitHub” could mean read only repo access, branch creation, pull request comments, workflow dispatch, or release tags.
Write the business capability next to the tool:
Tool: repo.read
Capability: inspect approved source files
Default price: low
Allowed for: Class 1, Class 2, Class 3
Tool: logs.query
Capability: inspect production like telemetry
Default price: high
Allowed for: Class 1 with redaction, Class 3 with approval
Tool: workflow.dispatch
Capability: start automation that changes shared state
Default price: very high
Allowed for: Class 4 only
The word “price” matters. It stops the team from treating all tool calls as technical plumbing. A tool spends authority. Some authority is cheap. Some is not.
Require evidence that matches the risk
A low risk agent run does not need a courtroom transcript. It does need a receipt.
For routine Claude Code work, the receipt can be short:
Task contract
Files inspected
Files changed
Commands run
Tests passed and failed
Assumptions
Reviewer questions
Rollback note
For sensitive enterprise agent work, add the security evidence:
Identity used by the agent
MCP servers and capabilities used
Data classes accessed
Policy checks applied
Approval owner
Residual risk
Incident or rollback path
The point is not paperwork. The point is review quality.
A reviewer should not have to reconstruct the agent’s path from a diff and a cheerful summary. If the agent touched risky context, the evidence should say so. If it stayed inside the fence, the evidence should prove that too.
Put approval where the risk changes
Many teams make approval too broad or too late.
Too broad means every agent run needs a committee. People route around that. Too late means the agent has already used the tool, read the data, or changed the state before anyone notices the risk changed.
Put approval at the point where the class changes.
If a Class 2 coding task starts to touch authentication, it becomes Class 3 and stops for a human. If a read only investigation needs production telemetry, it stops before querying logs. If a patch requires a deployment workflow, it stops before any write capable automation runs.
This is the discipline production teams already use in other forms. You do not let a batch job silently become a payment job. You do not let a support query silently become a data export. Agent work deserves the same boundary.
The policy should be small enough to use
A workable delegation policy fits into the daily flow. Developers can paste it into prompts. Security can audit against it. Platform teams can turn parts of it into enforcement later.
A first version can be this plain:
For Claude Code:
Use task contracts for every run.
Limit edits to named files or packages.
Run targeted tests and capture output.
Open PRs with review packets.
Stop before auth, payments, customer data, deploy config, or migrations unless approved.
For enterprise agents:
Map each MCP server to business capability.
Default sensitive tools to read only or blocked.
Log identity, data class, tool use, and approval owner.
Require rollback or incident path for state changing actions.
Review policy exceptions weekly.
That is enough to start. It will be wrong in places, but wrong on paper is better than implicit in a Slack thread after an incident.
Buy the tool after you know what it must enforce
There is a cleaner way to evaluate agent platforms.
Do not start with feature checklists. Start with your delegation policy and ask which parts the platform can enforce, record, or make easier to review.
Can it restrict tools by task class? Can it separate read and write authority? Can it show which data classes the agent touched? Can it attach evidence to a pull request? Can it stop when risk changes? Can it replay the run? Can it make rollback boring?
If the answer is no, that may still be fine. You may decide to enforce some controls in prompts, some in CI, some in MCP server configuration, and some in human review. At least you will know what you are buying.
The platform should serve the operating model. Not the other way around.
If you are building this inside an engineering team, start with Claude Code: Building Production Agents That Actually Scale. If you are responsible for the security model around agents, read Securing Enterprise AI Agents. If your team owns both delivery and risk, the Enterprise AI Agents in Production bundle is the practical route.