Observability

A Claude Code run replay record connecting the task contract, agent actions, decision points, evidence packet, human review, and incident replay

A diff cannot replay a Claude Code run

A Claude Code diff shows the final change, but it leaves out the starting state, failed commands, MCP results, skipped tests, and plan changes. Capture a compact replay record before the run disappears.

A Claude Code flight recorder diagram with task contract, tool calls, evidence, review, and rollback

Claude Code needs a flight recorder

Claude Code can produce a clean patch from a messy run. Production teams need a flight recorder: the task contract, tool calls, permission pressure, tests, assumptions, and rollback notes that explain how the patch was made.

Diagram showing metric-only LLM observability versus a replayable production AI trace

LLM observability is not a dashboard. It is a replayable trail.

A latency chart will not explain why an AI answer was wrong. Production LLM systems need traces, sources, tool calls, prompt versions, eval results, and human decisions.

A Claude Code review packet showing objective, permission boundary, tool trace, tests, cost, and rollback path before human approval

The Claude Code review packet I want before approving agent work

A Claude Code diff is not enough evidence for production review. Ask for the objective, permission boundary, tool trace, tests, failures, cost, and rollback path before approving agent work.

Agent flight recorder diagram showing prompt, files changed, commands, tests, approvals, and rollback notes between a coding agent and deployment gate

Claude Code Agents Need a Flight Recorder

If a Claude Code agent changes production code, the useful artifact is not the chat transcript. It is a flight recorder: intent, boundaries, commands, diffs, tests, approvals, and rollback notes.