Diagram showing metric-only LLM observability versus a replayable production AI trace

LLM observability is not a dashboard. It is a replayable trail.

A latency chart will not explain why an AI answer was wrong. Production LLM systems need traces, sources, tool calls, prompt versions, eval results, and human decisions.

May 10, 2026 · 4 min · 812 words · Thomas De Vos
Read LLM observability is not a dashboard. It is a replayable trail.
A Claude Code review packet showing objective, permission boundary, tool trace, tests, cost, and rollback path before human approval

The Claude Code review packet I want before approving agent work

A Claude Code diff is not enough evidence for production review. Ask for the objective, permission boundary, tool trace, tests, failures, cost, and rollback path before approving agent work.

May 4, 2026 · 7 min · 1299 words · Thomas De Vos
Read The Claude Code review packet I want before approving agent work
Cover of Claude Code: Building Production Agents That Actually Scale by Thomas De Vos

Claude Code book is live on Amazon Kindle

Claude Code: Building Production Agents That Actually Scale is now live on Amazon Kindle. Here is who it is for and why I wrote it.

May 3, 2026 · 2 min · 369 words · Thomas De Vos
Read Claude Code book is live on Amazon Kindle
Claude Code rollback envelope diagram showing scope, agent change, and rollback path

Claude Code Rollback Plans Belong in the Prompt

If a Claude Code agent can change production-shaped code, the prompt should say how to undo the work. Rollback is not paperwork after the diff. It is part of the task boundary.

May 1, 2026 · 6 min · 1094 words · Thomas De Vos
Read Claude Code Rollback Plans Belong in the Prompt
Claude Code agent cost loop diagram showing vague tasks, broad tools, repeated exploration, and no stop rule

Claude Code Agent Cost Loops Start as Workflow Bugs

Claude Code cost problems usually start before the model call: vague tasks, wide-open tools, repeated repo exploration, and no stop rule. Treat spend as a workflow bug, not just a pricing problem.

April 30, 2026 · 6 min · 1153 words · Thomas De Vos
Read Claude Code Agent Cost Loops Start as Workflow Bugs
Claude Code evaluation loop showing capture, reduce, test, and change steps for failed agent runs

Claude Code Evals Should Start With Bad Runs

Production Claude Code evals should not begin with abstract benchmarks. Start with the agent runs that scared you, reduce them into replayable cases, and use them to tune permissions, prompts, tools, and review gates.

April 29, 2026 · 5 min · 1020 words · Thomas De Vos
Read Claude Code Evals Should Start With Bad Runs
Claude Code permissions blast-radius diagram showing agent workspace, repo files, CI deploy, secrets, and production data

Claude Code Permissions: The Production Mistake That Bites Later

Claude Code permission modes can look safer than they are. The real production risk lives in tool scope: paths, network access, secrets, deploy files, and what reviewers actually approve.

April 28, 2026 · 5 min · 884 words · Thomas De Vos
Read Claude Code Permissions: The Production Mistake That Bites Later