Evals on Thomas Talks AI: Claude Code, AI Agents & Production Engineering

Evals on Thomas Talks AI: Claude Code, AI Agents & Production Engineering https://thomasdevos.com/tags/evals/ Recent content in Evals on Thomas Talks AI: Claude Code, AI Agents & Production Engineering Hugo -- 0.145.0 en-us Thomas De Vos © Wed, 29 Apr 2026 10:44:00 +0100 Claude Code Evals Should Start With Bad Runs https://thomasdevos.com/posts/2026-04-29.10.44.00-claude-code-evals-start-with-bad-runs/ Wed, 29 Apr 2026 10:44:00 +0100 https://thomasdevos.com/posts/2026-04-29.10.44.00-claude-code-evals-start-with-bad-runs/ A practical way to build Claude Code evals from failed or risky agent runs: capture evidence, reduce cases, test behavior, and change permissions before expanding autonomy.