Thomas Talks AI

I write about AI after the demo starts looking convincing: production engineering, secure AI agents, Claude Code, LLM observability, evals, governance, financial-services controls, and the uncomfortable gap between a model release and a system people can trust.

The common thread is controlled autonomy. What can the system see? What can it change? Who reviews it? What evidence remains when something breaks?

Start here

What is an AI agent?: a practical definition for production teams.
The model release is not your AI strategy: how to stay calm when the AI news cycle moves faster than your roadmap.
From AI POC to production: the operating model that makes AI survive beyond the demo.
LLM observability: why production AI needs replayable evidence, not only dashboards.
Java vs Python for production AI applications: where experiments end and enterprise systems begin.
AI agents in financial services: controls before autonomy.
Agentic coding in production: the operating model around AI coding agents.
AI coding agents: Claude Code, software agents, permissions, evals, and review loops.

Wider AI notes

Not every useful AI article needs to point straight at a book. Some pieces are here because the topic matters: model releases, vendor churn, AI interfaces, adoption habits, organisational risk, and the way the industry keeps mistaking demos for direction.

I will keep a mix: practical production AI, security and governance, Claude Code field notes, and broader AI commentary when the news cycle exposes something worth saying.

LeanPub

Securing Enterprise AI Agents

AI agent security, bounded autonomy, AgentSecOps, MCP security, RAG governance, and audit evidence.

Book page LeanPub

Kindle + LeanPub

Claude Code: Building Production Agents That Actually Scale

Claude Code in real repos: MCP, permissions, hooks, evals, observability, cost controls, review, and rollback.

Book page Amazon

Build and secure enterprise AI agents in production

What I am trying to answer

Claude Code and other AI coding agents are already useful. The harder question is what happens when they meet real repositories, review habits, permissions, tests, and production risk.

The strongest posts so far:

Latest writing

A Claude Code production workflow where the rollback note is written before the patch

Claude Code needs a rollback note before code

Before Claude Code edits production-adjacent code, ask for the rollback note. If the agent cannot explain how to undo the change, the task contract is not ready yet.

A Claude Code permission workflow where request, grant, and evidence lead to automatic expiry

Claude Code permissions need expiry dates

Claude Code permissions are safest when they are temporary. Treat every extra file, command, MCP tool, and network path as a task-scoped grant that must expire unless a human renews it with evidence.

A Claude Code human review control loop with task contract, agent run, review packet, and human gate

Claude Code human review is a control, not a vibe check

Claude Code can make a change feel review-ready before the risk is understood. Production teams need human review that can reject the run, narrow the scope, or demand better evidence before merge.

A Claude Code flight recorder diagram with task contract, tool calls, evidence, review, and rollback

Claude Code needs a flight recorder

Claude Code can produce a clean patch from a messy run. Production teams need a flight recorder: the task contract, tool calls, permission pressure, tests, assumptions, and rollback notes that explain how the patch was made.

A Claude Code permission boundary diagram showing allowed tools, a closed gate for risky tools, review, and rollback

Claude Code permissions should fail closed

Claude Code permissions are where agent safety becomes concrete. If a run needs production data, billing config, deploy access, or a wider MCP tool, the default should be stop, explain, and wait for a human decision.

A Claude Code review packet diagram with task contract, evidence, boundary pressure, and rollback note

Before you merge Claude Code's work, ask for the receipt

Passing tests are a useful signal, but they are not enough for production Claude Code work. Ask for a review packet that shows scope, evidence, boundary pressure, remaining risk, and rollback before merge.

The model release is not your AI strategy

New models matter. They change what is possible. But a serious AI strategy cannot be rebuilt around every launch. The hard work is deciding what should change in your products, teams, controls, and habits.

Diagram showing a bad Claude Code run becoming a replay case, an eval, a control change, and a safer next run

Claude Code evals should start with the run that scared you

The best Claude Code eval is not a tidy benchmark. It is the uncomfortable run your team does not want to repeat, captured as a replayable production control.

Diagram showing a Claude Code permission budget across scope, tools, spend, and approval

Claude Code needs a permission budget

Before giving Claude Code wider access, define what each run may read, edit, call, spend, and merge. A permission budget keeps agent speed inside a reviewable boundary.

Diagram showing Claude Code MCP blast radius controls with allowed tools, write scope, audit trail, and approval gate

Claude Code MCP tools need a blast radius

MCP tools make Claude Code far more useful, but broad access turns a weak prompt into a production risk. Treat every tool as blast radius, not convenience.

Thomas Talks AI#

Start here#

Wider AI notes#

Books#

Securing Enterprise AI Agents#

Claude Code: Building Production Agents That Actually Scale#

What I am trying to answer#

Latest writing#