The phrase AI agent has become almost useless unless you ask one extra question:

What is this system allowed to do?

That is where the definition starts to matter.

A chatbot can answer a question. An AI agent can pursue a goal through steps. A production AI agent can read context, choose actions, call tools, use memory, change state, and leave evidence behind. The difference is not only intelligence. The difference is delegated authority.

That is the part most definitions miss.

A practical definition of an AI agent

Here is the definition I use with engineering teams:

An AI agent is a software system that uses a model, instructions, tools, context, and runtime state to pursue a goal with some degree of autonomy.

That sounds dry, but each word carries weight.

  • Model: the reasoning engine, usually an LLM or multimodal model.
  • Instructions: the task, policy, role, boundaries, and expected output.
  • Tools: APIs, shell commands, MCP servers, databases, browsers, code editors, ticketing systems, or business systems.
  • Context: files, prompts, retrieved documents, conversation history, memory, user input, logs, and current environment.
  • Runtime state: the loop that lets the system plan, act, observe, retry, stop, or ask for approval.
  • Autonomy: the degree to which the system can decide the next step without a human choosing every action.

The more authority the system has, the more serious the engineering problem becomes.

A demo agent that drafts a paragraph is one thing. An agent that opens a pull request, queries customer data, sends an email, changes a ticket, calls a payment API, or edits infrastructure is something else entirely.

Why people define agents differently

When I started asking engineers what an AI agent was, the answers were all over the place.

Some people said:

  • “LLM plus memory, planning, tools, and a loop.”
  • “A system that can complete multi-step tasks.”
  • “Something that can make decisions for me.”
  • “A model that controls the application’s flow.”
  • “A software worker with access to tools.”

None of those are wrong. They are just incomplete.

A machine-learning engineer may think in terms of perception, decision, and action. A software engineer may think in terms of loops, APIs, orchestration, and state. A product person may think in terms of task completion. A security architect will ask the better question: what can it touch?

That is why the same word creates confusion. We are not arguing about the model. We are arguing about authority.

The useful spectrum: assistant, workflow, agent

I find this split more useful than trying to draw one perfect boundary.

Assistant

An assistant helps a human think or draft. It may summarize, answer, explain, or suggest. The human still decides what happens next.

Example: a chat tool that explains a policy document.

Workflow

A workflow follows a defined path. It may use AI inside the flow, but the steps are mostly predetermined.

Example: classify an email, extract fields, route it to the right queue, ask a human to approve the response.

Agent

An agent has room to choose steps. It can decide which tool to call, what to inspect, what to try next, and when the task is finished.

Example: a coding agent that reads a repository, proposes a patch, runs tests, reacts to failures, and prepares a review packet.

The hard cases sit between workflow and agent. That is where most real enterprise systems will live for a while: bounded autonomy, not total freedom.

The part most definitions miss: delegated authority

The moment an agent can act, the question changes from “is the answer good?” to “what damage can the action cause?”

A production team has to ask:

  • What files can it read?
  • What files can it edit?
  • What tools can it call?
  • Can it access secrets?
  • Can it use customer data?
  • Can it trigger CI, deploy, send messages, update tickets, or change records?
  • When does it need human approval?
  • What evidence remains after the run?

That is the real definition in practice.

An AI agent is not dangerous because it has a fancy name. It becomes risky when it has access, incentives, and unclear boundaries.

Examples of AI agents in the real world

A few examples make the definition less abstract.

Coding agent

A coding agent can inspect a repository, edit files, run commands, and prepare a pull request. Claude Code is the obvious example in my own writing, but the wider pattern matters more than one tool.

The production questions are about permissions, review, tests, rollback, and evidence. If the agent can change the repo, you need more than a clever prompt.

RAG agent

A RAG agent retrieves documents, reasons over them, and may call follow-up tools. In a simple case, it answers questions. In a serious case, it may influence decisions, generate reports, or route work.

The production questions are about source quality, data permissions, retrieval boundaries, citations, and audit trails.

Customer-service agent

A customer-service agent may read account data, summarize history, recommend next actions, or draft replies. If it can issue refunds, change plans, or update records, the risk jumps.

The production questions are about identity, policy, approvals, customer harm, and monitoring.

Financial-services agent

A financial-services agent may help with onboarding, fraud investigation, document review, compliance checks, or operational workflows.

The production questions are not optional here. You need segregation of duties, evidence, review gates, model-risk controls, and a clear line between suggestion and action.

What makes an AI agent production-ready?

A production-ready agent is not simply more capable. It is more constrained, more observable, and easier to challenge.

I would look for six things before trusting an agent with serious work.

1. Clear task boundaries

The agent needs a defined job. If the task is vague, the agent will invent scope. That is where small mistakes become expensive.

2. Tool boundaries

Tools need a blast radius. A tool that can read one project folder is different from a tool that can read the whole machine. A tool that can draft a ticket is different from one that can close it.

3. Human approval points

Not every step needs approval. The important steps do. Anything irreversible, expensive, customer-facing, security-sensitive, or production-facing should have a gate.

4. A run record

The agent should leave a trail: prompt, context, tool calls, files touched, commands run, outputs, skipped checks, errors, and decisions.

If you cannot reconstruct the run, you cannot review it properly.

5. Stop rules

Agents need a way to stop before they drift. Repeated failures, unexpected files, rising cost, uncertain requirements, or permission conflicts should stop the run and pull a human back in.

6. Review that still means something

Human review cannot become theatre. A reviewer needs the evidence to judge what happened, not just the final answer.

Why this matters now

The industry keeps compressing the distance between idea and action.

A few years ago, AI mostly generated text. Now it can use tools, write code, browse systems, call APIs, generate plans, execute commands, and coordinate workflows. That is useful. It is also a governance problem.

The question is not whether agents will be used. They already are.

The question is whether teams will treat them as production systems before or after the first ugly incident.

A simple test

When someone tells you they are building an AI agent, ask five questions:

  1. What goal is it pursuing?
  2. What can it see?
  3. What can it change?
  4. When must it stop or ask?
  5. What evidence does it leave behind?

If those questions have clear answers, you may have the start of a production system.

If they do not, you probably have a demo with access.

That is where the real work begins.

Where to go next

If you are thinking about AI agents as production systems, these are the pieces I would read next:

The common thread is controlled autonomy: useful systems that can act, but only inside boundaries the team can explain, review, and defend.