What Is an AI Agent? — Agentic · Engineering Playbook

Summary#

An AI agent is an LLM-powered program that perceives its environment, reasons about a goal, and acts in the world through tools — in a loop, with enough autonomy to handle multi-step work without per-step human direction. The “agentic” part is not any single capability; it’s the loop. A plain LLM call answers a question and stops. An agent reads its environment, decides on an action, takes it, observes the result, and decides again.

What that loop typically looks like in 2026: a model (the reasoning engine), a tool surface (the verbs it can invoke), memory (short-term context plus optional long-term store), and an instruction set (the role, rules, and safety constraints). Drop any of these and the system breaks differently — a model without tools is a chat; tools without memory can’t compose multi-step work; instructions without memory or tools is a prompt.

Why it matters#

The shift from “LLM as oracle” to “LLM as worker” is what makes agentic systems a different engineering surface than chat. Three consequences follow:

Cost moves from per-call to per-task. A chat session bills per turn; an agent bills per step, and a multi-step task can spike to dozens of calls before it’s done. Cost engineering becomes a first-class concern.
Failure modes change shape. A chat hallucination produces a wrong answer; an agent hallucination can send the wrong email, edit the wrong file, or call the wrong API. The blast radius is real.
Evaluation gets harder. A chat response can be graded against a reference answer. An agent’s trajectory has many valid paths; success is end-state, not intermediate-state. The eval harness has to grade outcomes, not transcripts.

The practical implication is that “build a chatbot” and “build an agent” are different engineering tasks, even if both call the same underlying model. The agent is a system: a loop with state, observability, evaluation, and safety boundaries. The chat is one function call.

How it works#

The loop#

The canonical loop has four steps, repeated until done:

Perceive. Read the environment. For a coding agent: read files, run a command, search the codebase. For a web agent: take a screenshot, parse the DOM, query an API.
Reason. Decide what to do next. The model consumes the current state, the goal, and the history, and emits a plan or a tool call.
Act. Execute the chosen action — call a tool, run code, send a message, modify a file.
Observe. Read the result of the action. The observation becomes part of the input for the next iteration.

This is the ReAct (Reason+Act) loop in its simplest form. Variants exist — plan-then-execute (decompose first, then act), reflect-and-revise (act, critique, retry), multi-agent (loops within loops with handoffs) — but the underlying shape is the same.

The four components#

An agent’s behaviour is almost entirely determined by four pieces:

Model. The LLM that does the reasoning. Affects capability, latency, cost, and the maximum complexity of tasks the agent can handle.
Tools. The verbs the agent has access to. A tool is a function with a typed input — call send_email, call query_database, call take_screenshot. The tool surface defines what the agent can do in the world.
Memory. The state the agent carries forward. Short-term: the current conversation/loop context. Long-term: persistent stores the agent can read and write — vector indexes, key-value stores, file systems.
Instructions. The role, rules, output format, and safety constraints. The system prompt, the in-context examples, the persona. Instructions shape how the model reasons inside the loop.

Two agents with the same model and different tools, memory, or instructions are functionally different systems. Most novel agent work in the field is reconfiguring these four — not pretraining new models.

Autonomy is a dial, not a binary#

“Autonomous” has a spectrum baked in. At one end is human-in-the-loop — every action confirms with a human before executing. At the other is fully autonomous — the agent runs end-to-end, possibly for hours, with no human intervention. Production systems sit somewhere in between, usually closer to the human-in-the-loop end for anything consequential.

Choosing where on the dial to sit is a design decision, not a capability ceiling. The decision shapes the system’s failure cost: more autonomy means faster outcomes and bigger blast radius when something goes wrong.

Variants and trade-offs#

Plain LLM call — one input, one output, done. No state, no tools, no loop. Cheap, fast, predictable. Good for: classification, generation, Q&A, transformation. Bad for: anything requiring action, multi-step planning, or environment interaction.

Agent — looped LLM calls with tools, memory, and a goal. State, tools, multi-step. Expensive, variable, harder to evaluate. Good for: tasks that require action, decomposition, recovery, or environment exploration. Bad for: anything where a single call would do.

Within agents, the main axes:

Single-agent vs multi-agent. One model orchestrating tools, vs multiple agents collaborating (sequential, router, swarm, supervisor-worker). Multi-agent helps when the tasks have distinct expertise zones; it hurts when the coordination overhead outweighs the specialisation gain.
Tool-using vs code-executing. Tools are typed function calls; code execution lets the model write and run arbitrary code in a sandbox. Tool-using is safer and more debuggable; code-executing is more flexible and often more capable.
Short-horizon vs long-horizon. Minutes vs hours/days. Long-horizon agents need persistent memory, robust exit conditions, and far better observability than short-horizon ones — the cost of a long-running mistake compounds.
Synchronous vs asynchronous. Run-while-you-watch vs run-in-the-background-and-tell-me-when-done. Async agents change the UX entirely — closer to delegation than to interaction.

A short note on what's *not* an agent

Three things commonly called agents that mostly aren’t: (1) RAG systems — they retrieve and answer, no loop, no action; (2) function-calling LLM endpoints — one call, one tool use, no follow-up; (3) workflow engines with an LLM node — the LLM is one step in a pre-defined pipeline, not the planner. Each of these uses LLMs and tools, but they’re missing the loop-with-state-and-goal that makes an agent an agent. They’re useful — they’re just a different category.

When this is asked in interviews#

Increasingly often as agentic tooling becomes mainstream. The question shapes:

AI-product loops. “Walk me through an agent you’ve built.” The interviewer wants the loop, the tool surface, the memory, the eval — not just “it uses GPT-4”. Show that you understand the system.
Senior backend / platform loops where AI is on the roadmap. “What changes when you move from chat to agent?” The expected answer covers cost shape, failure mode, evaluation, observability, blast-radius management.
ML engineering loops. “How would you evaluate an agent?” — end-state grading, trajectory tracing, success rate on a held-out task set, cost per successful task. Anyone who answers “user thumbs-up rate” alone is failing.

Common follow-ups:

“When wouldn’t you use an agent?” — Anything a single call can answer, anything latency-bound under ~500ms, anything where the failure cost is too high to accept.
“How do you keep an agent from running away?” — Step budgets, success predicates, stagnation detectors, human-in-the-loop gates on irreversible actions.
“What’s hardest about making agents production-ready?” — Evaluation, cost predictability, and failure observability. Capability is usually fine; reliability is the constraint.