What Are AI Agents? Concepts, Architecture, and How They Work
An AI agent is a system that perceives its environment, reasons about it, and takes actions to achieve a goal — repeatedly, in a loop.
What Makes Something an “Agent”?
The minimal definition: a model + a loop + tools.
while not done: observation → LLM → action → execute → observationMost LLM chat interfaces are not agents — they’re single-turn request/response. An agent persists across turns, maintains state, and uses tools to affect the world.
This distinction matters more than it might seem. A chatbot that answers questions about your database schema is useful. An agent that connects to your database, runs queries, finds anomalies, and sends a Slack notification is doing something fundamentally different — it’s operating autonomously in the world rather than just producing text in response to a question.
The key properties that distinguish an agent from a chatbot are:
Autonomy: The agent decides what actions to take next, not the user. The user sets a goal; the agent figures out how to reach it.
Tool use: Agents can call functions that affect the external world — run code, search the web, read files, post to APIs. Without tools, a model can only produce text.
Persistence: Agents maintain state across multiple steps. Each action produces an observation that becomes context for the next decision.
Loops: Agents run in a cycle until a task is complete or they hit a stopping condition. A single model call is not an agent.
The Agent Loop
Every agent runs some variation of the ReAct loop:
- Observe — receive input (user message, tool result, environment state)
- Think — reason about what to do next (the LLM’s job)
- Act — call a tool, execute code, or produce output
- Observe — receive the result and loop
# Simplified agent loopwhile not agent.is_done(): thought = llm.think(agent.context) action = thought.next_action result = tools.execute(action) agent.context.append(result)The “Reason” in ReAct refers to the model generating a thought before deciding what action to take. This explicit reasoning step — which might look like “I need to find the current price of AAPL. I should search for it.” — is what separates ReAct from blind tool invocation. The model explains its reasoning before acting, which makes the behavior more interpretable and generally more accurate.
Core Architecture
┌─────────────────────────────────────────┐│ Agent System ││ ││ ┌─────────┐ ┌──────────────────┐ ││ │ Input │───▶│ LLM (Brain) │ ││ └─────────┘ └────────┬─────────┘ ││ │ ││ ┌───────────▼──────────┐ ││ │ Tool Dispatcher │ ││ └─┬──────┬──────┬─────┘ ││ │ │ │ ││ ┌────▼┐ ┌───▼─┐ ┌─▼────┐ ││ │ Web │ │Code │ │ API │ ││ │ Srch│ │Exec │ │Calls │ ││ └────┘ └─────┘ └──────┘ ││ ││ ┌──────────────────────────────────┐ ││ │ Memory / State │ ││ └──────────────────────────────────┘ │└─────────────────────────────────────────┘The LLM is the brain — it reads the current context (conversation history, tool results, system instructions) and decides what to do next. It either calls a tool or produces a final answer.
The tool dispatcher receives the model’s tool call (a structured function call with a name and arguments) and routes it to the appropriate handler. The handler executes the function and returns a result.
Memory and state store everything the agent has observed across its execution: the initial user request, the sequence of tool calls, every result received. This accumulated context is what allows the agent to maintain coherent behavior across many steps.
A Minimal Working Agent
Here’s a complete, working agent in about 30 lines:
import anthropic
client = anthropic.Anthropic()
TOOLS = [{ "name": "web_search", "description": "Search the web for current information.", "input_schema": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] }}]
def web_search(query: str) -> str: # In a real agent, this would call a search API return f"Search results for: {query} — [results would appear here]"
def run_agent(task: str) -> str: messages = [{"role": "user", "content": task}]
for _ in range(10): response = client.messages.create( model="claude-opus-4-6", max_tokens=1024, tools=TOOLS, messages=messages, )
if response.stop_reason == "end_turn": return next(b.text for b in response.content if hasattr(b, "text"))
messages.append({"role": "assistant", "content": response.content}) results = [] for block in response.content: if block.type == "tool_use": result = web_search(**block.input) results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result, }) if results: messages.append({"role": "user", "content": results})
return "Max turns reached"This demonstrates every essential component of an agent:
- The tools list tells the model what it can do
- The agent loop runs until
stop_reason == "end_turn" - Tool calls are detected in
response.content - Results feed back into the conversation as
tool_resultmessages - A turn limit prevents infinite loops
What Agents Can and Can’t Do
Agents are powerful for tasks that require multiple steps, require accessing external information or systems, or where the approach isn’t fully determined in advance. Research tasks, coding tasks, data analysis, and complex workflows are natural fits.
Agents are not magic. They’re limited by the quality of the underlying model, the tools they have access to, and the quality of their instructions. Common failure modes:
- Tool misuse: Calling the wrong tool, or calling a tool with incorrect arguments
- Context degradation: In very long runs, the model may lose track of earlier context or contradict earlier decisions
- Looping: Getting stuck in a loop if the task is unclear or a tool keeps returning unhelpful results
- Hallucination: Fabricating tool results rather than actually calling the tool (rare with well-designed tool schemas)
Designing agents well means anticipating these failure modes: clear tool descriptions, explicit stopping conditions, turn limits, and human-in-the-loop checkpoints for high-stakes actions.
Agents in Production
Building an agent that works in a demo is straightforward. Building one that works reliably in production requires thinking carefully about what can go wrong and building safeguards.
Production agents need logging. When an agent makes 30 tool calls and produces an unexpected result, you need to be able to trace exactly what happened: what was in the context at each step, which tool was called with which arguments, and what each tool returned. Without logs, debugging production failures is nearly impossible.
Production agents need monitoring. Track success rates, latency, and cost per run. Set up alerts for unusual failure rates or runaway costs. A single agent run that loops indefinitely can become very expensive very quickly.
Production agents need rate limiting. Agents that call external APIs (search, databases, third-party services) should respect rate limits and handle rate limit errors gracefully. An agent that hammers an API when it gets rate-limited will worsen the situation.
Production agents need human review for high-stakes actions. Sending emails, modifying production data, making purchases, posting to social media — any irreversible action should have a human confirmation step. The cost of pausing for confirmation is low; the cost of an unintended irreversible action can be very high.
These concerns don’t apply to experimental or internal-use agents where failures are easy to recover from. But they matter greatly for customer-facing agents or agents that interact with critical systems.
See Also
- Agent Patterns — Common design patterns for agent systems
- Tokens & Context — Managing context windows effectively
- Code Examples — Complete runnable agent code