An AI agent is more than a language model given more autonomy. It is a system where the model acts as a brain coordinating three capabilities that are individually simple but together produce complex behaviour: access to external tools, the ability to maintain context across multiple steps, and chained reasoning to break down problems.
Tools: the agent’s arms
A language model on its own can only process text and produce text. Tools are what give it the ability to act in the world.
Technically, a tool is a function the model can invoke. The model decides when to use it, with what arguments, and receives the result to incorporate into its reasoning. The implementation varies between frameworks, but the pattern is always the same: the model produces structured text indicating which tool to use and with what parameters, the system executes that tool, and returns the result to the model.
Tool categories:
Information tools:
- Web search (Google, Bing, specialised searches)
- Database queries
- Reading files and documents
- Access to data APIs (weather, finance, news)
Computation tools:
- Python/JavaScript code execution
- Calculators and mathematical processing
- Image processing
- Statistical analysis
Action tools:
- Sending emails or messages
- Creating or modifying files
- Web interface interaction (browser automation)
- Calls to third-party APIs (CRM, calendars, Slack)
The design of the tool set is one of the most important architectural decisions when building an agent. More tools is not necessarily better: each additional tool increases the complexity of the agent’s reasoning and the risk of choosing the wrong tool.
How the model uses tools
The mechanism of “tool calling” or “function calling” is the standard interface between the model and the tools. The process is as follows:
- The developer defines the available tools with a schema describing what each one does and what parameters it accepts.
- That description is included in the system prompt.
- When the model decides it needs a tool, it generates a structured response (usually JSON) indicating which tool to invoke and with what arguments.
- The system intercepts that response, executes the tool, and returns the result to the model as part of the conversation.
- The model continues with the result available.
Simplified example of a cycle with tools:
Model: "I need to find the current price of Apple."
→ tool_call: {"name": "web_search", "query": "AAPL stock price today"}
System: Executes the search → result: "AAPL: $189.45 (15:30 EST)"
Model: "The current price of Apple is $189.45.
Now I compare with the price from a month ago..."
→ tool_call: {"name": "web_search", "query": "AAPL stock price May 2026"}
The key is that the model chooses when and how to use each tool based on the objective and context. It is not a predefined script: it is reasoning at each step.
Memory: the three types
Agents need memory to function well on tasks that extend beyond a single exchange. There are three types that work in very different ways:
Working memory (in-context). This is simply the model’s context window: everything that has happened in the current session — the objective, actions taken, results obtained, the model’s reflections. It is the most immediate and powerful, but is limited by the size of the context window.
Episodic memory (scratchpad). For long tasks that may exceed the context window, the agent can write to a “notebook” — an external file or variable — summaries of what it has done and learned, and consult it when it needs earlier context.
Long-term memory. Information that persists across different sessions. Implemented with vector databases (RAG) or traditional databases. The agent can remember that the user prefers a certain report format, or that a specific API has a rate limit per minute.
Orchestrating these three types of memory is one of the most complex technical problems in agent design. Too much information in the context fills it and degrades performance. Too little and the agent loses the thread.
Chained reasoning
The chained reasoning (chain-of-thought) we saw as a prompting technique takes on a different dimension in the context of agents: the model writes its reasoning steps explicitly as part of its working process.
Frameworks like ReAct (Reasoning and Acting) structure this process:
Thought: What do I need to do? The user wants to know if
their company should invest in TikTok advertising.
First I should understand who their audience is.
Action: web_search("TikTok user demographics UK 2025")
Observation: [search results]
Thought: 60% of users are between 18–34 years old.
What is the user's target audience?
I need more information.
Action: ask_user("What age group are your target customers?")
Observation: "25–45 years, mid-to-senior professionals"
Thought: The client's audience has some overlap
with TikTok but is not the core. I will search
for cost and conversion data for that segment.
...
This explicit reasoning has two advantages: it produces better decisions (the model does not jump straight to conclusions) and makes the process auditable (you can review why the agent made each decision).
The reliability problem
Agents are more powerful than chatbots, but also more fragile. Reliability problems are the biggest obstacle to their adoption at scale.
Errors that propagate. If the agent makes an error at step 3 of a 10-step process, steps 4 through 10 are built on that incorrect foundation. The error is amplified.
Infinite loops. An agent can get stuck in a reasoning cycle where each action leads to a situation requiring a similar action, with no real progress.
Wrong tool choice. The agent may use a tool appropriate to the wrong problem, or use the right tool with the wrong parameters.
Ambiguous objective interpretation. A poorly defined objective can lead the agent to complete something different from what the user wanted, in a way that looks correct on the surface.
Solutions to these problems include: well-defined objectives, well-documented tools, human checkpoints at critical steps, step limits to avoid loops, and real-time process monitoring. In the next chapter, we will see how to apply this in real cases.