Beyond the Hype: Defining Agents vs Models
Behavior (orchestration), actions (tools), and decision making (model)
There is a lot of hype surrounding the concept of AI agents, so I was excited when I ran into Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic's whitepaper on Agents. This is the best document I have seen so far that clearly explains the key building blocks of agentic architecture. This post summarizes the key takeaways.
What is an Agent?
An agent is an autonomous system designed to achieve specific objectives by interacting with its environment. Unlike traditional AI models that rely solely on pre-trained data and require a trigger, agents can:
Act independently of human intervention.
Use tools to access real-time information or perform actions.
Plan and execute tasks iteratively to achieve their goals.
The combination of behavior (orchestration), actions (tools), and decision making (model), also described as cognitive architecture are the foundational building block of an agent.
Exploring the Key Components of an Agent
Decision Making with Language Models
The Model is the central decision-making engine of the agent, typically a language model (LM). The agent might rely on one or multiple language models of any size, small or large, capable of following instruction based reasoning or logic frameworks like ReAct (Reasoning and Acting), Chain-of-Thought (CoT), or Tree-of-Thoughts (ToT).
The model can be general-purpose or fine-tuned for specific tasks. The more the model is trained for a specific task, the better the outcome.
Actions Leveraging the Agent Tools
Tools are what enable agents to interact with external data and services. They unlock a wide range of actions beyond beyond that of the underlying model alone.
Key type of tools include:
Extensions: they act as bridges between agents and APIs for seamless integration and enable the developer to control the agent interactions with the API endpoints. For instance, if you build an agent to book a flight, an extension will teach that agent what API and how to use the API to book a flight.
Functions: they are client-side modules that allow developers to control API execution and the flow of data in the application as a whole. In other words, it enables the agent to leverage the model to refine the user interaction without invoking an external system. For instance, it might produce a list of cities to travel to, but before booking a flight, the developer wants to provide information about each city to the user.
Data Stores: Additional data (structured, aka. a database, or unstructured, aka. files) available to the model for the purpose of augmenting the model's knowledge (aka. RAG: Retrieval Augmented Generation). A data store usually leverages a vector data base and converts the additional data into a set of vector database embeddings that the model can readily interpret and leverage without requiring time-consuming data transformations, model retraining, or fine-tuning.
One obvious question that arises is the relationship between a data store and an extension. Take the need to leverage HR documents that live in Google Drive. A typical high level pattern for this type of integration is to leverage an approach similar to the one use is search architecture:
Leverage a data store to maintain an "index" of the content that will provide the embeddings from Google Drive to your model to find the right content.
Use an extension for file retrieval of the actual content as required
Managing Behaviors with the Orchestration Layer
The orchestration layer defines how the agent operates and is at the core of the agent's cognitive architecture. Similar to humans, to reach a goal, an agent will take iterative steps processing information, making informed decisions and refining the next action based on the previous output. The orchestration layer is responsible for maintaining memory, state, reasoning and planning using the rapidly evolving field of prompt engineering and associated frameworks.
Key frameworks include:
ReAct, a prompt engineering framework that provides a thought process strategy for language models to Reason and take Action on a user query, with or without in-context examples. A simple example:
Question: What is the current temperature in New York City on Tuesday, January 7, 2025, at 4 PM EST?
ReAct will iterate through the following thoughts:
Thought 1: I need to look up the current weather data for New York City.
Action 1: Use the weather API to fetch the temperature for New York City at this time.
Observation 1: The API returns that the temperature is 35°F.
Thought 2: Based on this observation, I now know the current temperature.
Answer: The current temperature in New York City is 35°F.
ReAct prompting shows 10-34% performance gains across diverse tasks. You can learn more about ReAct in its original research publication.
Chain-of-Thought (CoT), a prompt engineering framework that focuses on reasoning through intermediate logical steps without interacting with external tools. The model relies solely on its internal knowledge. The question above results in the following steps with CoT:
Question: What is the current temperature in New York City on Tuesday, January 7, 2025, at 4 PM EST?
CoT will guide the agent through these steps:
The temperature in New York City varies depending on the season.
January is winter in New York City, so it is likely cold.
Typical winter temperatures in New York range from 20°F to 40°F.
Answer: Based on this reasoning, the current temperature in New York City might be around 30°F.
You can learn more about CoT in its original research publication.
There are other reasoning technique and each is best suited for specific tasks. For ReAct vs CoT:
Agents vs. Models
For technical audiences, it is important to understand the difference between agents and models. While language models are limited to their training data and single-turn predictions, agents extend these capabilities by:
Integrating external tools for real-time knowledge.
Managing session history for multi-turn interactions.
Iteratively reasoning and planning actions based on evolving contexts.
The Future of Agents
As reasoning frameworks evolve and tools become more sophisticated, agents will tackle increasingly complex problems across industries such as healthcare, finance, logistics, and customer service. The concept of “agent chaining”—combining specialized agents into a cohesive system—will further enhance their capabilities.
Building effective agents requires iterative development tailored to specific business needs. By leveraging cognitive architectures, reasoning frameworks, and advanced tools, developers can create impactful solutions that extend the boundaries of what AI can achieve.
A great article, David! Agents is one of the most exciting topics I'm following as well. One of my first blogs on Substack is on Agents and RAG in SaaS applications coming up later today as well.
p.s. Thank you for introducing me to Substack - just started my own series this week!