Demystifying AI Agents: Frameworks and Comparative Analysis
What Are AI Agents and Tools?
In modern AI systems, an AI agent refers to an autonomous reasoning engine powered by LLMs that can break down problems, make decisions, and perform actions to achieve a goal. Unlike a static chatbot or assistant that only responds with text, an agent actively plans its steps and can use external functions or APIs (often called "tools") to extend its capabilities beyond text generation . These tools might include web search, database queries, code execution, or any custom function – enabling the agent to observe, act upon, and modify its environment.
AI agents typically operate in a loop: they assess a user query, decide on a plan (possibly decomposing complex tasks), invoke tools or other agents as needed, and iterate until they produce a final answer. This autonomy and tool-use give agents a form of "agency" – they don’t just answer questions; they figure out how to answer or accomplish tasks by themselves, within the bounds set by their design and available tools.
Throughout this article, we’ll explore several prominent AI agent frameworks. For each, we’ll examine how they define "agents" and "tools", their internal architecture (state management, planning loop, etc.), how they integrate and call tools, code examples of usage, any available architecture diagrams, and the unique strengths or use cases they support. We’ll then compare these frameworks side-by-side to help you choose the right one for different scenarios.
1. LangGraph (by LangChain)
Agent & Tool Definitions: LangGraph is an advanced agent framework built on LangChain that models agents as nodes in a stateful computation graph. In LangGraph, an agent is essentially a node (or a set of nodes) representing an LLM-powered component that can act in the workflow. A tool in LangGraph is typically a function or action that an agent can execute; LangGraph uses LangChain’s tool-calling interface, meaning you can bind any Python function as a tool for the agent to call. For instance, an external API or a database query can be wrapped as a ToolNode in the graph, allowing the agent to invoke it when needed.
Architecture & State: LangGraph introduces a graph-based architecture for agent orchestration. Workflows are represented as cyclic graphs (not just linear DAGs), enabling loops and iterative decision-making. The key components include:
- Nodes: Each node is a discrete step or sub-agent (an LLM, a function, or another graph). Nodes can be of various types (LLM calls, tool calls, etc.), and developers can create custom node types.
- Edges: Edges define the flow of execution and can include conditional logic. They determine which node runs next based on the current state.
- State: LangGraph emphasizes persistent state and memory. The state is a shared context (structured as a Python object) that accumulates information across the agent’s steps. This could be the conversation history, intermediate results, or any variables the graph needs to remember. The framework provides a Checkpointer to store state at each step and a Store for long-term data across sessions.
- Cyclical Execution: Unlike one-shot agents, LangGraph allows agents to loop back to earlier nodes (forming cycles) if needed. This is useful for refining answers or iterative tasks.
Planning & Execution Loop: In LangGraph, planning can be explicitly designed via the graph structure. A typical agent loop (for a tool-using agent) involves the LLM node deciding on an action (potentially calling a tool), then tool nodes executing and updating the state, and the loop continuing until a certain end condition is met. You could implement classic agent strategies (like ReAct) as graph patterns: e.g., an LLM node that either produces a final answer or a tool invocation, edges that route to the appropriate tool node, and then back to the LLM node with the observation. LangGraph’s advantage is that you’re not confined to a single loop – you can branch, merge, or cycle as needed, giving fine-grained control over the agent’s cognitive architecture.
Tool Integration: Tools in LangGraph are usually LangChain tools bound into the graph. Using LangChain’s API, you can attach tools to an LLM by providing function schemas. For example, LangChain’s ChatModel.bind_tools() can wrap Python functions so that the LLM can call them with structured arguments. LangGraph inherits this capability, but places it in a graph context. A ToolNode can be added to your graph to represent a specific tool call. During execution, if the LLM’s output indicates a tool should be used, the graph’s control flow will direct to that ToolNode. The tool’s output is then captured into state and fed back into the LLM node on the next cycle.
Defining a Simple Agent: Below is a simplified example in pseudo-code of using LangGraph to set up a basic agent that can use a math tool. This illustrates how you might define nodes and run the graph:
LLMNode uses an LLM (like GPT-4 via LangChain) with a prompt that instructs it to use add_tool if needed. The ToolNode.from_function helps create a tool with the appropriate schema from the Python function. (LangGraph’s actual API might differ slightly, but conceptually this is how an agent can be constructed.)
Workflow Diagram: LangGraph’s execution can be visualized as a directed graph of nodes and edges. Each node (agent step) processes input and produces output that flows along edges to the next step. It even supports multi-agent graphs – you can have multiple LLM agents (nodes) within one graph coordinating via the shared state. For example, you might design a graph with a "Researcher" agent node and a "Writer" agent node that pass tasks back and forth (forming a cycle) until a report is complete. This flexibility is a core strength: LangGraph is ideal for complex, custom workflows where you need full control over the agent’s decision process and state.
Use Cases & Strengths: LangGraph shines in scenarios requiring reliable, production-grade agents with custom logic. It was created to overcome limitations of simpler agent loops. Unique strengths include:
- Resilience and Debuggability: The explicit graph structure makes it easier to inspect and test each part of the agent’s reasoning. LangGraph integrates with LangSmith (LangChain’s observability tool) for tracing and debugging agent runs.
- Stateful Iteration: The persistent state and cyclical loops enable agents that can reflect on past actions (supporting patterns like self-reflection and iterative refinement).
- Multi-Agent Coordination: LangGraph natively supports multi-agent workflows, letting multiple agents operate in one graph (with possible parallelism or sequential coordination).
- Human-in-the-Loop: It allows insertion of human check-points or approvals within the graph (e.g., a node that halts for human decision). This is crucial for sensitive applications where full autonomy isn’t desirable.
In summary, LangGraph provides a highly customizable orchestration layer atop LangChain. It’s open-source (MIT licensed). If you need to build a bespoke agentic system - say a workflow with conditionals, memory, multiple LLMs, and rigorous control – LangGraph offers the building blocks to do so.
2. crewAI (Open-Source Multi-Agent Orchestration)
Agent & Tool Definitions: crewAI is an open-source framework focused on orchestrating multiple AI agents collaborating as a team. In crewAI’s terminology, an agent represents an autonomous entity with a specific role, goal, and (optionally) backstory. You can think of each agent as a specialized expert (e.g., a "Researcher", "Data Scientist", etc.), and together they form a crew to tackle tasks collaboratively. A tool in crewAI is defined as a skill or function an agent can use to perform tasks beyond just reasoning. crewAI supports integration with LangChain’s tools as well as its own toolkit of custom tools. For example, crewAI provides ready-made search tools (like JSONSearchTool, GithubSearchTool, YouTubeChannelSearchTool) to enable information retrieval using RAG techniques. Agents can also use LangChain’s tools (shell, Python REPL, etc.) or any custom tool you define by providing a function with a description.
Architecture & Components: crewAI introduces a modular architecture with five main components:
- Agents: The fundamental actors, each with distinct roles/goals as mentioned. They are configured with an LLM (OpenAI’s GPT-4 by default, or any other via LangChain integration), a persona (role/backstory), and optional tool access. Agents can communicate with each other by exchanging messages (delegating work or asking questions) during the workflow.
- Tools: Extend agent capabilities as described. Tools in crewAI come with built-in error handling and caching, to ensure robustness in agent actions.
- Tasks: A Task represents a unit of work or an assignment that one or more agents need to complete. Tasks have attributes like a description, an expected output, and an assigned agent. They can also specify tool usage or run asynchronously for parallelism. Tasks essentially break down a project into manageable pieces.
- Processes: A Process defines how tasks are executed by the agents, effectively the collaboration strategy or workflow pattern. crewAI provides two implemented process types. Sequential Process: Agents execute tasks in a predefined order, one after another (like an assembly line). The output of one task can feed into the next. Hierarchical Process: crewAI autonomously generates a manager agent that oversees the others in a manager-worker hierarchy. The manager assigns tasks to appropriate agents, checks their outputs, and coordinates completion – mimicking a project manager coordinating a team. (A planned Consensual Process is mentioned for future, where agents democratically decide, but not yet implemented.)
- Crews: A Crew is the group of agents plus the assigned tasks and the chosen process – essentially the overall multi-agent orchestration unit. When you define a Crew, you specify which agents are in the team, what tasks they have, and what process will govern them. Starting the crew will then trigger the process (sequential or hierarchical) and run through the tasks with the agents collaborating.
Planning & Execution: In crewAI, the planning largely depends on the chosen process. In a sequential setup, "planning" is just following the fixed task list order (though each agent still independently decides how to accomplish its task, possibly using tools). In the hierarchical setup, the manager agent does dynamic planning – it might decide task order, reassign tasks if needed, or spawn new subtasks based on results. Agents communicate via crewAI’s built-in mechanisms: for instance, an agent can delegate subtasks to others by messaging (if allowed by the process). This inter-agent communication and delegation is a core feature: it’s designed to emulate a human team brainstorming and cooperating to solve a complex problem.
The execution loop in crewAI thus can be more complex than a single-agent loop. For example, in a hierarchical process:
- The manager agent (automatically created) receives the overall goal.
- It plans sub-tasks and assigns them to specialist agents in the crew.
- Each agent works on its task (potentially using tools), and returns the result.
- The manager reviews results, maybe asks another agent to verify or improve something (e.g., it could delegate a "Critic" agent to evaluate outputs).
- This continues until all tasks are completed to the manager’s satisfaction, then a final output is compiled.
Throughout, crewAI maintains a shared conversation context so that all agents remain aware of relevant information from each other’s progress. It also supports connecting to external monitoring (e.g., logging, evaluation tools) for observability of multi-agent runs.
Tool Invocation: Agents in crewAI use tools by calling functions defined in the toolkit or added by the developer. For example, an agent might call JSONSearchTool or a PythonREPLTool if its prompt decides to (crewAI likely wraps these so that the LLM knows the tool names and usage). Thanks to integration with LangChain, adding a LangChain tool is straightforward – you simply include it in the agent’s tool list and the agent’s LLM prompt will have that capability. The developers are responsible for describing what each tool does (tool description), so the agent can choose appropriately based on its instructions. All tool calls in crewAI are executed with error handling (to catch exceptions) and optional caching (to reuse results of expensive operations), enhancing reliability.
Creating Agents and a Crew: Below is an example of using crewAI’s Python API:
In this snippet, we set up two agents and assign them tasks, then run them in sequence. In a hierarchical process, we would omit direct agent assignment on tasks and let the manager decide which agent handles each task at runtime.
Use Cases & Strengths: crewAI is purpose-built for autonomous multi-agent teams. Its strengths include:
- Role-Playing Agents: It encourages designing agents with distinct personalities and specialties (e.g., a "Critic", "Planner", "Coder" working together). This can enhance reasoning via inter-agent debate or complementary skills.
- Structured Collaboration: By introducing tasks and processes, crewAI gives more structure to multi-agent collaboration compared to free-form agent chats. This makes complex workflows (like a team of agents building a product or handling a customer query pipeline) easier to manage.
- Modularity: Each component (agent, tool, task, process) can be customized or extended. You can plug in different LLMs per agent, add new tools easily, and choose a collaboration style (seq/hierarchical) that suits the problem.
- Real-World Examples: The framework comes with community examples (in the crewAI-examples repo) like email drafting automation (using LangGraph + crewAI) or stock analysis by multiple agents. This shows its applicability from content creation to data analysis.
- Comparison to Others: crewAI combines ideas from existing frameworks – the conversational flexibility of Microsoft’s AutoGen and the structured processes of frameworks like ChatDev. It tries to give the best of both worlds.
In summary, if you need a team of LLMs working in concert – where each has a clear role – crewAI provides a ready-to-use structure. It’s well-suited for complex tasks like research assistants with multiple expert personas, automated business processes with different AI roles, or educational scenarios where agents "teach" or challenge each other.
3. OpenAI Function Calling & Agents SDK (Assistants API)
Agent & Tool Concepts: OpenAI’s API itself can facilitate agent-like behavior through function calling. In this paradigm, the LLM (ChatGPT model) effectively acts as the agent, and functions defined by the developer serve as the tools. When using OpenAI’s function calling, you provide the model with a list of functions it can call (with JSON-schema specifications of their arguments). The model’s responses can then include a function_call entry indicating which function to use and with what arguments, instead of a final answer. In other words, the LLM internally decides if/when to invoke a tool based on the conversation and tool descriptions, making it a self-directed agent that uses tools. For example, you might define a get_weather function; the user asks, "Do I need an umbrella today in NYC?", and the model chooses to call get_weather with {"location": "New York City"}. You execute that function (say, fetch weather data) and feed the result back to the model, which then produces a final answer.
OpenAI Agents SDK (sometimes referred to as the "Assistants API" is in beta) which is a higher-level framework to build these function-calling agents more easily. In the Agents SDK, an Agent is explicitly defined as an LLM with a given persona (system instructions) and a set of tools. The SDK introduces a few core abstractions:
- Agent: encapsulates an LLM plus optional tools and instructions. For example, you can create an Agent(name="Assistant", tools=[WebSearchTool(), FileSearchTool()]) to get an agent that can do web and vector DB searches.
- Tools: The SDK recognizes three types of tools. Hosted Tools: pre-built tools that run on OpenAI’s server side (like a web search or code execution tool that OpenAI provides out-of-the-box). For instance, WebSearchTool and FileSearchTool are provided when using OpenAI’s ResponsesModel. Function Tools: any Python function you provide can be a tool. The SDK will automatically generate the JSON schema from the function signature and docstring (using Python’s introspection and Pydantic). This greatly simplifies adding custom tools. Agents as Tools: even an entire agent can be used as a tool by another agent, allowing hierarchical or multi-agent setups (one agent can delegate to another).
- Runner (Agent Loop): The Agents SDK provides a Runner utility that handles the message-passing loop with the LLM. The Runner (agent, user_input) will internally call the LLM, detect function calls, execute them, feed results back, and loop until completion. This built-in agent loop saves you from manually writing the while-loop logic.
Architecture & Execution: With just OpenAI’s raw API, the "architecture" lives partly in the model’s prompt (the model decides when to call a function) and partly in your code (you must catch the function_call response and handle it). The OpenAI Agents SDK formalizes this. It essentially implements an agent loop where:
- The user message (and conversation history) plus available tools are given to the LLM.
- The LLM either responds with an answer or a function call.
- If a function call is returned, the SDK executes the function (the "tool") and captures its output.
- The output is sent back to the LLM as context (usually as a system-level function result message).
- Steps 1–4 repeat until the LLM returns a final answer, which is then output as the agent’s response.
The planning is implicit – the LLM’s own reasoning (influenced by its training and the system prompt you set) decides which tool to use and in what sequence. There’s no explicit planner code; instead, you can encourage good planning by writing clear system instructions or using OpenAI’s function definitions effectively. The Agents SDK, however, gives you hooks to customize behavior if needed (like adding guardrails/validation on tool inputs, or using Handoffs which let an agent defer to another agent for specific queries).
Basic Usage: Using the raw API vs. the SDK looks a bit different. Here’s a quick example with the raw OpenAI Python API:
The above demonstrates the low-level loop. Now, using the OpenAI Agents SDK in Python (which abstracts much of this):
This high-level approach takes care of parsing the function call and looping until the final output (accessible via result output). The OpenAI SDK also supports async operation, streaming responses, etc., with minimal configuration.
The SDK’s Guardrails allow parallel validation of inputs (to stop the loop if something is off), and Handoffs allow delegating to sub-agents. In effect, you can construct multi-agent systems by making one agent’s tool be another agent. For example, an "Orchestrator Agent" could have a tool that triggers a "Database Agent" to handle a sub-query. This is analogous to the SupervisorAgent concept in other frameworks.
Use Cases & Strengths: The OpenAI function calling approach (with or without the SDK) excels at simplicity and reliability:
- Minimal Setup: If you’re already using OpenAI’s models, adding function calling requires just defining your functions’ schema and letting the model handle the rest. The cognitive burden is mostly on the model.
- Wide Support: Function calling is supported by GPT models on OpenAI (and similarly by some Anthropic models and others). It’s a standardized way to do tool use, so many ecosystem libraries (LangChain, LlamaIndex, AutoGen, etc.) support it or build atop it.
- Production-Readiness: OpenAI’s Agents SDK is production-focused. It’s designed to be lightweight with few abstractions, making it easier to debug and trust. The built-in tracing and upcoming integration with evaluation and fine-tuning tools in the OpenAI ecosystem are bonuses.
- Limitations: The flipside is less flexibility – the planning is entirely implicit in the LLM’s response. If the model decides not to call a tool when it should, or calls them in wrong order, you have to handle that via prompt engineering or additional checks. There is no explicit planner module you can tweak (beyond writing good system messages or adding Guardrails to prevent certain actions). However, for many applications (e.g., answering questions with a couple of API calls, building a code assistant that uses a Python tool, etc.), this approach is robust and straightforward.
In conclusion, OpenAI’s function calling and Agents SDK are a powerful way to get agent behaviors quickly – especially if you need an AI to call APIs or perform actions as part of a conversation. It’s a great fit for use cases like ChatGPT Plugins-style integrations, personal assistants that manage your calendar/email via functions, or any scenario where you want an LLM to decide when to use a tool. For developers in the OpenAI ecosystem, the Agents SDK also provides guardrails and multi-agent capabilities out-of-the-box, making it a compelling option for building complex assistants with minimal overhead.
4. AutoGen (Microsoft’s Multi-Agent Framework)
Agent & Tool Definitions: AutoGen is an open-source framework for building applications where multiple agents converse and collaborate to solve tasks. In AutoGen, an agent is a conversable entity that can send and receive messages (often powered by an LLM, but could also represent a human or a code executor). AutoGen provides built-in agent classes:
- ConversableAgent: a base class that implements the messaging interface – this is what allows agents to chat with each other asynchronously.
- AssistantAgent: a subclass of ConversableAgent designed to be an AI assistant (backed by an LLM) that can autonomously generate responses, including writing code if needed.
- UserProxyAgent: another subclass that acts as a proxy for a human user – it waits for human input by default, but can also execute code or call tools on behalf of the user. Essentially, this agent can either relay a human message or, if no human input is needed, it can itself use an LLM or code to reply (making it a semi-autonomous agent representing the user’s side).
- GroupChatManager / RoundRobinChat: utilities to manage multi-agent conversations (who speaks when, termination conditions, etc.) so that agents take orderly turns in complex interactions.
A tool in AutoGen is typically implemented via code execution or function calls within an agent’s message. AutoGen emphasizes that agents can either call tools by generating code (e.g., the AssistantAgent can output a Python code block which the framework will execute and return the result), or by using OpenAI’s function calling through a special GPTAssistantAgent (more on this below). AutoGen’s philosophy is to allow agents to be "conversable, customizable, and integrative" – meaning they can integrate LLMs, tools, and even human input in one conversation loop.
In newer versions, AutoGen introduced GPTAssistantAgent which leverages OpenAI’s Assistant API (function calling, Code Interpreter, etc.) inside AutoGen. This agent can use multiple OpenAI tools like the Code Interpreter and file search in a single conversation, effectively combining various capabilities in one agent. So AutoGen supports both agents that use tools by writing code and agents that use tools via function calling APIs.
Architecture & Workflow: AutoGen is structured around multi-agent conversation as the core abstraction. Key architectural points:
- Message Passing: Agents communicate by sending messages to each other. The framework handles delivering a message from one agent to the others in a defined order or pattern (like round-robin or a directed sequence). Each agent, upon receiving a message, decides how to act on it – e.g., an AssistantAgent might generate a reply using an LLM, or a UserProxyAgent might pause for human input.
- Conversations & Teams: You typically instantiate multiple agents and then start a conversation session among them. For example, a common pattern is a two-agent loop: an AssistantAgent and a UserProxyAgent exchange messages until a task is done. But you can also have group chats with more agents. AutoGen provides classes like RoundRobinGroupChat to facilitate a team of N agents working together, taking turns speaking.
- Task Completion: The conversation can be set to terminate when a certain condition is met (like one agent says "DONE" or a specific message is produced). AutoGen allows for termination conditions to be specified (e.g., the TextMentionTermination in the web browsing example triggers when an agent sees "exit" in input).
- Layers: Under the hood, AutoGen has a layered design: The Core handles generic message passing and event loops (supporting both local and distributed settings, and even cross-language interactions like .NET integration). The AgentChat API is a simpler interface on top for common multi-agent patterns – this is what most users interact with (e.g., creating AssistantAgent, etc.). The Extensions API covers things like integrating specific model backends (OpenAI, Azure OpenAI, HuggingFace, etc.) and adding capabilities like code execution or retrieval augmentation.
- Memory: Each agent can have its own memory of the conversation (scratchpad). In collaborative modes, they might share a global scratchpad or at least see each other’s messages . In some AutoGen modes, agents share all messages (collaboration mode), whereas in others they might only see summarized results from each other (supervisor mode). The memory and message visibility can be configured depending on whether you want agents to have full context or operate more independently.
Agents communicate via an event-driven chat system. The ConversableAgent base defines how agents exchange messages. Specialized agents like the AssistantAgent and UserProxyAgent inherit this, each with different behaviors (e.g., the AssistantAgent uses an LLM to auto-reply, while the UserProxyAgent may wait for human input or execute code). A GroupChatManager or similar orchestrator coordinates the turn-taking among agents in a multi-agent conversation.
Tool Use and Function Execution: AutoGen provides two main ways for agents to perform actions:
- Code Executors: Agents (especially AssistantAgent) can output a code block (in a specified syntax) as part of their message. AutoGen can catch that and execute it. For instance, if an agent says: "python", the framework’s CodeExecutor will run that Python code and feed the output back into the conversation. This effectively allows the agent to write its own tool on the fly (limited by what the environment provides). It’s powerful for tasks like math calculations, calling external APIs (with appropriate libraries), etc. AutoGen’s tutorial on Code Executors shows how giving the agent the ability to run Python can significantly enhance its problem-solving scope.
- Pre-Defined Tools (Function Calls): If you want more control than arbitrary code, AutoGen supports defining tools as functions similar to OpenAI’s function calling. The "Tool Use" tutorial in AutoGen demonstrates how to create tools that an agent can call instead of writing code. Tools in AutoGen are essentially "safe" functions the agent can invoke when needed. The framework likely integrates with OpenAI’s function calling API under the hood for LLMs that support it. For example, you could provide a search_web(query) tool to the agent, and the agent can decide to call search_web instead of trying to scrape via code. This agent brings OpenAI’s native toolset (like uploading files, executing code in a sandbox, etc.) into AutoGen’s multi-agent world, which can be very powerful (e.g., one can imagine an AssistantAgent using the Code Interpreter to analyze data and then messaging a UserProxyAgent with results).
Multi-Agent Conversation: A canonical AutoGen example is the "Assistant & UserProxy" loop, where the assistant solves a task and the user agent can provide feedback or additional prompts.
In a more complex scenario, you might have multiple AssistantAgents each with their own specialty (one could be a "CoderAgent" that only writes code, another a "ReviewerAgent" that checks the code). They could message each other or work in a round-robin facilitated by a GroupChat or by sending direct messages via agent.send(message, to=other_agent) in AutoGen’s API.
Use Cases & Strengths: AutoGen is particularly well-suited for:
- Complex Problem Solving via Debate/Collaboration: It allows multiple agents to bounce ideas off each other. Microsoft’s research has used AutoGen for scenarios like coding assistants that verify each other’s output, or agents that role-play interviewer/interviewee to refine answers.
- Automating Multi-Step Workflows: Because agents can call tools and write code, AutoGen can automate things like browsing the web for information, extracting data, running analysis, and compiling a report – all through an agent conversation. It’s like orchestrating an entire pipeline with AI where each step is done by a specialist agent.
- Human-in-the-Loop Hybrid: The UserProxyAgent makes it easy to insert human feedback at any point. You can have a human oversee a multi-agent discussion and step in when needed (for example, to approve a final action or provide additional context the AI lacks).
- Open-Ended Research: The framework’s emphasis on conversation means it’s good for exploratory tasks. Agents can ask each other questions. E.g., a "Researcher" agent could ask a "Calculator" agent for a computation, or a "Planner" agent could query a "KnowledgeBase" agent for facts.
AutoGen’s design aims to maximize LLM performance and overcome single-agent limitations by ensembling. By having agents reflect or critique each other, you mitigate errors. And by integrating tools and code execution, you cover the weaknesses of plain LLMs (like math and real-time data). It’s a fairly mature project with an active ecosystem:
- AutoGen Studio: A no-code GUI to design and run multi-agent workflows.
- AutoGen Bench: A benchmarking suite for evaluating multi-agent performance.
- Cross-language Support: Notably, there’s a .NET version, so it’s not just Python-only.
- Use with Various Models: It’s not tied to OpenAI. You can use local models (via HuggingFace or Ollama), Azure OpenAI, etc.
In essence, Microsoft’s AutoGen is a general platform for multi-agent conversations – think of it as creating a team of AI agents that chat with each other (and optionally with humans and tools) to solve problems. If your use case requires sophisticated reasoning that might benefit from multiple perspectives or a chain-of-thought that’s too complex for one agent, AutoGen is a strong candidate. It does require careful design of prompts for each agent and possibly more compute (multiple LLM calls), but it can yield impressive results on hard tasks by leveraging the power of collaboration.
5. LlamaIndex Agents & Workflows
Agent & Tool Definitions: LlamaIndex, known for its document indexing and retrieval capabilities, also offers an agent framework to create data-centric AI agents. In LlamaIndex terms, an agent is an automated decision engine that can use an LLM, memory, and a suite of tools to handle an input query. The design is similar to LangChain-style agents: the agent parses the user’s request, plans a series of actions (like breaking down sub-questions or choosing which tool to call), executes those actions, and finally produces a result. A tool in LlamaIndex is typically a function that can either interact with data (e.g., query a vector index or database) or perform some operation (like a calculator, API call, etc.). LlamaIndex provides many built-in tools, especially for data retrieval:
- QueryEngineTool: Wraps any LlamaIndex Query Engine (which could query a corpus or vector store) as a tool.
- FunctionTool: Wraps arbitrary Python functions into tools (similar to OpenAI or LangChain function tools).
- ToolSpecs: LlamaIndex has the concept of ToolSpecs which are collections of related tools (often corresponding to an external service integration). For example, a GmailToolSpec might provide send_email, read_email, etc., and you can install it via pip install llama-index-tools-google and load those tools easily.
- Utility Tools: It also includes "utility tools" for common patterns like caching results from an API call into a temporary index (to avoid context overload) – e.g., an OnDemandLoaderTool that can fetch data (like from Wikipedia) and automatically index + query it on the fly.
Architecture & Planning: LlamaIndex offers two levels of working with agents:
- Prebuilt Agents: These are higher-level classes (like FunctionAgent) that already implement a certain agent loop using LLM with function calling. If you use a prebuilt agent, you mainly configure it with an LLM, a set of tools, and perhaps memory, and then just call it with queries. The agent will handle planning – using either a ReAct style or a planner-executor style depending on the agent type.
- Custom Workflows: For maximum control, LlamaIndex introduced a Workflow abstraction. You can define custom agent loops by specifying events and steps. Essentially, you can recreate the agent’s loop logic step by step, which is useful if the built-in strategies don’t fit your needs. The workflow ensures type-checked and well-structured interactions between those steps.
In a typical FunctionAgent (a generic agent with tools) scenario, the loop looks like:
- Receive user query (and perhaps use memory to append conversation history).
- Call the LLM with the query plus available tools. LlamaIndex’s FunctionCallingLLM wrapper will let the LLM output a function call if it decides to.
- Parse the LLM’s output. If it included tool usage (ToolSelection), trigger a ToolCallEvent which executes the tool and captures the result.
- Feed the tool result back to the LLM (this might be done by another LLM call or by continuing the same call if streaming).
- Repeat if the LLM decides multiple tools are needed. Once no more tool calls are requested, the LLM’s final answer is returned.
LlamaIndex’s agents often employ a plan-and-execute approach when dealing with complex queries. For example, an "Agentic RAG" (Retrieval-Augmented Generation) use case might have the agent break a user’s complex question into sub-questions, query a knowledge base for each, then synthesize the answers. The agent’s internal reasoning chain (the "scratchpad") and the tools it picks are all influenced by the LLM’s prompts, which LlamaIndex helps manage.
Memory and State: Being integrated with LlamaIndex, agents can use vector stores or knowledge graphs as long-term memory. Also, a ChatMemoryBuffer is available for short-term memory (chat history) which can be plugged into an agent or workflow. The memory ensures the agent’s decisions consider prior conversation or prior tool outcomes, crucial for multi-turn interactions.
Tool Invocation: Using tools in LlamaIndex is straightforward:
- If you use FunctionAgent, you pass a list of tool instances (like tools=[tool1, tool2]). The agent automatically knows how to call them because LlamaIndex sets up the function schemas (similar to OpenAI’s function calling).
- If you go low-level with workflows, you’d manually detect ToolSelection objects from the LLM’s output and then call the corresponding tool (the docs snippet we saw defines ToolCallEvent and FunctionOutputEvent for that purpose).
- The framework has a lot of pre-integrations. The LlamaHub library offers hundreds of ready tools (for Google Suite, Slack, databases, etc.). This means you can quickly empower an agent to use, say, Google Calendar or a SQL database by installing a plugin and adding a ToolSpec.
Creating an Agent: Here’s how you might create an agent that can use a Wikipedia search tool and a calculator:
The agent might use the Wikipedia tool to look up US presidents in 1850 (Millard Fillmore became President in 1850), and use the Calculator tool to compute 57, then compose an answer.
Unique Strengths & Use Cases: LlamaIndex agents are particularly strong for data-heavy or knowledge-centric tasks:
- Agentic RAG (Retrieval-Augmented Generation): This is a scenario where an agent not only answers from data but figures out how to gather the data. The agent can decide which database or index to query, how to chunk questions, etc., rather than just doing a single retrieval. If your problem is complex (like "Research the impact of climate change on crop yields and draft a report"), a LlamaIndex agent could break it down: find relevant documents (using its tools), extract info, and then generate a report.
- Tool-Rich Ecosystem: Thanks to LlamaHub, if you need an agent that can operate on structured data or external systems (e.g., read from a CSV, query a SQL database, call an API like weather or maps), chances are LlamaIndex already has a reader or tool integration for it. This reduces the friction in equipping your agent with many capabilities.
- Workflow Customization: Advanced users can design bespoke agent loops that incorporate things like streaming output (maybe you want partial results to stream to the user while the agent is still working), or concurrent tool calls. This flexibility means you can optimize for performance or reliability as needed (for instance, validating the agent’s final answer using a second pass).
- Integration with Indexes: Obviously, if you’re already using LlamaIndex for knowledge bases, using its agent framework keeps everything in one system. The agent can treat indexes and query engines as first-class tools, leading to very tight integration between retrieval and reasoning.
In summary, LlamaIndex extends the concept of agents to be robust data assistants. It merges retrieval and tool use seamlessly. If your primary goal is an AI that can intelligently navigate and query large data sources (documents, databases, APIs) and compose results, LlamaIndex’s agents are a natural choice. They may require some careful setup (ensuring your indexes are built and tools are configured), but the payoff is an agent that truly knows how to use tools to find and synthesize information rather than relying on an LLM’s parametric knowledge alone.
6. Amazon Agent Squad (Multi-Agent Orchestrator by AWS)
Agent & Tool Definitions: Agent Squad is AWS’s open-source framework for managing multiple specialized agents in complex conversations. In Agent Squad, an agent typically corresponds to a particular AI backend or skill – for example, an agent could be an LLM hosted on Amazon Bedrock (like Jurassic-2 or Anthropic Claude), an Amazon Lex bot, a custom Lambda function, etc. They provide several built-in agent types like:
- Bedrock LLM Agent: connects to an LLM on Amazon Bedrock with a given prompt (useful for general conversation or knowledge tasks).
- Lex Bot Agent: connects to a pre-built Lex conversational bot (which might have its own intents/slots defined).
- Lambda Agent: uses an AWS Lambda function as an agent – essentially letting you execute code as a response (for custom logic or tool-like behavior).
- OpenAI/Anthropic Agent: connects to OpenAI or Anthropic models similarly.
- Chain Agent: can chain multiple calls internally (representing an agent that itself might use a sequence of prompts/tools).
- Supervisor Agent: a special agent whose "skill" is orchestrating other agents (more on this shortly).
A tool in Agent Squad is modeled via an AgentTool abstraction. Unlike some frameworks where tools are separate from agents, Agent Squad tends to encapsulate tools as part of an agent’s capabilities. For example, you can have a Bedrock LLM Agent with tools – meaning an LLM that also has function calling abilities to use external tools. The framework’s AgentTool system allows you to define a function with input schema (they parse Python type hints to JSON Schema, similar to OpenAI) and attach it to an agent. The agent’s underlying model (like an OpenAI function-calling model or Anthropic with some tool plugin) will then be able to call that function. In practice:
- If using an OpenAI agent, tools would be implemented via OpenAI function calling (since they mention OpenAI support is coming soon for the tools component).
- If using a Bedrock LLM (like AWS Titan or Claude on Bedrock), the framework might translate the AgentTool into the format that those models require for tool usage (Bedrock’s API now supports function calls for some models as well).
- They also mention agent-as-tools architecture, i.e., using one agent as a tool for another. That’s exactly what the SupervisorAgent leverages.
Orchestration Architecture: Agent Squad introduces a two-layer orchestration:
- Classifier (Router): When a new user query comes in, a classifier first decides which agent (or agents) should handle it. For example, if the user asks a math question vs. a travel question, the classifier routes to the Math Agent vs. Travel Agent. They provide built-in classifiers (BedrockClassifier, OpenAIClassifier, etc.) which likely use an LLM to pick the appropriate agent based on query content. This is essentially an intent recognition step.
- Orchestrator (Supervisor): The orchestrator manages the conversation state and delegation. The new SupervisorAgent is essentially an agent that can take a complex query and split it among multiple agents in parallel, then aggregate the responses. It treats other agents as its "tools" – delegating sub-queries to them and maintaining the overall context (conversation memory) so that the final answer is coherent.
Agent Squad’s approach allows for both sequential and parallel agent interactions:
- In simple cases, the classifier might just choose one agent to respond, and that agent handles it (potentially using its own tools). For example, a WeatherAgent might answer a weather query by calling a weather API tool itself.
- In complex cases, the SupervisorAgent can break a user request into parts handled by different agents concurrently. For instance, consider a user asks: "Plan a weekend trip to Paris and tell me if I’ll need an umbrella." The SupervisorAgent could delegate the travel planning part to a TravelAgent and the weather part to a WeatherAgent at the same time, then combine the results.
Memory and Context: The framework emphasizes context management across multiple agents. They maintain conversation history that is shared appropriately:
- User–Lead Agent memory: memory of interaction between the user and the orchestrator (lead agent).
- Lead Agent–Team memory: shared context that the lead agent (supervisor) maintains to pass relevant info to and from the specialized agents.
The conversation storage is pluggable (in-memory, DynamoDB, SQL, etc.), so that you can scale this system or keep persistent context over sessions.
Agent Squad’s multi-agent orchestration: A Lead Agent (Supervisor) receives user input and delegates parts of the query to a Team of specialized agents (Agent A, B, C). The lead agent and team agents communicate to exchange sub-task results. A shared memory is maintained: one between the user and lead for global context, and one between lead and team for coordination. This ensures the final response to the user is coherent and incorporates contributions from all relevant agents.
Tool Integration: Defining tools (AgentTool) in Agent Squad is straightforward and quite similar to OpenAI’s approach:
This would automatically convert get_weather into a tool with a JSON schema (under the hood, using the type hints and docstring). You can then add this weather_tool to a Bedrock LLM Agent or OpenAI Agent. When that agent runs, its model knows about the tool (likely via a prompt injection that the Agent Squad orchestrator does), and it can decide to call weather_tool as needed. Similarly, you could add a calculate tool, etc. They also support format conversion utilities and grouping multiple tools into an AgentTools collection for convenience.
Quick Orchestrator Setup: The Quickstart for Agent Squad indicates something like:
If using the SupervisorAgent specifically, one might include it as an agent (e.g., define SupervisorAgent with team members). However, the framework might also handle it implicitly if multiple agents need to act (the documentation suggests you can integrate the SupervisorAgent into the classifier for hierarchical routing).
Use Cases & Strengths: Agent Squad (and AWS’s multi-agent approach) is tailored for enterprise and production environments:
- Specialized Agents per Domain: It encourages breaking problems into domains – e.g., a company might have a FinanceBot, a TechSupportBot, and a HRBot, and Agent Squad can route customer inquiries to the right one. The team orchestration means if a query spans domains, they can still coordinate.
- Scalability & Deployment: It’s built by AWS, so naturally it’s meant to deploy on AWS infrastructure (Lambda, etc.) and scale. Dual language (Python/TypeScript) support means it can be integrated into various platforms easily. The design allows running on serverless or persistent servers without much change.
- Memory & Continuity: With conversation storage backends (like DynamoDB) you can maintain long-running chat sessions with multiple agents reliably, which is important for real applications (e.g., a long customer service session that involves different expert bots).
- Extensibility: You can add new agent types or integrate new AI providers. For example, they have a "Bedrock Flows Agent" to use Bedrock’s new Agents feature, an "OpenAI Agent", etc. You could conceivably integrate other tools or even non-LLM agents (like a symbolic reasoner) as long as it conforms to the interface.
- Parallel and Hierarchical Reasoning: The SupervisorAgent introduction is a standout feature – it’s particularly useful when multiple agents need to work at the same time. This can reduce latency (compared to an agent calling another sequentially) and also structure the problem (the lead agent can break a task into independent parts). Use cases like a complex planning (travel + weather + booking), or a multi-turn dialog where some agents handle sub-conversations (imagine a meeting scheduling assistant that consults separate calendar agents for each attendee in parallel) could benefit from this.
- Bedrock Integration: If you use Amazon Bedrock (AWS’s managed LLM service), Agent Squad is a natural choice as it provides native connectors for Bedrock models and features (like the Knowledge Base retriever, etc.) . Also, AWS is positioning Bedrock Agents (its own multi-agent managed service) for enterprises; using Agent Squad might be a way to prototype those solutions in open source.
In summary, Amazon’s Agent Squad is about coordinating multiple AI agents with enterprise-grade support. It may have a bit more setup overhead compared to a simpler single-agent system, but it’s powerful for building AI assistants that need to handle a wide array of tasks, possibly concurrently, with reliability and maintainable structure. It aligns well with use cases in customer support automation, virtual assistants with many skills, or any scenario where dividing the problem among different AI "experts" leads to better results.
Conclusion: Choosing the Right Agent Framework
AI agents are rapidly transforming from simple chatbots into complex autonomous collaborators. Each framework we reviewed approaches the challenge from a slightly different angle:
- LangGraph: Best when you need full control over agent reasoning and state. It’s like the "power steering" for LangChain – ideal for custom workflows (e.g., complex conditional logic, multi-step processes with memory) that must be reliable and debuggable. Use LangGraph for production scenarios where the agent’s decision path needs to be transparent and fine-tuned (such as financial report generation with multiple validation steps, or any mission-critical application where you want to avoid the "black-box" feeling of an end-to-end LLM agent). It may require more upfront design, but yields a very robust agent solution.
- crewAI: Great for coordinating multiple LLM "experts" in a structured way. If your use case naturally breaks down into roles (think of a virtual team solving a project), crewAI provides a blueprint. It shines in multi-step workflows like product development simulations, multi-perspective analysis (e.g., an agent lawyer + agent doctor discussing a case), or educational tools where agents with different expertise teach or challenge a student. crewAI’s built-in processes (sequential, hierarchical) reduce the complexity of building such systems from scratch. Choose crewAI when the interplay between agents is as important as their individual reasoning, and you want a clear scaffold (tasks, processes) to manage that interplay.
- OpenAI Function Calling / SDK: The go-to choice for quickly empowering an assistant with tools. If you are building a chatbot that needs to occasionally fetch info, run calculations, or execute transactions, using OpenAI’s native function calling is often the fastest path to a solution. The OpenAI Agents SDK adds conveniences for development, but even without it, function calling is powerful yet simple. It’s especially suitable for scenarios like: a customer support bot that uses a database lookup function, a personal assistant that integrates with your calendar and email through functions, or a coding helper that can run code. Basically, for single-agent scenarios where an LLM needs some extra "hands", this approach is hard to beat for ease of use and reliability – provided you don’t need highly complex multi-step plans beyond what the model can handle in one or two function calls.
- AutoGen (Microsoft): A top choice for advanced multi-agent reasoning or creative problem solving. AutoGen enables agents to engage in dialogues with each other – which can lead to emergent solutions (two agents can uncover ideas neither might alone). Consider AutoGen for research assistants that need to argue pros and cons, code generation where one agent writes code and another tests it, or any use case that benefits from a "self-checking" mechanism (one agent generates, another validates). AutoGen’s ability to integrate tool use via code execution means it’s incredibly flexible, but that also means you need to carefully manage prompts to avoid chaos. It’s a good fit when you’re experimenting at the cutting edge of what AI agents can do (like autonomous research pipelines) and are comfortable guiding a multi-LLM system. It may be overkill for straightforward tasks and requires more compute resources (multiple LLM calls), but it can tackle problems no single agent might solve alone.
- LlamaIndex Agents: Ideal for data-intensive applications and retrieval augmented generation. If your agent’s primary job is to fetch and synthesize information from various sources (documents, databases, APIs), LlamaIndex provides the infrastructure to do that effectively. Use it for building intelligent search assistants, enterprise Q&A bots that draw from multiple data silos, or report generators that need to pull in live data and citations. The framework’s focus on tools for data (like the OnDemandLoader for injecting external info mid-conversation) is a strong advantage. Also, if you want to start simple with a Q&A bot that can do a couple of actions (like look up in an index and answer math), a LlamaIndex FunctionAgent is straightforward. As your needs grow (multiple indices, complex flows), you can incrementally adopt custom workflows. In short, pick LlamaIndex when knowledge integration is your main challenge and you want your agent to be very competent at retrieving and using external information.
- Amazon Agent Squad: Tailored for scalable multi-agent deployments in production, especially in the AWS ecosystem. If you need an agent solution that spans multiple AI services (e.g., an assistant that uses both an LLM and a voice bot, or different model brands for different tasks) and you want to run it reliably on cloud infrastructure, Agent Squad is a strong candidate. It’s well-suited for enterprise virtual assistants – for example, a customer support system where one agent handles billing queries (with a connection to billing database), another handles technical issues (with access to a knowledge base), and a supervisor coordinates seamlessly. The ability to do parallel agent execution is a differentiator – for complex queries that touch multiple domains, Agent Squad can yield answers faster by not running things strictly serially. It’s also a good match if you plan to use AWS Bedrock’s managed agents or need a solution that integrates with AWS services for monitoring, logging, etc. Keep in mind that it’s newer and you’ll want a team with cloud devops expertise to maximize it. But for enterprises already on AWS, it can reduce the friction to implement sophisticated AI agent workflows with the compliance and scalability they require.
In practice, these frameworks are not mutually exclusive. We’re likely to see hybrid approaches – for instance, using OpenAI’s function calling within an AutoGen multi-agent setup, or employing LlamaIndex for retrieval inside a LangGraph workflow. The right choice often depends on the specific problem and constraints:
- Need quick results with minimal coding? OpenAI’s SDK or LlamaIndex’s prebuilt agents are great.
- Need fine-grained control and custom logic? LangGraph or a custom LlamaIndex workflow.
- Need multiple agents to collaborate? AutoGen or crewAI (or Agent Squad if in AWS and needing production scale).
As the field evolves, documentation and best practices are improving. It’s wise to start from the official guides and example repositories to get a feel for each framework’s paradigm. From there, consider the complexity of your task, the reliability needed, and the ecosystem you’re operating in.
In conclusion, AI agents are becoming powerful tools in the developer’s arsenal, and these frameworks are accelerating the journey from LLMs-as-chatbots to LLMs-as-autonomous task solvers. Whether you’re building an AI to book vacations, debug code, answer customer queries, or write research papers, there’s likely an agent framework suited to your needs. By understanding their differences – in how they define agents and tools, how they plan actions, and how much they let you customize – you can pick the one that will act as the smartest, most reliable “AI teammate” for your particular project.
Python & Gen AI Engineer | API Integrations | AI Agent Development | Open-Source LLMs
1moI like your image. I think you are a good prompt engineer
Product leader @ Ockam. Previously product @ Terraform (HashiCorp), AWS, Heroku. Also early stage startup investment.
2moI love the image you used. I assume it too is AI generated, do you know the prompt and model that was used?
Executive Chairman - SnapSoft | Board Member - Big Brothers Big Sisters | Former Amazon, Intuit, Sage
2moGreat repost, Skye Hart —thanks for surfacing this. The shift from static assistants to true agentic systems is one of the most exciting frontiers in AI right now. The ability to reason, plan, and take action via tools unlocks use cases we’re only beginning to imagine. Looking forward to diving into the framework comparisons.
ML - GenAI - Lead Solution Architect - Amazon Bedrock
2moWould be great to get your take on Strands Agents… https://strandsagents.com/0.1.x/