Section V.3: Agent Frameworks: LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, and More | Building Conversational AI with LLMs and Agents

Big Picture

Agent frameworks enable LLMs to take actions, use tools, plan multi-step workflows, and collaborate with other agents. The seven frameworks compared here represent three distinct architecture patterns: graph-based (LangGraph), role-based multi-agent (CrewAI, AutoGen), and code-first (OpenAI Agents SDK, Semantic Kernel, smolagents, PydanticAI). Each pattern optimizes for different trade-offs between control, simplicity, and multi-agent coordination.

1. Architecture Patterns

Agent frameworks differ fundamentally in how they model agent behavior and control flow. Understanding the three dominant patterns helps you narrow the field before evaluating individual frameworks.

1.1 Graph-Based Architecture (LangGraph)

In a graph-based architecture, you define agent behavior as a state machine where nodes represent actions (LLM calls, tool use, human review) and edges represent transitions. The agent traverses the graph based on the current state and the LLM's decisions. This pattern provides maximum control over execution flow, supports cycles (the agent can loop back to previous steps), and makes complex workflows explicit and debuggable.

The following example illustrates a LangGraph agent with a tool-calling loop and a human-in-the-loop approval step.

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

llm = ChatOpenAI(model="gpt-4o").bind_tools([search_tool, calculator_tool])

def call_model(state: AgentState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if last.tool_calls:
        return "tools"
    return END

graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.add_node("tools", ToolNode([search_tool, calculator_tool]))
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")

app = graph.compile()
result = app.invoke({"messages": [HumanMessage("What is 15% of the GDP of France?")]})

1.2 Role-Based Multi-Agent Architecture (CrewAI, AutoGen)

Role-based frameworks model agents as team members with defined roles, goals, and capabilities. Instead of programming control flow explicitly, you define agents and tasks, and the framework orchestrates their collaboration. This pattern excels at multi-agent scenarios where agents with different specializations need to cooperate.

The following CrewAI example shows a research team with two specialized agents collaborating on a report.

from crewai import Agent, Task, Crew, LLM

llm = LLM(model="gpt-4o")

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive data on the given topic",
    backstory="You are an expert researcher with 20 years of experience.",
    tools=[search_tool, web_scraper],
    llm=llm,
)

writer = Agent(
    role="Technical Writer",
    goal="Create a clear, well-structured report from research findings",
    backstory="You specialize in making complex topics accessible.",
    llm=llm,
)

research_task = Task(
    description="Research the current state of LLM inference optimization.",
    expected_output="A detailed list of findings with sources.",
    agent=researcher,
)

writing_task = Task(
    description="Write a 1000-word report based on the research findings.",
    expected_output="A polished report in markdown format.",
    agent=writer,
)

crew = Crew(agents=[researcher, writer], tasks=[research_task, writing_task], verbose=True)
result = crew.kickoff()

1.3 Code-First Architecture (OpenAI Agents SDK, smolagents, PydanticAI)

Code-first frameworks provide minimal abstractions, giving you an agent loop with tool calling built on standard Python patterns. These frameworks prioritize simplicity and transparency over sophisticated orchestration features. They are ideal when you want full control over agent behavior without learning a framework-specific DSL.

The following example uses the OpenAI Agents SDK's straightforward agent definition.

from agents import Agent, Runner, function_tool

@function_tool
def search(query: str) -> str:
    """Search the web for information."""
    return web_search(query)

@function_tool
def calculate(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return eval(expression)

agent = Agent(
    name="Research Assistant",
    instructions="You help users find information and perform calculations.",
    tools=[search, calculate],
    model="gpt-4o",
)

result = Runner.run_sync(agent, "What is 15% of the GDP of France?")

2. Comprehensive Feature Comparison

The following table compares all seven agent frameworks across key dimensions relevant to production agent development.

Agent Frameworks: Comprehensive Feature Comparison

Feature	LangGraph	CrewAI	AutoGen	OpenAI Agents SDK	Semantic Kernel	smolagents	PydanticAI
Architecture	Graph/state machine	Role-based crew	Conversation-based	Code-first loop	Plugin/planner	Code-first minimal	Type-safe agents
Maintainer	LangChain Inc.	CrewAI Inc.	Microsoft	OpenAI	Microsoft	Hugging Face	Pydantic team
Language	Python, JS/TS	Python	Python, .NET	Python	Python, C#, Java	Python	Python
Multi-agent support	Via sub-graphs	Native (crews)	Native (groups)	Via handoffs	Via planners	Basic	Manual composition
Tool calling	Full (any provider)	Full (decorator)	Full (function map)	Full (decorator)	Full (plugins)	Full (decorator)	Full (Pydantic)
Human-in-the-loop	Native (interrupt)	Built-in	Built-in	Manual	Approval hooks	Manual	Manual
State persistence	Checkpointers	Memory system	Conversation store	None built-in	Memory stores	None built-in	None built-in
Streaming	Full	Event-based	Full	Full	Full	Basic	Full
LLM provider lock-in	None (any provider)	None (litellm)	None (configurable)	OpenAI only	None (connectors)	None (any provider)	None (any provider)
Observability	LangSmith native	CrewAI+ dashboard	AutoGen Studio	OpenAI dashboard	OpenTelemetry	Basic logging	Logfire native
License	MIT	MIT	MIT (CC-BY-4.0 docs)	MIT	MIT	Apache 2.0	MIT
GitHub stars (approx.)	15k+	25k+	38k+	15k+	24k+	15k+	8k+

3. Multi-Agent Patterns

Multi-agent systems represent one of the fastest-growing areas in LLM development. The frameworks differ significantly in how they support agent-to-agent communication and coordination.

3.1 Supervisor Pattern

In the supervisor pattern, one agent coordinates others by deciding which sub-agent to invoke for each step. LangGraph implements this naturally through conditional edges, where a supervisor node routes to specialized agent sub-graphs. The OpenAI Agents SDK supports this through the handoff mechanism, where one agent explicitly transfers control to another.

3.2 Collaborative Pattern

In the collaborative pattern, agents communicate as peers without a central coordinator. CrewAI implements this through its crew abstraction, where agents pass task outputs to the next agent in a sequence or collaborate in parallel. AutoGen uses group chat, where agents take turns responding to a shared conversation thread.

3.3 Hierarchical Pattern

The hierarchical pattern organizes agents in a tree structure where higher-level agents delegate to lower-level specialists. CrewAI supports this with its hierarchical process mode. AutoGen supports it through nested group chats. LangGraph supports it through nested sub-graphs with parent-child state passing.

Key Insight

Multi-agent systems add complexity that is rarely justified for simple tasks. A single agent with multiple tools often outperforms a multi-agent system on straightforward workflows, with lower latency and easier debugging. Reserve multi-agent patterns for genuinely complex workflows where different steps require different expertise, different LLM configurations, or different trust boundaries. Start with a single agent and add agents only when you hit the limits of the single-agent approach.

4. Production Readiness

Agent frameworks vary widely in their production readiness. The following assessment focuses on features that matter when running agents in production environments with real users.

Agent Frameworks: Production Readiness

Production Feature	LangGraph	CrewAI	AutoGen	OpenAI Agents SDK	Semantic Kernel
State recovery after failure	Checkpointers (Redis, SQL)	Memory persistence	Conversation replay	Manual	Memory stores
Timeout and retry handling	Built-in	Built-in	Configurable	Built-in	Built-in
Cost control (token budgets)	Via callbacks	Built-in budgets	Token counting	Via API settings	Via filters
Guardrails integration	Custom nodes	Guardrails config	Custom agents	Native guardrails	Filters
Deployment platform	LangGraph Cloud	CrewAI Enterprise	AutoGen Studio	OpenAI platform	Azure AI
Long-running task support	Native (async nodes)	Background tasks	Async groups	Async runner	Step-based

5. When to Use Each Framework

The following decision table provides concrete recommendations based on common project requirements and team characteristics.

Agent Framework Selection Guide

If you need...	Best Fit	Runner-Up	Rationale
Maximum control over agent flow	LangGraph	Semantic Kernel	Graph-based architecture makes every state transition explicit
Quick multi-agent prototype	CrewAI	AutoGen	Role-based definition is intuitive; minimal boilerplate
Enterprise .NET/Java ecosystem	Semantic Kernel	AutoGen (.NET)	Microsoft backing; native C# and Java SDKs; Azure integration
OpenAI-only deployment	OpenAI Agents SDK	LangGraph	Tightest integration with OpenAI models and platform
Minimal dependencies	smolagents	PydanticAI	Lightest footprint; no heavy framework overhead
Type-safe structured outputs	PydanticAI	Semantic Kernel	Built on Pydantic; native structured output validation
Research agent with code execution	AutoGen	CrewAI	Built-in code executor; designed for code-writing agents
Production multi-agent with state	LangGraph	CrewAI	Checkpointing and state recovery for long-running agents
Open-source model flexibility	smolagents	LangGraph	Hugging Face ecosystem; works with any model on the Hub

Note

The agent framework landscape is evolving faster than any other LLM tooling category. New frameworks appear monthly, and existing frameworks add features rapidly. The architectural patterns (graph, role-based, code-first) are more stable than specific framework features. Choose based on architecture fit first, then evaluate features within your preferred architecture pattern.

6. Integration and Interoperability

Agent frameworks do not exist in isolation. They connect to orchestration layers (Section V.2), evaluation tools (Section V.4), and serving infrastructure (Section V.5). Key integration points to evaluate include:

Tool protocol: LangGraph and CrewAI use different tool-calling interfaces. Ensure your tools can be shared across frameworks if you are evaluating multiple options.
Observability hooks: LangGraph integrates natively with LangSmith. CrewAI has its own telemetry. Semantic Kernel uses OpenTelemetry. Verify that your chosen observability tool (Section V.4) can capture traces from your agent framework.
Model Context Protocol (MCP): MCP is emerging as a standard protocol for connecting agents to external tools and data sources. LangGraph, OpenAI Agents SDK, and smolagents all support MCP clients, enabling agents to connect to any MCP-compliant tool server.
Deployment model: Some frameworks (LangGraph Cloud, CrewAI Enterprise) offer managed deployment. Others require you to build your own deployment infrastructure. Match the deployment model to your operational capabilities.

Summary

Agent frameworks divide into three architecture patterns: graph-based (LangGraph) for maximum control, role-based (CrewAI, AutoGen) for intuitive multi-agent collaboration, and code-first (OpenAI Agents SDK, smolagents, PydanticAI) for simplicity and transparency. Semantic Kernel bridges the enterprise world with multi-language support and Azure integration. For production systems requiring state persistence, failure recovery, and human-in-the-loop approval, LangGraph and Semantic Kernel offer the most mature feature sets. For rapid prototyping of multi-agent systems, CrewAI provides the fastest path. For minimal-dependency single-agent use cases, smolagents or PydanticAI keep your stack lean.