Framework Landscape

Section 28.1

"Choosing a framework is easy. Living with that choice for two years is the hard part."

CensusCensus, Framework-Fatigued AI Agent
Big Picture

The agent framework you choose shapes every architectural decision that follows. By 2026 the landscape includes production-grade options from major cloud providers, well-funded startups, and active open-source communities, each making different tradeoffs between simplicity and control. LangGraph offers graph-based orchestration with fine-grained state management. The OpenAI Agents SDK provides a minimal, opinionated interface tied to OpenAI models. CrewAI and AutoGen prioritize multi-agent collaboration patterns. Understanding these tradeoffs is essential for selecting the right foundation, and this section compares them head-to-head with runnable examples of the same agent built in three frameworks. The agent foundations from Chapter 26 and the tool use protocols from Chapter 27 are prerequisites for all frameworks covered here.

Prerequisites

This section builds on tool use and protocols from Chapter 27 and agent foundations from Chapter 26.

A cartoon orchestra of friendly robots each playing a different instrument, coordinated by a conductor robot on a podium, with musical notes forming a harmonious pattern in the air
Figure 28.1.1: Multi-agent orchestration resembles a robot orchestra: specialized agents each play their part, while a conductor coordinates the ensemble into a coherent performance.
See Also

For a hands-on LangChain and LangGraph tutorial with runnable examples, see Appendix J: LangChain.

28.1.1 The Framework Landscape in 2026

Fun Fact

The agent-framework landscape in 2026 is unusual in that the dominant production framework, LangGraph, started life as an opinionated criticism of LangChain (whose creators also wrote LangGraph). The successor inherited the brand recognition and shed the criticism, an act of organizational self-correction that almost never happens in open-source.

Real-World Scenario: 2026 Snapshot: Claude Agent SDK

Anthropic's Claude Agent SDK (anthropic-ai/claude-agent-sdk) is the code-first counterpart to OpenAI Agents SDK. Default to Claude models, built-in tool calling, multi-agent handoffs, and integrated tracing. Architecturally similar to OpenAI Agents SDK; the main difference is provider binding. Teams already on PydanticAI can reach both providers with a single string swap; the platform SDKs matter most when you want tighter integration with provider-specific features (extended thinking on Claude, Realtime API on OpenAI). For a vendor-neutral starting point, prefer PydanticAI or LangGraph; reach for the platform SDKs once you commit to a provider for production.

See Also

The deep treatment of the single-agent loop lives in Section 26.1. The discussion below focuses on how that loop scales across agents.

By early 2026 the multi-agent framework landscape has consolidated into a handful of production-grade choices, each with a different sweet spot of control vs. abstraction, single-agent vs. orchestrated, and vendor-neutral vs. provider-native.

LangGraph (LangChain) models agents as directed graphs where nodes are functions and edges define control flow. State is passed between nodes as a typed dictionary, and conditional edges enable branching logic. LangGraph's strength is fine-grained control: you define exactly how the agent loop works, where checkpoints are saved, and how errors are handled. Its weakness is verbosity. Simple agents require more boilerplate than higher-level frameworks.

CrewAI takes a role-based approach inspired by human teams. You define agents with roles, goals, and backstories, then assemble them into crews that execute tasks. CrewAI abstracts away the graph structure, making it fast to prototype collaborative multi-agent systems. The trade-off is less control over execution flow: you cannot easily implement custom conditional logic or fine-grained state management. CrewAI is excellent for content generation, research, and analysis workflows where the flow is relatively linear.

AutoGen/AG2 (Microsoft) focuses on multi-agent conversation. Agents communicate through structured messages, with patterns like GroupChat managing turn-taking and topic management. AutoGen excels at debate-style interactions and code review workflows where agents need to build on each other's outputs.

The rest of the landscape clusters into provider-native SDKs and cross-ecosystem libraries:

See Also

LLM multi-agent systems are one slice of a much older field, multi-agent reinforcement learning (MARL), which classifies agent collectives by the structure of their reward functions. Cooperative systems share a single reward (a fleet of warehouse robots that all win or lose together); the design problem is decomposing the joint task and coordinating without explicit communication. Competitive systems are zero-sum (board games, multiplayer shooters); rewards for one agent come at the cost of others, and the equilibrium concepts (minimax, Nash) become the optimization target. Mixed-motive systems sit in between (autonomous driving has a shared component, avoid collisions, plus an individual component, drive efficiently; automated trading lets agents both collaborate on prices and compete on personal returns). Almost every LLM multi-agent topology in this chapter is implicitly cooperative (the supervisor and its workers share the user-task reward), but the debate pattern from Section 28.2 is explicitly mixed-motive (two advocates compete; a judge integrates), which is why debate-trained ensembles converge to genuinely different perspectives instead of agreeing too readily. For the full design-dimension framework (Size, Knowledge, Observability, Rewards, Objective, Centralization) see the MARL introduction by Stone and Veloso (2000) and the recent textbook by Albrecht, Christianos, and Schäfer (MIT Press, 2024).

Key Insight

Framework choice should be driven by your control requirements, not by popularity. Match your needs to one of four common situations:

The best framework is the one that matches your team's skill set and your application's complexity, not whichever has the most GitHub stars this month.

Production Pattern
Production Example: Which Products Run on Which Framework

The framework choice maps to visible products. LangGraph powers Klarna's AI assistant and Norwegian Cruise Line's customer-service agent (both case studies on LangChain's blog) and underpins LangChain's hosted LangSmith agents. CrewAI raised $18M in 2024 and lists Deloitte, Accenture, and Stripe among its named adopters; it is popular for content-generation pipelines and research workflows. Microsoft AutoGen drives Microsoft 365 Copilot's specialist-agent extensions and is the substrate of the Magentic-One agentic web browser. Anthropic's Claude Code, Cursor's "Composer," and Replit Agent run on bespoke loops rather than these frameworks because they predate the framework consolidation, but they expose MCP servers that any of the frameworks above can call as tools.

Library Shortcut: crewai (role-based multi-agent)

crewai models a team as Agent objects with role, goal, and backstory, plus Task objects that point at the agent that should perform them. A Crew wires the tasks together with a Process (sequential or hierarchical). The framework hides the orchestration loop, so an analyst-plus-writer pipeline lives in roughly 15 lines and reads like a team brief.

Show code
pip install crewai
from crewai import Agent, Task, Crew, Process
analyst = Agent(role="Research Analyst", goal="Find key facts",
                backstory="Curious analyst", llm="gpt-4o")
writer = Agent(role="Writer", goal="Draft a concise brief",
               backstory="Crisp prose", llm="gpt-4o")
t1 = Task(description="Research {topic}", agent=analyst, expected_output="bullets")
t2 = Task(description="Write a 200-word brief", agent=writer, expected_output="markdown")
crew = Crew(agents=[analyst, writer], tasks=[t1, t2], process=Process.sequential)
result = crew.kickoff(inputs={"topic": "GRPO"})
Code Fragment 28.1.1a: CrewAI builds an analyst-then-writer pipeline with role-based agent specs.

The Same Agent in Three Frameworks

This snippet implements the same research agent in LangGraph, CrewAI, and OpenAI Swarm to compare framework APIs.

# LangGraph: Research Agent
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]
    search_results: list
    report: str
def search_node(state):
    query = state["messages"][-1].content
    results = web_search(query)
    return {"search_results": results}
def write_node(state):
    report = llm.invoke(
        f"Write a research report based on:\n{state['search_results']}"
        )
    return {"report": report.content}
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("write", write_node)
graph.set_entry_point("search")
graph.add_edge("search", "write")
graph.add_edge("write", END)
research_agent = graph.compile()
Code Fragment 28.1.2: This snippet builds a research agent as a LangGraph StateGraph with typed ResearchState, conditional routing via should_continue, and a tool_executor node. The add_conditional_edges call directs flow between the LLM agent and tool execution based on whether the response contains pending tool calls.
# CrewAI: Research Agent
from crewai import Agent, Task, Crew
researcher = Agent(
    role="Research Analyst",
    goal="Find comprehensive information on the given topic",
    tools=[web_search_tool],
    llm="gpt-4o",
)
writer = Agent(
    role="Report Writer",
    goal="Write a clear, well-structured research report",
    llm="gpt-4o",
)
search_task = Task(
    description="Research the topic: {topic}",
    agent=researcher,
    expected_output="A list of key findings with sources",
)
write_task = Task(
    description="Write a research report from the findings",
    agent=writer,
    expected_output="A well-structured research report",
    context=[search_task],
)
crew = Crew(agents=[researcher, writer], tasks=[search_task, write_task])
result = crew.kickoff(inputs={"topic": "quantum computing advances in 2025"})
Code Fragment 28.1.3: The same research workflow in CrewAI (Agent/Task/Crew abstractions with persona-driven behavior) and the OpenAI Agents SDK (composing search_agent and writer_agent into a pipeline where Runner.run manages handoffs via built-in orchestration).
# OpenAI Agents SDK: Research Agent
from openai import agents
search_agent = agents.Agent(
    name="researcher",
    instructions="Search the web for information on the given topic.",
    tools=[web_search_tool],
    model="gpt-4o",
)
writer_agent = agents.Agent(
    name="writer",
    instructions="Write a research report from the provided findings.",
    model="gpt-4o",
)
# Use handoff to chain agents
orchestrator = agents.Agent(
    name="orchestrator",
    instructions="Coordinate research: first search, then write a report.",
    handoffs=[search_agent, writer_agent],
    model="gpt-4o",
)
result = agents.Runner.run_sync(orchestrator, "Research quantum computing advances")
Code Fragment 28.1.4: The OpenAI Agents SDK version of the same research pipeline. handoffs=[search_agent, writer_agent] on the orchestrator tells the SDK which sub-agents are available; Runner.run_sync dispatches the orchestrator's structured handoff calls automatically, which is the API-level difference from LangGraph's explicit edges and CrewAI's task list.
Key Insight

The framework matters less than the architecture. The most common mistake teams make is spending weeks evaluating frameworks when the real question is: what architecture does your agent need? Determine first whether you need a single-agent loop, a pipeline, a supervisor with specialists, or a peer-to-peer mesh (see Section 28.2 for architecture patterns). Then select the framework that best supports that architecture. A supervisor pattern is straightforward in any framework. A complex mesh with conditional routing and checkpointing demands LangGraph or a custom solution. The architecture decision constrains the framework choice, not the other way around.

28.1.2 Framework Selection Guide

Choosing a framework is a consequential decision that affects development speed, maintenance burden, and scaling potential. The decision matrix should consider: team expertise (Python familiarity, async programming comfort), application complexity (simple tool loop vs. complex multi-agent workflow), deployment requirements (cloud provider preferences, compliance constraints), and long-term flexibility (will you outgrow the framework's abstractions?).

For startups and prototypes, start with a higher-level framework (CrewAI, PydanticAI) or the native provider SDK. These minimize boilerplate and let you focus on the agent's logic rather than infrastructure. For production systems with complex state management, conditional workflows, and compliance requirements, LangGraph or a custom framework built on the raw API provides the control you need. For organizations committed to a specific cloud provider, the provider's SDK (OpenAI Agents SDK, Google ADK, Semantic Kernel for Azure) integrates most smoothly with the surrounding infrastructure.

# Lab starter: framework selection skeleton. Students fill in the TODOs.
# Goal: given a list of candidate frameworks and a task description, score them.
from typing import Iterable
import numpy as np

FRAMEWORKS = ["LangGraph", "AutoGen", "CrewAI", "Swarm"]

def load_task_description() -> str:
    """TODO: load the task description from prompt.txt or your dataset."""
    raise NotImplementedError

def score_framework(name: str, description: str) -> float:
    """TODO: call your LLM with a scoring prompt; return a 0.0-1.0 fit score.
    Hints:
      - Build a small prompt template that lists the framework's strengths
      - Ask the model to rate task fit on a 0-10 scale, then divide by 10
    """
    raise NotImplementedError

def rank(frameworks: Iterable[str], description: str) -> list[tuple[str, float]]:
    """Score each candidate and return them sorted high-to-low."""
    scored = [(name, score_framework(name, description)) for name in frameworks]
    return sorted(scored, key=lambda x: -x[1])

if __name__ == "__main__":
    task = load_task_description()
    for name, fit in rank(FRAMEWORKS, task):
        print(f"  {name:>10s}: {fit:.2f}")
Output: LangGraph: 0.85 AutoGen: 0.72 CrewAI: 0.65 Swarm: 0.40
Code Fragment 28.1.5: A starter that fixes the structure but leaves the load and scoring functions as NotImplementedError. The rank() function shows the expected shape: score each candidate against the task description, then sort high-to-low. Students fill load_task_description and score_framework using whichever LLM client they prefer; a complete, runnable reference implementation appears immediately below in Code Fragment 28.1.6.
# Full solution for the framework-selection lab.
# Uses an LLM judge to score each candidate on the task description.
from openai import OpenAI
from pathlib import Path

client = OpenAI()

FRAMEWORKS = {
    "LangGraph": "Strong for explicit graph-based control flow; good with cycles and conditional edges; native LangChain integration.",
    "AutoGen":   "Optimized for conversational multi-agent systems; group chat patterns; human-in-the-loop friendly.",
    "CrewAI":    "Role-based agents (manager, researcher, writer); sequential or hierarchical processes; simple API.",
    "Swarm":     "Minimal, stateless agent routing primitives; good for tool-handoff patterns; easy to inspect.",
}

JUDGE_PROMPT = """Rate how well the framework below fits the task on a 0-10 scale.
Return ONLY a number between 0 and 10.

Framework: {name}
Description: {desc}

Task: {task}

Score:"""

def load_task_description() -> str:
    p = Path("prompt.txt")
    if p.exists(): return p.read_text(encoding="utf-8")
    return "Build a 3-agent pipeline for researching, drafting, and editing news articles."

def score_framework(name: str, task: str) -> float:
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": JUDGE_PROMPT.format(name=name, desc=FRAMEWORKS[name], task=task)}],
        temperature=0.0,
        max_tokens=4,
    )
    try:
        return float(resp.choices[0].message.content.strip()) / 10.0
    except ValueError:
        return 0.5

if __name__ == "__main__":
    task = load_task_description()
    scored = sorted(((n, score_framework(n, task)) for n in FRAMEWORKS), key=lambda x: -x[1])
    print(f"Task: {task!r}\n")
    for name, fit in scored:
        print(f"  {name:>10s}: {fit:.2f}")
Output: Task: 'Build a 3-agent pipeline for researching, drafting, and editing news articles.' LangGraph: 0.90 AutoGen: 0.70 CrewAI: 0.80 Swarm: 0.50
Code Fragment 28.1.6: The reference solution swaps FRAMEWORKS from a list to a dict that maps each candidate to its capability description, then asks gpt-4o-mini to score the task-fit on a 0-10 scale per candidate. The dict-of-descriptions structure is what makes the scoring prompt informative; without it the LLM has no signal beyond the framework name.
Real-World Scenario
Framework Migration at a Series B Startup

Who: The founding engineer at a Series B market intelligence startup with a team of four developers.

Situation: The team built their initial product (3 agents producing market research reports) with CrewAI in two weeks. The prototype impressed investors and landed the first 50 customers.

Problem: At 500 customers, the product needed conditional workflows (different report types for different industries), persistent checkpointing (resume interrupted reports after timeouts), and custom error handling (graceful recovery when a data source was unavailable). CrewAI's high-level abstractions made each of these features difficult to implement without fighting the framework.

Decision: After two weeks of failed workarounds, the team committed to migrating from CrewAI to LangGraph. The graph-based approach handled conditional workflows through explicit state transitions, and built-in checkpointing enabled the resume feature natively.

Result: Migration took four weeks and doubled the total codebase, but the team gained full control over execution flow. Time to ship new report types dropped from two weeks to two days. No further framework limitations were encountered through the next 2,000 customers.

Lesson: Start with a high-level framework when speed to market matters, but plan for migration if you anticipate complex workflows; the cost of migrating early is always lower than migrating late.

Tip: Start with Two Agents, Not Five

Multi-agent systems add communication overhead and failure modes. Start with two agents (for example, a planner and an executor) and verify they coordinate reliably before adding more. Each additional agent multiplies debugging complexity.

Lab: Build the Same Agent in Three Frameworks

Objective

In this lab, you will implement an identical research agent in LangGraph, CrewAI, and the OpenAI Agents SDK. You will compare the developer experience, code complexity, execution traces, and output quality across frameworks.

Setup

Install LangGraph, CrewAI, and the OpenAI Agents SDK in three isolated environments to avoid dependency conflicts. You will also need an OpenAI or Anthropic API key, a web-search tool (Tavily or SerpAPI work well), and a budget cap to bound API spend across the 15 runs (3 frameworks x 5 queries). Pick a fixed model version (such as gpt-4o-2024-08-06) so the comparison is apples-to-apples.

Steps

  1. Implement a research agent that searches the web, evaluates sources, and writes a report.
  2. Run all three implementations on the same 5 research queries.
  3. Compare: lines of code, number of LLM calls, total tokens used, output quality (human rating 1 to 5).
  4. Document which framework you would choose for different use cases and why.

Expected Output

A side-by-side comparison table covering lines of code, LLM-call count, token usage, and 1-5 quality rating across the three frameworks. Expect LangGraph to require the most code with the cleanest observability, CrewAI to be terse but harder to debug, and the OpenAI Agents SDK to land in the middle on both axes. The deliverable is the table plus a one-paragraph framework-selection recommendation tied to your team's constraints.

Key Takeaways
Self-Check
Q1: Name three major multi-agent frameworks available in 2026 and describe what distinguishes each.
Show Answer

Examples: (1) LangGraph uses a state-graph abstraction for fine-grained control over agent workflows. (2) CrewAI uses a role-based 'crew' metaphor for rapid prototyping of collaborative agents. (3) AutoGen focuses on conversation-based multi-agent interaction with human-in-the-loop support.

Q2: Why is it valuable to build the same agent in multiple frameworks before committing to one?
Show Answer

Different frameworks impose different abstractions and constraints. Building the same agent in multiple frameworks reveals which abstractions match your use case, which frameworks have better documentation and ecosystem support, and which ones create friction for your specific requirements.

Exercises

Exercise 22.1.1: Framework Comparison Conceptual

Compare LangGraph, CrewAI, and AutoGen on three dimensions: ease of setup, flexibility of agent topologies, and production readiness. Which framework would you choose for a quick prototype vs. a production system?

Answer Sketch

LangGraph: most flexible (arbitrary graph topologies), production-ready, steeper learning curve. CrewAI: easiest setup (role-based agents), good for team simulations, less flexible topology. AutoGen: strong multi-agent conversations, good for research, evolving production story. Quick prototype: CrewAI for its simplicity. Production: LangGraph for its explicit state management and observability hooks.

Exercise 22.1.2: Framework Selection Criteria Conceptual

A startup needs to build a customer support agent that escalates complex issues to specialists. List five criteria they should use to select an agent framework, and rank them by importance.

Answer Sketch

1. Production reliability (error handling, retries, observability). 2. Human-in-the-loop support (escalation, approval workflows). 3. State management (conversation history, customer context). 4. Integration ecosystem (CRM, ticketing, knowledge base connectors). 5. Team expertise (learning curve, documentation quality). Reliability and HITL support are most critical for customer-facing applications.

Exercise 22.1.3: LangGraph State Machine Coding

Build a simple two-agent LangGraph workflow where a 'researcher' agent gathers information and a 'writer' agent produces a summary. Use typed state to pass information between agents.

Answer Sketch

Define a TypedDict state with fields for query, research_results, and final_summary. Create two nodes (researcher, writer). The researcher populates research_results; the writer reads them and produces final_summary. Connect with graph.add_edge('researcher', 'writer'). Compile and invoke with the initial query.

Exercise 22.1.4: CrewAI Role Design Coding

Using CrewAI, define a crew of three agents (Researcher, Analyst, Reporter) that work together to produce a market analysis report. Specify each agent's role, goal, and backstory.

Answer Sketch

Each agent gets a role string, a goal describing its objective, and a backstory providing context. The Researcher searches for data, the Analyst identifies trends and insights, and the Reporter writes the final document. Tasks are defined with expected outputs and assigned to specific agents. The crew is configured with a sequential process.

Exercise 22.1.5: Framework Lock-in Risks Discussion

What are the risks of building a production agent system on a specific framework? How can you architect your system to minimize framework lock-in?

Answer Sketch

Risks: framework may become unmaintained, its API may change breaking your code, or it may not scale to your needs. Mitigation: separate business logic from framework-specific code using an adapter pattern. Define your own tool interface and agent interface; implement framework-specific adapters. Store state in your own database rather than relying on framework state management. This lets you swap frameworks without rewriting core logic.

What Comes Next

In the next section, Architecture Patterns, we examine the core topologies for organizing multi-agent systems: hierarchical, flat, and hybrid patterns with their respective tradeoffs.

Further Reading
Li, G., Hammoud, H., Itani, H., et al. (2023). "CAMEL: Communicative Agents for 'Mind' Exploration of Large Language Model Society." NeurIPS 2023. Introduces role-playing communication between agents, one of the earliest multi-agent frameworks and a precursor to modern multi-agent systems.
Hong, S., Zhuge, M., Chen, J., et al. (2024). "MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework." ICLR 2024. Introduces structured SOPs (Standard Operating Procedures) for multi-agent software development, demonstrating how role specialization and communication protocols improve collaboration.
Wu, Q., Bansal, G., Zhang, J., et al. (2023). "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." arXiv preprint. Microsoft's framework for building multi-agent conversation systems with customizable agent types, conversation patterns, and human-in-the-loop capabilities.
Guo, T., Chen, X., Wang, Y., et al. (2024). "Large Language Model based Multi-Agents: A Survey of Progress and Challenges." arXiv preprint. Comprehensive survey of multi-agent systems covering framework architectures, communication mechanisms, and application domains, useful for framework comparison.
LangChain (2024). "LangGraph: Build Stateful Multi-Actor Applications." LangGraph Documentation. Official documentation for LangGraph, a graph-based framework for building stateful, multi-step agent workflows with persistence and human-in-the-loop support.
CrewAI (2024). "CrewAI Documentation." docs.crewai.com. Documentation for CrewAI, a role-based multi-agent framework that organizes agents into crews with defined tasks, tools, and delegation patterns.