
"An agent is a loop with permission to act. Most of the work is defining the loop. The rest is defining the permission."
Pip, Stack-Building AI Agent
Chapters 26 through 29 designed agents. This chapter surveys the frameworks that host them: LangGraph, OpenAI Agents SDK, Claude Code, Mastra, smolagents, and the open question of which abstraction layer is worth its complexity.
Part VI built agents: planning, tool use, multi-agent coordination, memory, and the protocols (MCP, A2A, AG-UI) that let agents talk to tools, to each other, and to users. This chapter is the toolbox: the orchestration frameworks (LangGraph, CrewAI, AutoGen, smolagents, PydanticAI), the new agent-native protocols, and the eval harnesses (SWE-bench, WebArena, SWE-Lancer) that measure whether an agent actually finishes the task.
Chapter Overview
Part VI introduced agents: the LLM-powered programs that can use tools, plan, and act with side effects. This chapter consolidates the agent toolchain: the agent-native platforms (Model Context Protocol, Agent Protocol, runtimes) that did not exist as standardized infrastructure before 2024, the libraries (LangChain, LangGraph, OpenAI Assistants, AutoGen, CrewAI, AG2), the benchmarks (AgentBench, SWE-Bench, GAIA, tau-bench, WebArena) that test end-to-end task completion, the models that hold up under agentic load, and the rapidly-evolving literature.
Agent tooling is the youngest stack in this book. By 2026 it has converged enough to ship products but is still moving fast enough to obsolete a year-old runbook. This chapter is the index of what stuck.
- Compare agent runtimes (LangGraph, AutoGen, CrewAI, AG2) on routing, memory, and human-in-the-loop support.
- Configure an agent stack with Model Context Protocol and Agent Protocol for tool interoperability.
- Choose an agent benchmark (AgentBench, SWE-Bench, GAIA, tau-bench, WebArena) for a target capability.
- Identify the models (Claude, GPT, Gemini, open-weight) that hold up under multi-step tool-use load.
- Track the agent literature through the venues, blogs, and communities that maintain it.
If you want one stack that scales from a single-agent toy to a multi-agent system:
pip install langgraph anthropic
LangGraph is the graph-based agent runtime from the LangChain team; combined with a frontier LLM it covers ~80 percent of the patterns in Part VI. For function-calling-heavy structured agents, swap in PydanticAI.
Sections in This Chapter
Prerequisites
- Agent foundations from Chapter 26
- LLM tooling stack from Chapter 14
- Python and shell comfort for hands-on framework comparisons
- 30.1 Platforms Agentic systems run on top of LLM API platforms (Chapter 16) plus a new layer: agent-native protocols and runtimes that did not exist as standardized infrastructure before 2024.
- 30.2 Agent Libraries: LangChain & Framework Deep Dive Agent library landscape, LangChain Agents (Legacy), and a deep dive into modern agent frameworks (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Semantic Kernel, smolagents, PydanticAI).
- 30.3 Multi-Agent Patterns & Topologies The four multi-agent topologies in production (hierarchical, peer / debate, pipeline, competitive) with failure modes and canonical frameworks for each.
- 30.4 Datasets & Benchmarks Agent benchmarks have to test something that traditional NLP benchmarks do not: whether the system can complete an actual multi-step task with side effects.
- 30.5 Models Agentic workloads put unusual pressure on models.
- 30.6 External Reading & Communities Agentic literature is the youngest in this book.
What Comes Next
Next: Chapter 31: Embeddings, Vector Databases & Semantic Search, opening Part VII. The agents you just learned to build are limited by what is in their training data and what is in the prompt. Part VII fixes both problems: embedding-based retrieval gives the agent a queryable memory across millions of documents, and structured extraction lets it pull facts out of unstructured text. The shift is from "the model knows everything at training time" to "the system learns at runtime".