Section 28.2: Architecture Patterns

"One agent solves a problem. Two agents solve it differently. Three agents schedule a meeting to discuss the problem."
Census, Hierarchically Orchestrated AI Agent

Big Picture

The topology of your multi-agent system determines its reliability, latency, and failure modes. Six architecture patterns have emerged as practical for production: supervisor, pipeline, mesh, swarm, hierarchical, and debate. Each pattern defines how agents are organized, how work flows between them, and how decisions are made. Choosing the wrong pattern leads to cascading failures, runaway costs, or agents that talk past each other. This section covers each pattern with concrete examples, tradeoff analysis, and guidance on when to use (and avoid) each one. The AI agent from Chapter 26 is the building block; this section shows how to compose multiple loops into a coherent system.

Prerequisites

This section builds on tool use and protocols from Chapter 27 and agent foundations from Chapter 26.

Three foundational multi-agent topologies shown side by side at full size.

Figure 28.2.1: Three foundational multi-agent topologies. Hub-and-Spoke gives a single supervisor full control. Pipeline gives clean sequential handoff with no back-talk. Fully Connected gives maximum flexibility at the cost of message volume that scales as O(N²). Most production systems are hybrids: a supervisor handing structured subgoals to a pipeline of specialists.

28.2.1 Foundational Patterns

Fun Fact

Conway's Law ("organizations design systems that mirror their communication structure") was published by Mel Conway in 1968 in a paper rejected by the Harvard Business Review. The paper found a home in Datamation magazine; Fred Brooks cited it in 1975's Mythical Man-Month and made it famous. Multi-agent systems in 2026 routinely re-derive Conway's Law every time a team splits an agent that should have been kept whole.

Key Insight

Why: When multi-agent decomposition actually beats a single agent

Multi-agent decomposition only beats a single agent when the bandwidth between sub-problems is lower than the capacity cost of holding both in context. If sub-agents need to share most of their reasoning, the inter-agent message-passing serializes what a single agent would do in parallel attention, and you lose. This is the Conway's-law cousin of microservices: distributing a system pays off only when interfaces are narrower than internals. Multi-agent debate, planner/executor splits, and tool-specialized agents are all instances of this; LLM-routing-to-LLM (where each agent solves a similar but slightly different problem) usually is not.

Topology decides how agents are organized, how work flows between them, and how decisions land. The pick drives reliability, latency, cost, and the class of tasks the system can even attempt. Six patterns dominate production work: supervisor, pipeline, mesh, swarm, hierarchical, and debate.

The supervisor pattern places a single orchestrating agent in charge. The supervisor receives tasks, decides which specialist agent should handle each subtask, routes work accordingly, and synthesizes results. This is the most common pattern in production because it provides a clear control point for monitoring, cost management, and error handling. The supervisor's routing logic can be as simple as a classification prompt or as complex as a planning agent with its own tool set.

The pipeline pattern arranges agents in a linear sequence where each agent transforms the output of the previous one. A content generation pipeline might flow through Research Agent, Outline Agent, Draft Agent, Edit Agent, and Fact-Check Agent. Pipelines are simple to understand and debug because the data flow is predictable. They work well when the task naturally decomposes into sequential stages with well-defined inputs and outputs.

Key Insight

Most production multi-agent systems use the supervisor pattern as the top level, with pipelines or meshes within specific subtask flows. A customer service system might have a supervisor that routes tickets to specialist agents, where each specialist runs a pipeline (understand, retrieve, draft, review) internally. This layered approach combines the routing intelligence of a supervisor with the structured execution of a pipeline.

Key Insight

Multi-agent coordination is fundamentally harder than single-agent design, and the difficulty grows non-linearly. A single agent has one context window, one reasoning chain, and one execution path. Adding a second agent introduces communication (what information to share and in what format), synchronization (when to wait for results vs. proceed independently), and conflict resolution (what happens when agents disagree). With N agents, potential interaction patterns grow combinatorially. This is not just an engineering inconvenience; it mirrors the fundamental coordination costs in distributed systems and human organizations. Brooks's Law ("adding people to a late project makes it later") applies to agents too. Start with the fewest agents possible and add more only when a single agent demonstrably cannot handle the task's diversity or throughput requirements. The rest of this section catalogues the patterns that manage these coordination challenges.

Key Insight

The coordination challenges of multi-agent systems are a computational manifestation of problems studied in organizational theory and microeconomics. Ronald Coase's theory of the firm (1937) asks why organizations exist at all rather than relying on market transactions between independent actors. His answer, transaction costs, applies directly to multi-agent systems: the overhead of coordinating separate agents (communication, context sharing, conflict resolution) is the "transaction cost" of the multi-agent approach. When these costs exceed the benefits of specialization, a single monolithic agent is more efficient, just as a firm integrates operations rather than outsourcing them. Conversely, when tasks are highly modular with clean interfaces, the multi-agent approach wins for the same reason that markets outperform central planning in economics: distributed agents can specialize and operate in parallel. The supervisor pattern is essentially a management hierarchy; the mesh pattern is a market; and the debate pattern is an adversarial proceeding. Each topology trades off coordination costs against the benefits of specialization differently.

Library Shortcut: autogen v0.4 (event-driven multi-agent)

Microsoft AutoGen v0.4 (the autogen-agentchat package, released late 2024) rewrites the framework around an asyncio event bus. Agents are AssistantAgent objects that subscribe to messages, and teams like RoundRobinGroupChat or SelectorGroupChat orchestrate turn-taking with a termination condition. The result is a cleaner mental model than v0.2 plus first-class streaming and human-in-the-loop nodes.

Show code

pip install -U autogen-agentchat autogen-ext[openai]
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def main():
    client = OpenAIChatCompletionClient(model="gpt-4o")
    a = AssistantAgent("planner", model_client=client, system_message="Plan steps.")
    b = AssistantAgent("solver", model_client=client, system_message="Execute steps.")
    team = RoundRobinGroupChat([a, b], termination_condition=MaxMessageTermination(6))
    await team.run(task="Estimate the energy cost of training Llama-3 70B.")
asyncio.run(main())

Code Fragment 28.2.1a: AutoGen v0.4 round-robin team driven by an asyncio event loop.

28.2.2 Advanced Topologies

The mesh pattern connects agents in a peer-to-peer network where any agent can communicate with any other. This is the most flexible topology but also the hardest to debug and monitor. Mesh patterns emerge naturally in systems where agents need to negotiate or share information bidirectionally. The A2A protocol discussed in Section 27.3 provides a standard for mesh communication.

The swarm pattern (popularized by OpenAI's Swarm framework) uses lightweight agents that hand off tasks to each other through a simple transfer mechanism. Each agent in the swarm has a focused role and a set of handoff targets. When an agent determines that the task needs a different capability, it transfers the conversation to the appropriate agent. This pattern is particularly effective for customer service scenarios where different query types require different specialist knowledge.

The debate pattern assigns agents opposing roles and has them argue toward a conclusion. One agent advocates for a position, another challenges it, and a judge synthesizes the best arguments into a final answer. Research shows that debate produces higher-quality outputs for tasks involving judgment, evaluation, and analysis. The key risk is sycophantic convergence, where agents agree with each other too readily rather than maintaining genuinely opposing perspectives.

Three advanced multi-agent topologies: swarm with directional handoffs between specialist agents, debate with two opposing agents and a judge, and hierarchical with a top supervisor delegating to mid-level supervisors that each manage worker agents — **Figure 28.2.2**: Three advanced multi-agent topologies that extend the foundational set. Swarm uses directed handoffs (good when categories are known in advance). Debate forces structured disagreement followed by judgement (good for high-stakes evaluation). Hierarchical stacks supervisors recursively so each layer reasons at its own abstraction level (good when one supervisor would overflow its context budget).

Production Pattern

Production Example: Multi-Agent Topologies in Real Products

Each topology has named production deployments. Swarm: OpenAI shipped the Swarm framework in October 2024 as the reference for handoff-based customer-service routing; several Zendesk and Intercom partners use it for triage. Hierarchical (orchestrator-worker): Anthropic's "Claude as a research lead" pattern, published in their multi-agent post (April 2025), uses a top Claude that spawns subagents for parallel literature searches, file reads, and writeup; it now powers Claude Research and the Computer Use research-mode pipeline. Debate: Constitutional AI training at Anthropic and Google DeepMind's Society of Mind paper (Du et al., 2023) both use it; in production it shows up in LMSYS Chatbot Arena's pairwise grader and in adversarial red-team pipelines.

Algorithm 28.2.1

The supervisor (hub-and-spoke) pattern as a multi-round dispatch loop. At each round the LLM-router consumes the task and the running result list, decides which specialist agent to call (or signals DONE), executes that agent, and appends its output. The loop is bounded by R rounds to prevent runaway costs.

Input: task T, specialist agents {A1, ..., An} with descriptions, LLM M, max rounds R
Output: synthesized result
1. Initialize results = []
2. for round = 1 to R:
  a. route = M("Given task T and results so far, select next agent or DONE")
  b. if route == DONE:
break
  c. agent = lookup(route, {A1, ..., An})
  d. subtask = M("Extract the subtask for agent from T and context")
  e. result = agent.execute(subtask)
  f. results.append((agent, result))
3. final = M("Synthesize results into final answer for T")
return final

# Supervisor pattern with LangGraph
from langgraph.graph import StateGraph, END
def supervisor(state):
    """Route the task to the appropriate specialist."""
    response = llm.invoke(
        f"Classify this task and route to the best specialist:\n"
        f"Task: {state['task']}\n"
        f"Available specialists: research, coding, writing, analysis\n"
        f"Respond with just the specialist name."
        )
    return {"next_agent": response.content.strip().lower()}
def route(state):
    return state["next_agent"]
graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor)
graph.add_node("research", research_agent)
graph.add_node("coding", coding_agent)
graph.add_node("writing", writing_agent)
graph.add_node("analysis", analysis_agent)
graph.add_node("synthesize", synthesize_results)
graph.set_entry_point("supervisor")
graph.add_conditional_edges("supervisor", route, {
    "research": "research",
    "coding": "coding",
    "writing": "writing",
    "analysis": "analysis",
    })
for agent in ["research", "coding", "writing", "analysis"]:
    graph.add_edge(agent, "synthesize")
    graph.add_edge("synthesize", END)

Code Fragment 28.2.2a: Supervisor pattern with LangGraph

Library Shortcut: CrewAI in Practice

The same supervisor pattern in 12 lines with CrewAI (pip install crewai):

Show code

from crewai import Agent, Task, Crew, Process
supervisor = Agent(role="Supervisor", goal="Route tasks to specialists")
researcher = Agent(role="Research Analyst", goal="Find information")
coder = Agent(role="Software Engineer", goal="Write code solutions")
writer = Agent(role="Technical Writer", goal="Produce clear reports")
task = Task(
    description="Research quantum computing advances and write a summary",
    agent=supervisor,
    expected_output="A researched summary report",
)
crew = Crew(
    agents=[supervisor, researcher, coder, writer],
    tasks=[task],
    process=Process.hierarchical,
    manager_agent=supervisor,
)
result = crew.kickoff()

Code Fragment 28.2.3: Minimal working example using CrewAI.

28.2.3 Pattern Selection Criteria

Selecting the right architecture pattern depends on several factors. Task decomposability: can the task be broken into independent subtasks (favors supervisor or mesh) or sequential stages (favors pipeline)? Quality requirements: does the output need adversarial review (favors debate) or is a single pass sufficient? Latency budget: sequential patterns add latency proportional to the number of stages; parallel patterns trade latency for cost. Debuggability: simpler topologies (pipeline, supervisor) are easier to trace and debug than meshes or swarms.

Key Insight

Why decentralized topologies win: the joint-action-space argument

The multi-agent reinforcement learning literature has a clean quantitative argument for why LLM multi-agent systems almost always use decentralized topologies rather than a single super-supervisor that picks every agent's action. The MARL loop is: each of $N$ agents emits an action from its local action space $A$, a joint-action operator combines them into a single environment input, and the environment returns per-agent rewards and observations. A centralized controller that picks the joint action directly must search the product space $|A|^N$. With three agents and six actions each, that is $6^3 = 216$ joint actions per step; with 10 agents it is $6^{10} \approx 6 \times 10^7$. Decentralizing into $N$ independent agents that each search their own $|A|$-sized action space replaces the exponential with a linear cost, at the price of having to learn (or prompt) coordination without a global controller. The same combinatorial-explosion argument explains why supervisor + worker LLM topologies dominate flat "council of agents" designs: a supervisor that has to enumerate every possible $(\text{worker}_1\text{ action}, \ldots, \text{worker}_N\text{ action})$ tuple in its context cannot scale past three or four workers, while a supervisor that just routes one subtask at a time per worker scales linearly. References: Stone, P. and Veloso, M. (2000). "Multiagent Systems: A Survey from a Machine Learning Perspective." Autonomous Robots; Albrecht, S., Christianos, F., and Schäfer, L. (2024). Multi-Agent Reinforcement Learning: Foundations and Modern Approaches, MIT Press.

Table 28.2.2b: MARL design dimensions mapped to LLM multi-agent design choices. The MARL community's six classification axes are useful even for cooperative LLM systems because they expose decisions that are often left implicit.

MARL dimension	Question for an LLM team	Typical LLM choice	Concrete example
Size	How many agents and how big is each action space?	Small (2-7); per-agent action = one tool call	Supervisor + 4 specialist workers (research, draft, edit, fact-check)
Knowledge	What does each agent know about the others' goals and tools?	Full common knowledge (shared system prompt)	Each agent sees the team objective and the full tool catalogue
Observability	Does every agent see the full conversation history?	Partial (workers see only their handoff message)	Pipeline stages each receive the prior stage's output, not the full thread
Rewards	Cooperative (shared reward), competitive (zero-sum), or mixed?	Cooperative for supervisor/pipeline; mixed for debate	Autonomous-driving-style mixed motive: shared "avoid collisions" + per-agent "drive efficiently"
Objective	Optimize task success, time, cost, or all three?	Weighted sum, with cost as a hard cap	"Best answer within $0.50 of API spend per query"
Centralization	One coordinator or fully decentralized handoffs?	Centralized for simple flows; hierarchical when one supervisor overflows	Anthropic's research-lead pattern: one Claude spawns 4 subagents in parallel

Table 28.2.1b: Pattern selection cheat sheet. Rank-ordering shows the patterns most often defaulted to per criterion; treat as a starting heuristic, not a hard rule.

Criterion	Supervisor	Pipeline	Mesh	Swarm	Debate	Hierarchical
Task structure	Branching	Linear	Negotiation	Categorical	Judgement	Nested
Latency	Low (parallel)	Medium (sum of stages)	High	Low	High (multi-round)	Medium
Cost	Low	Low	High (O(N²) msgs)	Low	High (2-3x)	Medium
Debug effort	Easy	Easiest	Hard	Easy	Medium	Hard
Quality ceiling	Medium	Medium	High	Medium	High	High
Default first choice?	Yes	Yes if linear	No	If categories fixed	Only for eval/audit	Only when supervisor overflows

Real-World Scenario

Choosing Between Patterns for a Legal Document Review System

Who: A lead architect at a legal technology company building an AI-powered contract review product for mid-market law firms.

Situation: The product needed to review commercial contracts (NDAs, MSAs, SOWs) and flag problematic clauses. The team prototyped three architecture patterns and had budget to build only one for the v1 launch.

Problem: A pipeline approach (extract, classify, summarize, report) was simple but missed cross-clause interactions (e.g., an indemnification clause that contradicted a liability cap elsewhere in the same document). A pure debate approach caught more issues but doubled the LLM cost per contract.

Decision: The team chose a supervisor pattern with a selective debate sub-pattern. The supervisor routed each contract section to specialized agents (clause extraction, risk assessment, precedent search, summary generation). Clauses flagged as high-risk by the risk agent went through a secondary debate process where two agents reviewed independently and a judge reconciled discrepancies. Routine clauses were reviewed only once.

Result: The hybrid approach caught 91% of issues identified by the full-debate system at 60% of the cost. High-risk clause accuracy (the metric law firms cared about most) matched the debate pattern at 94%.

Lesson: Applying expensive patterns selectively to high-risk components, rather than uniformly across all inputs, captures most of the quality benefit at a fraction of the cost.

Warning

Common Misconception: More Agents Means Better Results

The multi-agent literature often showcases impressive results on complex benchmarks, which can create the impression that multi-agent systems are inherently superior to single-agent approaches. In practice, a single well-prompted agent with good tools often outperforms a multi-agent system for straightforward tasks, and at a fraction of the cost and latency. Each additional agent adds an LLM call (latency and cost), a communication overhead (context that must be passed between agents), and a new failure point (what if one agent misinterprets another's output?). Add agents only when you can clearly articulate what each agent contributes that the others cannot. The simplest architecture that meets your requirements is the best architecture. Start with one agent and scale to multiple only when you hit clear limitations.

Tip: Define Clear Agent Handoff Protocols

Specify exactly what information passes between agents at each handoff: structured messages with required fields, not free-text. Ambiguous handoffs cause the receiving agent to misunderstand context, which cascades into downstream failures.

Key Takeaways

The three foundational patterns are supervisor, peer-to-peer, and hierarchical, each with distinct tradeoffs.
Supervisor pattern offers centralized control but creates a single point of failure.
Peer-to-peer enables flexible collaboration but requires careful coordination to avoid conflicts.
Hierarchical patterns scale to complex organizations but add communication overhead between levels.

Self-Check

Q1: What are the three foundational multi-agent architecture patterns, and when would you use each?

Show Answer

Supervisor (one agent delegates to specialized workers; use for centralized control), peer-to-peer (agents communicate as equals; use for collaborative problem-solving), and hierarchical (multi-level supervisor trees; use for complex workflows with sub-team specialization).

Q2: What is the main risk of the supervisor pattern, and how can it be mitigated?

Show Answer

The supervisor is a single point of failure and a bottleneck. If it misroutes tasks or fails, the entire system stops. Mitigation includes adding health checks, fallback supervisors, and allowing workers to escalate directly to a backup coordinator.

Exercises

Exercise 22.2.1: Pattern Identification Conceptual

Match each scenario to the best multi-agent architecture pattern (supervisor, pipeline, debate, or swarm): (a) content moderation with multiple criteria, (b) sequential document processing, (c) open-ended research, (d) fact-checking claims.

Answer Sketch

(a) Supervisor: a central agent coordinates specialized checkers for different criteria. (b) Pipeline: each agent handles one stage (extract, validate, format). (c) Swarm: agents dynamically self-organize to explore different research directions. (d) Debate: two agents argue for and against a claim, producing a balanced assessment.

Exercise 22.2.2: Supervisor Pattern Implementation Coding

Implement a supervisor agent that receives a task, decides which of three specialist agents (coder, researcher, writer) to delegate to, collects the result, and decides whether the task is complete or needs further delegation.

Answer Sketch

The supervisor is a function that calls the LLM with the task and available agents. The LLM returns a JSON decision: {agent: 'coder', subtask: '...'}. Execute the chosen agent, return the result to the supervisor, and loop until the supervisor decides the task is complete. Use a max_delegation_rounds limit.

Exercise 22.2.3: Pipeline vs. Supervisor Conceptual

A document processing system needs to extract entities, classify them, and generate a summary. Compare implementing this as a pipeline pattern versus a supervisor pattern. What are the trade-offs?

Answer Sketch

Pipeline: deterministic, predictable latency, easy to debug and monitor. Each stage runs once in sequence. Supervisor: can dynamically re-route (e.g., skip classification if no entities found), but adds overhead of routing decisions and is harder to predict latency. Choose pipeline when the steps are always the same; choose supervisor when some steps may be skipped or repeated based on intermediate results.

Exercise 22.2.4: Debate Pattern Design Coding

Implement a debate pattern where two agents argue for and against a proposition, moderated by a judge agent that scores arguments and declares a winner after three rounds.

Answer Sketch

Create three functions: proposer (argues for), opposer (argues against), and judge (scores each round). In each round, the proposer and opposer receive the conversation history and produce arguments. The judge scores each argument on strength of evidence, logical coherence, and relevance. After three rounds, the judge produces a final verdict with reasoning.

Exercise 22.2.5: Topology Selection Criteria Conceptual

List four criteria for selecting a multi-agent topology and explain how each criterion favors a different pattern.

Answer Sketch

1. Task decomposability (clear subtasks favor pipeline or map-reduce). 2. Interdependency between steps (high interdependency favors supervisor or swarm). 3. Need for diverse perspectives (favors debate or ensemble). 4. Latency requirements (pipeline is predictable; supervisor adds routing overhead; parallel patterns reduce wall-clock time). 5. Observability needs (pipeline is easiest to trace).

What Comes Next

In the next section, Section 28.3: Human-in-the-Loop Agent Systems, we continue.

In the next section, Communication, Consensus and Conflict Resolution, we explore how agents coordinate their actions, resolve disagreements, and reach consensus in distributed settings.

Further Reading

Park, J.S., O'Brien, J.C., Cai, C.J., et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." UIST 2023. Demonstrates a flat peer-to-peer multi-agent architecture where 25 agents interact autonomously, establishing the decentralized communication pattern for agent societies.

Du, Y., Li, S., Torralba, A., et al. (2023). "Improving Factuality and Reasoning in Language Models through Multiagent Debate." ICML 2024. Shows that multi-agent debate improves reasoning accuracy, establishing the debate pattern as a key multi-agent topology for quality improvement.

Chen, W., Su, Y., Zuo, J., et al. (2024). "AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors." ICLR 2024. Proposes a general framework for multi-agent collaboration with configurable group dynamics, demonstrating how architecture patterns affect emergent collaborative behavior.

Qian, C., Cong, X., Yang, C., et al. (2024). "Communicative Agents for Software Development." ACL 2024. ChatDev implements a waterfall pipeline topology where specialized agents hand off through phases (design, coding, testing), demonstrating the sequential multi-agent architecture pattern.

Guo, T., Chen, X., Wang, Y., et al. (2024). "Large Language Model based Multi-Agents: A Survey of Progress and Challenges." arXiv preprint. Surveys multi-agent topologies including centralized, decentralized, and hierarchical patterns, providing a taxonomy for architecture selection.

Anthropic (2024). "Building Effective Agents." Anthropic Documentation. Practical guide covering orchestrator-worker, routing, and parallelization patterns for production multi-agent systems.