"One LLM, ten robots, and an open question about whose latency budget is whose."
Sched, Coordinator-In-Chief AI Agent
The single-robot, single-LLM-planner stack from Sections 39.1-39.3 scales surprisingly poorly to multiple robots. The naive extension (run one LLM planner per robot) produces uncoordinated chaos: two robots simultaneously decide to grasp the same cup. The dispatch pattern dominant in 2026 is different: one shared LLM observes the entire fleet, assigns tasks, and arbitrates conflicts via natural-language task tokens. This section walks through the dispatch architecture, the failure modes it introduces, and where the pure-LLM dispatch breaks against classical multi-agent coordination algorithms.
Prerequisites
This section assumes the single-robot VLA patterns from Section 24.1 through Section 24.5. Multi-agent coordination vocabulary is covered in detail later in the book.
24.10.1 The Shared-LLM Dispatcher
The minimal multi-robot dispatch architecture has three components: a fleet of robots with individual VLA executors (Chapter 24), a shared LLM dispatcher that observes the global state, and a message bus over which the dispatcher streams natural-language task assignments to each robot. The dispatcher's job is to receive the human's high-level instruction ("set the table for four people"), the current state of each robot (position, what it is carrying, current sub-goal), and emit per-robot task assignments at each planning cycle.
from dataclasses import dataclass, field
@dataclass
class RobotState:
robot_id: str
position_xy: tuple[float, float]
held_object: str | None = None
current_task: str | None = None
battery_pct: float = 100.0
capabilities: list[str] = field(default_factory=list)
class FleetDispatcher:
def __init__(self, llm):
self.llm = llm
def assign(self, goal: str, fleet: list[RobotState]) -> dict[str, str]:
prompt = self._build_prompt(goal, fleet)
response = self.llm.complete(prompt, response_format="json")
# response is a dict mapping robot_id to a sub-instruction string
return response
def _build_prompt(self, goal, fleet):
fleet_str = "\n".join(
f" {r.robot_id} at {r.position_xy}, holding={r.held_object}, can={r.capabilities}"
for r in fleet
)
return f"""You are dispatching a fleet of {len(fleet)} robots.
Goal: {goal}
Current fleet state:
{fleet_str}
Assign each robot a one-sentence sub-task. Two robots must not target the same object.
If a robot has no useful task, assign "idle".
Output strictly as JSON mapping robot_id to sub-task string.
"""
The dispatcher pattern treats multi-robot coordination as a constraint-satisfaction problem solved by a chat model. The constraints are spelled out in the prompt ("two robots must not target the same object"). The LLM's natural-language reasoning capabilities turn out to be a surprisingly competent constraint solver for small fleets (under ~10 robots, under ~5 simultaneously contested resources). It is not optimal in the operations-research sense, but it gives plans that humans agree with and that handle ambiguity gracefully, which is the actual deployment-friendly property.
24.10.2 When the LLM Dispatcher Breaks
The pure-LLM dispatch architecture has three structural failure modes that any real deployment must handle.
Fleet size. Above ~10 robots the LLM's context window starts to struggle. Each robot's state takes 50-100 tokens; ten robots is 500-1000 tokens; the working state plus the dispatcher's reasoning takes a non-trivial fraction of the context budget. At 50 robots the dispatch latency climbs to 5+ seconds per cycle, which is incompatible with reactive control.
Conflicting objectives. Multi-robot dispatch in practice has trade-offs: balance load across robots, minimize travel distance, respect battery levels, prefer specialist robots for specialized tasks. A pure LLM can reason about these qualitatively but does not produce Pareto-optimal assignments; classical assignment algorithms (Hungarian algorithm, Auction methods) do better when the trade-offs are well-defined.
Real-time constraints. If two robots are converging on the same cup and one will arrive 200 ms before the other, neither classical nor LLM dispatch can re-coordinate fast enough. The 1-2 Hz dispatcher cadence is fine for high-level task assignment but does not handle the reactive collision-avoidance problem.
| Coordination problem | LLM dispatch | Classical (Hungarian / Auction) | Best practice |
|---|---|---|---|
| Task assignment (5 robots, 3 tasks) | Excellent | Excellent | LLM (more flexible) |
| Task assignment (50 robots, 30 tasks) | Poor (context blow-up) | Excellent | Classical |
| Reactive collision avoidance | Poor (latency) | OK | Dedicated reactive layer |
| Goal interpretation ("set the table") | Excellent | Impossible | LLM |
| Battery-aware scheduling | OK (with explicit instruction) | Excellent | Classical, LLM as fallback |
| Novel-object assignment | Excellent (perception-driven) | Poor (needs ID mapping) | LLM |
24.10.3 The Hybrid Pattern: LLM at the Top, Classical Below
2026 production fleets settle on a layered architecture. The LLM dispatcher handles two responsibilities: parsing the human's natural-language goal into a structured task list, and assigning each task to a robot capability class ("this needs a mobile manipulator", "this needs a quadruped"). A classical optimizer then handles the within-class assignment, using Hungarian or Auction methods to minimize travel distance subject to per-robot capacity and battery constraints. The two layers communicate via a structured task description format (JSON, typically).
# Hybrid two-layer dispatch.
from scipy.optimize import linear_sum_assignment
class HybridDispatcher:
def __init__(self, llm):
self.llm = llm
def dispatch(self, goal, fleet):
# Layer 1: LLM parses goal into typed tasks.
tasks = self.llm.complete(
f"Decompose '{goal}' into atomic tasks. "
f"Tag each with required_capability and target_object. JSON output.",
response_format="json",
)
# tasks = [{"description": ..., "required_capability": ..., "target_object": ...}, ...]
# Layer 2: classical Hungarian assignment within capability classes.
assignments = {}
for cap in _distinct_capabilities(tasks):
cap_tasks = [t for t in tasks if t["required_capability"] == cap]
cap_robots = [r for r in fleet if cap in r.capabilities]
cost = _build_cost_matrix(cap_robots, cap_tasks)
row, col = linear_sum_assignment(cost)
for i, j in zip(row, col):
assignments[cap_robots[i].robot_id] = cap_tasks[j]["description"]
return assignments
linear_sum_assignment handles the cost-minimizing assignment. This is the architecture deployed by Amazon Robotics for warehouse dispatch and by AutoRT for office-environment task assignment in 2025.A dispatched task usually takes 5-60 seconds to execute on the robot. The dispatcher itself runs at 0.2-1 Hz (one LLM call every 1-5 seconds). Conflicts that arise faster than that (two robots simultaneously decide to drive through the same doorway, both moving at 1 m/s) cannot be resolved by the LLM dispatcher; they need a dedicated reactive layer running at 10-50 Hz. Failing to add that layer is the most common mistake in academic multi-robot LLM systems, and it is why such systems often look great in demos but collide in real deployments.
24.10.4 The Coordination Protocol: Tokens on a Shared Channel
Beyond task assignment, robots in a fleet need to communicate runtime events: "I have picked up cup A", "I have failed to grasp", "I need to recharge". The 2026 pattern is to expose this as a structured event stream that all robots read and write, with the dispatcher LLM consuming the event log as part of its prompt context. Events are typed (TaskCompleted, TaskFailed, ObjectGrasped, ObjectReleased) and addressed (which robot emitted them).
# The event protocol for inter-robot coordination.
import enum
class EventType(enum.Enum):
TASK_STARTED = "task_started"
TASK_COMPLETED = "task_completed"
TASK_FAILED = "task_failed"
OBJECT_CLAIMED = "object_claimed"
OBJECT_RELEASED = "object_released"
HELP_REQUESTED = "help_requested"
@dataclass
class FleetEvent:
event_type: EventType
robot_id: str
timestamp: float
payload: dict # task description, object id, etc.
# The dispatcher polls the event bus each cycle and re-prompts the LLM with new events.
def dispatch_cycle(dispatcher, event_bus, fleet, goal):
events = event_bus.drain()
new_assignments = dispatcher.assign(goal, fleet, recent_events=events)
for robot_id, subtask in new_assignments.items():
send_to_robot(robot_id, subtask)
24.10.5 Emergent Behaviors
An interesting empirical observation from 2024-2025 multi-robot LLM deployments: the dispatcher LLM exhibits emergent role-assignment behavior without being prompted for it. Given a fleet of three identical robots and the goal "set the table for dinner", the dispatcher consistently assigns one robot to "fetch plates", one to "fetch utensils", and one to "place napkins", even when the prompt does not mention role specialization. This emerges from the LLM's prior over how multi-agent tasks are normally divided in human descriptions of teamwork. It is empirically useful but somewhat unpredictable, and is one reason the "LLM at the top, classical below" hybrid is preferred over pure LLM dispatch for safety-critical fleets.
The aha: emergent role specialization is not the LLM "discovering" division of labor; it is the LLM reading aloud a distribution it absorbed from pretraining. Cookbooks, wedding-planning blogs, project-management docs, and operations manuals already describe multi-agent tasks as "one person fetches X, another sets Y." When you tell the dispatcher "set the table with three robots," it samples from that prior. The behavior looks like reasoning about cooperation; mechanistically it is conditional text completion. That is exactly why it is brittle in safety-critical fleets: change one word in the goal and the prior may pattern-match to a different document type where roles are not split.
When the LLM dispatcher must assign tasks to N robots but only M < N tasks exist, it sometimes outputs "robot_3: idle, wait for further instructions". This is fine in research demos but produces an awkward UX in actual deployments where a customer sees a robot just standing in their kitchen doing nothing. The fix is to add explicit prompting ("if no task is available, return to charging dock") or to bake in a default behavior at the robot side. In one 2025 office-robot deployment, the lack of this fix caused a sequence of internal memes about robots that "looked existentially uncertain" while the dispatcher decided what to do with them.
24.10.6 The Decentralized Alternative
The dispatcher architecture is centralized: a single LLM observes the whole fleet. The decentralized alternative is to run a small LLM on each robot, with peer-to-peer communication via natural-language messages. Each robot decides its own next action based on its local view plus messages from neighbors. This scales better to large fleets but loses the global-optimality property of centralized dispatch.
The decentralized pattern was pioneered by the Stanford SMART-LLM work (Kannan et al., 2024) and the RoCo system (Mandi et al., 2024). It works well for fleets above ~20 robots where centralized dispatch hits context-window limits, and for fleets that operate in poorly-connected environments where a central network is unavailable (search-and-rescue, agriculture). In 2026 the trade-off is well understood: centralized for less than 20 robots, decentralized above 50, hybrid in between.
A misbehaving robot in a decentralized fleet can propagate bad information to its neighbors, causing cascading failures. The robotics community is increasingly drawing on distributed systems literature (Paxos, Raft, Byzantine fault tolerance) to handle this. The 2025 SMART-LLM-BFT paper introduces a quorum-vote protocol where each task assignment requires three independent LLM agents to agree, which mitigates the impact of a single compromised or malfunctioning robot. This is structurally a port of consensus algorithms from databases to embodied agents, and is the kind of cross-pollination that makes the field interesting in 2026.
Key Takeaway
A shared LLM dispatcher coordinates a fleet by receiving the global goal plus per-robot state, then emitting natural-language task assignments. The pattern works well below 10 robots and breaks above 50; the production solution is a hybrid where the LLM does natural-language decomposition and a classical assignment algorithm (Hungarian, Auction) handles the within-class optimization. Real-time conflicts need a separate reactive layer; the LLM is not a real-time component.
Show Answer
Show Answer
{task_id, capability_required, position, priority} objects on one side and a {robot_id, task_id} assignment list on the return; the LLM never sees per-robot positions in detail and the assigner never sees the natural-language goal.Show Answer
Continue to Section 24.11: ROS 2 Integration.
Section 24.11 drops into the implementation details: how does an LLM agent actually talk to a ROS 2 robot? The integration patterns are surprisingly close to the function-calling patterns from Part VI, but with the wire format and timing semantics that ROS 2 brings.