Section 24.10: Multi-Robot Dispatch via Shared LLM

"One LLM, ten robots, and an open question about whose latency budget is whose."
Sched, Coordinator-In-Chief AI Agent

Big Picture

The single-robot, single-LLM-planner stack from Sections 39.1-39.3 scales surprisingly poorly to multiple robots. The naive extension (run one LLM planner per robot) produces uncoordinated chaos: two robots simultaneously decide to grasp the same cup. The dispatch pattern dominant in 2026 is different: one shared LLM observes the entire fleet, assigns tasks, and arbitrates conflicts via natural-language task tokens. This section walks through the dispatch architecture, the failure modes it introduces, and where the pure-LLM dispatch breaks against classical multi-agent coordination algorithms.

Prerequisites

This section assumes the single-robot VLA patterns from Section 24.1 through Section 24.5. Multi-agent coordination vocabulary is covered in detail later in the book.

24.10.1 The Shared-LLM Dispatcher

The minimal multi-robot dispatch architecture has three components: a fleet of robots with individual VLA executors (Chapter 24), a shared LLM dispatcher that observes the global state, and a message bus over which the dispatcher streams natural-language task assignments to each robot. The dispatcher's job is to receive the human's high-level instruction ("set the table for four people"), the current state of each robot (position, what it is carrying, current sub-goal), and emit per-robot task assignments at each planning cycle.

from dataclasses import dataclass, field

@dataclass
class RobotState:
    robot_id: str
    position_xy: tuple[float, float]
    held_object: str | None = None
    current_task: str | None = None
    battery_pct: float = 100.0
    capabilities: list[str] = field(default_factory=list)

class FleetDispatcher:
    def __init__(self, llm):
        self.llm = llm

    def assign(self, goal: str, fleet: list[RobotState]) -> dict[str, str]:
        prompt = self._build_prompt(goal, fleet)
        response = self.llm.complete(prompt, response_format="json")
        # response is a dict mapping robot_id to a sub-instruction string
        return response

    def _build_prompt(self, goal, fleet):
        fleet_str = "\n".join(
            f"  {r.robot_id} at {r.position_xy}, holding={r.held_object}, can={r.capabilities}"
            for r in fleet
        )
        return f"""You are dispatching a fleet of {len(fleet)} robots.
Goal: {goal}
Current fleet state:
{fleet_str}

Assign each robot a one-sentence sub-task. Two robots must not target the same object.
If a robot has no useful task, assign "idle".
Output strictly as JSON mapping robot_id to sub-task string.
"""

Code Fragment 24.10.1: The minimal shared-LLM dispatcher in ~35 lines. The LLM sees the entire fleet, the global goal, and per-robot capabilities, then emits a JSON task assignment. The "two robots must not target the same object" constraint is the lever that prevents the most common multi-robot pathology.

Key Insight

Coordination as natural-language constraint satisfaction

The dispatcher pattern treats multi-robot coordination as a constraint-satisfaction problem solved by a chat model. The constraints are spelled out in the prompt ("two robots must not target the same object"). The LLM's natural-language reasoning capabilities turn out to be a surprisingly competent constraint solver for small fleets (under ~10 robots, under ~5 simultaneously contested resources). It is not optimal in the operations-research sense, but it gives plans that humans agree with and that handle ambiguity gracefully, which is the actual deployment-friendly property.

24.10.2 When the LLM Dispatcher Breaks

The pure-LLM dispatch architecture has three structural failure modes that any real deployment must handle.

Fleet size. Above ~10 robots the LLM's context window starts to struggle. Each robot's state takes 50-100 tokens; ten robots is 500-1000 tokens; the working state plus the dispatcher's reasoning takes a non-trivial fraction of the context budget. At 50 robots the dispatch latency climbs to 5+ seconds per cycle, which is incompatible with reactive control.

Conflicting objectives. Multi-robot dispatch in practice has trade-offs: balance load across robots, minimize travel distance, respect battery levels, prefer specialist robots for specialized tasks. A pure LLM can reason about these qualitatively but does not produce Pareto-optimal assignments; classical assignment algorithms (Hungarian algorithm, Auction methods) do better when the trade-offs are well-defined.

Real-time constraints. If two robots are converging on the same cup and one will arrive 200 ms before the other, neither classical nor LLM dispatch can re-coordinate fast enough. The 1-2 Hz dispatcher cadence is fine for high-level task assignment but does not handle the reactive collision-avoidance problem.

Coordination problem	LLM dispatch	Classical (Hungarian / Auction)	Best practice
Task assignment (5 robots, 3 tasks)	Excellent	Excellent	LLM (more flexible)
Task assignment (50 robots, 30 tasks)	Poor (context blow-up)	Excellent	Classical
Reactive collision avoidance	Poor (latency)	OK	Dedicated reactive layer
Goal interpretation ("set the table")	Excellent	Impossible	LLM
Battery-aware scheduling	OK (with explicit instruction)	Excellent	Classical, LLM as fallback
Novel-object assignment	Excellent (perception-driven)	Poor (needs ID mapping)	LLM

Figure 24.10.1a: LLM dispatch versus classical multi-robot task assignment, by problem type. The hybrid pattern (LLM at the top for goal interpretation, classical at the bottom for optimization) is what production fleets actually use in 2026.

24.10.3 The Hybrid Pattern: LLM at the Top, Classical Below

2026 production fleets settle on a layered architecture. The LLM dispatcher handles two responsibilities: parsing the human's natural-language goal into a structured task list, and assigning each task to a robot capability class ("this needs a mobile manipulator", "this needs a quadruped"). A classical optimizer then handles the within-class assignment, using Hungarian or Auction methods to minimize travel distance subject to per-robot capacity and battery constraints. The two layers communicate via a structured task description format (JSON, typically).

# Hybrid two-layer dispatch.
from scipy.optimize import linear_sum_assignment

class HybridDispatcher:
    def __init__(self, llm):
        self.llm = llm

    def dispatch(self, goal, fleet):
        # Layer 1: LLM parses goal into typed tasks.
        tasks = self.llm.complete(
            f"Decompose '{goal}' into atomic tasks. "
            f"Tag each with required_capability and target_object. JSON output.",
            response_format="json",
        )
        # tasks = [{"description": ..., "required_capability": ..., "target_object": ...}, ...]

        # Layer 2: classical Hungarian assignment within capability classes.
        assignments = {}
        for cap in _distinct_capabilities(tasks):
            cap_tasks = [t for t in tasks if t["required_capability"] == cap]
            cap_robots = [r for r in fleet if cap in r.capabilities]
            cost = _build_cost_matrix(cap_robots, cap_tasks)
            row, col = linear_sum_assignment(cost)
            for i, j in zip(row, col):
                assignments[cap_robots[i].robot_id] = cap_tasks[j]["description"]
        return assignments

Code Fragment 24.10.2: The hybrid two-layer dispatcher. LLM handles natural-language decomposition; SciPy's linear_sum_assignment handles the cost-minimizing assignment. This is the architecture deployed by Amazon Robotics for warehouse dispatch and by AutoRT for office-environment task assignment in 2025.

Warning: The LLM is not a real-time component

A dispatched task usually takes 5-60 seconds to execute on the robot. The dispatcher itself runs at 0.2-1 Hz (one LLM call every 1-5 seconds). Conflicts that arise faster than that (two robots simultaneously decide to drive through the same doorway, both moving at 1 m/s) cannot be resolved by the LLM dispatcher; they need a dedicated reactive layer running at 10-50 Hz. Failing to add that layer is the most common mistake in academic multi-robot LLM systems, and it is why such systems often look great in demos but collide in real deployments.

24.10.4 The Coordination Protocol: Tokens on a Shared Channel

Beyond task assignment, robots in a fleet need to communicate runtime events: "I have picked up cup A", "I have failed to grasp", "I need to recharge". The 2026 pattern is to expose this as a structured event stream that all robots read and write, with the dispatcher LLM consuming the event log as part of its prompt context. Events are typed (TaskCompleted, TaskFailed, ObjectGrasped, ObjectReleased) and addressed (which robot emitted them).

# The event protocol for inter-robot coordination.
import enum

class EventType(enum.Enum):
    TASK_STARTED = "task_started"
    TASK_COMPLETED = "task_completed"
    TASK_FAILED = "task_failed"
    OBJECT_CLAIMED = "object_claimed"
    OBJECT_RELEASED = "object_released"
    HELP_REQUESTED = "help_requested"

@dataclass
class FleetEvent:
    event_type: EventType
    robot_id: str
    timestamp: float
    payload: dict            # task description, object id, etc.

# The dispatcher polls the event bus each cycle and re-prompts the LLM with new events.
def dispatch_cycle(dispatcher, event_bus, fleet, goal):
    events = event_bus.drain()
    new_assignments = dispatcher.assign(goal, fleet, recent_events=events)
    for robot_id, subtask in new_assignments.items():
        send_to_robot(robot_id, subtask)

Code Fragment 24.10.3: The event-driven coordination protocol. Each robot publishes typed events to a shared bus (typically Redis Streams or NATS in production); the dispatcher drains the bus each cycle and feeds the events into the LLM's prompt. Object claiming is the most safety-critical event: it prevents two robots from racing for the same target.

24.10.5 Emergent Behaviors

An interesting empirical observation from 2024-2025 multi-robot LLM deployments: the dispatcher LLM exhibits emergent role-assignment behavior without being prompted for it. Given a fleet of three identical robots and the goal "set the table for dinner", the dispatcher consistently assigns one robot to "fetch plates", one to "fetch utensils", and one to "place napkins", even when the prompt does not mention role specialization. This emerges from the LLM's prior over how multi-agent tasks are normally divided in human descriptions of teamwork. It is empirically useful but somewhat unpredictable, and is one reason the "LLM at the top, classical below" hybrid is preferred over pure LLM dispatch for safety-critical fleets.

Key Insight: Why the LLM Knows To Split the Roles

The aha: emergent role specialization is not the LLM "discovering" division of labor; it is the LLM reading aloud a distribution it absorbed from pretraining. Cookbooks, wedding-planning blogs, project-management docs, and operations manuals already describe multi-agent tasks as "one person fetches X, another sets Y." When you tell the dispatcher "set the table with three robots," it samples from that prior. The behavior looks like reasoning about cooperation; mechanistically it is conditional text completion. That is exactly why it is brittle in safety-critical fleets: change one word in the goal and the prior may pattern-match to a different document type where roles are not split.

Fun Fact: The "lazy robot" problem

When the LLM dispatcher must assign tasks to N robots but only M < N tasks exist, it sometimes outputs "robot_3: idle, wait for further instructions". This is fine in research demos but produces an awkward UX in actual deployments where a customer sees a robot just standing in their kitchen doing nothing. The fix is to add explicit prompting ("if no task is available, return to charging dock") or to bake in a default behavior at the robot side. In one 2025 office-robot deployment, the lack of this fix caused a sequence of internal memes about robots that "looked existentially uncertain" while the dispatcher decided what to do with them.

24.10.6 The Decentralized Alternative

The dispatcher architecture is centralized: a single LLM observes the whole fleet. The decentralized alternative is to run a small LLM on each robot, with peer-to-peer communication via natural-language messages. Each robot decides its own next action based on its local view plus messages from neighbors. This scales better to large fleets but loses the global-optimality property of centralized dispatch.

The decentralized pattern was pioneered by the Stanford SMART-LLM work (Kannan et al., 2024) and the RoCo system (Mandi et al., 2024). It works well for fleets above ~20 robots where centralized dispatch hits context-window limits, and for fleets that operate in poorly-connected environments where a central network is unavailable (search-and-rescue, agriculture). In 2026 the trade-off is well understood: centralized for less than 20 robots, decentralized above 50, hybrid in between.

Research Frontier: The Byzantine problem

A misbehaving robot in a decentralized fleet can propagate bad information to its neighbors, causing cascading failures. The robotics community is increasingly drawing on distributed systems literature (Paxos, Raft, Byzantine fault tolerance) to handle this. The 2025 SMART-LLM-BFT paper introduces a quorum-vote protocol where each task assignment requires three independent LLM agents to agree, which mitigates the impact of a single compromised or malfunctioning robot. This is structurally a port of consensus algorithms from databases to embodied agents, and is the kind of cross-pollination that makes the field interesting in 2026.

Key Takeaway

Key Insight

A shared LLM dispatcher coordinates a fleet by receiving the global goal plus per-robot state, then emitting natural-language task assignments. The pattern works well below 10 robots and breaks above 50; the production solution is a hybrid where the LLM does natural-language decomposition and a classical assignment algorithm (Hungarian, Auction) handles the within-class optimization. Real-time conflicts need a separate reactive layer; the LLM is not a real-time component.

Self-Check

Q1: Why does the pure-LLM dispatch architecture break above ~10 robots? Tie your answer to a specific resource constraint of modern LLM APIs.

Show Answer

A pure-LLM dispatcher writes every robot's pose, sensor digest, and task state into the prompt at every dispatch cycle. With fifty robots the prompt grows past 50,000 tokens of state, well into the regime where context-length pricing, prefill latency, and attention-quality degradation all bite. The end-to-end dispatch cycle balloons from one second to fifteen, which is incompatible with reactive fleet coordination. The harder limit is rate: a single dispatcher cannot make hundreds of well-formed decisions per second on a 0.2 Hz inference budget, while fleet load grows linearly with robot count. Above ten robots you must either shard the dispatcher (which loses global optimality) or offload the per-robot decisions to a different layer.

Q2: Sketch the hybrid two-layer dispatcher: which decisions does the LLM make, which does the Hungarian assignment make, and what is the interface format between them?

Show Answer

The LLM does language understanding and high-level task decomposition: it consumes the human-language goal ("clean up the cafeteria after lunch") plus a coarse fleet summary, and emits a typed list of tasks with hard constraints (capability, location, deadline) and soft scores (preferred robot type). The Hungarian or auction-based assignment algorithm consumes that list plus the precise per-robot cost matrix (current pose, battery, current task) and returns the cost-minimal one-to-one matching in O(n^3) time, deterministically. The interface format is a JSON array of {task_id, capability_required, position, priority} objects on one side and a {robot_id, task_id} assignment list on the return; the LLM never sees per-robot positions in detail and the assigner never sees the natural-language goal.

Q3: Two robots in your fleet are converging on the same cup. One arrives in 2 seconds, the other in 2.5 seconds. Which layer of the multi-robot stack should resolve this conflict, and why does the LLM dispatcher fail to do so?

Show Answer

The reactive collision-avoidance layer (typically ORCA, Velocity Obstacles, or buffered-Voronoi planners running on each robot at 30-100 Hz) must resolve this. The LLM dispatcher fails because its loop runs at 0.2 to 1 Hz, far too slow to react to a half-second contention window; by the time the dispatcher notices the conflict and emits a reassignment, both robots have already arrived at the cup. The standard pattern reserves the LLM for "who does what" decisions on the second-to-minute timescale and delegates "how do you avoid each other right now" to a deterministic real-time layer with hard safety guarantees.

What's Next

Continue to Section 24.11: ROS 2 Integration.

Section 24.11 drops into the implementation details: how does an LLM agent actually talk to a ROS 2 robot? The integration patterns are surprisingly close to the function-calling patterns from Part VI, but with the wire format and timing semantics that ROS 2 brings.

Further Reading

Kannan, S. S., et al. (2024). SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models. "IROS 2024, arXiv:2309.10062".

Mandi, Z., et al. (2024). RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. "ICRA 2024, arXiv:2307.04738".

Ahn, M., et al. (2024). AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents. "arXiv:2401.12963".

Brohan, A., et al. (2023). RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. "arXiv:2307.15818".

Kuhn, H. W. (1955). The Hungarian Method for the Assignment Problem. "Naval Research Logistics Quarterly".

Lamport, L. (1998). The Part-Time Parliament (Paxos). "ACM TOCS".