ROS 2 Integration

Section 24.11

"ROS 2 is the cable that connects 'think' to 'move'. The cable is held together by topic names and good intentions."

PipPip, Topic-Subscriber AI Agent
Big Picture

Every concept in the previous four sections hits the road through ROS 2 (Robot Operating System 2), the de facto middleware for serious robotics in 2026. This section shows how an LLM agent (whether a planner, dispatcher, or VLA wrapper) actually talks to a real robot: which ROS 2 primitives matter (topics, services, actions), which Python patterns work, and where the timing model of ROS 2 collides with the request-response model of LLM APIs.

Prerequisites

This section assumes Python familiarity and the single-robot VLA architecture from Section 24.1. Familiarity with publish/subscribe messaging is useful but not strictly required. The LLM-tool-use pattern is covered in detail later in the book.

24.11.1 The Three ROS 2 Primitives

Fun Fact

ROS (the original) was created at Willow Garage in 2007 as the operating-system substrate for the PR2 robot. ROS 2 was a complete rewrite started in 2014 specifically because ROS 1's pub-sub middleware did not handle multi-robot, multi-machine deployments well; ROS 2 switched to DDS (the Data Distribution Service standard, originally from air-traffic control) for that exact reason. Every LLM-to-ROS-2 bridge in 2026 inherits an aviation-grade messaging substrate.

ROS 2 exposes three interaction patterns that an LLM agent must learn. Topics are publish-subscribe streams (camera frames, joint states, odometry); the agent subscribes and reads at the rate the publisher pushes. Services are synchronous request-reply (compute IK, query semantic map, get battery state); the agent calls and blocks for the response. Actions are long-running goal-oriented operations with feedback and cancellation (navigate to pose, follow trajectory, dock); the agent sends a goal and receives periodic feedback until completion or cancellation.

PatternDirectionDurationCancellableTypical use from LLM
Topic (pub/sub)One-to-many, fire-and-forgetContinuousn/aObserve camera, joint state
ServiceRequest/reply, synchronousMillisecondsNoQuery, compute, get state
ActionGoal with feedback streamSeconds to minutesYesNavigate, manipulate, dock
Table 24.11.1: The three ROS 2 interaction primitives and how an LLM agent uses each. Topics are read; services are called; actions are dispatched and monitored. Most LLM-emitted "tool calls" map to either a service or an action; topics are read in the agent's perception loop, not driven by tool calls.

24.11.2 The Canonical LLM-to-ROS 2 Bridge

The 2026 reference pattern is an "ROS 2 agent" Python class that exposes high-level methods to the LLM (typically as function-calling tools) and translates each method into the appropriate ROS 2 primitive underneath. Below is a minimal but complete example using rclpy:

import rclpy
from rclpy.node import Node
from rclpy.action import ActionClient
from geometry_msgs.msg import PoseStamped
from nav2_msgs.action import NavigateToPose
from sensor_msgs.msg import Image

class LlmRobotAgent(Node):
    """Bridges LLM tool calls into ROS 2 primitives on a Nav2-enabled robot."""

    def __init__(self):
        super().__init__("llm_agent")
        self._latest_image = None
        self._image_sub = self.create_subscription(
            Image, "/camera/image_raw", self._image_cb, 10,
        )
        self._nav_client = ActionClient(self, NavigateToPose, "/navigate_to_pose")

    def _image_cb(self, msg):
        self._latest_image = msg

    # --- Tools exposed to the LLM ---
    def get_camera_image(self) -> bytes:
        """Return the latest camera frame as raw bytes."""
        return bytes(self._latest_image.data) if self._latest_image else b""

    def go_to_pose(self, x: float, y: float, yaw: float) -> bool:
        """Navigate to (x, y, yaw) in the map frame. Returns success boolean."""
        goal = NavigateToPose.Goal()
        goal.pose.header.frame_id = "map"
        goal.pose.pose.position.x = x
        goal.pose.pose.position.y = y
        goal.pose.pose.orientation.w = 1.0  # simplification; real code converts yaw
        self._nav_client.wait_for_server()
        future = self._nav_client.send_goal_async(goal)
        rclpy.spin_until_future_complete(self, future)
        result_future = future.result().get_result_async()
        rclpy.spin_until_future_complete(self, result_future)
        return result_future.result().status == 4   # STATUS_SUCCEEDED
Code Fragment 24.11.1a: A minimal LLM-to-ROS-2 bridge. The image topic is read passively (subscriber callback); the navigation action is dispatched and waited on. The two tools get_camera_image() and go_to_pose() are what the LLM sees; the ROS 2 plumbing is hidden underneath.

24.11.3 The Timing Impedance Mismatch

ROS 2 runs at fixed rates (camera at 30 Hz, joint state at 100 Hz, navigation actions over many seconds). LLM APIs respond in 1-5 seconds per call. The two timing models do not align. A naive bridge that blocks the entire ROS 2 executor while waiting for an LLM response stops publishing topics, which causes downstream nodes (Nav2, perception) to time out. The right pattern is to keep the LLM call off the ROS executor thread, typically via Python's asyncio or a background thread that posts the result back into the executor via a queue.

import asyncio
import threading
from rclpy.executors import MultiThreadedExecutor

# Two threads: ROS spin thread, LLM call thread.
def main():
    rclpy.init()
    agent = LlmRobotAgent()
    executor = MultiThreadedExecutor(num_threads=2)
    executor.add_node(agent)

    spin_thread = threading.Thread(target=executor.spin, daemon=True)
    spin_thread.start()

    while rclpy.ok():
        scene_description = summarize(agent.get_camera_image())
        plan = call_llm(                # this blocks for ~3 seconds
            prompt=f"Goal: tidy the table. Scene: {scene_description}",
            tools=[agent.go_to_pose, agent.pick_up, agent.place],
        )
        for call in plan.tool_calls:
            tool = getattr(agent, call.name)
            tool(**call.args)
Code Fragment 24.11.2: Threading the LLM-ROS bridge correctly. The ROS executor spins in its own thread so topic callbacks keep firing; the main thread issues LLM calls and dispatches the resulting tool calls. Forgetting to separate these threads is the most common bug in academic LLM-on-ROS-2 demos.
Warning: QoS profiles matter

ROS 2 subscribers and publishers specify a "Quality of Service" (QoS) profile that controls reliability, durability, and history depth. The default profile (reliable, volatile, depth 10) works for most general topics but is wrong for camera images (which want best-effort, sensor-data profile) and for one-shot configuration topics (which want transient-local durability). An LLM agent that subscribes with the wrong QoS profile silently misses messages, which surfaces as "the model never sees the camera frames" without any error. The fix is to use the QoS profile that matches the publisher; rclpy.qos.QoSPresetProfiles.SENSOR_DATA.value for camera/IMU/LiDAR, SYSTEM_DEFAULT for most everything else.

24.11.4 The Tool-Calling Glue

Modern LLM tool calling (OpenAI function calling, Anthropic tool use, Gemini function calling) maps cleanly onto ROS 2 services and actions. Each LLM tool corresponds to one ROS 2 endpoint; the tool's JSON schema mirrors the request type's structure. The mapping is mechanical enough that several open-source libraries (rosa, rosbridge-llm, ros2_llm_agent) auto-generate the tool definitions from message .msg and .action files.

# Auto-generating tool definitions from a ROS 2 action.
from nav2_msgs.action import NavigateToPose
from rclpy_llm import action_to_tool

nav_tool = action_to_tool(NavigateToPose, action_name="navigate_to_pose")
print(nav_tool.openai_schema())
# {
#     "name": "navigate_to_pose",
#     "description": "Navigate the robot to a target pose in a given frame.",
#     "parameters": {
#         "type": "object",
#         "properties": {
#             "x": {"type": "number", "description": "X position in frame_id"},
#             "y": {"type": "number", "description": "Y position in frame_id"},
#             "yaw": {"type": "number", "description": "Yaw angle in radians"},
#             "frame_id": {"type": "string", "default": "map"},
#         },
#         "required": ["x", "y", "yaw"],
#     },
# }
Code Fragment 24.11.3: Auto-generating an OpenAI function-calling schema from a ROS 2 action definition. The mechanical mapping (message field to JSON property, comments to descriptions) means a robot with N actions exposes ~N LLM tools without per-tool boilerplate. This is the integration pattern that lets a hobbyist plug an LLM into an existing ROS 2 stack in an afternoon.

24.11.5 Handling Long-Running Actions

Navigation actions take seconds; manipulation actions take longer. The LLM must not block on a 30-second action without producing intermediate feedback to the user. The pattern is to expose three tools per long-running action: start, check status, and cancel. The LLM starts the action, polls status periodically while emitting status updates to the user, and cancels if a new instruction arrives.

# Three-tool wrapper around a long-running ROS 2 action.
class LongActionWrapper:
    def __init__(self, action_client):
        self.client = action_client
        self._active_goal_handle = None

    def start(self, goal) -> str:
        """Start the action. Returns a goal_id the LLM can poll on."""
        future = self.client.send_goal_async(goal)
        future.add_done_callback(self._store_handle)
        return "goal_{future.request.stamp.sec}"

    def status(self, goal_id: str) -> str:
        """Returns one of: pending, executing, succeeded, failed, canceled."""
        if self._active_goal_handle is None:
            return "pending"
        return STATUS_MAP[self._active_goal_handle.status]

    def cancel(self, goal_id: str) -> None:
        """Cancel the in-progress action."""
        if self._active_goal_handle:
            self._active_goal_handle.cancel_goal_async()
Code Fragment 24.11.4: The start/status/cancel triplet for long-running actions. The LLM uses these three tools in a loop: start the action, poll status until succeeded or failed, optionally cancel if the user issues a new instruction. This is the same pattern as polling a long-running cloud job; the only difference is that the "job" is a physical action with real-world consequences.
Real-World Scenario: "Cancel and re-plan"

In a 2025 office-robot demo, the operator told the robot "go to the kitchen, then bring me coffee", but after 5 seconds added "actually, go to my desk first". The LLM agent's correct behavior is: detect the new instruction, call cancel(current_goal_id), then start a new navigation goal to the desk. Getting this right requires the LLM to know which goal_ids are active and to cancel them before issuing a new conflicting action. Forgetting to cancel produced the famous "robot stuck oscillating between two locations" demo failure that circulated on Twitter that summer.

24.11.6 Debugging the Bridge

Three diagnostic tools should be permanently mounted in any LLM-on-ROS-2 deployment. ros2 topic echo on the topics the LLM agent reads, to verify the data is flowing. ros2 action info on the actions the agent dispatches, to verify the action server is registered. A logger that records every LLM tool call with its arguments and the ROS 2 response, so post-mortems on incidents have a complete trace.

Research Frontier: Beyond ROS 2: DDS, Zenoh, and the future

ROS 2 itself is built on DDS (Data Distribution Service), which has its own performance characteristics that LLM agents inherit. Two emerging alternatives matter in 2026. (a) Zenoh (zenoh.io) replaces the DDS layer with a more efficient pub/sub protocol better suited to high-latency wireless links and large fleets. (b) NVIDIA Isaac Sim's reference architecture uses gRPC streaming instead of DDS for the simulation-to-policy interface. The ROS 2 community's response is the Tier-1 Zenoh integration in Humble Hawksbill and successor releases, which lets you swap the DDS layer without changing application code. Expect the underlying middleware to look different by 2028, but the topic/service/action abstraction is stable.

Key Takeaway

Key Insight

ROS 2 exposes three primitives (topics, services, actions) that map cleanly to LLM tool calling. The bridge is mechanical enough to auto-generate; the timing mismatch (LLM at 0.2 Hz, ROS at 30 Hz) requires multi-threading the executor so topic callbacks never block on LLM calls. Long-running actions need a start/status/cancel triplet so the LLM can monitor and abort without freezing. QoS profiles, often overlooked, are the silent cause of "the model never sees the camera frame" bugs.

Self-Check
Q1: An LLM agent subscribes to /camera/image_raw with the default QoS profile and observes that only ~1 in 30 frames arrives. What is the likely cause, and which QoS profile should you use instead?
Show Answer
The default subscriber QoS in ROS 2 is RELIABLE with KEEP_LAST(10), which means the middleware buffers ten frames and refuses to drop. A 30 Hz publisher overflows that buffer in a third of a second, and DDS then backpressures or skips frames depending on configuration; the LLM's slow consumption rate (one frame per few seconds) means it sees stale buffered frames or none at all. The correct profile for high-rate sensor data is SensorDataQoS, which is BEST_EFFORT with KEEP_LAST(5). The trade is that you drop frames the LLM cannot keep up with rather than queue them, so the agent always sees the freshest available frame.
Q2: Sketch a deadlock scenario where the ROS 2 executor and the LLM call thread share the same Python interpreter and both try to acquire self._lock. How does the multi-threaded executor pattern from Code Fragment 24.11.2a avoid this?
Show Answer
A single-threaded executor runs all topic callbacks on one thread. The LLM-call thread holds self._lock while waiting for the synchronous API response (tens of seconds). A topic callback fires in the executor thread, tries to update shared state, blocks on self._lock, and the executor stalls; meanwhile the LLM response writes a tool call that publishes on the now-blocked publisher, which never executes, and the original LLM call appears to hang. The multi-threaded executor allocates a thread pool so the callback runs in a different worker while the LLM thread waits; self._lock still serializes shared-state updates, but the executor never blocks. The deadlock disappears because the two waiting paths no longer share a single OS thread.
Q3: Your LLM agent calls go_to_pose(0, 0, 0) and the action server replies with status STATUS_ABORTED. List three ROS 2 diagnostic commands you would run to identify the cause.
Show Answer
First, ros2 action info /go_to_pose -t to confirm the action server is alive and lists the expected goal/feedback/result types; abort with no server is silent and looks identical to a real abort. Second, ros2 topic echo /tf and ros2 topic echo /odom to confirm the robot has a valid pose estimate; a missing or jumping TF tree is the most common root cause of pose-action aborts. Third, ros2 node info /go_to_pose_server to inspect the action server's parameters and subscriptions, then check the server's log via ros2 run rqt_console rqt_console for the abort reason string, which servers typically publish but agents do not propagate. These three together resolve the majority of STATUS_ABORTED responses without touching application code.
What's Next

Continue to Section 24.12: Comparing the Planners.

With ROS 2 integration in hand you have a complete LLM-to-robot stack. Section 24.12 revisits SayCan, Code-as-Policies, VoxPoser, and the dispatcher pattern side by side, with a comparison matrix that maps each onto the deployment scenarios where it dominates.

Further Reading
Macenski, S., et al. (2022). Robot Operating System 2: Design, Architecture, and Uses in the Wild. "Science Robotics".
Macenski, S., et al. (2020). The Marathon 2: A Navigation System (Nav2). "IROS 2020, arXiv:2003.00368".
Open Source Robotics Foundation. (2024). ROS 2 Humble Hawksbill Documentation. "docs.ros.org".
Object Management Group. (2015). Data Distribution Service for Real-Time Systems (DDS) Specification. "OMG".
Corradi, A., et al. (2024). Zenoh: A New Pub/Sub Protocol for the IoT. "Eclipse Foundation".
Ahn, M., et al. (2024). AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents. "arXiv:2401.12963".