"When the planner is the programmer, every robot is a chat-completion away from a new behavior."
Tensor, Compile-And-Run AI Agent
Code-as-Policies (Liang et al., 2023, arXiv:2209.07753) generalized SayCan by replacing "rank a skill from a fixed list" with "write Python code that uses skills as function calls". The LLM emits an executable program; the robot runtime executes it line by line, with the LLM's program calling into a library of perception and motor primitives. This unlocks loops, conditionals, recursion, and arbitrary composition that SayCan's flat skill ranking cannot express. As of 2026 it is the dominant paradigm for high-level planning in research robotics and an increasingly common pattern in production.
Prerequisites
This section assumes the LLM-as-planner pattern from Section 24.7 and basic Python control-flow fluency. Tool-use and code-generation patterns are covered in detail later in the book.
24.8.1 The Paradigm Shift: Plan as Program
Code-as-Policies (Liang et al., 2022) was developed at Google Brain Robotics and treated robot programs as Python. The team reportedly debated whether to use a domain-specific language and settled on plain Python because "LLMs already know Python"; this turned out to be the correct call by a wide margin. The same team later folded into Google DeepMind's broader robotics agent work, which is why Gemini Robotics in 2026 still feels structurally like a Code-as-Policies descendant.
SayCan's plan is a list of skill names: ["go to kitchen", "pick up Coke", "go to user", "place Coke"]. Code-as-Policies' plan is a Python program:
# LLM-generated plan for "bring me a Coke and clean up the spill"
go_to("kitchen")
coke = find_object("Coke can")
pick_up(coke)
sponge = find_object("sponge")
pick_up(sponge)
go_to("user")
place(coke, "table")
spill = find_region("spill on table")
wipe(sponge, spill)
The shift looks small but is structurally profound. The LLM is no longer producing natural-language steps that a separate module must interpret. It is producing source code that runs against an API. The runtime executes the code; the LLM's job is the same as a programmer's: pick the right function calls, in the right order, with the right arguments.
Code-as-Policies makes the robot indistinguishable from any other API-based system. The LLM's job is to write a Python script against a fixed library; the library happens to actuate motors and read cameras instead of querying a database. Every tool-use pattern from Part VI, function calling, MCP servers, ReAct, applies unchanged. This is why the team that built Code-as-Policies (Google Brain Robotics) later folded into the broader function-calling agent community: the technical problem is the same. The robot is an agent whose tools are physical actions.
24.8.2 The Skill Library as an API Surface
The skill library in Code-as-Policies is a Python API. Each skill is a function the LLM can call. The API is documented in the LLM's prompt as docstrings, so the LLM knows what functions exist and what arguments they take. Designing this API is the load-bearing engineering work: the API is the bridge between language and motor commands.
from typing import Optional
def go_to(location: str) -> None:
"""Navigate the robot to a named location.
Valid locations include: kitchen, living_room, bedroom, hallway, charging_dock.
Raises LocationNotFoundError if the location is unknown.
"""
def find_object(description: str, timeout: float = 5.0) -> ObjectHandle:
"""Visually locate the object matching the natural-language description.
Returns a handle that can be passed to pick_up(), place(), etc.
Raises ObjectNotFoundError if no matching object is visible within timeout.
"""
def pick_up(obj: ObjectHandle) -> None:
"""Grasp the object and lift it off its current support surface.
Raises GraspFailedError on failure.
"""
def place(obj: ObjectHandle, target: str) -> None:
"""Place the currently-grasped object at a named target (table, shelf, etc.)."""
def wipe(tool: ObjectHandle, region: RegionHandle) -> None:
"""Wipe a 2D region on a flat surface using the held tool (e.g. a sponge)."""
24.8.3 Runtime: LLM as a Just-in-Time Programmer
The Code-as-Policies runtime gives the LLM the instruction, the skill API (as a stub-only module), a small number of in-context examples, and asks it to emit a Python program. The runtime then executes the program in a sandboxed environment that has the real skill implementations available. Crucially, when a skill raises an exception (object not found, grasp failed), control returns to the LLM, which can examine the exception and emit a corrective program. This is the same retry-with-error-feedback pattern from Chapter 27 on tool use, applied to robotics.
import subprocess
import textwrap
class CodeAsPolicyRuntime:
def __init__(self, llm, skill_module, max_retries=3):
self.llm = llm
self.skills = skill_module
self.max_retries = max_retries
def run(self, instruction: str, scene_description: str):
prompt = self._build_prompt(instruction, scene_description)
for attempt in range(self.max_retries):
program = self.llm.complete(prompt)
try:
exec(program, {**self.skills.__dict__})
return True
except Exception as e:
prompt += f"\n# Attempt {attempt+1} raised: {type(e).__name__}: {e}\n# Please write a corrected program:\n"
return False
def _build_prompt(self, instruction, scene):
return textwrap.dedent(f"""
You are programming a household robot. Available functions:
{self._skill_docs()}
Current scene: {scene}
Instruction: {instruction}
Write a Python program that completes the instruction. Use only the functions listed above.
Output the program only, no explanation.
""")
Executing LLM-generated code on the same Python interpreter that controls a 30 kg robot is a security and safety nightmare. Production Code-as-Policies deployments run the LLM-generated program in a tightly restricted sandbox: no network, no filesystem access, only the skill API in scope, no exec / eval / __import__ nested calls. The sandbox is typically a separate process with the skill API exposed over gRPC. Mistakes here have already caused real incidents in academic-lab deployments; do not skip this layer.
24.8.4 What Control Flow Buys You
The leverage Code-as-Policies has over SayCan comes from Python's control structures. Three patterns matter:
Loops over object collections. "Put all the red blocks in the bin" becomes:
blocks = find_all_objects("red block")
for b in blocks:
pick_up(b)
place(b, "bin")
which a SayCan ranker would have to expand into ad-hoc replanning at each step.
Conditional branches. "If the door is open, walk through; otherwise, open it first" becomes:
if not is_door_open("front door"):
open_door("front door")
go_to("outside")
Helper functions. Complex tasks can be decomposed into helper functions the LLM defines on the fly:
def stack_on(block, target):
pick_up(block)
place(block, target)
blocks = find_all_objects("colored block")
tower_base = blocks[0]
for b in blocks[1:]:
stack_on(b, tower_base)
tower_base = b
The canonical Code-as-Policies demo. Given a scene with five dishes of varying sizes, the LLM emits a program that calls find_all_objects("dish"), sorts the list by an estimated bounding-box size, and iterates through the sorted list to place each dish in a vertical stack. A flat SayCan ranker cannot express "sort by size and iterate"; it would have to enumerate every permutation, scoring each. Code-as-Policies expresses the algorithm in three lines and the LLM compiler emits exactly the program a competent intern would write.
24.8.5 The Failure Modes
Code-as-Policies has predictable failure modes that SayCan does not. The LLM can call functions that do not exist, pass arguments in the wrong order, or write programs that loop infinitely. The runtime catches these by wrapping the executor in exception handlers, timeout, and a hard call budget. Three failure categories dominate in 2026 deployments:
| Failure category | Example | Mitigation |
|---|---|---|
| Hallucinated function | Calls vacuum_floor() when no such skill exists | Static check before exec; error feedback retry |
| Wrong argument type | Passes a str where ObjectHandle is expected | Python typing + runtime isinstance check |
| Infinite loop | "While not done: do_something()" without progress | Wall-clock timeout, max-iteration budget |
| Unsafe skill composition | Drops object while still moving, no place() call | Skill-level preconditions; safety wrapper from 39.6 |
| Misinterpretation | "Clean the kitchen" emits code that throws away clean dishes | Confirmation prompt; intent verification by a second LLM |
24.8.6 The 2026 Evolution: Structured Output and Tool Calling
The pure "LLM emits Python" pattern has matured into a structured-output pattern in 2026. Instead of free-form code, the LLM emits a JSON tool-call sequence that is interpreted by the runtime. The JSON form is easier to validate (every call has a schema), easier to log (the trace is structured), and easier to checkpoint (you can resume after a crash by replaying the JSON log). The expressive power is the same, because the JSON sequence supports conditionals and loops via nested structure or via an explicit "control-flow" command.
# The 2026 structured-output successor to Code-as-Policies.
plan = [
{"call": "go_to", "args": {"location": "kitchen"}},
{"call": "find_all_objects", "args": {"description": "red block"}, "bind": "blocks"},
{
"for_each": "blocks", "as": "b",
"body": [
{"call": "pick_up", "args": {"obj": "$b"}},
{"call": "place", "args": {"obj": "$b", "target": "bin"}},
],
},
]
Even in 2026, free-form Python still beats structured JSON on three task categories. (a) Tasks needing rich arithmetic on intermediate quantities ("place the block 5 cm to the left of the rightmost cup"). (b) Tasks needing list comprehensions or other functional patterns. (c) Tasks needing user-defined helper functions that get reused later. The dominant production pattern is "JSON for routine calls, Python escape hatch for the genuinely-programmatic tasks", with the LLM choosing between the two modes based on complexity. The OpenAI Code Interpreter and Claude Code Mode are concrete examples of this hybrid in chat-LLM products; the same pattern is reaching robotics in 2026.
Key Takeaway
Code-as-Policies has the LLM emit Python (or structured JSON) that calls into a robot skill API. The result is a planner with loops, conditionals, and helper functions, which expresses tasks SayCan's flat skill ranking cannot. The runtime is a sandboxed Python interpreter with the skill module in scope plus an exception-feedback retry loop. The 2026 successor pattern uses structured JSON tool-call sequences instead of free-form code, with a Python escape hatch for the genuinely-programmatic minority of tasks.
Show Answer
for red in get_red_blocks()[:3]: pick(red); place_on(get_stack_top(blue_block)). The code form is shorter because it expresses the repetition as a loop rather than unrolling it. Debugging favors Code-as-Policies because exceptions carry stack traces, line numbers, and variable values that you can replay; SayCan's flat skill list only tells you which step failed, not why the planner chose it.Show Answer
Show Answer
set_velocity(v)); in software you see an infinite loop, in robotics you see a hardware crash. Failure mode two: file-system or shell-escape that lets the model bypass the action allowlist (e.g., calling os.system or a raw ROS topic publish), bypassing the joint-limit and workspace-bound guards that the skill API enforces. Sandboxing for robotics therefore restricts the call graph to a typed skill module, blocks all OS and network access, and enforces wall-clock and command-rate limits at the interpreter boundary.Continue to Section 24.9: VoxPoser: Language as Spatial Cost Field.
Section 24.9 covers VoxPoser, which keeps the LLM-as-planner pattern but changes the output: instead of a skill list or Python program, the LLM emits a 3D cost field that a classical optimizer turns into a trajectory. The shift gives a fundamentally different way to ground language in space.