
"Orchestration is the difference between an LLM script and an LLM platform."
Deploy, Workflow-Watching AI Agent
Chapter 63 routed requests; this chapter routes workflows. Temporal, Inngest, Airflow with LLM operators, Prefect, durable execution, retries with backoff, and the workflow patterns that turn long-running LLM jobs into reliable async pipelines.
LLM-powered applications often span hours of work , chained tool calls, human review, retries, and long-running document processing. This chapter covers durable workflow engines (Temporal, AWS Step Functions, Airflow) and the patterns that make stateful agent workflows reliable.
Chapter Overview
Agent workflows that span minutes or hours will fail, and losing progress on a 20-step research pipeline is unacceptable. This chapter teaches durable execution: the workflow orchestration patterns that survive retries, partial failures, and provider outages, the canonical engines (Temporal, Restate, Inngest), the LLM-specific patterns (long-running agent loops, checkpointed retrievals, human-in-the-loop steps), and the trade-offs between durability, latency, and cost.
Workflow orchestration is what makes multi-step agents production-grade. This chapter is the syllabus for the runtimes and patterns that survive contact with reality.
- Explain why long-running LLM workflows need durable execution rather than ad-hoc retry logic.
- Compare Temporal, Restate, and Inngest as workflow engines for LLM applications.
- Architect a checkpointed agent loop with retries and partial-failure recovery.
- Apply human-in-the-loop checkpoints inside a durable workflow.
- Trade off durability, latency, and cost for a target workflow.
Sections in This Chapter
Prerequisites
- Production engineering from Chapter 62
- Agent foundations from Chapter 26
- Familiarity with at least one workflow engine (Airflow, Temporal, Step Functions)
- 64.1 The Case for Durable Execution What durable execution means, the failure modes that motivate it, and when cheap retries are still the right call. Intermediate
- 64.2 Durable Execution Frameworks A side-by-side catalog of Temporal, Inngest, LangGraph persistence, Restate, and Hatchet for LLM agent workflows. Intermediate
- 64.3 Operating Durable Workflows Retry strategies, idempotency keys, compensation (saga) logic, and observability for workflows that span minutes to hours. Advanced
- 64.4 Framework Selection A decision matrix across the five frameworks, the workflow-as-code vs DAG-as-config debate, the OpenAI Agents SDK durability story, and migration paths. Advanced
What's Next?
This chapter begins with Section 64.1: The Case for Durable Execution. Each section builds on the previous one, so we recommend reading them in order.