The right workflow engine is the one your on-call engineer can debug at 3 a.m. without paging the founder.
Deploy, Workflow-Watching AI Agent
The five frameworks surveyed in Section 64.2 all solve the same problem with different trade-offs. This section provides a decision matrix that maps workload characteristics to the right engine, discusses the workflow-as-code versus DAG-as-config split that separates Temporal-family tools from Airflow-family tools, traces the OpenAI Agents SDK's emerging durability story, and finishes with migration paths for teams that picked one engine and need to move to another. The goal is to make a choice you will not regret six months from now.
Prerequisites
This section assumes you have read Section 64.2 (frameworks) and Section 64.3 (operational patterns). The decision matrix below references concepts from both.
64.4.1 Decision Matrix
The three frameworks covered in Section 64.2 (Temporal, Inngest, LangGraph) plus the two newer additions (Restate, Hatchet) serve different needs. The quick rule is to match the framework to the cost of failure, with cross-framework composition reserved for the highest-stakes pipelines.
64.4.1.1 When Temporal Fits
Temporal is the right choice when you need infrastructure-level durability for complex, long-running workflows with strict exactly-once guarantees. It excels in environments that already run Kubernetes and have platform engineering teams comfortable with operating stateful infrastructure. The OpenAI Agents SDK integration makes it particularly appealing for teams building tool-heavy agents that interact with external systems.
64.4.1.2 When Inngest Fits
Inngest is the right choice when you want durable execution without operating additional infrastructure. Its event-driven model and step-level checkpointing provide strong durability guarantees with minimal setup. It fits teams that prefer managed services and want to add durability to existing web applications rather than building a separate worker infrastructure.
64.4.1.3 When LangGraph Persistence Fits
LangGraph persistence is the right choice when you are already using LangGraph for agent orchestration and want to add checkpointing, time-travel debugging, and human-in-the-loop capabilities with minimal additional complexity. Its durability guarantees are scoped to the LangGraph execution engine, making it simpler but less general than Temporal.
64.4.1.4 When Restate Fits
Restate is the right choice for greenfield agent stacks where the team finds Temporal's separate cluster operationally heavy and LangGraph's framework lock-in narrow. A single sidecar binary, RPC-shaped programming model, and journaling at every await mean less code reorganization to get durability. The risk is the ecosystem maturity gap: fewer SDKs, fewer connectors, fewer postmortems published.
64.4.1.5 When Hatchet Fits
Hatchet is the right choice for Python-first teams whose workload looks like "background jobs at scale with durability." LLM batch processing (encode 100,000 documents, classify them, store results) hits Hatchet's sweet spot. Concurrency limits, retries, and progress tracking are first-class; the trade-off is that exactly-once-across-side-effects guarantees are weaker than Temporal's.
64.4.1.6 A Quick Side-by-Side
| Framework | State location | Operates as | Best for | Watch out for |
|---|---|---|---|---|
| Temporal | Server event history | Separate cluster | High-stakes multi-step agents with side effects (payments, bookings) | Operational footprint; non-deterministic-workflow gotchas |
| Inngest | Managed platform | SDK in your web app | Event-driven LLM pipelines without ops overhead | Vendor managed-service tier; lower visibility than self-hosted |
| LangGraph | Checkpointer DB | Library inside your app | Single-agent loops with HITL pauses | Durability scoped to LangGraph; non-LangGraph code is on its own |
| Restate | Sidecar journal | Sidecar binary next to service | Greenfield agent services that want simpler model than Temporal | Newer; smaller ecosystem; fewer SDK languages |
| Hatchet | PostgreSQL | Engine + Python workers | LLM batch jobs; "Celery, but durable" | Weaker exactly-once-for-side-effects than Temporal |
Among Temporal practitioners there is an old saying: "the third workflow engine is always the one you keep." The first is whatever the founder picked in week one (usually Celery or a homegrown thing). The second is the panic move at series A scale (usually Airflow or Step Functions). The third is the one chosen with actual operational pain memory, which is why so many production stacks end up on Temporal. The faster you can get to the third engine, the more runway you save; this section's decision matrix is meant to compress that journey from eighteen months to an afternoon.
64.4.1.7 Workflow-as-Code vs DAG-as-Config
The choice above happens inside one family: workflow-as-code, where the workflow is ordinary code that the runtime instruments for durability. The other family is DAG-as-config, where the workflow is a declarative graph (Python DSL, YAML, or visual editor) consumed by a separate engine. Airflow, Prefect, Dagster, and AWS Step Functions in its Standard form are the canonical examples.
Workflow-as-code wins for LLM agents for one reason: each step is dynamic. The LLM's tool selection at step 7 depends on the result at step 6; you can't draw that graph at deploy time, because the graph is generated at runtime. Temporal, Inngest, LangGraph, Restate, and Hatchet all let you write if statements and loops that the LLM steers, and the runtime records the choices into the event history. A DAG-as-config tool can do the same thing only by emitting a static DAG that delegates the actual branching to an external service, which collapses the value of the declarative graph in the first place.
DAG-as-config still wins for daily-batch ETL: nightly data pipelines, model training schedules, periodic eval runs, scheduled report generation. The graph is the same every night, observability tools (Airflow UI, Dagster Asset graph) are tuned for it, and the operator does not want every Python developer rewriting the workflow on the fly. The 2026 rule of thumb: if the graph is static and runs on a cron, reach for Airflow/Prefect/Dagster; if the graph is generated at runtime by an LLM or a user, reach for Temporal/Inngest/Restate/Hatchet/LangGraph. Many production teams run both: Airflow for the data layer, Temporal for the agent layer, with Airflow tasks that start Temporal workflows when an LLM is needed.
64.4.1.8 The OpenAI Agents SDK Durability Story
OpenAI's Agents SDK (announced March 2025) folded durability into the agent runtime itself via persistent threads. A thread is a server-side conversation handle: messages, tool calls, and intermediate outputs all live on OpenAI's infrastructure, indexed by thread ID. Resuming a crashed agent is as simple as passing the same thread ID; the SDK reattaches to the running run and continues from where it left off. For teams whose agents already live entirely inside the OpenAI stack, this is the lowest-friction path to "durable enough."
The trade-off is scope. Thread-based durability covers the model loop, tool argument generation, and the SDK's built-in tools. Anything you do outside the Agents SDK (database writes, third-party API calls, message queue publishes, multi-LLM-vendor orchestration) gets no durability guarantees: a crash mid-Stripe-call still loses the call. The OpenAI Agents SDK plus Temporal integration covered in Section 64.2.1.1 exists precisely to bridge that gap: thread state for the agent loop, Temporal activities for everything that touches the outside world. Expect this hybrid pattern to be the dominant production shape through 2026.
64.4.2 Migration Paths Between Frameworks
Most teams pick one framework, then discover six to twelve months later that their workload has outgrown it. The good news is that the workflow-as-code family shares enough structural DNA that migrations are feasible. The common moves:
- LangGraph → LangGraph + Temporal. The graph stays as the agent loop, but each side-effecting node moves to a Temporal activity. The LangGraph checkpointer can coexist with Temporal's event history; the LangGraph node becomes a single Temporal activity that "runs the next graph step." Migration is mostly additive: you keep your LangGraph code and add a thin Temporal wrapper.
- Inngest → Temporal. Both use step-level checkpointing. Inngest
step.run("name", lambda: f())maps almost one-to-one to Temporalworkflow.execute_activity(f). The hard part is the event-driven trigger surface: Inngest functions react to events, Temporal workflows are started explicitly. The migration usually adds a thin event bus (Kafka, NATS, or AWS EventBridge) in front of Temporal to recover the event-driven shape. - Celery → Hatchet. For teams whose pre-LLM stack was Celery, Hatchet is the smallest delta to gain durability. Task signatures stay similar, but the engine moves from Celery's broker model to Hatchet's PostgreSQL-backed log. The win is durability and observability; the cost is reworking concurrency and rate-limit configuration.
- Workflow-as-code → Restate. The Restate model is enough of a re-think that this is closer to a rewrite than a migration. The good news is the rewrite usually shrinks the code (no explicit workflow/activity split), so the diff is negative in lines.
The escape hatch in every direction is the durable-execution interoperability surface: events. A team can keep their old framework running while new workflows are written in the new one, with events flowing between them. This is how almost every successful migration actually happens.
64.4.2.1 Combining Frameworks
These frameworks are not mutually exclusive. A common production architecture uses LangGraph for agent logic, wrapped inside a Temporal workflow that provides infrastructure-level durability, with an Inngest event bus connecting the workflow to other services. The key principle is to match the durability guarantee to the cost of failure: lightweight summarization tasks may need only LangGraph checkpointing, while a booking agent that charges credit cards needs Temporal's exactly-once semantics.
The choice between orchestration frameworks is fundamentally a question about where the state lives. In Temporal, state lives in the Temporal server's event history, and your workers are stateless. In LangGraph, state lives in the checkpointer database, and your application server manages the graph execution. In Inngest, state lives in Inngest's managed platform, and your function code is stateless. In Restate, state lives in the sidecar journal next to your service. In Hatchet, state lives in PostgreSQL and the engine consults it. Each approach has different failure modes: Temporal survives worker crashes but requires a healthy Temporal cluster; LangGraph survives application restarts but depends on the checkpointer database; Inngest survives application failures but depends on the Inngest platform. Understanding where your state lives is the first step toward understanding what can go wrong.
An e-commerce company builds an agent that processes customer returns: (1) validate the return request, (2) check inventory for the returned item, (3) generate a return shipping label, (4) issue a refund, (5) send a confirmation email. Identify which steps need idempotency keys, which need compensation logic, and which framework (Temporal, Inngest, LangGraph, Restate, or Hatchet) best fits this use case. Justify your choice.
Answer Sketch
Steps 3 (shipping label), 4 (refund), and 5 (email) need idempotency keys because they produce external side effects. Steps 3 and 4 need compensation: if the refund fails, the shipping label should be voided. Temporal is the best fit because the workflow has strict ordering, involves financial transactions requiring exactly-once guarantees, and benefits from the saga pattern for compensation. LangGraph would be insufficient because it does not natively manage external side-effect durability. Inngest could work for the event-driven notification (step 5) but would be harder to use for the transactional booking/refund sequence. Restate is a credible second choice for greenfield teams; Hatchet is a fit only if the rest of the system is already Python-task-queue-shaped.
- LLM agent workflows need durable execution because multi-step processes that take minutes or hours will inevitably encounter failures, timeouts, and infrastructure interruptions.
- Temporal provides infrastructure-level durability with automatic retry, state persistence, and exactly-once execution semantics for long-running workflows.
- Inngest offers event-driven durable functions with a simpler developer experience, ideal for serverless LLM pipelines.
- LangGraph persistence provides application-level checkpointing within the LangGraph framework, enabling conversation and agent state recovery.
- Restate and Hatchet round out the 2026 landscape: Restate for simpler greenfield durability, Hatchet for Python-first batch-style LLM jobs.
- Workflow-as-code wins for LLM agents because the graph is generated at runtime; DAG-as-config (Airflow, Prefect) still wins for static daily-batch ETL.
- The OpenAI Agents SDK's thread-based durability handles the agent loop but punts external side effects to wrappers like Temporal.
Show Answer
Show Answer
Show Answer
Show Answer
Classical durable execution (Temporal, the Saga pattern from Garcia-Molina and Salem 1987) assumes a deterministic workflow with well-defined retryable steps. LLM agent workflows break that assumption: steps are non-deterministic, errors are sometimes semantic rather than infrastructural, and the right "retry" can be a different prompt rather than the same one. A new research line is reshaping the orchestration layer around that reality.
Reflexion (Shinn et al., NeurIPS 2023, arXiv:2303.11366) and Self-Refine (Madaan et al., NeurIPS 2023) introduced agent-level retry loops that learn from failure by generating a reflection on what went wrong and editing the next attempt's prompt. AutoGen Studio (Microsoft, 2024) and Voyager (Wang et al., 2023) push this into long-horizon settings with skill libraries that grow over time. LangGraph's time-travel debugging (2024) and OpenAI's Agents SDK plus Temporal integration (2025) bring this thinking into production orchestrators by recording the full agent trajectory and allowing replay from any node.
Open research directions include: formal guarantees of progress for non-deterministic retry policies (when does a self-correcting agent halt?), cost-aware retry budgets that trade dollars for success probability, and orchestrators that explicitly model "tool returned wrong answer" as a distinct failure class with its own recovery policy. The 2026 production stack is converging on Temporal-style durable execution wrapped around an agent layer that learns from its own failures, with observability surfaces good enough to make the loop transparent.
With production engineering patterns established, the next chapter covers Chapter 65: Containers, Kubernetes & Deployment, the infrastructure substrate that runs the durable workflows you have just designed. The retry, idempotency, and observability machinery from this chapter all assume a healthy container platform underneath; Chapter 65 builds it.
For the agent-loop frameworks (LangGraph, AutoGen, CrewAI) that live on top of durable execution, see Section 26.1: AI Agents. For the agent-safety constraints (sandboxing, human-in-the-loop checkpoints) durable workflows are the right shape for, see Section 49.1: Agent Safety. For the observability and tracing patterns (OpenTelemetry spans, retry-and-resume telemetry) durable execution depends on, see Section 42.9: OpenTelemetry for LLM Applications.