Framework Selection

Section 64.4

The right workflow engine is the one your on-call engineer can debug at 3 a.m. without paging the founder.

DeployDeploy, Workflow-Watching AI Agent
Big Picture

The five frameworks surveyed in Section 64.2 all solve the same problem with different trade-offs. This section provides a decision matrix that maps workload characteristics to the right engine, discusses the workflow-as-code versus DAG-as-config split that separates Temporal-family tools from Airflow-family tools, traces the OpenAI Agents SDK's emerging durability story, and finishes with migration paths for teams that picked one engine and need to move to another. The goal is to make a choice you will not regret six months from now.

Prerequisites

This section assumes you have read Section 64.2 (frameworks) and Section 64.3 (operational patterns). The decision matrix below references concepts from both.

64.4.1 Decision Matrix

The three frameworks covered in Section 64.2 (Temporal, Inngest, LangGraph) plus the two newer additions (Restate, Hatchet) serve different needs. The quick rule is to match the framework to the cost of failure, with cross-framework composition reserved for the highest-stakes pipelines.

64.4.1.1 When Temporal Fits

Temporal is the right choice when you need infrastructure-level durability for complex, long-running workflows with strict exactly-once guarantees. It excels in environments that already run Kubernetes and have platform engineering teams comfortable with operating stateful infrastructure. The OpenAI Agents SDK integration makes it particularly appealing for teams building tool-heavy agents that interact with external systems.

64.4.1.2 When Inngest Fits

Inngest is the right choice when you want durable execution without operating additional infrastructure. Its event-driven model and step-level checkpointing provide strong durability guarantees with minimal setup. It fits teams that prefer managed services and want to add durability to existing web applications rather than building a separate worker infrastructure.

64.4.1.3 When LangGraph Persistence Fits

LangGraph persistence is the right choice when you are already using LangGraph for agent orchestration and want to add checkpointing, time-travel debugging, and human-in-the-loop capabilities with minimal additional complexity. Its durability guarantees are scoped to the LangGraph execution engine, making it simpler but less general than Temporal.

64.4.1.4 When Restate Fits

Restate is the right choice for greenfield agent stacks where the team finds Temporal's separate cluster operationally heavy and LangGraph's framework lock-in narrow. A single sidecar binary, RPC-shaped programming model, and journaling at every await mean less code reorganization to get durability. The risk is the ecosystem maturity gap: fewer SDKs, fewer connectors, fewer postmortems published.

64.4.1.5 When Hatchet Fits

Hatchet is the right choice for Python-first teams whose workload looks like "background jobs at scale with durability." LLM batch processing (encode 100,000 documents, classify them, store results) hits Hatchet's sweet spot. Concurrency limits, retries, and progress tracking are first-class; the trade-off is that exactly-once-across-side-effects guarantees are weaker than Temporal's.

64.4.1.6 A Quick Side-by-Side

Framework State location Operates as Best for Watch out for
TemporalServer event historySeparate clusterHigh-stakes multi-step agents with side effects (payments, bookings)Operational footprint; non-deterministic-workflow gotchas
InngestManaged platformSDK in your web appEvent-driven LLM pipelines without ops overheadVendor managed-service tier; lower visibility than self-hosted
LangGraphCheckpointer DBLibrary inside your appSingle-agent loops with HITL pausesDurability scoped to LangGraph; non-LangGraph code is on its own
RestateSidecar journalSidecar binary next to serviceGreenfield agent services that want simpler model than TemporalNewer; smaller ecosystem; fewer SDK languages
HatchetPostgreSQLEngine + Python workersLLM batch jobs; "Celery, but durable"Weaker exactly-once-for-side-effects than Temporal
Fun Fact
Decision tree for picking a durable execution framework based on language preference, scale, and operational model
Figure 64.4.1: Framework selection decision tree. The first cut is team size; the second is language preference and operational appetite. Inngest sits horizontally as the SaaS event-driven alternative.

Among Temporal practitioners there is an old saying: "the third workflow engine is always the one you keep." The first is whatever the founder picked in week one (usually Celery or a homegrown thing). The second is the panic move at series A scale (usually Airflow or Step Functions). The third is the one chosen with actual operational pain memory, which is why so many production stacks end up on Temporal. The faster you can get to the third engine, the more runway you save; this section's decision matrix is meant to compress that journey from eighteen months to an afternoon.

64.4.1.7 Workflow-as-Code vs DAG-as-Config

The choice above happens inside one family: workflow-as-code, where the workflow is ordinary code that the runtime instruments for durability. The other family is DAG-as-config, where the workflow is a declarative graph (Python DSL, YAML, or visual editor) consumed by a separate engine. Airflow, Prefect, Dagster, and AWS Step Functions in its Standard form are the canonical examples.

Workflow-as-code wins for LLM agents for one reason: each step is dynamic. The LLM's tool selection at step 7 depends on the result at step 6; you can't draw that graph at deploy time, because the graph is generated at runtime. Temporal, Inngest, LangGraph, Restate, and Hatchet all let you write if statements and loops that the LLM steers, and the runtime records the choices into the event history. A DAG-as-config tool can do the same thing only by emitting a static DAG that delegates the actual branching to an external service, which collapses the value of the declarative graph in the first place.

DAG-as-config still wins for daily-batch ETL: nightly data pipelines, model training schedules, periodic eval runs, scheduled report generation. The graph is the same every night, observability tools (Airflow UI, Dagster Asset graph) are tuned for it, and the operator does not want every Python developer rewriting the workflow on the fly. The 2026 rule of thumb: if the graph is static and runs on a cron, reach for Airflow/Prefect/Dagster; if the graph is generated at runtime by an LLM or a user, reach for Temporal/Inngest/Restate/Hatchet/LangGraph. Many production teams run both: Airflow for the data layer, Temporal for the agent layer, with Airflow tasks that start Temporal workflows when an LLM is needed.

64.4.1.8 The OpenAI Agents SDK Durability Story

OpenAI's Agents SDK (announced March 2025) folded durability into the agent runtime itself via persistent threads. A thread is a server-side conversation handle: messages, tool calls, and intermediate outputs all live on OpenAI's infrastructure, indexed by thread ID. Resuming a crashed agent is as simple as passing the same thread ID; the SDK reattaches to the running run and continues from where it left off. For teams whose agents already live entirely inside the OpenAI stack, this is the lowest-friction path to "durable enough."

The trade-off is scope. Thread-based durability covers the model loop, tool argument generation, and the SDK's built-in tools. Anything you do outside the Agents SDK (database writes, third-party API calls, message queue publishes, multi-LLM-vendor orchestration) gets no durability guarantees: a crash mid-Stripe-call still loses the call. The OpenAI Agents SDK plus Temporal integration covered in Section 64.2.1.1 exists precisely to bridge that gap: thread state for the agent loop, Temporal activities for everything that touches the outside world. Expect this hybrid pattern to be the dominant production shape through 2026.

64.4.2 Migration Paths Between Frameworks

Most teams pick one framework, then discover six to twelve months later that their workload has outgrown it. The good news is that the workflow-as-code family shares enough structural DNA that migrations are feasible. The common moves:

The escape hatch in every direction is the durable-execution interoperability surface: events. A team can keep their old framework running while new workflows are written in the new one, with events flowing between them. This is how almost every successful migration actually happens.

64.4.2.1 Combining Frameworks

These frameworks are not mutually exclusive. A common production architecture uses LangGraph for agent logic, wrapped inside a Temporal workflow that provides infrastructure-level durability, with an Inngest event bus connecting the workflow to other services. The key principle is to match the durability guarantee to the cost of failure: lightweight summarization tasks may need only LangGraph checkpointing, while a booking agent that charges credit cards needs Temporal's exactly-once semantics.

Key Insight

The choice between orchestration frameworks is fundamentally a question about where the state lives. In Temporal, state lives in the Temporal server's event history, and your workers are stateless. In LangGraph, state lives in the checkpointer database, and your application server manages the graph execution. In Inngest, state lives in Inngest's managed platform, and your function code is stateless. In Restate, state lives in the sidecar journal next to your service. In Hatchet, state lives in PostgreSQL and the engine consults it. Each approach has different failure modes: Temporal survives worker crashes but requires a healthy Temporal cluster; LangGraph survives application restarts but depends on the checkpointer database; Inngest survives application failures but depends on the Inngest platform. Understanding where your state lives is the first step toward understanding what can go wrong.

Exercise 64.4.1: Durable Agent Design Conceptual

An e-commerce company builds an agent that processes customer returns: (1) validate the return request, (2) check inventory for the returned item, (3) generate a return shipping label, (4) issue a refund, (5) send a confirmation email. Identify which steps need idempotency keys, which need compensation logic, and which framework (Temporal, Inngest, LangGraph, Restate, or Hatchet) best fits this use case. Justify your choice.

Answer Sketch

Steps 3 (shipping label), 4 (refund), and 5 (email) need idempotency keys because they produce external side effects. Steps 3 and 4 need compensation: if the refund fails, the shipping label should be voided. Temporal is the best fit because the workflow has strict ordering, involves financial transactions requiring exactly-once guarantees, and benefits from the saga pattern for compensation. LangGraph would be insufficient because it does not natively manage external side-effect durability. Inngest could work for the event-driven notification (step 5) but would be harder to use for the transactional booking/refund sequence. Restate is a credible second choice for greenfield teams; Hatchet is a fit only if the rest of the system is already Python-task-queue-shaped.

Key Takeaways
Self-Check
Q1: An agent pipeline runs 20 sequential LLM calls over 15 minutes. The server crashes at step 14. Without durable execution and with Temporal, what happens at restart?
Show Answer
Without durability: all 14 completed steps are lost. Next invocation restarts from step 1, incurring full LLM cost again. With Temporal: workflow state is persisted to a database after each activity completes. On restart, the workflow function re-executes from the beginning, but Temporal intercepts each activity whose result is already in the event history and returns the cached result. Only step 15 (the failed step) actually re-runs against the external LLM.
Q2: What is exactly-once semantics, and why is it critical for agent tool calls that send emails or charge credit cards?
Show Answer
Exactly-once semantics guarantees each step executes precisely one time even across retries and crashes. Without it, a retry after a crash can re-execute a completed step, sending the email twice or charging the customer twice. Temporal implements this by replaying workflow code deterministically and returning cached activity results from the event history for already-completed steps, rather than re-invoking the actual external service. Retries become idempotent without developers writing idempotency logic in every tool.
Q3: When would you choose application-level checkpointing over a durable execution framework like Temporal?
Show Answer
Application-level checkpointing is appropriate for: simple workflows (under 5 steps), teams already using a database-backed state machine pattern, or avoiding the operational overhead of a Temporal cluster. Temporal/Inngest become necessary when: workflows run for minutes or hours, workflows have many parallel branches that must be joined, workflows include human-in-the-loop approval with arbitrary wait times, or cross-service fan-outs must be coordinated reliably.
Q4: Your team runs nightly batch jobs that train a model and generate reports, plus on-demand agent workflows that respond to user requests. Which framework family fits each?
Show Answer
The nightly batch jobs run on a static DAG: same graph every night, schedule-driven, with a clear sequence of data tasks. DAG-as-config tools (Airflow, Prefect, Dagster) are the natural fit; their UIs and lineage tracking are tuned for this. The on-demand agent workflows have runtime-determined graphs (each user's task triggers different tool sequences). Workflow-as-code tools (Temporal, Inngest, Restate, Hatchet, or LangGraph) are the natural fit. Many teams run both, with Airflow tasks starting Temporal workflows when an LLM is in the loop.
Research Frontier: Self-Healing Agent Workflows

Classical durable execution (Temporal, the Saga pattern from Garcia-Molina and Salem 1987) assumes a deterministic workflow with well-defined retryable steps. LLM agent workflows break that assumption: steps are non-deterministic, errors are sometimes semantic rather than infrastructural, and the right "retry" can be a different prompt rather than the same one. A new research line is reshaping the orchestration layer around that reality.

Reflexion (Shinn et al., NeurIPS 2023, arXiv:2303.11366) and Self-Refine (Madaan et al., NeurIPS 2023) introduced agent-level retry loops that learn from failure by generating a reflection on what went wrong and editing the next attempt's prompt. AutoGen Studio (Microsoft, 2024) and Voyager (Wang et al., 2023) push this into long-horizon settings with skill libraries that grow over time. LangGraph's time-travel debugging (2024) and OpenAI's Agents SDK plus Temporal integration (2025) bring this thinking into production orchestrators by recording the full agent trajectory and allowing replay from any node.

Open research directions include: formal guarantees of progress for non-deterministic retry policies (when does a self-correcting agent halt?), cost-aware retry budgets that trade dollars for success probability, and orchestrators that explicitly model "tool returned wrong answer" as a distinct failure class with its own recovery policy. The 2026 production stack is converging on Temporal-style durable execution wrapped around an agent layer that learns from its own failures, with observability surfaces good enough to make the loop transparent.

What Comes Next

With production engineering patterns established, the next chapter covers Chapter 65: Containers, Kubernetes & Deployment, the infrastructure substrate that runs the durable workflows you have just designed. The retry, idempotency, and observability machinery from this chapter all assume a healthy container platform underneath; Chapter 65 builds it.

See Also

For the agent-loop frameworks (LangGraph, AutoGen, CrewAI) that live on top of durable execution, see Section 26.1: AI Agents. For the agent-safety constraints (sandboxing, human-in-the-loop checkpoints) durable workflows are the right shape for, see Section 49.1: Agent Safety. For the observability and tracing patterns (OpenTelemetry spans, retry-and-resume telemetry) durable execution depends on, see Section 42.9: OpenTelemetry for LLM Applications.

Further Reading
Temporal Technologies (2024). "Temporal Core Concepts: Workflows, Activities, and Durable Execution." Temporal Documentation. The authoritative reference for Temporal's durable execution model, explaining how workflows survive process failures through event sourcing and deterministic replay.
Inngest (2024). "Inngest: Durable Functions for AI and Background Jobs." Inngest Documentation. Documentation for the Inngest serverless durable execution platform, which provides step-level checkpointing for LLM agent pipelines without managing infrastructure.
LangChain (2024). "LangGraph Persistence: Checkpointing and Recovery." LangGraph Documentation. Describes LangGraph's built-in persistence layer for agent state checkpointing, enabling automatic recovery of multi-step agent workflows from failures.
Restate (2024). "Restate: Durable Execution for Modern Services." Restate Documentation. The reference for the Restate runtime: virtual objects, journaling, and the sidecar-based deployment model that distinguishes it from Temporal-style clusters.
Hatchet (2024). "Hatchet: Distributed Task Queues with Durable Execution." Hatchet Documentation. Python-first task queue with workflow durability, concurrency control, and PostgreSQL-backed event logging, optimized for high-throughput LLM batch jobs.
Liu, Z., et al. (2023). "AgentBench: Evaluating LLMs as Agents." arXiv:2308.08155. Benchmarks LLM agent performance across complex multi-step tasks, revealing the failure modes and reliability challenges that motivate durable execution patterns.
Qian, C., et al. (2024). "Experiential Co-Learning of Software-Developing Agents." arXiv:2401.02009. Demonstrates multi-agent workflows with recovery mechanisms that parallel durable execution concepts, showing how agents can learn from and recover from failures.
Wang, L., et al. (2024). "A Survey on Large Language Model based Autonomous Agents." arXiv:2402.01680. Comprehensive survey covering agent architectures and their operational challenges, including the reliability and state management problems that durable execution addresses.
OpenAI (2025). "Agents SDK: Temporal Integration." OpenAI Agents Python documentation. Reference for wiring Temporal workflows into the OpenAI Agents SDK so agent runs survive process restarts and step-level failures.
Google Cloud (2025). "Building Agentic AI Workflows with Temporal and Vertex AI." Google Cloud Blog. Production pattern guide for composing Temporal's durable execution with Vertex AI agent endpoints.
Garcia-Molina, H., and K. Salem (1987). "Sagas." ACM SIGMOD Record 16, no. 3: 249-259. The foundational paper that introduced the Saga pattern for long-running transactions with compensating actions, the academic ancestor of every modern durable-execution system.
Nygard, M. (2018). Release It! Design and Deploy Production-Ready Software. 2nd ed. Pragmatic Bookshelf. The canonical reference for production-system failure patterns (circuit breakers, bulkheads, timeouts) that motivate durable-execution architectures for agent pipelines.