Section 64.4: Framework Selection | Building Language AI

The right workflow engine is the one your on-call engineer can debug at 3 a.m. without paging the founder.
Deploy, Workflow-Watching AI Agent

Big Picture

The five frameworks surveyed in Section 64.2 all solve the same problem with different trade-offs. This section provides a decision matrix that maps workload characteristics to the right engine, discusses the workflow-as-code versus DAG-as-config split that separates Temporal-family tools from Airflow-family tools, traces the OpenAI Agents SDK's emerging durability story, and finishes with migration paths for teams that picked one engine and need to move to another. The goal is to make a choice you will not regret six months from now.

Prerequisites

This section assumes you have read Section 64.2 (frameworks) and Section 64.3 (operational patterns). The decision matrix below references concepts from both.

64.4.1 Decision Matrix

The three frameworks covered in Section 64.2 (Temporal, Inngest, LangGraph) plus the two newer additions (Restate, Hatchet) serve different needs. The quick rule is to match the framework to the cost of failure, with cross-framework composition reserved for the highest-stakes pipelines.

64.4.1.1 When Temporal Fits

Temporal is the right choice when you need infrastructure-level durability for complex, long-running workflows with strict exactly-once guarantees. It excels in environments that already run Kubernetes and have platform engineering teams comfortable with operating stateful infrastructure. The OpenAI Agents SDK integration makes it particularly appealing for teams building tool-heavy agents that interact with external systems.

64.4.1.2 When Inngest Fits

Inngest is the right choice when you want durable execution without operating additional infrastructure. Its event-driven model and step-level checkpointing provide strong durability guarantees with minimal setup. It fits teams that prefer managed services and want to add durability to existing web applications rather than building a separate worker infrastructure.

64.4.1.3 When LangGraph Persistence Fits

LangGraph persistence is the right choice when you are already using LangGraph for agent orchestration and want to add checkpointing, time-travel debugging, and human-in-the-loop capabilities with minimal additional complexity. Its durability guarantees are scoped to the LangGraph execution engine, making it simpler but less general than Temporal.

64.4.1.4 When Restate Fits

Restate is the right choice for greenfield agent stacks where the team finds Temporal's separate cluster operationally heavy and LangGraph's framework lock-in narrow. A single sidecar binary, RPC-shaped programming model, and journaling at every await mean less code reorganization to get durability. The risk is the ecosystem maturity gap: fewer SDKs, fewer connectors, fewer postmortems published.

64.4.1.5 When Hatchet Fits

Hatchet is the right choice for Python-first teams whose workload looks like "background jobs at scale with durability." LLM batch processing (encode 100,000 documents, classify them, store results) hits Hatchet's sweet spot. Concurrency limits, retries, and progress tracking are first-class; the trade-off is that exactly-once-across-side-effects guarantees are weaker than Temporal's.

64.4.1.6 A Quick Side-by-Side

Framework	State location	Operates as	Best for	Watch out for
Temporal	Server event history	Separate cluster	High-stakes multi-step agents with side effects (payments, bookings)	Operational footprint; non-deterministic-workflow gotchas
Inngest	Managed platform	SDK in your web app	Event-driven LLM pipelines without ops overhead	Vendor managed-service tier; lower visibility than self-hosted
LangGraph	Checkpointer DB	Library inside your app	Single-agent loops with HITL pauses	Durability scoped to LangGraph; non-LangGraph code is on its own
Restate	Sidecar journal	Sidecar binary next to service	Greenfield agent services that want simpler model than Temporal	Newer; smaller ecosystem; fewer SDK languages
Hatchet	PostgreSQL	Engine + Python workers	LLM batch jobs; "Celery, but durable"	Weaker exactly-once-for-side-effects than Temporal

Fun Fact

Decision tree for picking a durable execution framework based on language preference, scale, and operational model — **Figure 64.4.1**: Framework selection decision tree. The first cut is team size; the second is language preference and operational appetite. Inngest sits horizontally as the SaaS event-driven alternative.

Among Temporal practitioners there is an old saying: "the third workflow engine is always the one you keep." The first is whatever the founder picked in week one (usually Celery or a homegrown thing). The second is the panic move at series A scale (usually Airflow or Step Functions). The third is the one chosen with actual operational pain memory, which is why so many production stacks end up on Temporal. The faster you can get to the third engine, the more runway you save; this section's decision matrix is meant to compress that journey from eighteen months to an afternoon.

64.4.1.7 Workflow-as-Code vs DAG-as-Config

The choice above happens inside one family: workflow-as-code, where the workflow is ordinary code that the runtime instruments for durability. The other family is DAG-as-config, where the workflow is a declarative graph (Python DSL, YAML, or visual editor) consumed by a separate engine. Airflow, Prefect, Dagster, and AWS Step Functions in its Standard form are the canonical examples.

Workflow-as-code wins for LLM agents for one reason: each step is dynamic. The LLM's tool selection at step 7 depends on the result at step 6; you can't draw that graph at deploy time, because the graph is generated at runtime. Temporal, Inngest, LangGraph, Restate, and Hatchet all let you write if statements and loops that the LLM steers, and the runtime records the choices into the event history. A DAG-as-config tool can do the same thing only by emitting a static DAG that delegates the actual branching to an external service, which collapses the value of the declarative graph in the first place.

DAG-as-config still wins for daily-batch ETL: nightly data pipelines, model training schedules, periodic eval runs, scheduled report generation. The graph is the same every night, observability tools (Airflow UI, Dagster Asset graph) are tuned for it, and the operator does not want every Python developer rewriting the workflow on the fly. The 2026 rule of thumb: if the graph is static and runs on a cron, reach for Airflow/Prefect/Dagster; if the graph is generated at runtime by an LLM or a user, reach for Temporal/Inngest/Restate/Hatchet/LangGraph. Many production teams run both: Airflow for the data layer, Temporal for the agent layer, with Airflow tasks that start Temporal workflows when an LLM is needed.

64.4.1.8 The OpenAI Agents SDK Durability Story

OpenAI's Agents SDK (announced March 2025) folded durability into the agent runtime itself via persistent threads. A thread is a server-side conversation handle: messages, tool calls, and intermediate outputs all live on OpenAI's infrastructure, indexed by thread ID. Resuming a crashed agent is as simple as passing the same thread ID; the SDK reattaches to the running run and continues from where it left off. For teams whose agents already live entirely inside the OpenAI stack, this is the lowest-friction path to "durable enough."

The trade-off is scope. Thread-based durability covers the model loop, tool argument generation, and the SDK's built-in tools. Anything you do outside the Agents SDK (database writes, third-party API calls, message queue publishes, multi-LLM-vendor orchestration) gets no durability guarantees: a crash mid-Stripe-call still loses the call. The OpenAI Agents SDK plus Temporal integration covered in Section 64.2.1.1 exists precisely to bridge that gap: thread state for the agent loop, Temporal activities for everything that touches the outside world. Expect this hybrid pattern to be the dominant production shape through 2026.

64.4.2 Migration Paths Between Frameworks

Most teams pick one framework, then discover six to twelve months later that their workload has outgrown it. The good news is that the workflow-as-code family shares enough structural DNA that migrations are feasible. The common moves:

LangGraph → LangGraph + Temporal. The graph stays as the agent loop, but each side-effecting node moves to a Temporal activity. The LangGraph checkpointer can coexist with Temporal's event history; the LangGraph node becomes a single Temporal activity that "runs the next graph step." Migration is mostly additive: you keep your LangGraph code and add a thin Temporal wrapper.
Inngest → Temporal. Both use step-level checkpointing. Inngest step.run("name", lambda: f()) maps almost one-to-one to Temporal workflow.execute_activity(f). The hard part is the event-driven trigger surface: Inngest functions react to events, Temporal workflows are started explicitly. The migration usually adds a thin event bus (Kafka, NATS, or AWS EventBridge) in front of Temporal to recover the event-driven shape.
Celery → Hatchet. For teams whose pre-LLM stack was Celery, Hatchet is the smallest delta to gain durability. Task signatures stay similar, but the engine moves from Celery's broker model to Hatchet's PostgreSQL-backed log. The win is durability and observability; the cost is reworking concurrency and rate-limit configuration.
Workflow-as-code → Restate. The Restate model is enough of a re-think that this is closer to a rewrite than a migration. The good news is the rewrite usually shrinks the code (no explicit workflow/activity split), so the diff is negative in lines.

The escape hatch in every direction is the durable-execution interoperability surface: events. A team can keep their old framework running while new workflows are written in the new one, with events flowing between them. This is how almost every successful migration actually happens.

64.4.2.1 Combining Frameworks

These frameworks are not mutually exclusive. A common production architecture uses LangGraph for agent logic, wrapped inside a Temporal workflow that provides infrastructure-level durability, with an Inngest event bus connecting the workflow to other services. The key principle is to match the durability guarantee to the cost of failure: lightweight summarization tasks may need only LangGraph checkpointing, while a booking agent that charges credit cards needs Temporal's exactly-once semantics.

Key Insight

The choice between orchestration frameworks is fundamentally a question about where the state lives. In Temporal, state lives in the Temporal server's event history, and your workers are stateless. In LangGraph, state lives in the checkpointer database, and your application server manages the graph execution. In Inngest, state lives in Inngest's managed platform, and your function code is stateless. In Restate, state lives in the sidecar journal next to your service. In Hatchet, state lives in PostgreSQL and the engine consults it. Each approach has different failure modes: Temporal survives worker crashes but requires a healthy Temporal cluster; LangGraph survives application restarts but depends on the checkpointer database; Inngest survives application failures but depends on the Inngest platform. Understanding where your state lives is the first step toward understanding what can go wrong.

Exercise 64.4.1: Durable Agent Design Conceptual

An e-commerce company builds an agent that processes customer returns: (1) validate the return request, (2) check inventory for the returned item, (3) generate a return shipping label, (4) issue a refund, (5) send a confirmation email. Identify which steps need idempotency keys, which need compensation logic, and which framework (Temporal, Inngest, LangGraph, Restate, or Hatchet) best fits this use case. Justify your choice.

Answer Sketch

Steps 3 (shipping label), 4 (refund), and 5 (email) need idempotency keys because they produce external side effects. Steps 3 and 4 need compensation: if the refund fails, the shipping label should be voided. Temporal is the best fit because the workflow has strict ordering, involves financial transactions requiring exactly-once guarantees, and benefits from the saga pattern for compensation. LangGraph would be insufficient because it does not natively manage external side-effect durability. Inngest could work for the event-driven notification (step 5) but would be harder to use for the transactional booking/refund sequence. Restate is a credible second choice for greenfield teams; Hatchet is a fit only if the rest of the system is already Python-task-queue-shaped.

Key Takeaways

LLM agent workflows need durable execution because multi-step processes that take minutes or hours will inevitably encounter failures, timeouts, and infrastructure interruptions.
Temporal provides infrastructure-level durability with automatic retry, state persistence, and exactly-once execution semantics for long-running workflows.
Inngest offers event-driven durable functions with a simpler developer experience, ideal for serverless LLM pipelines.
LangGraph persistence provides application-level checkpointing within the LangGraph framework, enabling conversation and agent state recovery.
Restate and Hatchet round out the 2026 landscape: Restate for simpler greenfield durability, Hatchet for Python-first batch-style LLM jobs.
Workflow-as-code wins for LLM agents because the graph is generated at runtime; DAG-as-config (Airflow, Prefect) still wins for static daily-batch ETL.
The OpenAI Agents SDK's thread-based durability handles the agent loop but punts external side effects to wrappers like Temporal.

Self-Check

Q1: An agent pipeline runs 20 sequential LLM calls over 15 minutes. The server crashes at step 14. Without durable execution and with Temporal, what happens at restart?

Show Answer

Without durability: all 14 completed steps are lost. Next invocation restarts from step 1, incurring full LLM cost again. With Temporal: workflow state is persisted to a database after each activity completes. On restart, the workflow function re-executes from the beginning, but Temporal intercepts each activity whose result is already in the event history and returns the cached result. Only step 15 (the failed step) actually re-runs against the external LLM.

Q2: What is exactly-once semantics, and why is it critical for agent tool calls that send emails or charge credit cards?

Show Answer

Exactly-once semantics guarantees each step executes precisely one time even across retries and crashes. Without it, a retry after a crash can re-execute a completed step, sending the email twice or charging the customer twice. Temporal implements this by replaying workflow code deterministically and returning cached activity results from the event history for already-completed steps, rather than re-invoking the actual external service. Retries become idempotent without developers writing idempotency logic in every tool.

Q3: When would you choose application-level checkpointing over a durable execution framework like Temporal?

Show Answer

Application-level checkpointing is appropriate for: simple workflows (under 5 steps), teams already using a database-backed state machine pattern, or avoiding the operational overhead of a Temporal cluster. Temporal/Inngest become necessary when: workflows run for minutes or hours, workflows have many parallel branches that must be joined, workflows include human-in-the-loop approval with arbitrary wait times, or cross-service fan-outs must be coordinated reliably.

Q4: Your team runs nightly batch jobs that train a model and generate reports, plus on-demand agent workflows that respond to user requests. Which framework family fits each?

Show Answer

The nightly batch jobs run on a static DAG: same graph every night, schedule-driven, with a clear sequence of data tasks. DAG-as-config tools (Airflow, Prefect, Dagster) are the natural fit; their UIs and lineage tracking are tuned for this. The on-demand agent workflows have runtime-determined graphs (each user's task triggers different tool sequences). Workflow-as-code tools (Temporal, Inngest, Restate, Hatchet, or LangGraph) are the natural fit. Many teams run both, with Airflow tasks starting Temporal workflows when an LLM is in the loop.

Research Frontier: Self-Healing Agent Workflows

Classical durable execution (Temporal, the Saga pattern from Garcia-Molina and Salem 1987) assumes a deterministic workflow with well-defined retryable steps. LLM agent workflows break that assumption: steps are non-deterministic, errors are sometimes semantic rather than infrastructural, and the right "retry" can be a different prompt rather than the same one. A new research line is reshaping the orchestration layer around that reality.

Reflexion (Shinn et al., NeurIPS 2023, arXiv:2303.11366) and Self-Refine (Madaan et al., NeurIPS 2023) introduced agent-level retry loops that learn from failure by generating a reflection on what went wrong and editing the next attempt's prompt. AutoGen Studio (Microsoft, 2024) and Voyager (Wang et al., 2023) push this into long-horizon settings with skill libraries that grow over time. LangGraph's time-travel debugging (2024) and OpenAI's Agents SDK plus Temporal integration (2025) bring this thinking into production orchestrators by recording the full agent trajectory and allowing replay from any node.

Open research directions include: formal guarantees of progress for non-deterministic retry policies (when does a self-correcting agent halt?), cost-aware retry budgets that trade dollars for success probability, and orchestrators that explicitly model "tool returned wrong answer" as a distinct failure class with its own recovery policy. The 2026 production stack is converging on Temporal-style durable execution wrapped around an agent layer that learns from its own failures, with observability surfaces good enough to make the loop transparent.

What Comes Next

With production engineering patterns established, the next chapter covers Chapter 65: Containers, Kubernetes & Deployment, the infrastructure substrate that runs the durable workflows you have just designed. The retry, idempotency, and observability machinery from this chapter all assume a healthy container platform underneath; Chapter 65 builds it.