Section V.2: Orchestration Frameworks: LangChain, LlamaIndex, Haystack, and DSPy | Building Conversational AI with LLMs and Agents

Big Picture

Orchestration frameworks sit between your application logic and the LLM API, providing abstractions for prompt management, chaining, retrieval, and tool integration. The four dominant frameworks each take a fundamentally different design approach: LangChain emphasizes composable components, LlamaIndex focuses on data and retrieval, Haystack uses a pipeline architecture, and DSPy replaces prompts with optimizable programs. Understanding these differences is essential for making the right choice.

1. Framework Overview and Design Philosophies

Each orchestration framework reflects a distinct philosophy about how developers should interact with LLMs. These philosophical differences manifest in API design, abstraction levels, and the types of applications each framework makes easy or difficult to build.

LangChain follows a component-based philosophy. It provides hundreds of integrations (LLM providers, vector stores, document loaders, tools) that snap together through a common interface called the Runnable protocol. This breadth makes LangChain the Swiss Army knife of LLM frameworks, but it also means the framework has a large surface area and a steep learning curve for advanced use cases.

LlamaIndex was built with a data-first philosophy. It excels at ingesting, indexing, and retrieving documents for RAG applications. While it has expanded to support general orchestration, its core strength remains the retrieval pipeline: loading documents from dozens of sources, chunking them with configurable strategies, embedding them, and querying them with sophisticated retrieval methods.

Haystack uses a directed acyclic graph (DAG) pipeline architecture inherited from its origins as a search framework (by deepset). Each component in a Haystack pipeline has typed inputs and outputs, and the framework validates the pipeline graph at construction time. This strictness catches errors early and makes pipelines easier to reason about, at the cost of some flexibility.

DSPy takes the most radical approach: it treats LLM interactions as differentiable programs rather than prompt templates. Instead of hand-crafting prompts, you define input/output signatures and let DSPy's optimizers (teleprompters) find effective prompting strategies automatically. This approach produces more robust results but requires a fundamentally different mental model.

2. Feature Comparison

The following table compares the four frameworks across key dimensions. Scores reflect the state of each framework as of early 2026.

Orchestration Frameworks: Feature Comparison

Feature	LangChain	LlamaIndex	Haystack	DSPy
Primary focus	General orchestration	RAG and data retrieval	Pipeline-based NLP	Programmatic prompting
GitHub stars (approx.)	100k+	38k+	18k+	20k+
Language support	Python, TypeScript	Python, TypeScript	Python	Python
LLM provider integrations	80+	40+	20+	15+
Vector store integrations	50+	40+	15+	5+
Document loaders	100+	160+	30+	Minimal
Streaming support	Full	Full	Full	Limited
Async support	Full	Full	Full	Partial
Type safety	Moderate (Pydantic)	Moderate (Pydantic)	Strong (typed I/O)	Strong (signatures)
Learning curve	Moderate to steep	Moderate	Moderate	Steep
Commercial offering	LangSmith (observability)	LlamaCloud (managed RAG)	deepset Cloud	None
License	MIT	MIT	Apache 2.0	MIT

3. Code Complexity Comparison

The best way to understand the practical differences between frameworks is to implement the same task in each. The following examples all implement a basic RAG pipeline that loads a document, creates embeddings, stores them in a vector database, and answers questions using retrieved context.

3.1 LangChain

LangChain's approach uses composable components chained together with the pipe operator. The LCEL (LangChain Expression Language) syntax is concise but requires familiarity with the Runnable protocol.

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Load and chunk documents
loader = TextLoader("knowledge_base.txt")
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(loader.load())

# Create vector store
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Build RAG chain with LCEL
prompt = ChatPromptTemplate.from_template(
    "Answer based on context:\n{context}\n\nQuestion: {question}"
)
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

answer = chain.invoke("What is PagedAttention?")

3.2 LlamaIndex

LlamaIndex's approach is more declarative and data-centric. The framework handles chunking, embedding, and retrieval with sensible defaults, requiring less boilerplate for standard RAG.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Load documents (auto-detects file types)
documents = SimpleDirectoryReader("./data").load_data()

# Create index (handles chunking, embedding, and storage)
index = VectorStoreIndex.from_documents(documents)

# Query with built-in retrieval
query_engine = index.as_query_engine(
    llm=OpenAI(model="gpt-4o"),
    similarity_top_k=3,
)

response = query_engine.query("What is PagedAttention?")

3.3 Haystack

Haystack's pipeline approach makes the data flow explicit. Each component declares its inputs and outputs, and the framework validates that connections are type-compatible.

from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()

# Indexing pipeline
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
indexing.add_component("embedder", OpenAIDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=doc_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["knowledge_base.txt"]}})

# Query pipeline
template = "Answer based on context:\n{% for d in documents %}{{ d.content }}\n{% endfor %}\nQuestion: {{ question }}"
query_pipe = Pipeline()
query_pipe.add_component("embedder", OpenAITextEmbedder())
query_pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store, top_k=3))
query_pipe.add_component("prompt", PromptBuilder(template=template))
query_pipe.add_component("llm", OpenAIGenerator(model="gpt-4o"))
query_pipe.connect("embedder.embedding", "retriever.query_embedding")
query_pipe.connect("retriever", "prompt.documents")
query_pipe.connect("prompt", "llm")

result = query_pipe.run({
    "embedder": {"text": "What is PagedAttention?"},
    "prompt": {"question": "What is PagedAttention?"},
})

3.4 DSPy

DSPy replaces prompt engineering with programming. You define what the LLM should do (via signatures), and DSPy optimizes how it does it. This code is the shortest, but understanding the framework requires grasping the signature and module abstractions.

import dspy

# Configure LLM and retriever
lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=lm)

# Define the RAG module
class RAG(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

rag = RAG()
result = rag(question="What is PagedAttention?")

Key Insight

Lines of code is not the right metric for comparison. LlamaIndex and DSPy achieve brevity by hiding complexity behind high-level abstractions. LangChain and Haystack expose more of the pipeline, giving you finer control. The right choice depends on whether you need that control. If your RAG pipeline uses standard settings, LlamaIndex's defaults save time. If you need custom chunking, reranking, and hybrid search, LangChain or Haystack give you the knobs to turn.

4. Strengths and Weaknesses

Beyond feature lists and code comparisons, each framework has characteristic strengths and weaknesses that emerge during real-world usage. The following assessment is based on production experience and community feedback.

4.1 LangChain

Strengths: Unmatched integration breadth; large community means abundant examples; LCEL provides a powerful composition model; TypeScript support for full-stack teams; LangSmith provides excellent observability.

Weaknesses: Large API surface creates confusion (multiple ways to do the same thing); frequent breaking changes in earlier versions eroded trust (stabilized in v0.2+); abstraction overhead adds latency; debugging LCEL chains can be opaque without LangSmith.

4.2 LlamaIndex

Strengths: Best-in-class document ingestion and retrieval; excellent defaults for RAG; strong TypeScript support; LlamaCloud provides managed infrastructure; property graph index for knowledge graph use cases.

Weaknesses: Less suited for non-RAG workflows; agent capabilities are less mature than LangGraph; some advanced features require LlamaCloud (paid); can be "magic" in ways that make debugging difficult.

4.3 Haystack

Strengths: Pipeline validation catches errors at build time; clean typed component model; strong enterprise backing from deepset; excellent documentation; modular design avoids bloated dependencies; mature pipeline serialization (YAML export/import).

Weaknesses: Smaller community means fewer third-party examples; fewer integrations than LangChain; pipeline syntax is verbose for simple use cases; no TypeScript SDK.

4.4 DSPy

Strengths: Eliminates prompt engineering through optimization; produces more robust prompts that generalize better; signature-based approach is concise and testable; strong academic foundation from Stanford NLP.

Weaknesses: Steep learning curve; requires a different mental model than traditional prompt engineering; smaller ecosystem and fewer integrations; optimization requires labeled examples; limited production tooling and observability.

5. Production Readiness Comparison

Moving from a prototype to production introduces requirements around reliability, observability, security, and operational management. The following table assesses each framework's production readiness.

Orchestration Frameworks: Production Readiness

Production Feature	LangChain	LlamaIndex	Haystack	DSPy
Error handling / retries	Built-in (Runnable)	Built-in	Pipeline-level	Basic
Streaming responses	Full support	Full support	Full support	Limited
Observability integration	LangSmith, callbacks	LlamaTrace, callbacks	OpenTelemetry native	Basic logging
Caching	Multiple backends	Built-in	Component-level	None built-in
Rate limiting	Provider-level	Provider-level	Custom component	Manual
Pipeline serialization	JSON export	JSON/dict export	YAML native	Module save/load
Deployment guides	Extensive	Good	Extensive	Minimal
Security features	Input sanitization	Input validation	Input validation	Minimal

6. Decision Table: Which Framework Should You Choose?

The following decision table maps common project scenarios to framework recommendations. Each row describes a situation and identifies the best-fit framework along with an alternative.

If your project needs...	Best Fit	Runner-Up	Rationale
Maximum integration options	LangChain	LlamaIndex	Largest integration ecosystem by a wide margin
RAG with complex retrieval	LlamaIndex	Haystack	Purpose-built for data ingestion and retrieval
Strict pipeline validation	Haystack	LangChain	Typed I/O and build-time validation prevent runtime errors
Automated prompt optimization	DSPy	None comparable	Only framework with built-in prompt optimization
TypeScript full-stack app	LangChain	LlamaIndex	Most mature TypeScript SDK
Enterprise with compliance needs	Haystack	LangChain	deepset Cloud offers enterprise support; Apache 2.0 license
Quick prototype (few days)	LlamaIndex	LangChain	Sensible defaults minimize boilerplate for RAG use cases
Research with novel prompting	DSPy	LangChain	Programmatic approach enables systematic prompt exploration
Agent-heavy workflows	LangChain + LangGraph	LlamaIndex	LangGraph (covered in V.3) provides best agent orchestration

Figure V.2.3: Decision table for orchestration framework selection. "Best Fit" indicates the recommended primary choice; "Runner-Up" is a viable alternative if the best fit does not meet a constraint.

Note

These frameworks are not mutually exclusive. Many production systems combine LlamaIndex for retrieval with LangChain for orchestration, or use DSPy for prompt optimization during development and then export the optimized prompts into a LangChain or Haystack pipeline for production. Choose a primary framework for your core workflow, then integrate specialized tools where they add value.

Summary

LangChain offers the broadest ecosystem and is the default choice when you need maximum flexibility and integrations. LlamaIndex is the strongest choice for RAG-centric applications where data ingestion and retrieval quality are paramount. Haystack provides the most disciplined engineering experience with its validated pipeline architecture, making it well suited for enterprise deployments. DSPy offers a fundamentally different (and often superior) approach to prompt engineering through optimization, but requires the steepest learning curve and has the least production tooling. Apply the decision framework from Section V.1 with your project-specific weights to make the final call.