Building Conversational AI with LLMs and Agents
Appendix V: LLM Tooling Ecosystem

Orchestration Frameworks: LangChain, LlamaIndex, Haystack, and DSPy

Big Picture

Orchestration frameworks sit between your application logic and the LLM API, providing abstractions for prompt management, chaining, retrieval, and tool integration. The four dominant frameworks each take a fundamentally different design approach: LangChain emphasizes composable components, LlamaIndex focuses on data and retrieval, Haystack uses a pipeline architecture, and DSPy replaces prompts with optimizable programs. Understanding these differences is essential for making the right choice.

1. Framework Overview and Design Philosophies

Each orchestration framework reflects a distinct philosophy about how developers should interact with LLMs. These philosophical differences manifest in API design, abstraction levels, and the types of applications each framework makes easy or difficult to build.

LangChain follows a component-based philosophy. It provides hundreds of integrations (LLM providers, vector stores, document loaders, tools) that snap together through a common interface called the Runnable protocol. This breadth makes LangChain the Swiss Army knife of LLM frameworks, but it also means the framework has a large surface area and a steep learning curve for advanced use cases.

LlamaIndex was built with a data-first philosophy. It excels at ingesting, indexing, and retrieving documents for RAG applications. While it has expanded to support general orchestration, its core strength remains the retrieval pipeline: loading documents from dozens of sources, chunking them with configurable strategies, embedding them, and querying them with sophisticated retrieval methods.

Haystack uses a directed acyclic graph (DAG) pipeline architecture inherited from its origins as a search framework (by deepset). Each component in a Haystack pipeline has typed inputs and outputs, and the framework validates the pipeline graph at construction time. This strictness catches errors early and makes pipelines easier to reason about, at the cost of some flexibility.

DSPy takes the most radical approach: it treats LLM interactions as differentiable programs rather than prompt templates. Instead of hand-crafting prompts, you define input/output signatures and let DSPy's optimizers (teleprompters) find effective prompting strategies automatically. This approach produces more robust results but requires a fundamentally different mental model.

2. Feature Comparison

The following table compares the four frameworks across key dimensions. Scores reflect the state of each framework as of early 2026.

Orchestration Frameworks: Feature Comparison
Feature LangChain LlamaIndex Haystack DSPy
Primary focus General orchestration RAG and data retrieval Pipeline-based NLP Programmatic prompting
GitHub stars (approx.) 100k+ 38k+ 18k+ 20k+
Language support Python, TypeScript Python, TypeScript Python Python
LLM provider integrations 80+ 40+ 20+ 15+
Vector store integrations 50+ 40+ 15+ 5+
Document loaders 100+ 160+ 30+ Minimal
Streaming support Full Full Full Limited
Async support Full Full Full Partial
Type safety Moderate (Pydantic) Moderate (Pydantic) Strong (typed I/O) Strong (signatures)
Learning curve Moderate to steep Moderate Moderate Steep
Commercial offering LangSmith (observability) LlamaCloud (managed RAG) deepset Cloud None
License MIT MIT Apache 2.0 MIT

3. Code Complexity Comparison

The best way to understand the practical differences between frameworks is to implement the same task in each. The following examples all implement a basic RAG pipeline that loads a document, creates embeddings, stores them in a vector database, and answers questions using retrieved context.

3.1 LangChain

LangChain's approach uses composable components chained together with the pipe operator. The LCEL (LangChain Expression Language) syntax is concise but requires familiarity with the Runnable protocol.

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Load and chunk documents
loader = TextLoader("knowledge_base.txt")
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(loader.load())

# Create vector store
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Build RAG chain with LCEL
prompt = ChatPromptTemplate.from_template(
    "Answer based on context:\n{context}\n\nQuestion: {question}"
)
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

answer = chain.invoke("What is PagedAttention?")

3.2 LlamaIndex

LlamaIndex's approach is more declarative and data-centric. The framework handles chunking, embedding, and retrieval with sensible defaults, requiring less boilerplate for standard RAG.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Load documents (auto-detects file types)
documents = SimpleDirectoryReader("./data").load_data()

# Create index (handles chunking, embedding, and storage)
index = VectorStoreIndex.from_documents(documents)

# Query with built-in retrieval
query_engine = index.as_query_engine(
    llm=OpenAI(model="gpt-4o"),
    similarity_top_k=3,
)

response = query_engine.query("What is PagedAttention?")

3.3 Haystack

Haystack's pipeline approach makes the data flow explicit. Each component declares its inputs and outputs, and the framework validates that connections are type-compatible.

from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()

# Indexing pipeline
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
indexing.add_component("embedder", OpenAIDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=doc_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["knowledge_base.txt"]}})

# Query pipeline
template = "Answer based on context:\n{% for d in documents %}{{ d.content }}\n{% endfor %}\nQuestion: {{ question }}"
query_pipe = Pipeline()
query_pipe.add_component("embedder", OpenAITextEmbedder())
query_pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=doc_store, top_k=3))
query_pipe.add_component("prompt", PromptBuilder(template=template))
query_pipe.add_component("llm", OpenAIGenerator(model="gpt-4o"))
query_pipe.connect("embedder.embedding", "retriever.query_embedding")
query_pipe.connect("retriever", "prompt.documents")
query_pipe.connect("prompt", "llm")

result = query_pipe.run({
    "embedder": {"text": "What is PagedAttention?"},
    "prompt": {"question": "What is PagedAttention?"},
})

3.4 DSPy

DSPy replaces prompt engineering with programming. You define what the LLM should do (via signatures), and DSPy optimizes how it does it. This code is the shortest, but understanding the framework requires grasping the signature and module abstractions.

import dspy

# Configure LLM and retriever
lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=lm)

# Define the RAG module
class RAG(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

rag = RAG()
result = rag(question="What is PagedAttention?")
Key Insight

Lines of code is not the right metric for comparison. LlamaIndex and DSPy achieve brevity by hiding complexity behind high-level abstractions. LangChain and Haystack expose more of the pipeline, giving you finer control. The right choice depends on whether you need that control. If your RAG pipeline uses standard settings, LlamaIndex's defaults save time. If you need custom chunking, reranking, and hybrid search, LangChain or Haystack give you the knobs to turn.

4. Strengths and Weaknesses

Beyond feature lists and code comparisons, each framework has characteristic strengths and weaknesses that emerge during real-world usage. The following assessment is based on production experience and community feedback.

4.1 LangChain

Strengths: Unmatched integration breadth; large community means abundant examples; LCEL provides a powerful composition model; TypeScript support for full-stack teams; LangSmith provides excellent observability.

Weaknesses: Large API surface creates confusion (multiple ways to do the same thing); frequent breaking changes in earlier versions eroded trust (stabilized in v0.2+); abstraction overhead adds latency; debugging LCEL chains can be opaque without LangSmith.

4.2 LlamaIndex

Strengths: Best-in-class document ingestion and retrieval; excellent defaults for RAG; strong TypeScript support; LlamaCloud provides managed infrastructure; property graph index for knowledge graph use cases.

Weaknesses: Less suited for non-RAG workflows; agent capabilities are less mature than LangGraph; some advanced features require LlamaCloud (paid); can be "magic" in ways that make debugging difficult.

4.3 Haystack

Strengths: Pipeline validation catches errors at build time; clean typed component model; strong enterprise backing from deepset; excellent documentation; modular design avoids bloated dependencies; mature pipeline serialization (YAML export/import).

Weaknesses: Smaller community means fewer third-party examples; fewer integrations than LangChain; pipeline syntax is verbose for simple use cases; no TypeScript SDK.

4.4 DSPy

Strengths: Eliminates prompt engineering through optimization; produces more robust prompts that generalize better; signature-based approach is concise and testable; strong academic foundation from Stanford NLP.

Weaknesses: Steep learning curve; requires a different mental model than traditional prompt engineering; smaller ecosystem and fewer integrations; optimization requires labeled examples; limited production tooling and observability.

5. Production Readiness Comparison

Moving from a prototype to production introduces requirements around reliability, observability, security, and operational management. The following table assesses each framework's production readiness.

Orchestration Frameworks: Production Readiness
Production Feature LangChain LlamaIndex Haystack DSPy
Error handling / retries Built-in (Runnable) Built-in Pipeline-level Basic
Streaming responses Full support Full support Full support Limited
Observability integration LangSmith, callbacks LlamaTrace, callbacks OpenTelemetry native Basic logging
Caching Multiple backends Built-in Component-level None built-in
Rate limiting Provider-level Provider-level Custom component Manual
Pipeline serialization JSON export JSON/dict export YAML native Module save/load
Deployment guides Extensive Good Extensive Minimal
Security features Input sanitization Input validation Input validation Minimal

6. Decision Table: Which Framework Should You Choose?

The following decision table maps common project scenarios to framework recommendations. Each row describes a situation and identifies the best-fit framework along with an alternative.

If your project needs... Best Fit Runner-Up Rationale
Maximum integration options LangChain LlamaIndex Largest integration ecosystem by a wide margin
RAG with complex retrieval LlamaIndex Haystack Purpose-built for data ingestion and retrieval
Strict pipeline validation Haystack LangChain Typed I/O and build-time validation prevent runtime errors
Automated prompt optimization DSPy None comparable Only framework with built-in prompt optimization
TypeScript full-stack app LangChain LlamaIndex Most mature TypeScript SDK
Enterprise with compliance needs Haystack LangChain deepset Cloud offers enterprise support; Apache 2.0 license
Quick prototype (few days) LlamaIndex LangChain Sensible defaults minimize boilerplate for RAG use cases
Research with novel prompting DSPy LangChain Programmatic approach enables systematic prompt exploration
Agent-heavy workflows LangChain + LangGraph LlamaIndex LangGraph (covered in V.3) provides best agent orchestration
Figure V.2.3: Decision table for orchestration framework selection. "Best Fit" indicates the recommended primary choice; "Runner-Up" is a viable alternative if the best fit does not meet a constraint.
Note

These frameworks are not mutually exclusive. Many production systems combine LlamaIndex for retrieval with LangChain for orchestration, or use DSPy for prompt optimization during development and then export the optimized prompts into a LangChain or Haystack pipeline for production. Choose a primary framework for your core workflow, then integrate specialized tools where they add value.

Summary

LangChain offers the broadest ecosystem and is the default choice when you need maximum flexibility and integrations. LlamaIndex is the strongest choice for RAG-centric applications where data ingestion and retrieval quality are paramount. Haystack provides the most disciplined engineering experience with its validated pipeline architecture, making it well suited for enterprise deployments. DSPy offers a fundamentally different (and often superior) approach to prompt engineering through optimization, but requires the steepest learning curve and has the least production tooling. Apply the decision framework from Section V.1 with your project-specific weights to make the final call.