Section L.1: Core Abstractions: Models, Prompts, and Chains | Building Conversational AI with LLMs and Agents

Big Picture

LangChain: Chains, Agents, and Retrieval provides a unified interface for working with large language models from any provider. Its core abstractions (models, prompts, and chains) let you write provider-agnostic code, compose complex workflows from simple building blocks, and switch between OpenAI, Anthropic, Google, and open-source models with minimal changes. This section covers the foundational primitives you will use in every LangChain: Chains, Agents, and Retrieval application.

1. Chat Models

At the heart of LangChain: Chains, Agents, and Retrieval is the chat model abstraction. Rather than calling each provider's SDK directly, you instantiate a chat model class that normalizes the interface. All chat models accept a list of messages and return an AIMessage. This means you can swap providers without rewriting your application logic.

The following example shows how to instantiate chat models for OpenAI and Anthropic, then invoke them with identical message lists.

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage

# Instantiate models from different providers
openai_model = ChatOpenAI(model="gpt-4o", temperature=0.7)
anthropic_model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0.7)

# Both accept the same message format
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="Explain Python list comprehensions in two sentences.")
]

# invoke() returns an AIMessage
response = openai_model.invoke(messages)
print(response.content)

# Swap to Anthropic with no other code changes
response = anthropic_model.invoke(messages)
print(response.content)

A list comprehension is a concise way to create a new list by applying an expression to each element of an iterable, optionally filtering elements with a condition. The syntax `[expr for x in iterable if condition]` replaces multi-line for-loops with a single readable line. A list comprehension provides a compact syntax for generating lists from iterables by combining iteration, transformation, and filtering into one expression. It is both more readable and typically faster than building the equivalent list with an explicit for-loop and append calls.

The invoke() method is synchronous and returns a single response. LangChain: Chains, Agents, and Retrieval also provides stream() for token-by-token output, batch() for processing multiple inputs in parallel, and their async counterparts ainvoke(), astream(), and abatch().

This example demonstrates streaming and batch processing.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Streaming: tokens arrive incrementally
for chunk in model.stream([HumanMessage(content="Write a haiku about Python.")]):
    print(chunk.content, end="", flush=True)
print()

# Batch: process multiple inputs concurrently
questions = [
    [HumanMessage(content="What is a decorator?")],
    [HumanMessage(content="What is a generator?")],
    [HumanMessage(content="What is a context manager?")],
]
responses = model.batch(questions, config={"max_concurrency": 3})
for resp in responses:
    print(resp.content[:80], "...")

Lines of code, Logic intertwined, Python speaks truth. A decorator is a function that takes another function as input and extends its ... A generator is a function that yields values lazily using the `yield` keyword, ... A context manager is an object that defines `__enter__` and `__exit__` methods ...

Tip

Use batch() with max_concurrency to respect provider rate limits while still processing inputs faster than sequential invoke() calls. For real-time user-facing applications, prefer stream() so users see output as it is generated.

2. Prompt Templates

Hard-coding prompt strings into application code leads to maintenance headaches. LangChain: Chains, Agents, and Retrieval's prompt templates separate the prompt structure from the runtime variables that fill it. There are two main types: PromptTemplate for plain string prompts and ChatPromptTemplate for multi-message chat prompts.

The following example builds a chat prompt template with a system message and a user message that includes a variable placeholder.

from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

# Simple string template (useful for completion-style models)
string_template = PromptTemplate.from_template(
    "Translate the following text to {language}: {text}"
)
print(string_template.format(language="French", text="Hello, world!"))

# Chat prompt template (recommended for chat models)
chat_template = ChatPromptTemplate.from_messages([
    ("system", "You are a {role} who explains concepts at a {level} level."),
    ("human", "{question}")
])

# format_messages returns a list of Message objects
messages = chat_template.format_messages(
    role="computer science professor",
    level="beginner",
    question="What is recursion?"
)
for msg in messages:
    print(f"{msg.__class__.__name__}: {msg.content}")

Translate the following text to French: Hello, world! SystemMessage: You are a computer science professor who explains concepts at a beginner level. HumanMessage: What is recursion?

Templates support partial application via partial(), letting you fill some variables now and the rest later. This is useful when certain values (like the current date or a system configuration) are known at initialization time but others arrive at runtime.

3. Chains: The Legacy LLMChain

Early versions of LangChain: Chains, Agents, and Retrieval used the LLMChain class to connect a prompt template to a model. While this still works, it has been superseded by the LangChain: Chains, Agents, and Retrieval Expression Language (LCEL). Understanding LLMChain is helpful for reading older code and tutorials.

This example shows the legacy chain approach for comparison with the modern LCEL approach that follows.

from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Legacy approach (still functional, but not recommended for new code)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "Explain {topic} in {num_sentences} sentences.")
])

model = ChatOpenAI(model="gpt-4o-mini")
chain = LLMChain(llm=model, prompt=prompt)

result = chain.invoke({"topic": "gradient descent", "num_sentences": "3"})
print(result["text"])

Gradient descent is an optimization algorithm that iteratively adjusts model parameters by moving in the direction of steepest decrease of the loss function. The step size is controlled by the learning rate, which determines how far to move at each iteration. It is the foundation of training neural networks.

Deprecation Notice

LLMChain and other legacy chain classes are deprecated as of LangChain: Chains, Agents, and Retrieval 0.2. New projects should use LCEL (covered next). The legacy classes remain available for backward compatibility but will not receive new features.

4. LangChain: Chains, Agents, and Retrieval Expression Language (LCEL)

LCEL is LangChain: Chains, Agents, and Retrieval's modern composition framework. It uses the pipe operator (|) to chain components together, similar to Unix pipes. Every LCEL component implements the Runnable interface, which means it automatically supports invoke(), stream(), batch(), and their async variants. This composability is the key design principle: you build complex workflows by snapping simple pieces together.

The simplest LCEL chain connects a prompt template to a model and (optionally) an output parser.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define components
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise technical writer."),
    ("human", "Explain {concept} in exactly {sentences} sentences.")
])
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
parser = StrOutputParser()

# Compose with the pipe operator
chain = prompt | model | parser

# invoke() flows data through: prompt -> model -> parser
result = chain.invoke({"concept": "MapReduce", "sentences": "3"})
print(result)  # A plain string (parser extracts .content from AIMessage)

# Streaming works automatically through the entire chain
for token in chain.stream({"concept": "Docker containers", "sentences": "2"}):
    print(token, end="", flush=True)

MapReduce is a programming model for processing large datasets in parallel by splitting work into a "map" phase that transforms individual records and a "reduce" phase that aggregates the results. It was popularized by Google and forms the foundation of distributed data processing frameworks like Hadoop. Docker containers package an application with all its dependencies into a lightweight, portable unit that runs consistently across environments. They use OS-level virtualization to isolate processes without the overhead of full virtual machines.

Diagram — **Figure L.1.1**: Data flow through an LCEL chain. The pipe operator connects components sequentially: input variables fill the prompt, the prompt is sent to the model, and the output parser extracts a clean string from the model's response.

5. RunnablePassthrough and RunnableParallel

Real-world chains often need to pass original input alongside computed values, or run multiple steps in parallel. LangChain: Chains, Agents, and Retrieval provides two utility classes for these patterns: RunnablePassthrough passes its input through unchanged, and RunnableParallel runs multiple runnables simultaneously, collecting their outputs into a dictionary.

This example uses RunnableParallel to run two independent LLM calls concurrently, then merges the results.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
parser = StrOutputParser()

# Two independent chains
pros_chain = (
    ChatPromptTemplate.from_template("List 3 pros of {technology}.")
    | model | parser
)
cons_chain = (
    ChatPromptTemplate.from_template("List 3 cons of {technology}.")
    | model | parser
)

# RunnableParallel runs both chains concurrently
parallel = RunnableParallel(pros=pros_chain, cons=cons_chain)
result = parallel.invoke({"technology": "microservices"})
print("PROS:", result["pros"])
print("CONS:", result["cons"])

PROS: 1. Independent deployment: each service can be updated, scaled, and deployed independently. 2. Technology flexibility: teams can choose the best language and framework for each service. 3. Fault isolation: a failure in one service does not bring down the entire system. CONS: 1. Operational complexity: managing dozens of services requires robust orchestration, monitoring, and logging. 2. Network overhead: inter-service communication adds latency compared to in-process calls. 3. Data consistency: maintaining transactions across services is significantly harder than in a monolith.

RunnablePassthrough is especially useful in retrieval-augmented generation (RAG) pipelines where you need to forward the user's original question alongside retrieved context.

from langchain_core.runnables import RunnablePassthrough, RunnableParallel

# Pattern: pass the question through while also running a retrieval step
# (retriever would be a real vectorstore retriever in practice)
def mock_retriever(query: dict) -> str:
    return "Python was created by Guido van Rossum in 1991."

setup = RunnableParallel(
    context=lambda x: mock_retriever(x),
    question=RunnablePassthrough()
)

rag_prompt = ChatPromptTemplate.from_template(
    "Context: {context}\n\nAnswer this question: {question}"
)

rag_chain = setup | rag_prompt | model | parser
answer = rag_chain.invoke("Who created Python?")
print(answer)

Python was created by Guido van Rossum in 1991.

Tip

You can inspect any LCEL chain's structure by calling chain.get_graph().print_ascii(). This renders an ASCII diagram showing how components are connected, which is invaluable for debugging complex chains.

6. Configuring Model Parameters at Runtime

LCEL chains accept a config dictionary at invocation time for runtime customization. You can also use .configurable_fields() to expose model parameters (such as temperature or model name) as runtime-configurable options without rebuilding the chain.

This example shows how to make the model name configurable so that callers can switch between models per request.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import ConfigurableField

model = ChatOpenAI(model="gpt-4o-mini", temperature=0).configurable_fields(
    model_name=ConfigurableField(
        id="model_name",
        name="Model Name",
        description="The OpenAI model to use"
    )
)

chain = (
    ChatPromptTemplate.from_template("Summarize: {text}")
    | model
    | StrOutputParser()
)

# Use default model
result1 = chain.invoke({"text": "LangChain: Chains, Agents, and Retrieval is a framework for LLM apps."})

# Override model at runtime
result2 = chain.with_config(
    configurable={"model_name": "gpt-4o"}
).invoke({"text": "LangChain: Chains, Agents, and Retrieval is a framework for LLM apps."})

Key Insight

LCEL replaces the legacy chain classes with a composable, pipe-based syntax. Every component in an LCEL chain automatically supports invoke, stream, batch, and async variants. Use RunnableParallel for concurrent execution and RunnablePassthrough to forward data alongside computed values.