Chapter 20: Retrieval-Augmented Generation (RAG) | Building Conversational AI with LLMs and Agents

"The best answer is not always inside the model. Sometimes the smartest thing an AI can do is look it up."
RAG, Bookishly Wise AI Agent

Retrieval-Augmented Generation chapter illustration — **Figure 20.0.1**: RAG is the open-book exam of AI: instead of memorizing everything, the model looks up what it needs and weaves the answer on the fly.

Chapter Overview

Large language models are powerful generators but inherently limited by their training data cutoff, their tendency to hallucinate, and the impossibility of encoding all world knowledge in model parameters. Retrieval-Augmented Generation (RAG) addresses these limitations by connecting LLMs to external knowledge sources at inference time, grounding responses in retrieved evidence rather than relying solely on parametric memory. Building on the embedding and vector database foundations from Chapter 19, RAG closes the gap between static model knowledge and dynamic, real-world information.

This chapter covers the complete RAG landscape, from fundamental architectures through advanced retrieval techniques. You will learn how to build ingestion pipelines, implement query transformations, combine dense and sparse retrieval, and leverage knowledge graphs for structured reasoning. The chapter also explores agentic RAG systems that can decompose complex queries, perform iterative research, and synthesize information from multiple sources.

On the structured data side, you will learn how LLMs can query databases through text-to-SQL, process tabular data, and combine structured and unstructured retrieval. Finally, the chapter surveys the major RAG frameworks (LangChain, LlamaIndex, Haystack) that provide production-ready tooling for building retrieval-augmented applications.

Big Picture

Retrieval-augmented generation is one of the most widely deployed LLM patterns in production. By combining retrieval with generation, you can reduce hallucinations, keep responses current, and ground outputs in authoritative sources. This chapter is central to building the knowledge-intensive applications covered in Part VI and Part VIII.

Learning Objectives

Design and implement end-to-end RAG pipelines including document ingestion, chunking, embedding, and retrieval
Apply advanced retrieval techniques such as HyDE, multi-query expansion, cross-encoder re-ranking, and fusion retrieval (building on prompt engineering principles)
Construct and query knowledge graphs for structured reasoning, including GraphRAG with community detection
Build agentic RAG systems capable of query decomposition, iterative research, and multi-source synthesis
Implement text-to-SQL pipelines for structured data retrieval with schema linking and error correction
Evaluate RAG system quality using faithfulness, relevance, and answer correctness metrics
Compare and use RAG orchestration frameworks (LangChain, LlamaIndex, Haystack) for production applications
Diagnose and fix common RAG failure modes including lost-in-the-middle effects, retrieval drift, and context window overflow

Prerequisites

Chapter 19: Embeddings & Vector Databases (embedding models, similarity search, vector stores)
Chapter 10: LLM APIs (calling OpenAI, Anthropic, and other providers programmatically)
Chapter 11: Prompt Engineering (system prompts, few-shot examples, structured outputs)
Familiarity with Python, including working with APIs and JSON data
Basic understanding of SQL and relational databases (for Section 20.5)

Sections

What's Next?

In the next chapter, Chapter 21: Conversational AI, we explore dialogue management, memory, and the patterns that make conversational AI systems effective.