Chapter 36: Retrieval Tools of the Trade

Chapter opener illustration: Retrieval Tools of the Trade.

"Choose the retrieval stack you can maintain at 2 a.m., not the one that wins a benchmark."
Pip, Retrieval-Stack-Building AI Agent

Looking Back

Chapters 31 through 35 walked through retrieval theory. This chapter is the tooling: LlamaIndex, LangChain retrievers, Qdrant, Weaviate, Chroma, pgvector, Cohere Rerank, and the small operational decisions that hold a RAG pipeline together.

Big Picture

The retrieval and IE ecosystem has its own ladder of tools: vector databases (Pinecone, Weaviate, Qdrant, Milvus), embedding model libraries, RAG frameworks, knowledge-graph tooling, and evaluation suites for retrieval quality. This chapter is the practical reference.

Chapter Overview

Part VII's toolchain is the substrate that every retrieval workflow assumes. This chapter consolidates: vector databases and hybrid-search platforms (Pinecone, Weaviate, Qdrant, Milvus, pgvector, Elasticsearch, Vespa) with a decision tree, the libraries (embedding clients, rerankers, orchestrators like LangChain / LlamaIndex / Haystack / DSPy, document parsers), the benchmarks (MS MARCO, BEIR, MTEB, MIRACL, HotpotQA, FRAMES), the open and closed embedders plus rerankers (text-embedding-3, Cohere Embed-4, Voyage, BGE-M3, NV-Embed, Stella, ColPali), and the textbooks, conferences (SIGIR, ECIR, ACL, EMNLP), and communities that keep retrieval engineers current.

Retrieval tooling stabilized in 2024 and 2025. This chapter is the index of what stuck: the database, library, benchmark, and model choices that survive the contact with production.

Note: Learning Objectives

Choose a vector database (Pinecone, Weaviate, Qdrant, Milvus, pgvector) for a given scale, latency, and cost target.
Wire embedding clients, rerankers, and orchestrators (LangChain, LlamaIndex, Haystack, DSPy) into a production stack.
Evaluate retrieval quality on MTEB, BEIR, MIRACL, FRAMES, or the live leaderboards.
Compare closed-API embedders (text-embedding-3, Cohere Embed-4, Voyage) with open-weight options (BGE-M3, NV-Embed, Stella, ColPali).
Identify the textbooks, conferences, and communities that maintain the retrieval canon.

Sections in This Chapter

Prerequisites

Vector-DB and embedding basics from Chapter 31
RAG fundamentals from Chapter 32
Python and Docker comfort for hands-on tool comparisons

What's Next?

Next: Chapter 37: Building Conversational AI Systems, opening Part VIII. Retrieval is single-turn: I ask, the index returns. Conversation is many-turn: state, memory, identity, repair when something goes wrong. Part VIII covers the dialogue stack from prompt-to-response loops up through voice and realtime multimodal assistants. The shift is from one-shot Q&A to coherent multi-turn experiences with a face and a name.