
"Choose the retrieval stack you can maintain at 2 a.m., not the one that wins a benchmark."
Pip, Retrieval-Stack-Building AI Agent
Chapters 31 through 35 walked through retrieval theory. This chapter is the tooling: LlamaIndex, LangChain retrievers, Qdrant, Weaviate, Chroma, pgvector, Cohere Rerank, and the small operational decisions that hold a RAG pipeline together.
The retrieval and IE ecosystem has its own ladder of tools: vector databases (Pinecone, Weaviate, Qdrant, Milvus), embedding model libraries, RAG frameworks, knowledge-graph tooling, and evaluation suites for retrieval quality. This chapter is the practical reference.
Chapter Overview
Part VII's toolchain is the substrate that every retrieval workflow assumes. This chapter consolidates: vector databases and hybrid-search platforms (Pinecone, Weaviate, Qdrant, Milvus, pgvector, Elasticsearch, Vespa) with a decision tree, the libraries (embedding clients, rerankers, orchestrators like LangChain / LlamaIndex / Haystack / DSPy, document parsers), the benchmarks (MS MARCO, BEIR, MTEB, MIRACL, HotpotQA, FRAMES), the open and closed embedders plus rerankers (text-embedding-3, Cohere Embed-4, Voyage, BGE-M3, NV-Embed, Stella, ColPali), and the textbooks, conferences (SIGIR, ECIR, ACL, EMNLP), and communities that keep retrieval engineers current.
Retrieval tooling stabilized in 2024 and 2025. This chapter is the index of what stuck: the database, library, benchmark, and model choices that survive the contact with production.
- Choose a vector database (Pinecone, Weaviate, Qdrant, Milvus, pgvector) for a given scale, latency, and cost target.
- Wire embedding clients, rerankers, and orchestrators (LangChain, LlamaIndex, Haystack, DSPy) into a production stack.
- Evaluate retrieval quality on MTEB, BEIR, MIRACL, FRAMES, or the live leaderboards.
- Compare closed-API embedders (text-embedding-3, Cohere Embed-4, Voyage) with open-weight options (BGE-M3, NV-Embed, Stella, ColPali).
- Identify the textbooks, conferences, and communities that maintain the retrieval canon.
Sections in This Chapter
Prerequisites
- Vector-DB and embedding basics from Chapter 31
- RAG fundamentals from Chapter 32
- Python and Docker comfort for hands-on tool comparisons
- 36.1 Platforms Vector databases and hybrid search platforms (Pinecone, Weaviate, Qdrant, Milvus, pgvector, Elasticsearch, Vespa) with selection criteria and a decision tree.
- 36.2 Libraries and Frameworks Embedding clients, rerankers, orchestrators (LangChain, LlamaIndex, Haystack, DSPy), document parsers, and hybrid-retrieval helpers for production stacks.
- 36.3 Datasets and Benchmarks MS MARCO, BEIR, MTEB, MIRACL, HotpotQA, FRAMES, and the RAG-specific benchmarks plus live leaderboards retrieval engineers should track.
- 36.4 Models Closed-API and open-weight embedders (OpenAI text-embedding-3, Cohere Embed-4, Voyage, BGE-M3, NV-Embed, Stella, ColPali) plus rerankers, with dimensions, context, and license.
- 36.5 External Reading and Communities Textbooks (Manning et al., Lin et al.), essential papers, blogs, conferences (SIGIR, ECIR, ACL, EMNLP), Discord and Reddit communities, and a weekly reading cadence.
What's Next?
Next: Chapter 37: Building Conversational AI Systems, opening Part VIII. Retrieval is single-turn: I ask, the index returns. Conversation is many-turn: state, memory, identity, repair when something goes wrong. Part VIII covers the dialogue stack from prompt-to-response loops up through voice and realtime multimodal assistants. The shift is from one-shot Q&A to coherent multi-turn experiences with a face and a name.