Section 36.5

External Reading and Communities

"BM25 is sixty years old, HNSW is from 2018, ColBERT is from 2020; somebody on Discord is shipping all three by Tuesday. Read both literatures or be surprised by both."

Big Picture

Retrieval has two literatures that mostly do not talk to each other: the classical IR tradition (SIGIR, ECIR, Manning et al.'s textbook, TREC tracks, BM25, ColBERT, learning-to-rank) and the LLM-RAG tradition (NeurIPS, ICLR, ACL, EMNLP, arXiv, Anthropic and OpenAI engineering blogs, LangChain and LlamaIndex communities). Both are essential for any serious retrieval engineer building LLM agent systems. The 2026 best practice is to read the IR fundamentals for the algorithms that still drive every modern system (BM25 in every hybrid stack, HNSW under every vector DB, the metrics every leaderboard reports) and the LLM-RAG sources for what changed in the last 18 months (contextual retrieval, agentic search, ColPali for document images, FRAMES-style multi-hop benchmarks). This section maps the venues and prioritizes the ones with the best signal-to-noise for the practitioner shipping retrieval into LLM agents.

Prerequisites

This is an end-of-chapter reading list and assumes familiarity with the retrieval modules in Part VII. No new technical prerequisites.

Retrieval moves slower than agent literature but faster than classical IR. The cadence to expect: a foundational algorithm paper every few years (HNSW 2018, DPR 2020, ColBERT 2020, BEIR 2021, SPLADE 2021, RAG 2020, BGE 2023, BGE-M3 2024); a stream of incremental embedder releases monthly; a stream of practitioner blog posts weekly; and active Discord and Reddit threads daily. Allocate reading time across this hierarchy rather than only the top or only the bottom.

Looking Back: What sections 36.1-36.4 covered

Section 36.1 mapped the four-bucket vector-platform landscape (serverless, hosted-search-with-vector, self-hosted, SQL-extension) and anchored the recall-latency-memory trade-off with public ANN-Benchmarks numbers: HNSW ~0.95 recall@10 at ~5K QPS, IVF-Flat ~0.85 at ~12K QPS, IVF-PQ ~0.80 at ~30K QPS at $8\times$ memory savings. Complexity bounds: HNSW $O(M \log N)$ insert / $O(\log N)$ query; IVF-PQ $O(N/n_{\text{list}} \cdot D' + k \cdot D)$. Section 36.2 surveyed the library stack (embedders, rerankers, orchestrators, parsers, hybrid helpers, eval) and crystallized the "thinnest viable" four-library recipe (sentence-transformers + Qdrant OSS + RAGAS + Phoenix) plus the RRF formula $\text{score}(d) = \sum_q 1/(60 + \text{rank}_q(d))$ that fuses lexical and dense lists. Section 36.3 laid out the four-tier benchmark hierarchy (TREC-lineage, BEIR, MTEB, RAG-specific) and the canonical metrics (NDCG, MRR, MAP, Recall@k, BM25). Section 36.4 catalogued the 2026 embedder field (closed-API, open-weight, late-interaction, multimodal), the Matryoshka loss $\mathcal{L}_{\text{MRL}} = \sum_k c_k \mathcal{L}(z_{1:k}, y)$, ColBERT's MaxSim score $s(q,d) = \sum_i \max_j q_i \cdot d_j$, and the InfoNCE objective $\mathcal{L} = -\log \frac{\exp(s(q,d^+)/\tau)}{\sum \exp(s(q,d_i)/\tau)}$ that trains every modern bi-encoder. The four-section arc moves from infrastructure to libraries to evaluation to models — the same order practitioners encounter when standing up a retrieval system.

36.5.1 Foundational textbooks

36.5.2 Essential papers and essays

The papers that anyone working on retrieval in 2026 should have read. Sorted by canonicity rather than recency.

36.5.3 Active blogs and newsletters

36.5.4 Academic conferences and venues

36.5.5 Leaderboards and live rankings

The live numbers worth watching. Re-checking these monthly is the cheapest way to track field movement.

36.5.6 Communities: Discord, Slack, Reddit, forums

The retrieval communities are smaller and more technical than the general LLM communities. The active venues:

36.5.7 Comparing the venues

Table 36.5.1: Where to go for what (retrieval, 2026).
Venue Use case Latency
arXiv cs.IR / cs.CL Primary research Days
SIGIR / ECIR proceedings Peer-reviewed IR research Yearly
EMNLP / ACL / NeurIPS / ICLR LLM-RAG research Yearly
Eugene Yan, Anthropic Engineering Production-quality essays Monthly
Pinecone / Vespa / Qdrant blogs Vendor-flavored deep dives Weekly
LangChain / LlamaIndex blogs Framework updates Weekly
Latent Space Practitioner interviews Weekly
Simon Willison Daily commentary Daily
r/Rag, r/LocalLLaMA Real-world failure modes Hours
Discords (LC, LI, Pinecone, etc.) Tooling Q&A, debugging Minutes
MTEB / MMTEB leaderboards Current best embedder Continuous

36.5.8 Courses and tutorials

Library Shortcut
pagefind for static-site search without a backend

If the takeaway from this chapter is "add search to my docs site", you do not need a vector DB at all. pagefind (CloudCannon, 2023+) indexes a built static site (Hugo, Jekyll, Astro, Docusaurus, MkDocs, or hand-written HTML) into a sharded WASM index that the browser fetches lazily; queries run client-side with no server, no API key, and no monthly cost. It is the search engine this very book uses (the search box at the top of every section). Add it after your static build; deployment is a folder copy.

Show code
# Pagefind ships as an npx-runnable Rust binary, no Node setup required
# 1. Build your static site into ./public (any SSG works)
# 2. Index it:
#    npx pagefind --site public
# 3. Drop the snippet into your template:

# <link rel="stylesheet" href="../../pagefind/pagefind-ui.css">
# <script src="/pagefind/pagefind-ui.js"></script>
# <div id="search"></div>
# <script>new PagefindUI({ element: "#search" });</script>
Code Fragment 36.5.1a: Add data-pagefind-meta="chapter" attributes to elements you want to surface as filterable facets; the index grows by roughly 5-10% of the indexed HTML size.

36.5.9 Staying current: a weekly cadence

A defensible weekly reading plan for a retrieval practitioner:

This cadence keeps you within two weeks of the field's frontier without burning weekends. The discipline matters more than the volume: a retrieval engineer who reads 30 minutes a week consistently is better-calibrated than one who binges papers irregularly.

Key Insight
The two literatures complement, do not replace each other

The classical IR literature (Manning, SIGIR, BM25, learning-to-rank, ColBERT) and the LLM-RAG literature (Anthropic, OpenAI, LangChain, LlamaIndex, NeurIPS, ACL) are written by mostly disjoint communities and use mostly disjoint vocabularies. A practitioner who reads only one half overestimates what is new in the other half: the LLM-RAG-only reader rediscovers TF-IDF and ColBERT three years late; the IR-only reader misses contextual retrieval, agentic search, and the practical lessons of the 2023-25 RAG era. The right calibration is to read both, treat both as authoritative on their respective questions, and route a given problem to the literature that owns it. Algorithm? IR side. Prompt design and pipeline composition? LLM-RAG side. Evaluation? Both, with the IR side owning the metrics and the LLM-RAG side owning the reference-free synthetic-eval techniques.

36.5.10 Podcasts, YouTube channels, and video content

36.5.11 Meetups, summits, and workshops

Beyond the academic conferences, the practitioner gatherings worth attending:

36.5.12 The canonical resources by question

A practitioner's quick-reference of "where do I go when I have question X":

What's Next: Chapter 37 — Conversational AI Foundations

Continue to Section 37.1: Dialogue System Architecture.

Chapter 36 closes Part VII (Retrieval and Information Extraction). Part VIII turns from "find the right context" to "use the context in a conversation", and Chapter 37 opens that arc with the foundations of dialogue systems: turn-taking, dialog state, mixed-initiative interaction, and the architectural split between the LLM that generates utterances and the policy layer that decides what to do next. The retrieval stack we just built becomes one tool in a larger conversational agent: the embedder we picked in Section 36.4 will encode user turns for memory retrieval; the hybrid BM25-plus-dense recipe from Section 36.2 will index the assistant's tool-output history; the RAGAS faithfulness metric from Section 36.3 will measure whether a multi-turn answer remains grounded across follow-ups. Section 37.1 starts with the canonical dialogue-system architecture (ASR → NLU → DM → NLG → TTS in the classical era, collapsed to a single LLM-plus-policy in 2026) and traces the lineage from POMDP-based dialogue managers to the structured-output command generation that drives modern assistants like Claude, ChatGPT, and Gemini Live.

LexicaLexica, Cross-Citation-Tracking AI Agent
A whimsical top-down aerial view of two cartoon villages joined by a small bridge: the left village has stone walls, a library, and a banner reading Classical IR with TREC and SIGIR signs; the right village has glass towers and a banner reading LLM-RAG with Anthropic and LangChain signs; tiny travelers cross the bridge in both directions.
Figure 36.5.1b: The two villages of retrieval, classical IR (TREC, SIGIR) and LLM-RAG (Anthropic, LangChain), are connected by a well-traveled bridge. The reading list in this section spans both: the strongest practitioners borrow from each community rather than camping in one.
Further Reading
Manning, C. D., Raghavan, P., and Schutze, H. (2008). "Introduction to Information Retrieval." Cambridge University Press. nlp.stanford.edu/IR-book. The canonical IR textbook. Free online and still the right starting point for anyone new to the field; cited in every retrieval paper since 2008.
Lin, J., Nogueira, R., and Yates, A. (2021). "Pretrained Transformers for Text Ranking: BERT and Beyond." Morgan and Claypool. arxiv.org/abs/2010.06467. The bridge book from classical IR to the BERT-era transformer-retrieval literature. The right reading order is Manning et al. (2008) first, then this.
Anthropic (2024). "Introducing Contextual Retrieval." Anthropic News, September 2024. anthropic.com/news/contextual-retrieval. The opinionated essay that introduced prompt-engineered chunk contextualization plus the hybrid BM25-and-vector recipe; the canonical practitioner reference for production RAG quality improvements in 2024-26.
Yan, E. (2023-2024). "RAG / LLM Patterns" and "Building Reliable LLM Applications." eugeneyan.com. eugeneyan.com/writing/llm-patterns. The most comprehensive single-author treatment of production RAG patterns in the practitioner literature; required reading for retrieval engineers shipping to real users.
Hugging Face (2024-2026). "MTEB Leaderboard." Hugging Face Spaces. huggingface.co/spaces/mteb/leaderboard. The live embedder ranking. The canonical place to check the current state of the embedder field; cross-reference against in-domain evaluation before adopting any new model.
OpenAI (2024). "OpenAI Cookbook: Retrieval and Embeddings." cookbook.openai.com. cookbook.openai.com. Vendor-flavored but technically substantial notebooks covering chunking, hybrid retrieval, and evaluation in production-style code.

Foundational Information Retrieval

Robertson, S. E., & Sparck Jones, K. (1976). "Relevance Weighting of Search Terms" (origin of BM25 family). JASIS. onlinelibrary.wiley.com/doi/10.1002/asi.4630270302
Robertson, S. E., & Zaragoza, H. (2009). "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in IR. staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf
Manning, C. D., Raghavan, P., & Schutze, H. (2008). "Introduction to Information Retrieval." Cambridge University Press. nlp.stanford.edu/IR-book
Salton, G., & McGill, M. J. (1983). "Introduction to Modern Information Retrieval." McGraw-Hill. The canonical pre-statistical-IR reference; tf-idf and vector-space model.

Dense Retrieval and Embeddings

Karpukhin, V., Oguz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering" (DPR). EMNLP 2020. arXiv:2004.04906
Khattab, O., & Zaharia, M. (2020). "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT." SIGIR 2020. arXiv:2004.12832
Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). "BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation." arXiv:2402.03216
Reimers, N., & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. arXiv:1908.10084

Retrieval Libraries and Vector Indexes

Johnson, J., Douze, M., & Jegou, H. (2017). "Billion-scale similarity search with GPUs" (FAISS). arXiv:1702.08734
Malkov, Y. A., & Yashunin, D. A. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" (HNSW). IEEE TPAMI. arXiv:1603.09320
Pinecone (2023-2026). "Vector database fundamentals" and Pinecone documentation. pinecone.io/learn

Retrieval Benchmarks

Thakur, N., Reimers, N., Ruckle, A., Srivastava, A., & Gurevych, I. (2021). "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models." NeurIPS 2021. arXiv:2104.08663
Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023). "MTEB: Massive Text Embedding Benchmark." EACL 2023. arXiv:2210.07316
Craswell, N., Mitra, B., Yilmaz, E., et al. (2020-2023). "Overview of the TREC Deep Learning Tracks." NIST. trec.nist.gov/pubs/trec29/papers/OVERVIEW.DL.pdf
Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (RAG). NeurIPS 2020. arXiv:2005.11401