External Reading & Communities

Section 5.6

A textbook gets you through fundamentals; communities and ongoing reading keep you current. Part I is the most stable layer of the LLM stack: the calculus, the optimizers, the attention mechanism are unlikely to be rewritten in the next decade. But the surrounding ecosystem (libraries, datasets, hardware) shifts every few months, and the best way to stay aligned with it is to read where practitioners actually publish. Skip "AI news" sites; they lag the primary sources by days and add confusion. Subscribe to the originals.

A three-tier reading diet. Stable foundations (Goodfellow, Karpathy, CS336) you read once and return to; monthly currents (Raschka, Lilian Weng) keep you aligned with what shipped this quarter; community forums catch what nobody bothered to write up...
Figure 5.6.1: A three-tier reading diet. Stable foundations (Goodfellow, Karpathy, CS336) you read once and return to; monthly currents (Raschka, Lilian Weng) keep you aligned with what shipped this quarter; community forums catch what nobody bothered to write up yet. The four-paper canon at the bottom (Transformer → BERT → GPT-2 → InstructGPT) is the minimum literature for reading any 2026 paper.

5.6.1 Books and foundational courses

5.6.2 Blogs and newsletters worth subscribing to

5.6.3 Communities

Tip: read papers in pairs

Whenever a frontier paper drops, pair it with one of the survey blogs above to get the "is this actually new?" calibration. The pattern that almost always works: read the abstract, read the paper's figures, read Sebastian Raschka or Lilian Weng's take, then decide whether to spend an hour on the full text. Most papers do not survive that filter, and the ones that do are the ones you remember.

Key Insight: Reading discipline beats reading volume

The single highest-leverage habit a Part I reader can develop is "one paper a week, fully understood". Skim ten papers a week and you remember nothing; read one a week and rebuild the math on paper, and by month three you can hold a conversation with the author.

If you finish Part I and want to go deeper before continuing the book: read the original Transformer paper ("Attention Is All You Need"), then BERT (Devlin et al. 2018), then GPT-2 (Radford et al. 2019), then the InstructGPT paper (Ouyang et al. 2022). Those four papers cover, in order, the architecture, the pretraining recipe, the scaling, and the alignment step that produced the modern API stack you will start using in Part III.

What's Next?

This chapter completes the current part. The next part, Part II: Understanding LLMs, opens a new arc; see the part index for chapter ordering.

Further Reading

Communities and Courses

Karpathy, A. (2024). "Neural Networks: Zero to Hero." karpathy.ai/zero-to-hero. The most-recommended LLM-from-scratch video course.
Hugging Face (2024). "NLP Course." huggingface.co/learn/nlp-course. Reference open course on transformer-based NLP.
Stanford (2024). "CS224N: Natural Language Processing with Deep Learning." web.stanford.edu/class/cs224n. The standard academic NLP course.

Reference Blogs

Alammar, J. (2018+). "The Illustrated Transformer." jalammar.github.io/illustrated-transformer. The most-cited visual explainer for transformer mechanics.