A textbook gets you through fundamentals; communities and ongoing reading keep you current. Part I is the most stable layer of the LLM stack: the calculus, the optimizers, the attention mechanism are unlikely to be rewritten in the next decade. But the surrounding ecosystem (libraries, datasets, hardware) shifts every few months, and the best way to stay aligned with it is to read where practitioners actually publish. Skip "AI news" sites; they lag the primary sources by days and add confusion. Subscribe to the originals.
5.6.1 Books and foundational courses
- Deep Learning (Goodfellow, Bengio, Courville, 2016): the canonical math foundation. Dated on architectures but unbeaten on the basics.
- Dive into Deep Learning (Zhang et al.): free, code-first, every example runs in PyTorch, MXNet, or JAX. The most up-to-date free deep-learning textbook in 2026.
- fast.ai's Practical Deep Learning: the strongest "learn by training" course; the only fundamentals course that consistently produces practitioners.
- Andrej Karpathy: Neural Networks: Zero to Hero: build a transformer from scratch in Python. The clearest from-scratch implementation of a small GPT publicly available.
- Stanford CS224N (NLP with Deep Learning): lecture notes and assignments freely available; the academic counterpart to fast.ai's practical course.
- Stanford CS336: Language Models from Scratch (Percy Liang, 2024+): the newer Stanford course that takes you from pretraining to alignment with all materials online.
- Maxime Labonne's LLM Course: the most-starred open community curriculum in 2025-26; complements the Karpathy series with a fine-tuning focus.
- Hugging Face NLP Course and Smol-Course: free 2025 curricula targeting this Part I / II audience directly.
- 3Blue1Brown transformer-from-scratch series (2024): the most-watched visual explainer of attention and embeddings.
5.6.2 Blogs and newsletters worth subscribing to
- Ahead of AI (Sebastian Raschka): the single best monthly newsletter for "what actually shipped in 2026, with code and benchmarks".
- Lilian Weng: deep technical surveys of attention, agents, prompt engineering, alignment. Best for "I want to understand a topic in 30 minutes".
- Cameron R. Wolfe: long-form essays on scaling laws, alignment, and frontier architectures. Cited throughout Parts XI and XII of this book.
- Distill: the original "interactive ML explainer" journal; dormant since 2021, but every article remains worth reading. The spiritual successor is Anthropic Transformer Circuits, which inherited the interactive-essay tradition for mech-interp work.
- Horace He: Making Deep Learning Go Brrrr From First Principles: the clearest explainer of why GPU code is slow, written by a PyTorch core developer.
5.6.3 Communities
- PyTorch Forums: the canonical place to get help with PyTorch. Search before you post.
- Hugging Face Forums: model cards, dataset issues, fine-tuning recipes.
- r/MachineLearning: still useful for paper discussion threads, especially the [D] flair.
- r/LocalLLaMA: the hub for open-weight model fine-tuning, GPU benchmarking, and tooling around llama.cpp / Ollama / vLLM.
- Twitter/X: still where commercial-lab announcements break first. Cultivate a tight list (under 50 accounts) of paper authors and lab directors; ignore the rest.
- Bluesky: by 2025-26 the larger venue for academic ML posters and university-affiliated researchers; the right place to follow CS / linguistics academia. Many researchers cross-post both, but Bluesky often gets the first technical thread.
Whenever a frontier paper drops, pair it with one of the survey blogs above to get the "is this actually new?" calibration. The pattern that almost always works: read the abstract, read the paper's figures, read Sebastian Raschka or Lilian Weng's take, then decide whether to spend an hour on the full text. Most papers do not survive that filter, and the ones that do are the ones you remember.
The single highest-leverage habit a Part I reader can develop is "one paper a week, fully understood". Skim ten papers a week and you remember nothing; read one a week and rebuild the math on paper, and by month three you can hold a conversation with the author.
5.6.4 What to read next, in order
If you finish Part I and want to go deeper before continuing the book: read the original Transformer paper ("Attention Is All You Need"), then BERT (Devlin et al. 2018), then GPT-2 (Radford et al. 2019), then the InstructGPT paper (Ouyang et al. 2022). Those four papers cover, in order, the architecture, the pretraining recipe, the scaling, and the alignment step that produced the modern API stack you will start using in Part III.
What's Next?
This chapter completes the current part. The next part, Part II: Understanding LLMs, opens a new arc; see the part index for chapter ordering.