
Part Overview
Part II takes you inside the black box. You will learn how LLMs are pretrained on massive corpora, what scaling laws predict about model performance, and how modern frontier and open-weight families differ in their design choices. The part also covers reasoning models that trade more inference-time compute for higher quality, inference optimization techniques that make models practical in production, and the mechanistic interpretability work that has begun to reveal what a trained transformer actually computes. It closes with a Tools of the Trade chapter on the 2026 model zoo and the tokenizer/interpretability stack.
Chapters: 5 (Chapters 6 through 10). Builds directly on the Transformer foundations from Part I and prepares you for hands-on LLM work in Part III.
Before you can effectively use or customize an LLM, you need to understand how it was built and how it runs. Part II reveals the training recipes, architectural trade-offs, reasoning-model designs, serving strategies, and interpretability methods that determine what a model can do and how much it costs to run.
- 6.1 BERT, GPT, T5: Three Bets That Shaped Today's LLMs
- 6.2 Pretraining Objectives & Paradigms
- 6.3 Scaling Laws & Compute-Optimal Training
- 6.4 Data Curation at Scale
- 6.5 Optimizers & Training Dynamics
- 6.6 Distributed Training at Scale
- 6.6a Mixed Precision, Checkpointing, 3D Parallelism & Ring Attention
- 6.7 In-Context Learning Theory
- 6.8 Production LLM Training Systems: Megatron, Elastic Training, and Fault Tolerance
- 6.9 Lab: Pretrain a Tiny Language Model
- 8.1 Trading FLOPs for IQ: The Test-Time Compute Bet
- 8.1a KV Cache Growth, PRMs vs ORMs & Exercises
- 8.2 Reasoning Model Architectures: o1, o3, R1, QwQ
- 8.3 Training Reasoning Models: RLVR, GRPO, PRM
- 8.4 Prompting and Using Reasoning Models
- 8.5 Compute-Optimal Inference and Evaluation
- 8.6 Formal and Verifiable Reasoning with Proof Assistants
- 8.6a AlphaProof, Self-Play RL, and Evaluation for Formal Proving
- 10.1 Attention Analysis & Probing
- 10.2 Mechanistic Interpretability
- 10.3 Practical Interpretability for Applications
- 10.3a Model Editing, Concept Erasure & Debugging
- 10.4 Explaining Transformers
- 10.4b Interpretability Tooling, Evaluation, and LLM-Assisted Explanation
- 10.5 Platforms
- 10.6 Libraries & Frameworks
- 10.7 Datasets & Benchmarks
- 10.8 Models
- 10.9 External Reading & Communities
What's Next?
This part begins with Chapter 6: Pretraining, Scaling Laws & Data Curation. Each chapter builds on the previous one, so we recommend reading Part II in order.