Part II: Understanding LLMs

Part Overview

Part II takes you inside the black box. You will learn how LLMs are pretrained on massive corpora, what scaling laws predict about model performance, and how modern architectures (GPT, LLaMA, Mistral, Gemma) differ in their design choices. The part concludes with inference optimization: quantization, KV-cache management, batching strategies, and efficient serving frameworks that make LLMs practical in production.

Chapters: 5 (Chapters 6 through 9, plus Chapter 18: Interpretability). Builds directly on the Transformer foundations from Part I and prepares you for hands-on LLM work in Part III.

Big Picture

Before you can effectively use or customize an LLM, you need to understand how it was built. Part II reveals the training recipes, architectural trade-offs, and serving strategies that determine what a model can do and how much it costs to run.

Chapter 06 Pre-training, Scaling Laws and Data Curation

How LLMs learn from raw text: pretraining objectives, dataset construction, scaling laws (Chinchilla, Kaplan), data mixing strategies, deduplication, and the economics of large-scale training.

Chapter 07 Modern LLM Landscape and Model Internals

Survey of major model families (GPT, LLaMA, Mistral, Gemma, Claude, Gemini), their architectural innovations, and how to read model cards. Covers open vs. closed models and the rapidly evolving landscape.

Chapter 08 Reasoning Models & Test-Time Compute

How reasoning models like o1, o3, and DeepSeek-R1 improve outputs by allocating more compute at inference time. Covers the test-time compute paradigm, training with reinforcement learning, and compute-optimal inference strategies.

Chapter 09 Inference Optimization and Efficient Serving

Making LLMs fast and affordable: quantization (GPTQ, AWQ, GGUF), KV-cache optimization, continuous batching, speculative decoding, and serving frameworks (vLLM, TGI, SGLang).

Chapter 18 Interpretability & Mechanistic Understanding

Understanding what LLMs have learned and why they produce specific outputs. Covers attention analysis, probing classifiers, mechanistic interpretability (circuits, superposition), and practical tools for explaining model behavior.

What Comes Next

Continue to Part III: Working with LLMs.