Covered in Detail
For a comprehensive treatment of supervised learning, classification, and regression, see Section 0.1: ML Basics: Features, Optimization & Generalization. For a full introduction to reinforcement learning, see Section 0.4: Reinforcement Learning Foundations.
This page provides a concise at-a-glance summary of the three learning paradigms. Use it as a quick reminder; refer to the main chapters for explanations, examples, and code.
Learning Paradigms at a Glance
| Paradigm | Training Signal | LLM Stage | Main Text Reference |
|---|---|---|---|
| Supervised | Labeled input-output pairs | Supervised fine-tuning (SFT) | Section 0.1 |
| Self-supervised | Labels derived from data structure (next-token prediction) | Pretraining | Section 6.1 |
| Reinforcement | Reward signal from environment or human feedback | RLHF / DPO alignment | Section 0.4, Section 16.1 |
Key Insight: LLMs Span All Three Paradigms
A modern LLM's lifecycle touches all three paradigms. Pretraining is self-supervised (predict the next token). Supervised fine-tuning (SFT) uses labeled instruction-response pairs. RLHF/DPO applies reinforcement learning with human preference signals.