Section B.1: Learning Paradigms

Covered in Detail

For a comprehensive treatment of supervised learning, classification, and regression, see Section 0.1: ML Basics: Features, Optimization & Generalization. For a full introduction to reinforcement learning, see Section 0.4: Reinforcement Learning Foundations.

This page provides a concise at-a-glance summary of the three learning paradigms. Use it as a quick reminder; refer to the main chapters for explanations, examples, and code.

Learning Paradigms at a Glance

Paradigm	Training Signal	LLM Stage	Main Text Reference
Supervised	Labeled input-output pairs	Supervised fine-tuning (SFT)	Section 0.1
Self-supervised	Labels derived from data structure (next-token prediction)	Pretraining	Section 6.1
Reinforcement	Reward signal from environment or human feedback	RLHF / DPO alignment	Section 0.4, Section 16.1

Key Insight: LLMs Span All Three Paradigms

A modern LLM's lifecycle touches all three paradigms. Pretraining is self-supervised (predict the next token). Supervised fine-tuning (SFT) uses labeled instruction-response pairs. RLHF/DPO applies reinforcement learning with human preference signals.