Building Conversational AI with LLMs and Agents
Appendix B: Machine Learning Essentials

B.1 Learning Paradigms

Covered in Detail

For a comprehensive treatment of supervised learning, classification, and regression, see Section 0.1: ML Basics: Features, Optimization & Generalization. For a full introduction to reinforcement learning, see Section 0.4: Reinforcement Learning Foundations.

This page provides a concise at-a-glance summary of the three learning paradigms. Use it as a quick reminder; refer to the main chapters for explanations, examples, and code.

Learning Paradigms at a Glance
Paradigm Training Signal LLM Stage Main Text Reference
Supervised Labeled input-output pairs Supervised fine-tuning (SFT) Section 0.1
Self-supervised Labels derived from data structure (next-token prediction) Pretraining Section 6.1
Reinforcement Reward signal from environment or human feedback RLHF / DPO alignment Section 0.4, Section 16.1
Key Insight: LLMs Span All Three Paradigms

A modern LLM's lifecycle touches all three paradigms. Pretraining is self-supervised (predict the next token). Supervised fine-tuning (SFT) uses labeled instruction-response pairs. RLHF/DPO applies reinforcement learning with human preference signals.