Building Conversational AI with LLMs and Agents
Appendix B: Machine Learning Essentials

B.3 Overfitting, Regularization, and Validation

Covered in Detail

For a comprehensive treatment of overfitting, the bias-variance tradeoff, and regularization techniques (L1, L2, dropout), see Section 0.1: ML Basics: Features, Optimization & Generalization. For dropout and batch normalization in neural networks, see Section 0.2: Deep Learning Essentials.

This page provides a quick-reference lookup for regularization techniques and data-splitting conventions. For worked examples, visualizations, and the bias-variance tradeoff derivation, see the main text references above.

Regularization Techniques Quick Reference

Regularization Techniques for LLM Practitioners
Technique How It Works LLM Usage
Dropout Randomly zeroes a fraction of activations during training Used in BERT; less common in modern autoregressive LLMs
Weight Decay (L2) Adds a penalty proportional to weight magnitude to the loss Standard in all LLM training (via AdamW)
L1 Regularization Adds a penalty proportional to the absolute value of weights; drives some weights to exactly zero Feature selection in classical ML; rarely used in LLMs
Early Stopping Stop training when validation performance stops improving Common in fine-tuning; pretraining usually runs to a compute budget
Data Augmentation Create synthetic training examples by transforming existing ones Paraphrasing, back-translation, synthetic data (Chapter 12)

Data Splitting Conventions

Standard Data Splits
Split Typical Size Purpose
Training ~80% Model learns from this data
Validation ~10% Tune hyperparameters, detect overfitting
Test ~10% Final unbiased performance estimate (evaluate once)
Warning: Data Contamination in LLMs

Because LLMs are pretrained on massive internet corpora, there is a risk that test set examples appeared in the pretraining data. This is called data contamination, and it can artificially inflate benchmark scores. Always check for contamination when evaluating, and prefer held-out or recently created benchmarks that could not have appeared in the training data.