FM.3.B: Course B: Undergraduate Research

Prerequisites

Python programming. Calculus (derivatives, chain rule, gradients). Linear algebra (eigenvalues, matrix decomposition). Probability and statistics (Bayes' theorem, distributions). Recommended: one introductory ML course.

Course B: Undergraduate Research

Focus: Architecture internals, training methods, interpretability. Students leave with a deep understanding of how LLMs work and how to study them. This pathway trades breadth for depth: it covers the same foundations as Course A but then dives into pre-training, scaling laws, PEFT, and alignment. The reasoning is that future researchers need to understand the training pipeline end to end, since that is where novel contributions happen.

14-Week Syllabus

Week	Topics	Lab / Assignment
1	ML and PyTorch Foundations	Build and train an image classifier in PyTorch
2	NLP, Text Representation, Tokenization (Ch 01 through 02)	Compare tokenizer vocabulary coverage across languages
3	Sequence Models and Attention	Implement attention from scratch, visualize attention weights
4	The Transformer Architecture	Build a minimal transformer (encoder + decoder)
5	Decoding Strategies	Implement nucleus sampling, measure diversity vs. quality
6	Pre-training and Scaling Laws	Reproduce a scaling law curve on a small model
7	Modern LLM Landscape and Reasoning Models (Ch 07 through 08)	Compare model architectures (paper reading assignment)
8	Inference Optimization	Benchmark KV-cache and quantization effects
9	Synthetic Data Generation	Generate and validate a synthetic training dataset
10	Fine-Tuning and PEFT (Ch 14 through 15)	Compare full fine-tuning vs. LoRA on the same task
11	Alignment (RLHF, DPO)	Implement DPO training on a preference dataset
12	Interpretability	Probe internal representations with logit lens
13	Emerging Architectures and AI and Society (Ch 34 through 35)	Write a research proposal on an open problem
14	Final project presentations	Research paper replication or extension (individual project)

Recommended Appendices

Appendix D: Environment Setup – set up your development environment before Week 1
Appendix K: HuggingFace: Transformers, Datasets, and Hub – access pretrained models and datasets for labs
Appendix A: Math Foundations – review the linear algebra and probability behind attention
Appendix R: Experiment Tracking – log experiments systematically for your research project

What Comes Next

Return to the Course Syllabi overview to explore other courses and reading tracks, or proceed to FM.4: How to Use This Book for a quick orientation on conventions and callout types.