Part I: Foundations

Part Overview

Part I establishes the core knowledge you will draw on throughout the rest of the book. We begin with machine learning and PyTorch fundamentals, then move into natural language processing, tokenization, sequence modeling, the Transformer architecture, and text generation. By the end of these six chapters, you will have a solid understanding of how text becomes numbers, how models learn patterns, and how the Transformer produces coherent language.

Chapters: 6 (Chapters 0 through 5) covering approximately 50,000 words of content with hands-on labs, worked examples, and exercises.

Big Picture

Every concept in this book rests on the foundations built here. Part I gives you the mathematical intuition, NLP building blocks, and Transformer fluency needed to understand, use, and customize large language models with confidence.

Chapter 00 ML and PyTorch Foundations

Prerequisite refresher covering core machine learning concepts (supervised learning, loss functions, gradient descent, regularization) and hands-on PyTorch programming. Also introduces reinforcement learning foundations for later RLHF work.

Chapter 01 Foundations of NLP and Text Representation

How machines understand text: from bag-of-words and TF-IDF through Word2Vec, GloVe, and contextual embeddings like ELMo and BERT. Builds intuition for dense vector spaces that power all modern NLP.

Chapter 02 Tokenization and Subword Models

The critical bridge between raw text and model input. Covers BPE, WordPiece, Unigram, and SentencePiece tokenizers, with practical guidance on choosing and training tokenizers for your domain.

Chapter 03 Sequence Models and the Attention Mechanism

From RNNs and LSTMs to the attention mechanism that revolutionized NLP. Understand the limitations of recurrent models and why attention became the foundation for Transformers.

Chapter 04 The Transformer Architecture

Deep dive into the Transformer: multi-head self-attention, positional encoding, feed-forward networks, layer normalization, and the encoder-decoder design. The architecture that powers every modern LLM.

Chapter 05 Decoding Strategies and Text Generation

How language models produce text: greedy decoding, beam search, temperature sampling, top-k, top-p (nucleus), and advanced strategies like speculative decoding and structured generation.

What Comes Next

Continue to Part II: Understanding LLMs.