Part I: Foundations

Building the mathematical, linguistic, and engineering foundations that underpin every modern Large Language Model.

Part Overview

Part I establishes the core knowledge you will draw on throughout the rest of the book. We begin with machine learning and PyTorch fundamentals, then move into natural language processing, tokenization, sequence modeling, the Transformer architecture, and text generation. By the end of these six chapters, you will have a solid understanding of how text becomes numbers, how models learn patterns, and how the Transformer produces coherent language.

Chapters: 6 (Chapters 0 through 5) covering approximately 50,000 words of content with hands-on labs, worked examples, and exercises.

Big Picture

Every concept in this book rests on the foundations built here. Part I gives you the mathematical intuition, NLP building blocks, and Transformer fluency needed to understand, use, and customize large language models with confidence.

Prerequisite refresher covering core machine learning concepts (supervised learning, loss functions, gradient descent, regularization) and hands-on PyTorch programming. Also introduces reinforcement learning foundations for later RLHF work.

How machines understand text: from bag-of-words and TF-IDF through Word2Vec, GloVe, and contextual embeddings like ELMo and BERT. Builds intuition for dense vector spaces that power all modern NLP.

The critical bridge between raw text and model input. Covers BPE, WordPiece, Unigram, and SentencePiece tokenizers, with practical guidance on choosing and training tokenizers for your domain.

From RNNs and LSTMs to the attention mechanism that revolutionized NLP. Understand the limitations of recurrent models and why attention became the foundation for Transformers.

Deep dive into the Transformer: multi-head self-attention, positional encoding, feed-forward networks, layer normalization, and the encoder-decoder design. The architecture that powers every modern LLM.

How language models produce text: greedy decoding, beam search, temperature sampling, top-k, top-p (nucleus), and advanced strategies like speculative decoding and structured generation.

What Comes Next

Continue to Part II: Understanding LLMs.