Chapter 5: Tools of the Trade: Foundations Stack

Chapter opener illustration: Tools of the Trade: Foundations Stack.

"You can read every paper on backprop and still not be able to debug an exploding gradient. The library docs are where craftsmanship begins."
Pip, Toolbox-Curating AI Agent

Looking Back

Chapters 0 through 4 built the conceptual core: ML, NLP, attention, the Transformer, and decoding. This chapter is the practical toolkit that ties them together: PyTorch + Hugging Face + tokenizers + the small habits that distinguish a working LLM engineer from someone who has only read about it.

Big Picture

Part I built the language of foundations: tensors, gradients, sequence models, the attention head, the transformer block. This chapter consolidates the toolbox you reach for when those ideas become code. The engine is PyTorch 2.x (with JAX as the second-most-common research alternative), the numerical substrate is NumPy and SciPy, the classical-ML baselines come from scikit-learn, the tokenizer layer is Hugging Face tokenizers, and the canonical teaching datasets (MNIST, CIFAR-10, SQuAD, GLUE) anchor the exercises that follow. The reference models are BERT-base and GPT-2, the two checkpoints small enough to fit on a 6 GB GPU and large enough to teach every habit Part I depends on.

Chapter Overview

Part I taught the fundamentals: tensors and autograd, sequence models, attention, the transformer block, and decoding. This chapter consolidates the toolbox that those fundamentals become in practice. We walk the four editors that handle most LLM engineering, the libraries (PyTorch, NumPy, SciPy, scikit-learn, Hugging Face tokenizers) that ship the abstractions, the datasets (MNIST, CIFAR-10, SQuAD, GLUE) that anchor the exercises, the two reference models (BERT-base, GPT-2) sized for a 6 GB GPU, and the external reading and communities that keep your toolbox current.

Bookmark this chapter. Every later Part assumes the vocabulary locked in here, and every Tools of the Trade chapter that follows refers back to one of these primitives by name.

Note: Learning Objectives

Evaluate IDE and notebook platforms (VS Code, Cursor, JupyterLab, Colab) for LLM engineering workflows.
Install and validate the canonical Part I library set (torch, numpy, scipy, scikit-learn, tokenizers, datasets, matplotlib).
Choose the right teaching dataset (MNIST, CIFAR-10, SQuAD, GLUE) for a given foundations exercise.
Load and inspect the BERT-base and GPT-2 reference checkpoints on a 6 GB GPU.
Identify the external venues, blogs, and communities that maintain the modern LLM toolchain.

Library Shortcut

For the entirety of Part I you only need one install line:

uv pip install torch numpy scipy scikit-learn tokenizers datasets matplotlib

That covers every exercise from tensor manipulation through a small transformer trained on text. The uv installer (Astral, 2024) is 10-100x faster than plain pip and is the modern default. Add transformers when you reach Chapter 4 and want to load BERT or GPT-2 weights instead of training from scratch.

Sections in This Chapter

Prerequisites

All concepts from Chapters 0 through 4
Comfortable Python development setup (virtualenv, pip, git)
Familiarity with the Hugging Face ecosystem helps but is not required

What Comes Next

Part II moves from "how a transformer works" to "what a 70-billion-parameter language model actually does": tokenization at scale, the modern open-weight zoo, mechanistic interpretability, and inference optimization. Chapter 12 closes Part II with its own Tools of the Trade chapter, focused on tokenizer libraries, pretraining corpora, and the model-loading ecosystem you will live inside for the rest of the book. Continue to Chapter 6: Pretraining, Scaling Laws & Data Curation.