Tools of the Trade: Training & Adaptation Stack

Consolidated reference: platforms, libraries, datasets, models, and external resources for this part.

Chapter opener illustration: Tools of the Trade: Training & Adaptation Stack.

"Pretraining is geology. Fine-tuning is gardening."

PipPip, Workflow-Optimizing AI Agent
Looking Back

Chapters 15 through 18 built models. This chapter is the framework grid: TRL, Unsloth, Axolotl, PEFT, DeepSpeed, accelerate, plus the small command-line habits that turn a one-off experiment into a reproducible pipeline.

Big Picture

Part IV moved from "calling models" to "shaping models": SFT, instruction tuning, RLHF/RLAIF, DPO, and the parameter-efficient methods (LoRA, QLoRA). This chapter consolidates the toolbox: transformers as the training engine, TRL for SFT/DPO/RLHF/GRPO, PEFT for adapters, axolotl and lit-gpt as the high-level recipe layers, the instruction datasets (Alpaca, ShareGPT, FineWeb-Edu), and the experiment trackers (Weights & Biases, MLflow) that keep multi-day runs reproducible.

Chapter Overview

Part IV moved you from "call a hosted model" to "adapt a model to your task": continued pretraining, supervised fine-tuning, LoRA and QLoRA, preference optimization (DPO, GRPO), and the reasoning-recipe lineage that DeepSeek-R1 made canonical in 2025. This chapter consolidates the toolchain: Databricks and the managed-cluster platforms, Weights and Biases plus MLflow for tracking, Delta Lake and the Hugging Face datasets / tokenizers libraries for data, the PEFT and TRL stacks for parameter-efficient training, and the Ray ecosystem for distributed work.

These tools are the working substrate for every training run in the rest of the book. The deeper sections (Hugging Face Trainer, MLflow, Ray Train, distributed strategies) double as standalone tutorials you can return to whenever a recipe assumes them.

Note: Learning Objectives
Library Shortcut

If you want one stack that covers SFT, LoRA, DPO, and GRPO:

pip install transformers trl peft accelerate bitsandbytes

That set fine-tunes anything up to 70B on a single 80 GB GPU with QLoRA, runs DPO out of the box, and integrates with W&B for tracking. Section 19.4 covers the distributed-training side.

Sections in This Chapter

Prerequisites

What Comes Next

Next: Chapter 20: Audio and Music Generation, opening Part V. Parts I through IV stayed inside text: tokens in, tokens out. Part V breaks that frame. We start with audio (TTS, voice cloning, music generation) and proceed through document understanding, vision-language models, 3D scenes, and vision-language-action models that drive robots. By the end you will see why "multimodal" is no longer a separate branch but the new default architecture.