Part IV: Training and Adapting

Chapter 15: Parameter-Efficient Fine-Tuning (PEFT)

"The best parameter is the one you don't have to train."

LoRA LoRA, Refreshingly Frugal AI Agent
Parameter-Efficient Fine-Tuning chapter illustration
Figure 15.0.1: Why retrain billions of parameters when a few clever sticky notes on the right layers can do the job? PEFT adapters are the ultimate lightweight upgrade.

Chapter Overview

Full fine-tuning of a 7B parameter model requires about 14 GB just for the weights in FP16, plus optimizer states that push the total past 56 GB. For most practitioners, this puts full fine-tuning out of reach without expensive multi-GPU setups. Parameter-efficient fine-tuning (PEFT) methods solve this problem by training only a tiny fraction of parameters (often less than 1%) while achieving quality that rivals or matches full fine-tuning.

This chapter covers the most important PEFT techniques in depth, starting with LoRA and QLoRA (the dominant methods in practice) and extending to newer approaches like DoRA, LoRA+, and adapter-based methods. You will learn not just the theory behind each method, but also how to configure hyperparameters, select target modules, and merge adapters for efficient deployment.

The final section surveys the rapidly evolving ecosystem of training platforms and tools, from Unsloth (which delivers 2x speedups with half the memory) to managed platforms like Axolotl and LLaMA-Factory. By the end of this chapter, you will be able to fine-tune any open-weight model on a single consumer GPU.

Big Picture

Full fine-tuning is expensive and often unnecessary. Parameter-efficient methods like LoRA and QLoRA let you adapt large models by training only a small fraction of their parameters, dramatically reducing compute costs. These techniques make fine-tuning accessible even on consumer hardware, a practical skill used throughout Parts V and VI.

Learning Objectives

Prerequisites

Sections

What's Next?

In the next chapter, Chapter 16: Distillation and Model Merging, we learn to create efficient, specialized models through knowledge distillation and model merging.