PyTorch Reference

A standalone mini-book that takes the reader from a fresh `torch.tensor` to a profiled, sharded, deployed model

"I learned PyTorch in an afternoon. I have been chasing shape mismatches ever since. Somewhere between dim 0 and dim 1 lies enlightenment, and a missing unsqueeze."

TensorTensor, Slightly Humbled nn.Module
Big Picture

PyTorch is the dominant research framework for deep learning and the default toolkit used throughout this book. Every chapter that touches model code, from the transformer architecture in Chapter 4 to the inference optimizations in Chapter 9 and the alignment loops in Chapter 18, ultimately resolves to PyTorch tensors, modules, and optimizers. This appendix gathers the PyTorch knowledge a practitioner needs into a single self-contained reference.

The ten sections move from primitives toward production. Sections E.1 through E.3 cover the static building blocks: tensors, autograd, and modules. Sections E.4 and E.5 cover the moving parts of training: data pipelines and the canonical training loop. Sections E.6 through E.8 add the performance dimensions every serious user encounters: mixed precision, distributed training, and compilation. Sections E.9 and E.10 close with debugging recipes and deployment patterns.

This appendix is designed as a lookup reference. A reader new to PyTorch can read it front to back as a mini-book. A reader who already knows the basics can jump directly to whichever section answers the current question. Either way, the goal is to make the implicit knowledge that PyTorch authors accumulate over time explicit and accessible.

The narrative-friendly introduction to PyTorch lives in Chapter 0 (ML and PyTorch Foundations), which weaves PyTorch into a story about training a small classifier. This appendix complements that chapter: where Chapter 0 explains why, this appendix documents how, with the depth a practitioner needs when training runs misbehave at 2 a.m.

Note: Prerequisites

Comfort with Python (classes, decorators, context managers) is assumed. NumPy familiarity helps because PyTorch tensors share most of NumPy's slicing and broadcasting semantics. No prior PyTorch experience is required; the appendix introduces every concept from scratch. Readers who want a refresher on the underlying mathematics should consult Appendix A (Mathematical Foundations) first.

Real-World Scenario: When to Use This Appendix

Open Section E.1 the first time a tensor shape error appears. Open Section E.5 when designing a training loop from scratch. Open Section E.6 when a model OOMs on a 6 GB GPU and switching to bfloat16 is on the table. Open Section E.7 when one GPU is no longer enough. Open Section E.8 when training is suspiciously slow and the profiler is needed. Open Section E.9 when gradients turn into NaN. Open Section E.10 when the model is ready to ship.

Sections