Machine learning projects have unique version control challenges that standard software development workflows do not address well. Model weights are gigabytes in size. Datasets may be too large for Git. Results depend on random seeds, hardware, and library versions. This appendix covers the tools and practices that bring order to this complexity: Git workflows for ML codebases, Data Version Control (DVC) for large artifacts, experiment tracking with MLflow or Weights and Biases, and reproducibility practices that make results trustworthy and shareable.
Reproducibility is increasingly a professional and scientific requirement. A fine-tuning run whose configuration was not tracked cannot be reproduced, reported, or improved upon systematically. As LLM projects grow in scale and team size, the difference between ad hoc experimentation and a disciplined workflow becomes the difference between rapid iteration and recurring confusion about which checkpoint produced which result.
This appendix serves engineers and researchers who are comfortable with Git for software projects but have not applied version control discipline to ML artifacts. It is equally relevant for solo practitioners and teams working on shared LLM projects.
The reproducibility practices here directly support fine-tuning work in Chapter 14 (Fine-Tuning Fundamentals) and Chapter 15 (PEFT), where tracking configurations and checkpoints is essential. Experiment tracking concepts connect to evaluation workflows in Chapter 29. The production engineering practices of Chapter 31 assume the kind of disciplined artifact management this appendix establishes.
Basic Git knowledge is assumed: committing, branching, and pushing to a remote repository. If you have never used Git before, work through an introductory tutorial first. The DVC and experiment tracking sections require a Python environment (see Appendix D) and familiarity with running CLI tools. No ML-specific knowledge is required for this appendix.
Read Section E.1 before starting any collaborative ML project. Add DVC (Section E.2) when your datasets or model checkpoints exceed what Git can handle. Set up experiment tracking (Section E.3) before your first fine-tuning run; retrofitting tracking to an existing project is significantly more painful. Return to Section E.4 whenever a result cannot be reproduced or when preparing work for publication. If your project is a solo API-calling script without training, this appendix can be deferred until you begin working with model artifacts.