Section 19.10: Linking Experiment Runs to Git Commits

Linking Experiment Runs to Git Commits

Pattern 1: Capture the Commit Hash at Run Start

The cheapest, most portable recipe: shell out to git rev-parse HEAD and write the result into the tracker's config dict. It works the same in W&B, MLflow, CometML, or a plain JSON log.

import subprocess

def git_sha(short=False) -> str:
    """Return the current HEAD commit hash, or 'unknown' outside a repo."""
    fmt = ["--short"] if short else []
    try:
        return subprocess.check_output(
            ["git", "rev-parse", *fmt, "HEAD"], stderr=subprocess.DEVNULL,
        ).strip().decode()
    except (subprocess.CalledProcessError, FileNotFoundError):
        return "unknown"

# W&B: inject into config so it appears as a filterable column.
import wandb
wandb.init(project="llm-finetuning", config={"git_sha": git_sha()})

# MLflow: log_param works the same way.
import mlflow
with mlflow.start_run():
    mlflow.log_param("git_sha", git_sha())
    mlflow.log_param("git_sha_short", git_sha(short=True))

Code Fragment 19.10.1: A reusable git_sha() helper that injects the current HEAD commit into both W&B's config and MLflow's run parameters. The same helper works for any tracker that accepts a key/value dict.

Stored this way, the commit hash becomes a sortable, filterable column in the tracker's UI. To rebuild any past result you run git checkout <sha> in a clean clone, reinstall the pinned dependencies (Section I.4), and rerun the same command with the same config file.

Pattern 2: Detect a Dirty Working Tree

A commit hash only reproduces the run if the working tree matched the commit at run start. If you trained while uncommitted edits were sitting in your editor, the SHA is a lie. Use git describe --dirty (or test git status --porcelain for output) to make the lie visible.

import subprocess

def git_describe_dirty() -> str:
    """Return e.g. 'v1.0.2-3-gabc123' or 'abc123-dirty' if uncommitted edits exist."""
    return subprocess.check_output(
        ["git", "describe", "--always", "--dirty", "--tags"],
    ).strip().decode()

def ensure_clean_or_warn(strict=False):
    dirty = subprocess.check_output(["git", "status", "--porcelain"]).strip()
    if dirty:
        msg = f"Working tree has uncommitted changes:\n{dirty.decode()}"
        if strict:
            raise RuntimeError(msg)
        import warnings; warnings.warn(msg)

# Production training: fail fast if anything is uncommitted.
ensure_clean_or_warn(strict=True)
wandb.init(config={"git_describe": git_describe_dirty()})

Code Fragment 19.10.2: Detecting uncommitted edits at run start. strict=True turns a dirty tree into a hard failure (use for production runs); strict=False only warns (use for exploratory work). The git_describe output also embeds the nearest tag, which is convenient for human-readable run names.

Pattern 3: Git Tag per Milestone Run

For baseline runs, paper-result runs, or production-promoted checkpoints, lift the linkage from "buried in a config dict" to a first-class Git object: tag the commit with the run ID. This makes the inverse lookup trivial (git log --tags shows every run that ever shipped), and DVC's dvc exp save command provides a similar tag-as-experiment-ID workflow.

import subprocess
import wandb

run = wandb.init(project="llm-finetuning", name="lora-r16-baseline")

# ... training loop ...

# On completion: tag the commit with the run ID for permanent linkage.
tag = f"run/{run.id}"
msg = f"W&B run {run.name} ({run.url})"
subprocess.run(["git", "tag", "-a", tag, "-m", msg], check=True)
subprocess.run(["git", "push", "origin", tag], check=True)

wandb.log({"git_tag": tag})

Code Fragment 19.10.3: Tagging the commit with the W&B run ID at training completion. Once pushed, git checkout run/<id> recovers the exact code state for any past run. Use sparingly; one tag per milestone run, not per exploratory sweep.

Putting the Three Patterns Together

A complete experiment-launch helper combines all three: it refuses to start a strict run on a dirty tree, stamps every run with the SHA, and adds a Git tag once training completes successfully. A single ~30-line module then becomes the only entrypoint your training scripts call into.

Key Insight

Trackers store the results; Git stores the code. Linking the two through git_sha in run config is the cheapest, most durable reproducibility primitive in ML. Every other reproducibility practice (pinned deps, seeds, hardware capture, dataset checksums) layers on top of this base.

Note: For the Tracker APIs Themselves

Once the git-linkage layer is in place, you still need to learn how to run sweeps, query runs programmatically, register models, and build evaluation dashboards. All of that is in Experiment Tracking: K.1 covers W&B (runs, logging, sweeps, artifacts), K.2 covers MLflow (tracking, model registry), K.3 covers comparison and HPO, K.4 covers the model registry and deployment workflows, and K.5 covers LLM-specific observability.

What's Next?

This chapter completes the current part. The next part, Part V: Multimodal LLMs, opens a new arc; see the part index for chapter ordering.

Further Reading

Reproducibility and MLOps

Weights & Biases (2024). "W&B Git Integration." docs.wandb.ai/guides/runs/git-integration. Reference for linking W&B runs to git commits.

DVC (2024). "Data Version Control Documentation." dvc.org/doc. Reference for git-based ML data and pipeline versioning.