Section 19.12: MLflow Deep Dive

MLflow Deep Dive

Big Picture

MLflow is an open-source platform for managing the complete ML lifecycle: experiment tracking, reproducible packaging, model versioning, and deployment. Unlike W&B (a hosted SaaS platform), MLflow can run entirely on your own infrastructure, making it popular in regulated industries and organizations with strict data governance requirements. This section covers MLflow's tracking API, project packaging, and the model registry that manages model versions from development through production.

1. Installation and Setup

MLflow installs as a Python package and includes a local tracking server with a web UI. For team collaboration, you can deploy a remote tracking server backed by a database and artifact store.

# Install MLflow
pip install mlflow
# Start the local tracking UI (runs on port 5000)
# In a terminal:
# mlflow ui --host 0.0.0.0 --port 5000
# In Python, set the tracking URI
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
# Or use a remote server
mlflow.set_tracking_uri("https://mlflow.mycompany.com")

Code Fragment 19.12.1: MLflow installs as a Python package and includes a local tracking server with a web UI.

The tracking URI tells the MLflow client where to send data. For local development, the default file-based store (mlruns/ directory) works well. For team use, deploy a tracking server backed by PostgreSQL or MySQL for metadata and S3, Azure Blob, or GCS for artifacts.

2. Experiments and Runs

MLflow organizes work into experiments (groups of related runs) and runs (individual executions). Create an experiment for each project or research question.

import mlflow

# Create or get an experiment
mlflow.set_experiment("llm-fine-tuning")

# Start a run
with mlflow.start_run(run_name="gpt2-lora-baseline") as run:
    # Log parameters (hyperparameters, configuration)
    mlflow.log_param("model", "gpt2")
    mlflow.log_param("learning_rate", 2e-5)
    mlflow.log_param("batch_size", 16)
    mlflow.log_param("lora_rank", 8)
    mlflow.log_param("dataset", "alpaca-52k")

    # Log metrics (can be called multiple times for time series)
    for epoch in range(3):
        train_loss = train_one_epoch(model, dataloader)
        val_loss = evaluate(model, val_dataloader)

        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)

    # Log final metrics
    mlflow.log_metric("best_val_loss", min(val_losses))

    print(f"Run ID: {run.info.run_id}")

Output: W&B run initialized: fine-tuning-exp-001 Project: llm-experiments Logged: {'epoch': 1, 'train_loss': 0.45, 'val_loss': 0.52} Logged: {'epoch': 2, 'train_loss': 0.28, 'val_loss': 0.35} Logged: {'epoch': 3, 'train_loss': 0.19, 'val_loss': 0.24}

Code Fragment 19.12.2: Create or get an experiment

The with block ensures the run is closed properly even if an exception occurs. You can also use mlflow.start_run() and mlflow.end_run() explicitly, but the context manager pattern is safer.

Key Insight

MLflow distinguishes between parameters (set once per run, immutable) and metrics (logged repeatedly with a step counter). Log hyperparameters as parameters and training/evaluation scores as metrics. This distinction enables proper filtering and comparison in the UI.

3. Logging Artifacts

Artifacts are files associated with a run: model checkpoints, datasets, plots, configuration files, or any other output you want to preserve.

import mlflow
with mlflow.start_run():
    # Log a single file
    mlflow.log_artifact("config.yaml")
    # Log all files in a directory
    mlflow.log_artifacts("./outputs/plots/", artifact_path="plots")
    # Log a model checkpoint with metadata
    mlflow.log_artifact(
        "checkpoints/best_model.pt",
        artifact_path="model",
        )
    # Log a text file with predictions
    with open("predictions.txt", "w") as f:
        for prompt, output in predictions:
            f.write(f"Prompt: {prompt}\nOutput: {output}\n\n")
            mlflow.log_artifact("predictions.txt")

Code Fragment 19.12.3: Log a single file

Artifacts are stored in the configured artifact store (local filesystem, S3, Azure Blob, or GCS). Each run gets its own artifact directory, so there is no risk of overwriting artifacts from other runs.

4. Autologging

MLflow provides autologging integrations for popular frameworks. Enable autologging and MLflow captures parameters, metrics, and models automatically without manual log_param calls.

import mlflow

# Autolog for PyTorch / Hugging Face Transformers
mlflow.transformers.autolog(
    log_input_examples=True,
    log_model_signatures=True,
)

# Autolog for scikit-learn
mlflow.sklearn.autolog()

# Now your training code is instrumented automatically
from transformers import Trainer, TrainingArguments

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./results",
        num_train_epochs=3,
        per_device_train_batch_size=16,
    ),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# This automatically logs all training args, metrics, and the model
trainer.train()

Code Fragment 19.12.4: Autolog for PyTorch / Hugging Face Transformers

Tip

Use autologging for exploratory work and manual logging for production pipelines. Autologging captures everything, which is convenient but can produce noisy experiment records. For production, explicitly log only the parameters and metrics you need for decision-making.

5. MLflow Projects: Reproducible Packaging (Configuration View)

An MLflow Project is a self-contained package that declares dependencies, entry points, and parameters in an MLproject YAML file. The Project format is a configuration-management abstraction: it captures what to run and with what environment, independent of where the code lives. Inside an MLflow Project, the experiment-tracking code (params, metrics, artifacts) is the same as in any other run.

# MLproject file (lives at repo root, alongside conda.yaml)
name: llm-fine-tuning
conda_env: conda.yaml

entry_points:
  train:
    parameters:
      learning_rate: {type: float, default: 2e-5}
      batch_size:    {type: int,   default: 16}
      epochs:        {type: int,   default: 3}
      model_name:    {type: str,   default: "gpt2"}
    command: "python train.py --lr {learning_rate} --bs {batch_size}"

  evaluate:
    parameters:
      model_path: {type: str}
    command: "python evaluate.py --model {model_path}"

Code Fragment 19.12.5: An MLproject file declaring two entry points (train and evaluate), each with typed parameters and a templated command. The conda_env key points at a sibling file that pins the runtime environment.

The Project format ensures that every run gets the same dependencies regardless of who launches it. Use it when you want to formalize "this is the canonical way to train our model" and stop relying on tribal knowledge or undocumented launcher scripts.

Key Insight: Running an MLproject from a Git Repository

MLflow Projects can also be invoked directly from a Git URL (mlflow run https://github.com/...), but that is a version-control invocation pattern, not an experiment-tracking concern. See Section 5.2 (Libraries & Frameworks) for the Git-driven invocation recipe and the CI/CD wiring that goes with it.

6. The Model Registry

The model registry provides a centralized store for model versions, with stage transitions (Staging, Production, Archived) and access controls.

import mlflow
# Log a model to the registry
with mlflow.start_run():
    # Train your model...
    mlflow.log_metric("val_accuracy", 0.92)
    # Register the model
    mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        artifact_path="model",
        registered_model_name="gpt2-lora-qa",
        )
    # Transition a model version to production
    from mlflow import MlflowClient
    client = MlflowClient()
    client.transition_model_version_stage(
        name="gpt2-lora-qa",
        version=3,
        stage="Production",
        )
    # Load the production model
    model_uri = "models:/gpt2-lora-qa/Production"
    loaded_model = mlflow.transformers.load_model(model_uri)

Code Fragment 19.12.6: Log a model to the registry

Warning

The model registry's stage transitions (Staging, Production, Archived) are being replaced by model aliases in newer MLflow versions. Aliases are more flexible: you can define custom tags like "champion" and "challenger" rather than being limited to fixed stages. Check your MLflow version and use the recommended API.

7. Querying Runs Programmatically

The MLflow client API lets you search and compare runs programmatically, which is useful for automated model selection and reporting.

from mlflow import MlflowClient

client = MlflowClient()

# Search for runs matching criteria
runs = client.search_runs(
    experiment_ids=["1"],
    filter_string="metrics.val_accuracy > 0.85 and params.model = 'gpt2'",
    order_by=["metrics.val_accuracy DESC"],
    max_results=10,
)

# Print results
for run in runs:
    print(
        f"Run {run.info.run_id[:8]}: "
        f"accuracy={run.data.metrics['val_accuracy']:.3f}, "
        f"lr={run.data.params['learning_rate']}"
    )

# Find the best run
best_run = runs[0]
print(f"Best run: {best_run.info.run_id}")

Output: W&B sweep initiated: sweep_abc123 Trial 1/10: lr=1e-4, batch_size=16, val_loss=0.34 Trial 5/10: lr=3e-5, batch_size=32, val_loss=0.22 Trial 10/10: lr=5e-5, batch_size=32, val_loss=0.21 Best config: lr=5e-5, batch_size=32

Code Fragment 19.12.7: Search for runs matching criteria

The filter string syntax supports comparisons on parameters, metrics, and tags. This programmatic access is essential for building automated ML pipelines where model selection is a code-driven decision, not a manual one.

What's Next?

In the next section, Section 19.13: Experiment Comparison and Hyperparameter Optimization, we build on the material covered here.

Further Reading

MLflow

MLflow (2024). "MLflow Documentation." mlflow.org/docs/latest. Authoritative reference for the MLflow tracking and registry platform.

Databricks (2018). "MLflow: A Platform for Machine Learning Development." databricks.com/blog/2018/06/05/introducing-mlflow. The original MLflow announcement blog post.