When you run dozens of training experiments with different hyperparameters, keeping track of what you tried, what worked, and what failed becomes critical. Experiment tracking tools log metrics, hyperparameters, and artifacts automatically.
Weights & Biases (W&B)
W&B is the most popular experiment tracker in the LLM community. It integrates directly with the Hugging Face Trainer. Code Fragment E.3.1 below puts this into practice.
# Experiment tracking setup
# Key operations: loss calculation, training loop, structured logging
import wandb
from transformers import TrainingArguments, Trainer
# Initialize a W&B run
wandb.init(project="llm-finetuning", name="lora-r16-lr2e4")
training_args = TrainingArguments(
output_dir="./output",
report_to="wandb", # enables automatic logging
logging_steps=10,
num_train_epochs=3,
learning_rate=2e-4,
)
# The Trainer automatically logs loss, learning rate, etc. to W&B
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=eval_data,
)
trainer.train()
wandb.finish()
MLflow
MLflow is an open-source alternative that can be self-hosted. It is popular in enterprise settings where data must stay on-premise. Code Fragment E.3.2 below puts this into practice.
# Experiment tracking setup
# Key operations: loss calculation, training loop, monitoring and metrics
import mlflow
mlflow.set_experiment("llm-finetuning")
with mlflow.start_run(run_name="lora-r16-lr2e4"):
mlflow.log_params({
"model": "llama-3.1-8b",
"lora_r": 16,
"learning_rate": 2e-4,
"epochs": 3,
})
# ... training code ...
mlflow.log_metrics({
"eval_loss": 0.42,
"eval_accuracy": 0.87,
})
# Log the model as an artifact
mlflow.log_artifact("./output/adapter_model.safetensors")
| Feature | W&B | MLflow |
|---|---|---|
| Hosting | Cloud (free tier) or self-hosted | Self-hosted or Databricks |
| HF Trainer integration | Built-in | Via callback |
| Visualization | Excellent web dashboard | Good local UI |
| Team collaboration | Strong (shared workspaces) | Basic (model registry) |
| Cost | Free for individuals, paid for teams | Free and open source |
If your team already uses MLflow in production, adding W&B for personal experiments is still worthwhile. Most trackers can export runs as CSV or JSON, so migrating between platforms is straightforward.