Section 5.2: Library Catalog

The library layer is where ideas become runnable code. Part I sits on four kinds of library: a deep-learning engine (PyTorch, with JAX as the secondary option), a numerical substrate (NumPy and SciPy), a classical-ML reference (scikit-learn), and a tokenizer layer that bridges raw text to integer IDs. Every model in the rest of the book is built from these primitives, so it pays to know which library owns which concern.

5.2.1 The deep-learning engine

PyTorch 2.x is the default. Versions 2.0 and later add torch.compile, which JIT-compiles a model graph to a faster fused kernel sequence; for most models you get a 30 to 60% speedup with a one-line change. The 2025-26 default for inference is torch.compile(model, mode='reduce-overhead'), which reuses CUDA graphs across calls. PyTorch 2.5+ ships native FlashAttention through scaled_dot_product_attention, so attention kernels are no longer a separate install; the only reason to install flash-attn separately in 2026 is the FA3 backwards kernel for training. For edge deployment, PyTorch 2.6+ torch.export produces a portable graph artifact that runs on mobile / browser runtimes.

JAX is the second engine you should know about, even if you do not use it in Part I. Research papers from Google DeepMind, Anthropic, and EleutherAI increasingly publish JAX implementations alongside PyTorch ones. The combination of jax.jit, jax.vmap, and jax.shard_map (the JAX 0.5+ replacement for the older pmap) makes parallelization more explicit than PyTorch and the resulting code is often easier to reason about for novel architectures. The trade-off is a smaller ecosystem and a steeper learning curve.

On NVIDIA H100 / H200 / B200 hardware, serious post-2024 pretraining stacks add transformer-engine (NVIDIA, 2023) for FP8 training, which roughly halves memory and compute relative to bf16 with negligible quality loss.

5.2.2 The numerical substrate

NumPy: the n-dimensional array type that the rest of the Python scientific stack agrees on. Every PyTorch tensor has a one-line conversion to NumPy and back. NumPy 2.x dropped legacy APIs and gained typing improvements, but the surface area you use daily is unchanged from 1.x.
SciPy: optimization, linear algebra, signal processing, statistics. You mostly reach for scipy.stats (sampling, distributions) and scipy.sparse (sparse matrices used for classical retrieval) in Part I.
pandas: tabular dataframes for files under roughly 1 GB. The bridge between raw CSV-style data and tensor batches. For larger-than-RAM tabular files, the 2024-26 right answer is polars (Vink, 2020+; Rust-backed dataframe with lazy evaluation that often runs 5-30x faster than pandas on the same workload).
matplotlib and seaborn: plotting. Use matplotlib for training curves and confusion matrices; seaborn for anything that benefits from grouped statistics.

5.2.3 Classical ML and tokenizers

scikit-learn: the canonical classical-ML library. Logistic regression, random forests, SVMs, k-means, PCA. Use it for every baseline; if a transformer cannot beat a TF-IDF + logistic regression on your task, the model is not the problem.
Hugging Face tokenizers: the Rust-backed tokenizer library. Trains a 50k-vocab BPE in seconds on a million documents; the pure-Python alternatives take hours. Covered in depth in Chapter 1.
Hugging Face datasets: streaming, memory-mapped dataset loader. Handles the awkward step of "this corpus is 200 GB and does not fit in RAM" without you writing custom loaders.

Key Insight

Mental model: layers of abstraction, layers of cost

NumPy is fast on CPU, slow off it. PyTorch is fast on GPU, slow on CPU for small tensors. scikit-learn is fast on any CPU dataset that fits in RAM, undefined behaviour above that. Tokenizers are fast in Rust, never roll your own in Python. Each library is fastest in its native context; performance bugs are almost always the result of putting computation in the wrong layer (a NumPy operation inside a training loop, a Python tokenization step inside a DataLoader worker).

Real-World Scenario: The standard Part I import block

Open every notebook in Part I with the same import header (Code Fragment 5.2.1 below), so the patterns become muscle memory. By the time you reach Chapter 12, this same header has grown a from transformers import AutoModel, AutoTokenizer line, but nothing in it has been replaced.

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report

Code Fragment 5.2.1: Open every notebook in Part I with the same import header, so the patterns become muscle memory:.

5.2.4 Versions and compatibility

Pin your PyTorch version per-project: torch==2.5.* in a requirements file is enough to avoid the worst version-mismatch surprises (especially with flash-attn and bitsandbytes in later parts). NumPy 2.x is the current line but some research code still expects NumPy 1.26.

Warning: Don't use NumPy 2 with older HF datasets

If you see AttributeError: module 'numpy' has no attribute 'float' or 'numpy' has no attribute 'int', the cause is NumPy 2 against pre-2024 HF code (roughly 30% of older datasets and torchvision transform paths still expect numpy<2). The fix is uv pip install 'numpy<2' for that project. Pin it once and move on; chasing every shim wastes hours.

Hugging Face Hub: Sharing, Versioning, and Spaces

Big Picture

The Hugging Face Hub is a platform for sharing, discovering, and collaborating on machine learning models, datasets, and applications. It hosts over 800,000 models and 200,000 datasets as of early 2026, making it the largest open ML repository in the world. Beyond storage, the Hub provides Git-based versioning, model cards for documentation, gated access for sensitive models, and Spaces for deploying interactive demos. This section covers how to use the Hub programmatically and through the web interface, from pushing your first model to deploying a production Gradio application.

1. The Hub API: Programmatic Access

The huggingface_hub Python library provides a complete API for interacting with the Hub. You can search for models, download files, upload artifacts, manage repositories, and authenticate, all from Python code or the command line.

The following example demonstrates authentication, searching for models, and downloading specific files.

from huggingface_hub import (
    login,
    HfApi,
    hf_hub_download,
    list_models,
)

# Authenticate (stores token in ~/.cache/huggingface/token)
login(token="hf_YOUR_TOKEN_HERE")
# Or use the CLI: huggingface-cli login

# Initialize the API client
api = HfApi()

# Search for popular text-generation models
models = api.list_models(
    task="text-generation",
    sort="downloads",
    direction=-1,
    limit=5,
)
for m in models:
    print(f"  {m.modelId:<40} downloads: {m.downloads:,}")

# Download a specific file from a model repo
filepath = hf_hub_download(
    repo_id="mistralai/Mistral-7B-v0.3",
    filename="config.json",
)
print(f"Downloaded to: {filepath}")

Output: meta-llama/Llama-3.1-8B-Instruct downloads: 12,453,201 mistralai/Mistral-7B-Instruct-v0.3 downloads: 8,921,044 google/gemma-2-9b-it downloads: 6,104,558 Qwen/Qwen2.5-7B-Instruct downloads: 5,832,117 microsoft/Phi-3-mini-4k-instruct downloads: 4,219,803 Downloaded to: /home/user/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.3/snapshots/abc123/config.json

Code Fragment 5.2.2: Searching and downloading from the Hub with huggingface_hub. The list_models() API filters by task, library, and sort order, while hf_hub_download() fetches individual files to the local cache without downloading the full repository.

2. Pushing Models and Datasets to the Hub

Every Hugging Face model or dataset object has a push_to_hub() method that creates a repository (if needed) and uploads the artifacts. Under the hood, the Hub uses Git and Git LFS (Large File Storage) for versioning, so every push creates a commit with a full history.

The following example trains a small model and pushes both the model and tokenizer to the Hub.

from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments,
)
from datasets import load_dataset

# Train a model (abbreviated; see Section C.3 for full example)
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Push model and tokenizer to the Hub
repo_name = "my-username/distilbert-sst2"
model.push_to_hub(repo_name, commit_message="Upload fine-tuned model")
tokenizer.push_to_hub(repo_name, commit_message="Upload tokenizer")

# Push a dataset
dataset = load_dataset("imdb", split="train[:1000]")
dataset.push_to_hub("my-username/imdb-subset")

Code Fragment 5.2.3: Pushing models, tokenizers, and datasets to the Hub with push_to_hub(). Each call creates a Git commit in the Hub repository, providing version history and rollback capability for all artifacts.

You can also use the Trainer directly with Hub integration by setting push_to_hub=True in TrainingArguments. The Trainer will push the best checkpoint at the end of training.

from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
push_to_hub=True,
hub_model_id="my-username/distilbert-sst2",
hub_strategy="end", # Push at the end of training
# Other options: "every_save", "checkpoint", "all_checkpoints"
)

Code Fragment 5.2.4: Enabling automatic Hub uploads via TrainingArguments. The hub_strategy="end" pushes only the final checkpoint; alternatives include pushing at every save or pushing all checkpoints for full experiment reproducibility.

3. Model Cards: Documentation and Transparency

Every Hub repository includes a README.md file rendered as the model card. Model cards document a model's intended use, training data, evaluation results, limitations, and biases. Well-written model cards are essential for responsible AI deployment and are increasingly required by organizational policies and regulations.

The huggingface_hub library provides utilities for creating model cards programmatically.

from huggingface_hub import ModelCard, ModelCardData

card_data = ModelCardData(
    language="en",
    license="apache-2.0",
    library_name="transformers",
    tags=["text-classification", "sentiment-analysis"],
    datasets=["glue"],
    metrics=["accuracy", "f1"],
    model_name="DistilBERT SST-2",
    eval_results=[
        {
            "task": {"type": "text-classification", "name": "Sentiment Analysis"},
            "dataset": {"type": "glue", "name": "SST-2"},
            "metrics": [
                {"type": "accuracy", "value": 0.912},
                {"type": "f1", "value": 0.910},
            ],
        }
    ],
)

card = ModelCard.from_template(
    card_data,
    model_id="my-username/distilbert-sst2",
    model_description=(
        "A DistilBERT model fine-tuned on SST-2 for binary sentiment classification. "
        "Trained for 3 epochs with a learning rate of 2e-5."
    ),
    training_procedure="Fine-tuned using HuggingFace Trainer with AdamW optimizer.",
    limitations=(
        "This model was trained on English movie reviews and may not generalize "
        "well to other domains or languages."
    ),
)

# Save locally or push to Hub
card.save("README.md")
card.push_to_hub("my-username/distilbert-sst2")

Code Fragment 5.2.5: Generating a model card programmatically with ModelCard.from_template(). The ModelCardData metadata (language, license, tags, eval results) is indexed by the Hub for search and leaderboard display. The template fills in standard sections for description, training, and limitations.

Note: Model Card Metadata and Discoverability

The YAML metadata at the top of a model card (the ModelCardData) is indexed by the Hub for search and filtering. Including accurate tags, datasets, metrics, and evaluation results makes your model discoverable and allows the Hub to display evaluation leaderboards, task badges, and compatibility information automatically. Models with complete metadata receive significantly more downloads.

4. Repository Management and Versioning

Hub repositories are Git repositories with LFS support. You can create, clone, branch, and manage them using either the Python API or the Git CLI. This makes it straightforward to version models, roll back to previous checkpoints, and collaborate with teams.

The following example demonstrates repository management operations.

from huggingface_hub import HfApi, create_repo, upload_folder

api = HfApi()

# Create a new model repository
create_repo(
    repo_id="my-username/new-model",
    repo_type="model",       # "model", "dataset", or "space"
    private=False,
    exist_ok=True,           # Don't error if it already exists
)

# Upload an entire directory
upload_folder(
    repo_id="my-username/new-model",
    folder_path="./my-model-files",
    commit_message="Upload model v2 with improved accuracy",
)

# List all commits (version history)
commits = api.list_repo_commits("my-username/new-model")
for commit in commits[:5]:
    print(f"  {commit.commit_id[:8]}  {commit.created_at}  {commit.title}")

Output: a1b2c3d4 2026-01-15 10:32:00 Upload model v2 with improved accuracy e5f6a7b8 2026-01-10 14:18:00 Initial model upload c9d0e1f2 2026-01-10 14:15:00 Initial commit

Code Fragment 5.2.6: Hub repository management: create_repo() initializes a versioned repo, upload_folder() pushes a directory as a single commit, and list_repo_commits() retrieves the full version history for auditing and rollback.

# Download a specific revision (commit hash or branch)
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="my-username/new-model",
    revision="abc1234",      # Specific commit hash
    local_dir="./model-v1",
)

Code Fragment 5.2.7: Downloading a specific model revision with snapshot_download(). The revision parameter accepts commit hashes or branch names, enabling reproducible model loading from any point in the repository history.

5. Gated Models and Access Control

Some models on the Hub require users to accept license terms or provide information before downloading. These "gated" models use the Hub's access request system. Model authors can configure gating through the repository settings, requiring users to agree to terms, provide their intended use case, or be individually approved.

To use gated models programmatically, you must be authenticated and have been granted access.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Gated models require authentication and accepted terms
# First: visit the model page and accept the license
# Then: authenticate with your token

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",          # Gated model
    token="hf_YOUR_TOKEN_HERE",          # Or set HF_TOKEN env var
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    token="hf_YOUR_TOKEN_HERE",
)

Code Fragment 5.2.8: Accessing gated models like Llama-3.1 requires authentication via a Hub token and prior acceptance of the model's license terms on its Hub page. The token parameter can be replaced by the HF_TOKEN environment variable for cleaner code.

Tip: Setting Tokens via Environment Variables

Instead of passing token= to every function call, set the HF_TOKEN environment variable or run huggingface-cli login once. The library checks for tokens in this order: explicit token parameter, HF_TOKEN environment variable, cached token from huggingface-cli login. For CI/CD pipelines, use the environment variable approach.

6. Hugging Face Spaces: Deploying Interactive Demos

Spaces are the Hub's hosting platform for interactive ML applications. You can deploy applications built with Gradio, Streamlit, or static HTML directly from a Git repository. Spaces run on free CPU instances by default, with optional GPU upgrades for inference-heavy applications. They provide an excellent way to share models with non-technical stakeholders or create public demos.

The following example creates a Gradio-based Space for text generation.

# File: app.py (this file goes in your Space repository)
import gradio as gr
from transformers import pipeline

# Load model (cached after first run)
generator = pipeline(
    "text-generation",
    model="gpt2",
    device=-1,   # CPU (use 0 for GPU Spaces)
)

def generate_text(prompt, max_tokens, temperature):
    result = generator(
        prompt,
        max_new_tokens=int(max_tokens),
        temperature=float(temperature),
        do_sample=True,
        num_return_sequences=1,
    )
    return result[0]["generated_text"]

# Build the Gradio interface
demo = gr.Interface(
    fn=generate_text,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Enter your prompt here..."),
        gr.Slider(10, 200, value=50, step=10, label="Max Tokens"),
        gr.Slider(0.1, 2.0, value=0.7, step=0.1, label="Temperature"),
    ],
    outputs=gr.Textbox(label="Generated Text"),
    title="GPT-2 Text Generator",
    description="Generate text using GPT-2. Adjust temperature for creativity.",
    examples=[
        ["The future of artificial intelligence is", 100, 0.7],
        ["Once upon a time in a land far away,", 150, 1.0],
    ],
)

demo.launch()

Code Fragment 5.2.9: A complete Gradio app for text generation. The gr.Interface connects input widgets (textbox, sliders) to the generation function, and examples provides clickable presets. Deploying this as a Space requires only pushing this file to a Hub Space repository.

To deploy this as a Space, create a new Space on the Hub and push the app.py file along with a requirements.txt.

# Create and deploy a Space from the command line
from huggingface_hub import create_repo, upload_file

# Create a Gradio Space
create_repo(
    repo_id="my-username/gpt2-demo",
    repo_type="space",
    space_sdk="gradio",        # "gradio", "streamlit", or "static"
    space_hardware="cpu-basic", # Free tier; options include "t4-small", "a10g-small"
)

# Upload the app file
upload_file(
    path_or_fileobj="app.py",
    path_in_repo="app.py",
    repo_id="my-username/gpt2-demo",
    repo_type="space",
)

# Upload requirements
upload_file(
    path_or_fileobj="requirements.txt",
    path_in_repo="requirements.txt",
    repo_id="my-username/gpt2-demo",
    repo_type="space",
)
# The Space will build and deploy automatically

Code Fragment 5.2.10: Creating and deploying a Gradio Space programmatically. The space_sdk parameter selects the framework, space_hardware sets the compute tier, and uploaded files trigger an automatic build and deployment on the Hub.

Note: Organizations and Collaboration

Hub organizations let teams share models, datasets, and Spaces under a common namespace (e.g., my-org/model-name). Organization members can have different roles: read, write, or admin. This is useful for corporate teams, research labs, and open-source projects. Organizations can also set default licenses, require model cards, and enforce gating policies across all their repositories.

Feature	Free Tier	Pro / Enterprise
Public repositories	Unlimited	Unlimited
Private repositories	Unlimited	Unlimited
Spaces (CPU)	Free	Free
Spaces (GPU)	Not included	T4, A10G, A100 (paid per hour)
Inference API	Rate-limited	Dedicated endpoints
Storage (Git LFS)	Generous	Higher limits, persistent storage
Organization features	Basic	SSO, audit logs, resource groups

Figure 5.2.1: Hugging Face Hub feature comparison between free and paid tiers (as of early 2026).

Essential Python Libraries for LLM Work

The LLM ecosystem sits atop a small number of foundational libraries. Knowing what each one does (and does not do) will save you from confusion when you encounter them in code examples throughout this textbook.

Python LLM library ecosystem as a three-layer stack with GPU runtime on the side — **Figure 5.2.2**: The Python LLM library ecosystem. Application orchestration sits atop the LLM glue layer (Hugging Face), which itself depends on numerical compute frameworks (PyTorch + NumPy/Pandas), all running on the CUDA/GPU runtime.

Hugging Face Transformers

The transformers library from Hugging Face is the Swiss Army knife of LLM work. It provides a unified API for loading, running, and fine-tuning thousands of pretrained models. Code Fragment G.1.1 below puts this into practice.

# PyTorch implementation
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load a model and tokenizer with two lines
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")

# Or use the high-level pipeline API
generator = pipeline("text-generation", model=model_name)
result = generator("Explain transformers in one sentence:", max_new_tokens=50)
print(result[0]["generated_text"])

Output: Explain transformers in one sentence: Transformers are a neural network architecture that uses self-attention to process sequential data in parallel, enabling state-of-the-art performance across NLP tasks.

Code Fragment 5.2.11: The transformers library from Hugging Face is the Swiss Army knife of LLM work.

# PyTorch implementation
import torch

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")

# Basic tensor operations
x = torch.randn(3, 768)  # 3 vectors of dimension 768
y = torch.randn(768, 512) # weight matrix
z = x @ y                    # matrix multiplication, shape (3, 512)

# Move tensors to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)

Output: CUDA available: True Device count: 1 GPU name: NVIDIA A100-SXM4-80GB GPU memory: 79.6 GB

Code Fragment 5.2.12: Check GPU availability.

# NumPy computation
import numpy as np
import pandas as pd

# Preparing a fine-tuning dataset from a CSV
df = pd.read_csv("training_data.csv")
df = df.dropna(subset=["instruction", "response"])
df = df[df["response"].str.len() > 10]  # filter short responses

# Convert to the format expected by Hugging Face
dataset = df[["instruction", "response"]].to_dict(orient="records")
print(f"Training examples: {len(dataset)}")

Output: Training examples: 8923

Code Fragment 5.2.13: Loading a causal language model with Hugging Face Transformers and running inference via the high-level pipeline API. Two lines load the model; one line generates text.

Key classes you will encounter:

AutoTokenizer: Loads the correct tokenizer for any model.
AutoModelForCausalLM: Loads decoder-only models (GPT, LLaMA, Mistral).
AutoModelForSeq2SeqLM: Loads encoder-decoder models (T5, BART).
Trainer and TrainingArguments: High-level fine-tuning API.
pipeline: Quick inference for common tasks (text generation, classification, summarization).

PyTorch

PyTorch is the tensor computation framework that powers nearly all modern LLM training and inference. The transformers library is built on top of PyTorch (or optionally JAX/TensorFlow, though PyTorch dominates in practice). Code Fragment G.1.2 below puts this into practice.

NumPy and Pandas

numpy handles numerical arrays on CPU and is used for data preprocessing, metric computation, and quick prototyping. pandas manages tabular data and is indispensable for preparing fine-tuning datasets, analyzing evaluation results, and handling metadata. Code Fragment G.1.3 below puts this into practice.

# pip install datasets
from datasets import load_dataset

ds = load_dataset("imdb", split="train[:100]")
print(ds[0]["text"][:200])
print(f"Features: {ds.features}")

Output: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. cu Features: {'text': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=2, names=['neg', 'pos'], id=None)}

Code Fragment 5.2.14: Numpy handles numerical arrays on CPU and is used for data preprocessing, metric computation, and quick prototyping.

# pip install peft
from peft import LoraConfig, get_peft_model

config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()

Code Fragment 5.2.15: Checking GPU availability with PyTorch, performing basic tensor operations, and moving tensors to the GPU. These are the building blocks that underlie every model forward pass.

Additional Libraries

Table 5.2.1: Additional Libraries Comparison (as of 2026).

Library	Purpose	Chapters
`datasets`	Efficient dataset loading and processing (Hugging Face)	12, 13, 14
`peft`	Parameter-efficient fine-tuning (LoRA, QLoRA)	14
`trl`	Transformer Reinforcement Learning (SFT, DPO, Section 20.1)	16
`vllm`	High-throughput inference serving	8
`langchain`	LLM application framework (chains, agents, RAG)	19, 21
`sentence-transformers`	Embedding models for semantic search	18
`bitsandbytes`	Quantized model loading (4-bit, 8-bit)	8, 14
`wandb`	Experiment tracking and visualization	25

Real-World Scenario: datasets in Practice

Load and inspect a Hugging Face dataset with streaming for large corpora.

Real-World Scenario: peft in Practice

Apply a LoRA adapter to a pretrained model for parameter-efficient fine-tuning.

# Supervised fine-tuning with TRL's SFTTrainer.
# Handles chat-template formatting, packing, and the entire training loop.
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Llama-3.2-1B"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained(model_id)

dataset = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft[:5000]")

config = SFTConfig(
    output_dir="./sft-llama-3.2-1b",
    max_seq_length=2048,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-5,
    num_train_epochs=1,
    bf16=True,
    logging_steps=10,
    save_strategy="epoch",
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=config,
    train_dataset=dataset,
    dataset_text_field="messages",  # SFTTrainer reads chat-formatted messages
)
trainer.train()

Code Fragment 5.2.16: Supervised fine-tuning with SFTTrainer: the SFTConfig controls sequence length and output directory, while the trainer handles tokenization, loss masking, and checkpointing internally.

Library Shortcut: trl in Practice

Set up supervised fine-tuning with the TRL library's SFTTrainer.

# pip install trl
from trl import SFTTrainer, SFTConfig

training_args = SFTConfig(output_dir="./sft_output", max_seq_length=512)
trainer = SFTTrainer(model=model, args=training_args, train_dataset=dataset)
trainer.train()

Code Fragment 5.2.17: Generating dense vector embeddings with sentence-transformers. The encode() method returns a NumPy array where each row is a fixed-size embedding suitable for cosine similarity search.

Library Shortcut: sentence-transformers in Practice

Generate embeddings for semantic search using a sentence transformer model.

# pip install sentence-transformers
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedder.encode(["How do transformers work?", "Attention is all you need."])
print(f"Embedding shape: {embeddings.shape}")

Code Fragment 5.2.18: Generate embeddings for semantic search using a sentence transformer model.

Key Insight: wandb in Practice

Experiment-tracking libraries (W&B, MLflow) get full treatment in Section 19.11 (Libraries & Frameworks). The wandb row in the table above is included only so you recognize the package when you see it imported; the canonical wandb.init() / wandb.log() / wandb.finish() recipe lives in Section 19.11 (Libraries & Frameworks).

Library Shortcut: langchain in Practice

Build a simple LLM chain with LangChain for prompt templating and invocation.

# pip install langchain langchain-openai
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
result = chain.invoke({"text": "LLMs use transformers to process text."})
print(result.content)

Code Fragment 5.2.19: Build a simple LLM chain with LangChain for prompt templating and invocation.

What's Next?

In the next section, Section 5.3: Scripting Patterns & Environment Setup, we move from cataloging libraries to using them: common LLM scripting patterns (device mapping, generation, batching), linking CUDA to PyTorch, installing key libraries, and verifying your setup.

Further Reading

Deep Learning Libraries

Paszke, A., Gross, S., Massa, F., et al. (2019). "PyTorch: An Imperative Style, High-Performance Deep Learning Library." NeurIPS 2019. arXiv:1912.01703. The reference PyTorch paper.

PyTorch (2024). "PyTorch Documentation." pytorch.org/docs/stable. Authoritative reference for tensor operations, autograd, and distributed primitives.

Bradbury, J., Frostig, R., Hawkins, P., et al. (2018). "JAX: composable transformations of Python+NumPy programs." github.com/jax-ml/jax. Reference for the JAX deep-learning engine.

Scientific Python

Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). "Array programming with NumPy." Nature 585. nature.com/articles/s41586-020-2649-2. The reference NumPy paper.

Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). "Scikit-learn: Machine Learning in Python." JMLR 12. jmlr.org/papers/v12/pedregosa11a. The reference scikit-learn paper.