Building Conversational AI with LLMs and Agents
Appendix K: HuggingFace: Transformers, Datasets, and Hub

The HuggingFace Hub: Sharing, Versioning, and Spaces

Big Picture

The HuggingFace Hub is a platform for sharing, discovering, and collaborating on machine learning models, datasets, and applications. It hosts over 800,000 models and 200,000 datasets as of early 2026, making it the largest open ML repository in the world. Beyond storage, the Hub provides Git-based versioning, model cards for documentation, gated access for sensitive models, and Spaces for deploying interactive demos. This section covers how to use the Hub programmatically and through the web interface, from pushing your first model to deploying a production Gradio application.

1. The Hub API: Programmatic Access

The huggingface_hub Python library provides a complete API for interacting with the Hub. You can search for models, download files, upload artifacts, manage repositories, and authenticate, all from Python code or the command line.

The following example demonstrates authentication, searching for models, and downloading specific files.

from huggingface_hub import (
    login,
    HfApi,
    hf_hub_download,
    list_models,
)

# Authenticate (stores token in ~/.cache/huggingface/token)
login(token="hf_YOUR_TOKEN_HERE")
# Or use the CLI: huggingface-cli login

# Initialize the API client
api = HfApi()

# Search for popular text-generation models
models = api.list_models(
    task="text-generation",
    sort="downloads",
    direction=-1,
    limit=5,
)
for m in models:
    print(f"  {m.modelId:<40} downloads: {m.downloads:,}")

# Download a specific file from a model repo
filepath = hf_hub_download(
    repo_id="mistralai/Mistral-7B-v0.3",
    filename="config.json",
)
print(f"Downloaded to: {filepath}")
meta-llama/Llama-3.1-8B-Instruct downloads: 12,453,201 mistralai/Mistral-7B-Instruct-v0.3 downloads: 8,921,044 google/gemma-2-9b-it downloads: 6,104,558 Qwen/Qwen2.5-7B-Instruct downloads: 5,832,117 microsoft/Phi-3-mini-4k-instruct downloads: 4,219,803 Downloaded to: /home/user/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.3/snapshots/abc123/config.json
Code Fragment 1: Searching and downloading from the Hub with huggingface_hub. The list_models() API filters by task, library, and sort order, while hf_hub_download() fetches individual files to the local cache without downloading the full repository.

2. Pushing Models and Datasets to the Hub

Every HuggingFace model or dataset object has a push_to_hub() method that creates a repository (if needed) and uploads the artifacts. Under the hood, the Hub uses Git and Git LFS (Large File Storage) for versioning, so every push creates a commit with a full history.

The following example trains a small model and pushes both the model and tokenizer to the Hub.

from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments,
)
from datasets import load_dataset

# Train a model (abbreviated; see Section K.3 for full example)
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Push model and tokenizer to the Hub
repo_name = "my-username/distilbert-sst2"
model.push_to_hub(repo_name, commit_message="Upload fine-tuned model")
tokenizer.push_to_hub(repo_name, commit_message="Upload tokenizer")

# Push a dataset
dataset = load_dataset("imdb", split="train[:1000]")
dataset.push_to_hub("my-username/imdb-subset")
Code Fragment 2: Pushing models, tokenizers, and datasets to the Hub with push_to_hub(). Each call creates a Git commit in the Hub repository, providing version history and rollback capability for all artifacts.

You can also use the Trainer directly with Hub integration by setting push_to_hub=True in TrainingArguments. The Trainer will push the best checkpoint at the end of training.

training_args = TrainingArguments(
    output_dir="./results",
    push_to_hub=True,
    hub_model_id="my-username/distilbert-sst2",
    hub_strategy="end",       # Push at the end of training
    # Other options: "every_save", "checkpoint", "all_checkpoints"
)
Code Fragment 3: Enabling automatic Hub uploads via TrainingArguments. The hub_strategy="end" pushes only the final checkpoint; alternatives include pushing at every save or pushing all checkpoints for full experiment reproducibility.

3. Model Cards: Documentation and Transparency

Every Hub repository includes a README.md file rendered as the model card. Model cards document a model's intended use, training data, evaluation results, limitations, and biases. Well-written model cards are essential for responsible AI deployment and are increasingly required by organizational policies and regulations.

The huggingface_hub library provides utilities for creating model cards programmatically.

from huggingface_hub import ModelCard, ModelCardData

card_data = ModelCardData(
    language="en",
    license="apache-2.0",
    library_name="transformers",
    tags=["text-classification", "sentiment-analysis"],
    datasets=["glue"],
    metrics=["accuracy", "f1"],
    model_name="DistilBERT SST-2",
    eval_results=[
        {
            "task": {"type": "text-classification", "name": "Sentiment Analysis"},
            "dataset": {"type": "glue", "name": "SST-2"},
            "metrics": [
                {"type": "accuracy", "value": 0.912},
                {"type": "f1", "value": 0.910},
            ],
        }
    ],
)

card = ModelCard.from_template(
    card_data,
    model_id="my-username/distilbert-sst2",
    model_description=(
        "A DistilBERT model fine-tuned on SST-2 for binary sentiment classification. "
        "Trained for 3 epochs with a learning rate of 2e-5."
    ),
    training_procedure="Fine-tuned using HuggingFace Trainer with AdamW optimizer.",
    limitations=(
        "This model was trained on English movie reviews and may not generalize "
        "well to other domains or languages."
    ),
)

# Save locally or push to Hub
card.save("README.md")
card.push_to_hub("my-username/distilbert-sst2")
Code Fragment 4: Generating a model card programmatically with ModelCard.from_template(). The ModelCardData metadata (language, license, tags, eval results) is indexed by the Hub for search and leaderboard display. The template fills in standard sections for description, training, and limitations.
Model Card Metadata and Discoverability

The YAML metadata at the top of a model card (the ModelCardData) is indexed by the Hub for search and filtering. Including accurate tags, datasets, metrics, and evaluation results makes your model discoverable and allows the Hub to display evaluation leaderboards, task badges, and compatibility information automatically. Models with complete metadata receive significantly more downloads.

4. Repository Management and Versioning

Hub repositories are Git repositories with LFS support. You can create, clone, branch, and manage them using either the Python API or the Git CLI. This makes it straightforward to version models, roll back to previous checkpoints, and collaborate with teams.

The following example demonstrates repository management operations.

from huggingface_hub import HfApi, create_repo, upload_folder

api = HfApi()

# Create a new model repository
create_repo(
    repo_id="my-username/new-model",
    repo_type="model",       # "model", "dataset", or "space"
    private=False,
    exist_ok=True,           # Don't error if it already exists
)

# Upload an entire directory
upload_folder(
    repo_id="my-username/new-model",
    folder_path="./my-model-files",
    commit_message="Upload model v2 with improved accuracy",
)

# List all commits (version history)
commits = api.list_repo_commits("my-username/new-model")
for commit in commits[:5]:
    print(f"  {commit.commit_id[:8]}  {commit.created_at}  {commit.title}")
Code Fragment 5: Hub repository management: create_repo() initializes a versioned repo, upload_folder() pushes a directory as a single commit, and list_repo_commits() retrieves the full version history for auditing and rollback.
a1b2c3d4 2026-01-15 10:32:00 Upload model v2 with improved accuracy e5f6a7b8 2026-01-10 14:18:00 Initial model upload c9d0e1f2 2026-01-10 14:15:00 Initial commit
# Download a specific revision (commit hash or branch)
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="my-username/new-model",
    revision="abc1234",      # Specific commit hash
    local_dir="./model-v1",
)
Code Fragment 6: Downloading a specific model revision with snapshot_download(). The revision parameter accepts commit hashes or branch names, enabling reproducible model loading from any point in the repository history.

5. Gated Models and Access Control

Some models on the Hub require users to accept license terms or provide information before downloading. These "gated" models use the Hub's access request system. Model authors can configure gating through the repository settings, requiring users to agree to terms, provide their intended use case, or be individually approved.

To use gated models programmatically, you must be authenticated and have been granted access.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Gated models require authentication and accepted terms
# First: visit the model page and accept the license
# Then: authenticate with your token

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",          # Gated model
    token="hf_YOUR_TOKEN_HERE",          # Or set HF_TOKEN env var
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    token="hf_YOUR_TOKEN_HERE",
)
Code Fragment 7: Accessing gated models like Llama 3.1 requires authentication via a Hub token and prior acceptance of the model's license terms on its Hub page. The token parameter can be replaced by the HF_TOKEN environment variable for cleaner code.
Setting Tokens via Environment Variables

Instead of passing token= to every function call, set the HF_TOKEN environment variable or run huggingface-cli login once. The library checks for tokens in this order: explicit token parameter, HF_TOKEN environment variable, cached token from huggingface-cli login. For CI/CD pipelines, use the environment variable approach.

6. HuggingFace Spaces: Deploying Interactive Demos

Spaces are the Hub's hosting platform for interactive ML applications. You can deploy applications built with Gradio, Streamlit, or static HTML directly from a Git repository. Spaces run on free CPU instances by default, with optional GPU upgrades for inference-heavy applications. They provide an excellent way to share models with non-technical stakeholders or create public demos.

The following example creates a Gradio-based Space for text generation.

# File: app.py (this file goes in your Space repository)
import gradio as gr
from transformers import pipeline

# Load model (cached after first run)
generator = pipeline(
    "text-generation",
    model="gpt2",
    device=-1,   # CPU (use 0 for GPU Spaces)
)

def generate_text(prompt, max_tokens, temperature):
    result = generator(
        prompt,
        max_new_tokens=int(max_tokens),
        temperature=float(temperature),
        do_sample=True,
        num_return_sequences=1,
    )
    return result[0]["generated_text"]

# Build the Gradio interface
demo = gr.Interface(
    fn=generate_text,
    inputs=[
        gr.Textbox(label="Prompt", placeholder="Enter your prompt here..."),
        gr.Slider(10, 200, value=50, step=10, label="Max Tokens"),
        gr.Slider(0.1, 2.0, value=0.7, step=0.1, label="Temperature"),
    ],
    outputs=gr.Textbox(label="Generated Text"),
    title="GPT-2 Text Generator",
    description="Generate text using GPT-2. Adjust temperature for creativity.",
    examples=[
        ["The future of artificial intelligence is", 100, 0.7],
        ["Once upon a time in a land far away,", 150, 1.0],
    ],
)

demo.launch()
Code Fragment 8: A complete Gradio app for text generation. The gr.Interface connects input widgets (textbox, sliders) to the generation function, and examples provides clickable presets. Deploying this as a Space requires only pushing this file to a Hub Space repository.

To deploy this as a Space, create a new Space on the Hub and push the app.py file along with a requirements.txt.

# Create and deploy a Space from the command line
from huggingface_hub import create_repo, upload_file

# Create a Gradio Space
create_repo(
    repo_id="my-username/gpt2-demo",
    repo_type="space",
    space_sdk="gradio",        # "gradio", "streamlit", or "static"
    space_hardware="cpu-basic", # Free tier; options include "t4-small", "a10g-small"
)

# Upload the app file
upload_file(
    path_or_fileobj="app.py",
    path_in_repo="app.py",
    repo_id="my-username/gpt2-demo",
    repo_type="space",
)

# Upload requirements
upload_file(
    path_or_fileobj="requirements.txt",
    path_in_repo="requirements.txt",
    repo_id="my-username/gpt2-demo",
    repo_type="space",
)
# The Space will build and deploy automatically
Code Fragment 9: Creating and deploying a Gradio Space programmatically. The space_sdk parameter selects the framework, space_hardware sets the compute tier, and uploaded files trigger an automatic build and deployment on the Hub.
Organizations and Collaboration

Hub organizations let teams share models, datasets, and Spaces under a common namespace (e.g., my-org/model-name). Organization members can have different roles: read, write, or admin. This is useful for corporate teams, research labs, and open-source projects. Organizations can also set default licenses, require model cards, and enforce gating policies across all their repositories.

FeatureFree TierPro / Enterprise
Public repositoriesUnlimitedUnlimited
Private repositoriesUnlimitedUnlimited
Spaces (CPU)FreeFree
Spaces (GPU)Not includedT4, A10G, A100 (paid per hour)
Inference APIRate-limitedDedicated endpoints
Storage (Git LFS)GenerousHigher limits, persistent storage
Organization featuresBasicSSO, audit logs, resource groups
Figure K.5.1: HuggingFace Hub feature comparison between free and paid tiers (as of early 2026).