The LLM ecosystem sits atop a small number of foundational libraries. Knowing what each one does (and does not do) will save you from confusion when you encounter them in code examples throughout this textbook.
Hugging Face Transformers
The transformers library from Hugging Face is the Swiss Army knife of LLM work. It provides a unified API for loading, running, and fine-tuning thousands of pretrained models. Code Fragment C.1.1 below puts this into practice.
# PyTorch implementation
# Key operations: training loop, results display
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# Load a model and tokenizer with two lines
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
# Or use the high-level pipeline API
generator = pipeline("text-generation", model=model_name)
result = generator("Explain transformers in one sentence:", max_new_tokens=50)
print(result[0]["generated_text"])
# PyTorch implementation
# Key operations: results display
import torch
# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
print(f"GPU name: {torch.cuda.get_device_name(0)}")
print(f"GPU memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
# Basic tensor operations
x = torch.randn(3, 768) # 3 vectors of dimension 768
y = torch.randn(768, 512) # weight matrix
z = x @ y # matrix multiplication, shape (3, 512)
# Move tensors to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)
# NumPy computation
# Key operations: training loop, results display
import numpy as np
import pandas as pd
# Preparing a fine-tuning dataset from a CSV
df = pd.read_csv("training_data.csv")
df = df.dropna(subset=["instruction", "response"])
df = df[df["response"].str.len() > 10] # filter short responses
# Convert to the format expected by Hugging Face
dataset = df[["instruction", "response"]].to_dict(orient="records")
print(f"Training examples: {len(dataset)}")
pipeline API. Two lines load the model; one line generates text.Key classes you will encounter:
AutoTokenizer: Loads the correct tokenizer for any model.AutoModelForCausalLM: Loads decoder-only models (GPT, LLaMA, Mistral).AutoModelForSeq2SeqLM: Loads encoder-decoder models (T5, BART).TrainerandTrainingArguments: High-level fine-tuning API.pipeline: Quick inference for common tasks (text generation, classification, summarization).
PyTorch
PyTorch is the tensor computation framework that powers nearly all modern LLM training and inference. The transformers library is built on top of PyTorch (or optionally JAX/TensorFlow, though PyTorch dominates in practice). Code Fragment C.1.2 below puts this into practice.
NumPy and Pandas
numpy handles numerical arrays on CPU and is used for data preprocessing, metric computation, and quick prototyping. pandas manages tabular data and is indispensable for preparing fine-tuning datasets, analyzing evaluation results, and handling metadata. Code Fragment C.1.3 below puts this into practice.
# pip install datasets
from datasets import load_dataset
ds = load_dataset("imdb", split="train[:100]")
print(ds[0]["text"][:200])
print(f"Features: {ds.features}")
# pip install peft
from peft import LoraConfig, get_peft_model
config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
Additional Libraries
| Library | Purpose | Chapters |
|---|---|---|
datasets |
Efficient dataset loading and processing (Hugging Face) | 12, 13, 14 |
peft |
Parameter-efficient fine-tuning (LoRA, QLoRA) | 14 |
trl |
Transformer Reinforcement Learning (SFT, DPO, RLHF) | 16 |
vllm |
High-throughput inference serving | 8 |
langchain |
LLM application framework (chains, agents, RAG) | 19, 21 |
sentence-transformers |
Embedding models for semantic search | 18 |
bitsandbytes |
Quantized model loading (4-bit, 8-bit) | 8, 14 |
wandb |
Experiment tracking and visualization | 25 |
Load and inspect a Hugging Face dataset with streaming for large corpora.
Apply a LoRA adapter to a pretrained model for parameter-efficient fine-tuning.
Set up supervised fine-tuning with the TRL library's SFTTrainer.
# pip install trl
from trl import SFTTrainer, SFTConfig
training_args = SFTConfig(output_dir="./sft_output", max_seq_length=512)
trainer = SFTTrainer(model=model, args=training_args, train_dataset=dataset)
trainer.train()
SFTTrainer: the SFTConfig controls sequence length and output directory, while the trainer handles tokenization, loss masking, and checkpointing internally.Generate embeddings for semantic search using a sentence transformer model.
# pip install sentence-transformers
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedder.encode(["How do transformers work?", "Attention is all you need."])
print(f"Embedding shape: {embeddings.shape}")
sentence-transformers. The encode() method returns a NumPy array where each row is a fixed-size embedding suitable for cosine similarity search.Log training metrics to Weights and Biases for experiment tracking.
# pip install wandb
import wandb
wandb.init(project="my-llm-project", name="experiment-1")
wandb.log({"loss": 0.42, "learning_rate": 2e-5, "epoch": 1})
wandb.finish()
wandb.init() creates a run, wandb.log() records metric dictionaries, and wandb.finish() closes the run and syncs data to the dashboard.Build a simple LLM chain with LangChain for prompt templating and invocation.
# pip install langchain langchain-openai
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
result = chain.invoke({"text": "LLMs use transformers to process text."})
print(result.content)
ChatPromptTemplate formats the input, the ChatOpenAI model processes it, and the chain is invoked with a dictionary of template variables.