Section C.1: Essential Libraries

The LLM ecosystem sits atop a small number of foundational libraries. Knowing what each one does (and does not do) will save you from confusion when you encounter them in code examples throughout this textbook.

Hugging Face Transformers

The transformers library from Hugging Face is the Swiss Army knife of LLM work. It provides a unified API for loading, running, and fine-tuning thousands of pretrained models. Code Fragment C.1.1 below puts this into practice.


# PyTorch implementation
# Key operations: training loop, results display
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load a model and tokenizer with two lines
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")

# Or use the high-level pipeline API
generator = pipeline("text-generation", model=model_name)
result = generator("Explain transformers in one sentence:", max_new_tokens=50)
print(result[0]["generated_text"])

Explain transformers in one sentence: Transformers are a neural network architecture that uses self-attention to process sequential data in parallel, enabling state-of-the-art performance across NLP tasks.


# PyTorch implementation
# Key operations: results display
import torch

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")

# Basic tensor operations
x = torch.randn(3, 768)  # 3 vectors of dimension 768
y = torch.randn(768, 512) # weight matrix
z = x @ y                    # matrix multiplication, shape (3, 512)

# Move tensors to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)


# NumPy computation
# Key operations: training loop, results display
import numpy as np
import pandas as pd

# Preparing a fine-tuning dataset from a CSV
df = pd.read_csv("training_data.csv")
df = df.dropna(subset=["instruction", "response"])
df = df[df["response"].str.len() > 10]  # filter short responses

# Convert to the format expected by Hugging Face
dataset = df[["instruction", "response"]].to_dict(orient="records")
print(f"Training examples: {len(dataset)}")

Code Fragment C.1.1: Loading a causal language model with Hugging Face Transformers and running inference via the high-level pipeline API. Two lines load the model; one line generates text.

Key classes you will encounter:

AutoTokenizer: Loads the correct tokenizer for any model.
AutoModelForCausalLM: Loads decoder-only models (GPT, LLaMA, Mistral).
AutoModelForSeq2SeqLM: Loads encoder-decoder models (T5, BART).
Trainer and TrainingArguments: High-level fine-tuning API.
pipeline: Quick inference for common tasks (text generation, classification, summarization).

PyTorch

PyTorch is the tensor computation framework that powers nearly all modern LLM training and inference. The transformers library is built on top of PyTorch (or optionally JAX/TensorFlow, though PyTorch dominates in practice). Code Fragment C.1.2 below puts this into practice.

CUDA available: True Device count: 1 GPU name: NVIDIA A100-SXM4-80GB GPU memory: 79.6 GB

NumPy and Pandas

numpy handles numerical arrays on CPU and is used for data preprocessing, metric computation, and quick prototyping. pandas manages tabular data and is indispensable for preparing fine-tuning datasets, analyzing evaluation results, and handling metadata. Code Fragment C.1.3 below puts this into practice.

Training examples: 8923

# pip install datasets
from datasets import load_dataset

ds = load_dataset("imdb", split="train[:100]")
print(ds[0]["text"][:200])
print(f"Features: {ds.features}")

# pip install peft
from peft import LoraConfig, get_peft_model

config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()

Code Fragment C.1.2: Checking GPU availability with PyTorch, performing basic tensor operations, and moving tensors to the GPU. These are the building blocks that underlie every model forward pass.

Additional Libraries

Additional Libraries Comparison

Library	Purpose	Chapters
`datasets`	Efficient dataset loading and processing (Hugging Face)	12, 13, 14
`peft`	Parameter-efficient fine-tuning (LoRA, QLoRA)	14
`trl`	Transformer Reinforcement Learning (SFT, DPO, RLHF)	16
`vllm`	High-throughput inference serving	8
`langchain`	LLM application framework (chains, agents, RAG)	19, 21
`sentence-transformers`	Embedding models for semantic search	18
`bitsandbytes`	Quantized model loading (4-bit, 8-bit)	8, 14
`wandb`	Experiment tracking and visualization	25

datasets in Practice

Load and inspect a Hugging Face dataset with streaming for large corpora.

peft in Practice

Apply a LoRA adapter to a pretrained model for parameter-efficient fine-tuning.

trl in Practice

Set up supervised fine-tuning with the TRL library's SFTTrainer.

# pip install trl
from trl import SFTTrainer, SFTConfig

training_args = SFTConfig(output_dir="./sft_output", max_seq_length=512)
trainer = SFTTrainer(model=model, args=training_args, train_dataset=dataset)
trainer.train()

Code Fragment 5: Supervised fine-tuning with SFTTrainer: the SFTConfig controls sequence length and output directory, while the trainer handles tokenization, loss masking, and checkpointing internally.

sentence-transformers in Practice

Generate embeddings for semantic search using a sentence transformer model.

# pip install sentence-transformers
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedder.encode(["How do transformers work?", "Attention is all you need."])
print(f"Embedding shape: {embeddings.shape}")

Code Fragment 6: Generating dense vector embeddings with sentence-transformers. The encode() method returns a NumPy array where each row is a fixed-size embedding suitable for cosine similarity search.

wandb in Practice

Log training metrics to Weights and Biases for experiment tracking.

# pip install wandb
import wandb

wandb.init(project="my-llm-project", name="experiment-1")
wandb.log({"loss": 0.42, "learning_rate": 2e-5, "epoch": 1})
wandb.finish()

Code Fragment 7: Logging training metrics to Weights and Biases. The wandb.init() creates a run, wandb.log() records metric dictionaries, and wandb.finish() closes the run and syncs data to the dashboard.

langchain in Practice

Build a simple LLM chain with LangChain for prompt templating and invocation.

# pip install langchain langchain-openai
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
result = chain.invoke({"text": "LLMs use transformers to process text."})
print(result.content)

Code Fragment 8: A minimal LangChain pipeline using the LCEL pipe operator. The ChatPromptTemplate formats the input, the ChatOpenAI model processes it, and the chain is invoked with a dictionary of template variables.