Section U.1: Docker Fundamentals: Images, Containers, and Volumes | Building Conversational AI with LLMs and Agents

Big Picture

Docker packages applications and their dependencies into lightweight, portable units called containers. For ML engineers, Docker solves the perennial environment reproducibility problem: CUDA versions, Python dependencies, system libraries, and model weights can all be captured in a single image that runs identically on a laptop, a cloud VM, or a Kubernetes cluster. This section covers the core concepts, installation, and essential commands you need to containerize ML workloads.

1. Why ML Engineers Need Docker

Machine learning projects depend on a complex stack of software: Python interpreters, numerical libraries (NumPy, PyTorch, TensorFlow), CUDA toolkits, cuDNN, system-level packages, and often specific versions of each. A model that trains successfully on one machine may fail on another because of a minor version mismatch in any of these layers. Virtual environments like venv or conda manage Python packages but cannot control system libraries or GPU drivers.

Docker addresses this gap by packaging the entire runtime environment, from the operating system up through application code, into an image. When you run that image, Docker creates an isolated container that behaves identically regardless of the host machine's configuration. This guarantee is essential for three ML workflows: reproducible training, consistent evaluation, and reliable deployment.

Key Insight

Containers are not virtual machines. A VM runs a full guest operating system with its own kernel, consuming gigabytes of memory. A container shares the host kernel and isolates only the user-space processes, making it lightweight (often under 100 MB for the container layer itself) and fast to start (seconds, not minutes).

2. Core Concepts: Images, Containers, and Layers

Docker's architecture revolves around three fundamental concepts. An image is a read-only template that contains the filesystem, installed packages, environment variables, and a default command. A container is a running instance of an image, with its own writable layer on top of the image's read-only layers. You can run multiple containers from the same image, and each one is isolated from the others.

Images are built in layers. Each instruction in a Dockerfile (the recipe for building an image) creates a new layer. Docker caches these layers, so if you change only your application code, Docker reuses the cached layers for the base OS and installed packages, making rebuilds fast.

┌─────────────────────────────────────────┐
│          Container (writable)           │
│    your_script.py output, logs, etc.    │
├─────────────────────────────────────────┤
│   Layer 4: COPY app code                │
├─────────────────────────────────────────┤
│   Layer 3: pip install requirements     │
├─────────────────────────────────────────┤
│   Layer 2: apt-get install system libs  │
├─────────────────────────────────────────┤
│   Layer 1: Base image (python:3.11)     │
└─────────────────────────────────────────┘

Figure U.1.1: Docker images are composed of read-only layers stacked on top of a base image. The container adds a thin writable layer at the top. Layers are cached and shared across images, reducing disk usage and build times.

3. Installing Docker

Docker Desktop is available for Windows, macOS, and Linux. On Linux servers (the most common environment for ML workloads), you can install Docker Engine directly. The following commands install Docker on Ubuntu 22.04 or later.

# Update package index and install prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg

# Add Docker's official GPG key and repository
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
    | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) \
    signed-by=/etc/apt/keyrings/docker.gpg] \
    https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) stable" \
    | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# Allow your user to run Docker without sudo
sudo usermod -aG docker $USER

After installation, verify that Docker is working by running the hello-world container.

# Verify the installation
docker run hello-world

# Check Docker version
docker version

Tip

On cloud VMs (AWS EC2, GCP Compute Engine, Azure VMs), Docker is often pre-installed on ML-optimized images. Check with docker --version before installing. If you need GPU support, ensure the NVIDIA Container Toolkit is also installed (covered in Section U.2).

4. Essential Docker Commands

The Docker CLI provides commands for building images, running containers, managing storage, and inspecting state. The following table summarizes the commands you will use most frequently in ML workflows.

Command	Purpose	Example
`docker build`	Build an image from a Dockerfile	`docker build -t mymodel:v1 .`
`docker run`	Create and start a container	`docker run -it mymodel:v1 bash`
`docker ps`	List running containers	`docker ps -a` (include stopped)
`docker images`	List local images	`docker images`
`docker stop`	Stop a running container	`docker stop my_container`
`docker rm`	Remove a stopped container	`docker rm my_container`
`docker rmi`	Remove an image	`docker rmi mymodel:v1`
`docker logs`	View container output	`docker logs -f my_container`
`docker exec`	Run a command inside a running container	`docker exec -it my_container bash`
`docker pull`	Download an image from a registry	`docker pull python:3.11-slim`

Figure U.1.2: Essential Docker commands for ML development workflows.

5. Running Your First ML Container

Let us walk through running a PyTorch container interactively. The official PyTorch images from NVIDIA's NGC catalog come pre-configured with CUDA, cuDNN, and PyTorch. This is the fastest way to get a working GPU-enabled environment.

# Pull the official PyTorch container from NVIDIA NGC
docker pull nvcr.io/nvidia/pytorch:24.01-py3

# Run interactively with GPU access
docker run --gpus all -it \
    --name pytorch-dev \
    -v $(pwd)/data:/workspace/data \
    -v $(pwd)/models:/workspace/models \
    -p 8888:8888 \
    nvcr.io/nvidia/pytorch:24.01-py3 bash

This command does several things. The --gpus all flag grants the container access to all host GPUs (requires the NVIDIA Container Toolkit). The -it flags allocate an interactive terminal. The -v flags mount host directories into the container, allowing data and models to persist after the container stops. The -p flag maps port 8888 from the container to the host, useful for Jupyter notebooks.

6. Volumes: Persistent Data for ML Workloads

By default, all data written inside a container is lost when the container is removed. For ML projects, you need persistent storage for datasets, model checkpoints, logs, and experiment outputs. Docker provides two mechanisms for persistent data: bind mounts and named volumes.

A bind mount maps a specific host directory to a container path. This is ideal when you want to edit code on the host and have changes reflected immediately inside the container. A named volume is managed by Docker and stored in Docker's internal directory structure. Named volumes are better for databases, caches, and other data that the container manages exclusively.

# Bind mount: map host directory to container directory
docker run -v /home/user/datasets:/data mymodel:v1

# Named volume: Docker manages the storage location
docker volume create model-cache
docker run -v model-cache:/root/.cache/huggingface mymodel:v1

# List all volumes
docker volume ls

# Inspect a volume to find its host path
docker volume inspect model-cache

Practical Example

A common pattern for LLM projects is to create a named volume for the HuggingFace cache directory (~/.cache/huggingface). This way, model weights downloaded in one container are available to all future containers, avoiding repeated multi-gigabyte downloads. Mount it with -v hf-cache:/root/.cache/huggingface.

7. Networking Basics

Containers are isolated by default, which means they cannot communicate with each other or the host network unless explicitly configured. Docker provides port mapping and bridge networks to enable communication.

Port mapping with -p exposes a container port on the host. For example, -p 8000:8000 maps port 8000 inside the container to port 8000 on the host, making an API server accessible from outside. When you need multiple containers to communicate (for example, an inference server and a database), you create a Docker network and attach both containers to it.

# Create a custom bridge network
docker network create ml-network

# Run a vector database on the network
docker run -d --name chromadb \
    --network ml-network \
    -p 8000:8000 \
    chromadb/chroma:latest

# Run your application on the same network
docker run -d --name app \
    --network ml-network \
    -e CHROMA_HOST=chromadb \
    myapp:v1

Inside the ml-network, the application container can reach ChromaDB using the hostname chromadb (Docker's built-in DNS resolves container names to their IP addresses). Port 8000 is also mapped to the host, so you can access ChromaDB from your browser at http://localhost:8000.

8. Cleaning Up: Managing Disk Space

Docker images for ML workloads are large, often 5 to 15 GB each. Over time, unused images, stopped containers, and dangling volumes can consume hundreds of gigabytes. Regular cleanup is essential.

# Remove all stopped containers
docker container prune -f

# Remove unused images (not referenced by any container)
docker image prune -f

# Remove all unused volumes (WARNING: deletes data)
docker volume prune -f

# Nuclear option: remove everything unused
docker system prune -a --volumes -f

# Check Docker disk usage
docker system df

Warning

The docker system prune -a --volumes command removes all unused images, containers, and volumes. If you have model weights stored in named volumes that are not currently mounted, they will be deleted. Always check with docker volume ls before running a volume prune.

Summary

Docker provides the foundation for reproducible ML environments by packaging code, dependencies, and system libraries into portable images. Images are built from layered filesystems that enable efficient caching and sharing. Containers are lightweight, isolated runtime instances of these images. Volumes provide persistent storage for datasets, model weights, and experiment outputs. Networks enable communication between containers. In the next section, we explore how to write Dockerfiles specifically optimized for ML and LLM projects, including GPU passthrough and multi-stage builds.