Section 65.1: Docker Fundamentals: Images, Containers, and Volumes

"It works on my machine, but my machine is now a Docker image, and your machine is also that Docker image, and so the bug is reproducible by definition."
Deploy, Container-Native AI Agent

Big Picture

Docker packages applications and their dependencies into lightweight, portable units called containers. For ML engineers, Docker solves the perennial environment reproducibility problem: CUDA versions, Python dependencies, system libraries, and model weights can all be captured in a single image that runs identically on a laptop, a cloud VM, or a Kubernetes cluster. This section covers the core concepts, installation, and essential commands you need to containerize ML workloads. For LLM and agent deployment specifically, the same image abstraction is what lets you ship a vLLM inference server, a RAG service, or an agent runtime to production with the same CUDA, tokenizer, and model-weight pinning that you used in evaluation; without containers, LLM serving stacks drift between environments and the same prompt starts returning different outputs.

Prerequisites

This section assumes basic Linux command-line fluency, awareness of Python virtual environments, and a working mental model for what an LLM inference server is (covered in Section 9.1).

65.1.1 Why ML Engineers Need Docker

Fun Fact

Docker was originally called dotCloud, a PaaS company that pivoted to release Docker as an internal tool in March 2013. Solomon Hykes demoed the original Docker prototype in a 5-minute lightning talk at PyCon US 2013; the audience was approximately 70 people, and most of them missed the talk because it was scheduled against the lunch break.

Machine learning projects depend on a complex stack of software: Python interpreters, numerical libraries (NumPy, PyTorch, TensorFlow), CUDA toolkits, cuDNN, system-level packages, and often specific versions of each. A model that trains successfully on one machine may fail on another because of a minor version mismatch in any of these layers. Virtual environments like venv or conda manage Python packages but cannot control system libraries or GPU drivers.

Docker addresses this gap by packaging the entire runtime environment, from the operating system up through application code, into an image. When you run that image, Docker creates an isolated container that behaves identically regardless of the host machine's configuration. This guarantee is essential for three ML workflows: reproducible training, consistent evaluation, and reliable deployment.

Key Insight

Containers are not virtual machines. A VM runs a full guest operating system with its own kernel, consuming gigabytes of memory. A container shares the host kernel and isolates only the user-space processes, making it lightweight (often under 100 MB for the container layer itself) and fast to start (seconds, not minutes).

65.1.2 Core Concepts: Images, Containers, and Layers

Docker's architecture revolves around three fundamental concepts. An image is a read-only template that contains the filesystem, installed packages, environment variables, and a default command. A container is a running instance of an image, with its own writable layer on top of the image's read-only layers. You can run multiple containers from the same image, and each one is isolated from the others.

Images are built in layers. Each instruction in a Dockerfile (the recipe for building an image) creates a new layer. Docker caches these layers, so if you change only your application code, Docker reuses the cached layers for the base OS and installed packages, making rebuilds fast.

Under the Hood: Copy-on-write union filesystem

A container's filesystem is not copied from the image; it is a union mount. A storage driver such as OverlayFS stacks the image's read-only layers as a lowerdir and gives the container a single empty upperdir on top, presenting one merged view. Reads fall through to whichever lower layer holds the file, so launching a container copies nothing and is near-instant. Only when a process writes a file does the driver copy that file up into the upperdir (copy-on-write) and edit the copy, leaving the shared layers untouched. Because every container from an image shares the same read-only layers and adds only its own diffs, disk use grows with what changes, not with the number of containers.

Docker image layered architecture — **Figure 65.1.1**: Docker images are composed of read-only layers stacked on top of a base image. The container adds a thin writable layer at the top. Layers are cached and shared across images, reducing disk usage and build times.

65.1.3 Installing Docker

Docker Desktop is available for Windows, macOS, and Linux. On Linux servers (the most common environment for ML workloads), you can install Docker Engine directly. The following commands install Docker on Ubuntu 22.04 or later.

# Update package index and install prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg

# Add Docker's official GPG key and repository
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
    | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) \
    signed-by=/etc/apt/keyrings/docker.gpg] \
    https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) stable" \
    | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# Allow your user to run Docker without sudo
sudo usermod -aG docker $USER

Code Fragment 65.1.1a: Update package index and install prerequisites

After installation, verify that Docker is working by running the hello-world container.

# Verify the installation
docker run hello-world

# Check Docker version
docker version

Code Fragment 65.1.2: Verify the installation

Tip

On cloud VMs (AWS EC2, GCP Compute Engine, Azure VMs), Docker is often pre-installed on ML-optimized images. Check with docker --version before installing. If you need GPU support, ensure the NVIDIA Container Toolkit is also installed (covered in Section 65.2).

65.1.4 Essential Docker Commands

The Docker CLI provides commands for building images, running containers, managing storage, and inspecting state. The following table summarizes the commands you will use most frequently in ML workflows.

Command	Purpose	Example
`docker build`	Build an image from a Dockerfile	`docker build -t mymodel:v1 .`
`docker run`	Create and start a container	`docker run -it mymodel:v1 bash`
`docker ps`	List running containers	`docker ps -a` (include stopped)
`docker images`	List local images	`docker images`
`docker stop`	Stop a running container	`docker stop my_container`
`docker rm`	Remove a stopped container	`docker rm my_container`
`docker rmi`	Remove an image	`docker rmi mymodel:v1`
`docker logs`	View container output	`docker logs -f my_container`
`docker exec`	Run a command inside a running container	`docker exec -it my_container bash`
`docker pull`	Download an image from a registry	`docker pull python:3.11-slim`

Figure 65.1.2a: Essential Docker commands for ML development workflows.

65.1.5 Running Your First ML Container

Let us walk through running a PyTorch container interactively. The official PyTorch images from NVIDIA's NGC catalog (NVIDIA GPU Cloud, NVIDIA's public registry of pre-built GPU-ready containers) come pre-configured with CUDA, cuDNN, and PyTorch. This is the fastest way to get a working GPU-enabled environment.

# Pull the official PyTorch container from NVIDIA NGC
docker pull nvcr.io/nvidia/pytorch:24.01-py3
# Run interactively with GPU access
docker run --gpus all -it \
--name pytorch-dev \
-v $(pwd)/data:/workspace/data \
-v $(pwd)/models:/workspace/models \
-p 8888:8888 \
nvcr.io/nvidia/pytorch:24.01-py3 bash

Code Fragment 65.1.3: Pull the official PyTorch container from NVIDIA NGC

This command does several things. The --gpus all flag grants the container access to all host GPUs (requires the NVIDIA Container Toolkit). The -it flags allocate an interactive terminal. The -v flags mount host directories into the container, allowing data and models to persist after the container stops. The -p flag maps port 8888 from the container to the host, useful for Jupyter notebooks.

65.1.6 Volumes: Persistent Data for ML Workloads

By default, all data written inside a container is lost when the container is removed. For ML projects, you need persistent storage for datasets, model checkpoints, logs, and experiment outputs. Docker provides two mechanisms for persistent data: bind mounts and named volumes.

A bind mount maps a specific host directory to a container path. This is ideal when you want to edit code on the host and have changes reflected immediately inside the container. A named volume is managed by Docker and stored in Docker's internal directory structure. Named volumes are better for databases, caches, and other data that the container manages exclusively.

# Bind mount: map host directory to container directory
docker run -v /home/user/datasets:/data mymodel:v1

# Named volume: Docker manages the storage location
docker volume create model-cache
docker run -v model-cache:/root/.cache/huggingface mymodel:v1

# List all volumes
docker volume ls

# Inspect a volume to find its host path
docker volume inspect model-cache

Code Fragment 65.1.4: Bind mount: map host directory to container directory

Real-World Scenario

Sharing the Hugging Face Cache Across Containers

A common pattern for LLM projects is to create a named volume for the Hugging Face cache directory (~/.cache/huggingface). This way, model weights downloaded in one container are available to all future containers, avoiding repeated multi-gigabyte downloads. Mount it with -v hf-cache:/root/.cache/huggingface.

65.1.7 Networking Basics

Containers are isolated by default, which means they cannot communicate with each other or the host network unless explicitly configured. Docker provides port mapping and bridge networks to enable communication.

Port mapping with -p exposes a container port on the host. For example, -p 8000:8000 maps port 8000 inside the container to port 8000 on the host, making an API server accessible from outside. When you need multiple containers to communicate (for example, an inference server and a database), you create a Docker network and attach both containers to it.

# Create a custom bridge network
docker network create ml-network

# Run a vector database on the network
docker run -d --name chromadb \
    --network ml-network \
    -p 8000:8000 \
    chromadb/chroma:latest

# Run your application on the same network
docker run -d --name app \
    --network ml-network \
    -e CHROMA_HOST=chromadb \
    myapp:v1

Code Fragment 65.1.5: Create a custom bridge network

Inside the ml-network, the application container can reach ChromaDB using the hostname chromadb (Docker's built-in DNS resolves container names to their IP addresses). Port 8000 is also mapped to the host, so you can access ChromaDB from your browser at http://localhost:8000.

65.1.8 Cleaning Up: Managing Disk Space

Docker images for ML workloads are large, often 5 to 15 GB each. Over time, unused images, stopped containers, and dangling volumes can consume hundreds of gigabytes. Regular cleanup is essential.

# Remove all stopped containers
docker container prune -f

# Remove unused images (not referenced by any container)
docker image prune -f

# Remove all unused volumes (WARNING: deletes data)
docker volume prune -f

# Nuclear option: remove everything unused
docker system prune -a --volumes -f

# Check Docker disk usage
docker system df

Code Fragment 65.1.6: Remove all stopped containers

Warning

The docker system prune -a --volumes command removes all unused images, containers, and volumes. If you have model weights stored in named volumes that are not currently mounted, they will be deleted. Always check with docker volume ls before running a volume prune.

Summary

Docker provides the foundation for reproducible ML environments by packaging code, dependencies, and system libraries into portable images. Images are built from layered filesystems that enable efficient caching and sharing. Containers are lightweight, isolated runtime instances of these images. Volumes provide persistent storage for datasets, model weights, and experiment outputs. Networks enable communication between containers. In the next section, we explore how to write Dockerfiles specifically optimized for ML and LLM projects, including GPU passthrough and multi-stage builds.

What's Next?

In the next section, Section 65.2: Writing Dockerfiles for ML and LLM Projects, we build on the material covered here.

Further Reading

Foundational Sources

Docker Inc. (2024). "Docker Documentation." docs.docker.com. The official reference for image/container/volume semantics; the source of truth when behavior is ambiguous.

Merkel, D. (2014). "Docker: Lightweight Linux Containers for Consistent Development and Deployment." Linux Journal 239. dl.acm.org/doi/10.5555/2600239.2600241. The original Docker paper; useful historical context for why container layering looks the way it does.

Container Internals

Open Containers Initiative (2024). "OCI Runtime Specification." github.com/opencontainers/runtime-spec. The standard that Docker, containerd, and CRI-O all implement; defines what a "container" formally is.

Burns, B., Beda, J., & Hightower, K. (2022). Kubernetes: Up and Running (3rd ed.). O'Reilly. Chapter 1 on container basics is a clear high-level treatment of why containers are useful for ML workloads.

ML Container Patterns

NVIDIA (2024). "NGC Container Catalog." catalog.ngc.nvidia.com. Reference catalog of GPU-ready ML containers; the default base images for PyTorch, TensorFlow, and Triton.

NVIDIA (2024). "NVIDIA Container Toolkit." docs.nvidia.com/datacenter/cloud-native/container-toolkit. The runtime hook that exposes GPUs to containers; required reading for any LLM dockerfile.