Real-world AI applications rarely consist of a single container. A typical RAG system might include an LLM inference server, a vector database, a REST API gateway, a Redis cache, and a PostgreSQL database for user sessions. Docker Compose lets you define, configure, and launch all of these services with a single docker compose up command, using a declarative YAML file that describes the entire application stack.
1. Why Docker Compose?
In Section U.1, we launched individual containers with long docker run commands that
included port mappings, volume mounts, network assignments, and environment variables. Managing five
or six such commands manually is error-prone and tedious. Docker Compose replaces these ad-hoc commands
with a single configuration file (docker-compose.yml or compose.yml) that
defines all services, their relationships, and their configurations in one place.
Compose provides several capabilities beyond simple container launching. It creates an isolated network for the application stack automatically, manages service startup order with dependency declarations, supports health checks to ensure services are ready before dependents start, and enables scaling individual services with a single flag.
2. Compose File Structure
A Compose file uses YAML syntax and organizes configuration into top-level keys:
services (the containers), volumes (persistent storage), and
networks (communication channels). The following example shows the structure with a
minimal two-service stack.
# compose.yml (or docker-compose.yml)
# Defines a simple API + database stack
services:
api:
build: ./api # Build from local Dockerfile
ports:
- "8000:8000" # Map host port to container port
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/appdb
depends_on:
db:
condition: service_healthy # Wait for DB health check
volumes:
- ./api/src:/app/src # Mount source code for development
db:
image: postgres:16-alpine # Use pre-built image
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: appdb
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 5s
timeout: 3s
retries: 5
volumes:
pgdata: # Named volume for database persistence
Docker Compose automatically creates a bridge network for the stack. Services can reach each other using their service name as the hostname. In the example above, the API service connects to PostgreSQL at db:5432, where db is resolved by Docker's internal DNS. No manual network configuration is required.
3. A Complete RAG Application Stack
Let us build a realistic Compose file for a RAG (Retrieval-Augmented Generation) application. This stack includes an LLM inference server, a vector database for document embeddings, a REST API that orchestrates queries, a Redis cache for session management, and a PostgreSQL database for user data.
# compose.yml: RAG Application Stack
services:
# LLM inference server (GPU-enabled)
llm:
image: vllm/vllm-openai:latest
command: >
--model meta-llama/Llama-3.1-8B-Instruct
--max-model-len 4096
--gpu-memory-utilization 0.90
ports:
- "8001:8000"
volumes:
- hf-cache:/root/.cache/huggingface
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 120s # LLM loading takes time
# Vector database for document retrieval
chromadb:
image: chromadb/chroma:0.5.0
ports:
- "8002:8000"
volumes:
- chroma-data:/chroma/chroma
environment:
- ANONYMIZED_TELEMETRY=false
# Application API server
api:
build:
context: ./api
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- LLM_BASE_URL=http://llm:8000/v1
- CHROMA_HOST=chromadb
- CHROMA_PORT=8000
- REDIS_URL=redis://redis:6379/0
- DATABASE_URL=postgresql://raguser:ragpass@postgres:5432/ragdb
depends_on:
llm:
condition: service_healthy
chromadb:
condition: service_started
redis:
condition: service_healthy
postgres:
condition: service_healthy
# Session cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
# User and conversation database
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: raguser
POSTGRES_PASSWORD: ragpass
POSTGRES_DB: ragdb
volumes:
- pgdata:/var/lib/postgresql/data
- ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U raguser"]
interval: 5s
timeout: 3s
retries: 5
volumes:
hf-cache:
chroma-data:
redis-data:
pgdata:
┌─────────────────────────────────────────────────────────────┐
│ Docker Compose Network │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐│
│ │ API │──>│ LLM │ │ Redis │ │Postgres ││
│ │ :8000 │ │ (vLLM) │ │ :6379 │ │ :5432 ││
│ │ │──>│ :8000 │ │ │ │ ││
│ └────┬─────┘ └──────────┘ └──────────┘ └─────────┘│
│ │ │
│ │ ┌──────────┐ │
│ └────────>│ ChromaDB │ │
│ │ :8000 │ │
│ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
4. Essential Compose Commands
Docker Compose provides a set of subcommands for managing the application lifecycle. The following commands cover the most common operations during development and deployment.
# Start all services in the background
docker compose up -d
# Start and rebuild images if Dockerfiles changed
docker compose up -d --build
# View logs from all services (follow mode)
docker compose logs -f
# View logs from a specific service
docker compose logs -f api
# Stop all services (preserves volumes)
docker compose down
# Stop and remove volumes (WARNING: deletes all data)
docker compose down -v
# List running services and their status
docker compose ps
# Execute a command inside a running service
docker compose exec api bash
# Scale a specific service to multiple instances
docker compose up -d --scale api=3
During development, use docker compose up without -d to see all service logs interleaved in your terminal. Press Ctrl+C to stop everything. For production, always use -d (detached mode) and monitor with docker compose logs -f in a separate terminal.
5. Health Checks and Dependency Management
The depends_on directive in Compose controls startup order, but by default it only waits
for the container to start, not for the service inside it to be ready. This is a critical distinction
for ML applications where an LLM server may take two minutes to load model weights. The
condition: service_healthy option makes Compose wait for a service's health check to pass
before starting its dependents.
Health checks are defined per service and specify a command that returns exit code 0 when the service
is ready. The start_period parameter is especially important for LLM servers, as it tells
Docker to ignore health check failures during the initial loading phase.
llm:
image: vllm/vllm-openai:latest
healthcheck:
# Probe the health endpoint
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s # Check every 30 seconds
timeout: 10s # Fail if no response in 10 seconds
retries: 3 # Mark unhealthy after 3 consecutive failures
start_period: 180s # Allow 3 minutes for model loading
Without condition: service_healthy, your API container may start and immediately crash because the LLM server is still loading model weights. Always pair GPU-intensive services with generous start_period values and use health check conditions on dependent services.
6. Environment Files and Configuration Management
Hardcoding configuration values in compose.yml makes the file difficult to reuse across
environments (development, staging, production). Docker Compose supports .env files that
supply variable values referenced in the Compose file with ${VARIABLE} syntax.
# .env file (not committed to version control)
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
LLM_MAX_LEN=4096
LLM_GPU_UTIL=0.90
POSTGRES_USER=raguser
POSTGRES_PASSWORD=supersecretpassword
HF_TOKEN=hf_abc123xyz
# compose.yml referencing environment variables
services:
llm:
image: vllm/vllm-openai:latest
command: >
--model ${LLM_MODEL}
--max-model-len ${LLM_MAX_LEN}
--gpu-memory-utilization ${LLM_GPU_UTIL}
environment:
- HF_TOKEN=${HF_TOKEN}
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
For multi-environment setups, maintain separate files like .env.dev, .env.staging, and .env.prod. Launch with a specific environment by passing the file explicitly: docker compose --env-file .env.prod up -d. Add all .env* files to your .gitignore to prevent accidental credential commits.
7. GPU Configuration in Compose
GPU access in Docker Compose requires the deploy.resources.reservations.devices
configuration block. This is the Compose equivalent of the --gpus flag used with
docker run.
services:
llm:
image: vllm/vllm-openai:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1 # Number of GPUs to allocate
capabilities: [gpu] # Required capability
training:
build: ./training
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0", "1"] # Specific GPU IDs
capabilities: [gpu]
You can allocate GPUs by count (let Docker choose which GPUs) or by specific device IDs (for deterministic allocation). When running multiple GPU services on the same machine, use device IDs to ensure each service gets its own GPU and they do not compete for memory.
8. Overrides and Profiles for Development
Docker Compose supports override files that layer additional configuration on top of the base
compose.yml. This is useful for development-specific settings like source code mounts,
debug ports, and relaxed security.
# compose.override.yml (automatically loaded in development)
services:
api:
volumes:
- ./api/src:/app/src # Hot-reload source code
environment:
- LOG_LEVEL=debug
- RELOAD=true # Enable auto-reload (uvicorn)
command: ["uvicorn", "src.main:app", "--reload", "--host", "0.0.0.0"]
llm:
# In development, use a smaller model for faster iteration
command: >
--model microsoft/Phi-3-mini-4k-instruct
--max-model-len 2048
Compose automatically merges compose.yml with compose.override.yml if both
exist. For production, use docker compose -f compose.yml -f compose.prod.yml up -d to
load a production-specific override instead of the development defaults.
Summary
Docker Compose transforms multi-container management from a series of manual commands into a declarative configuration file. Health checks with dependency conditions ensure services start in the correct order, which is critical when LLM servers need minutes to load model weights. Environment files and override files enable clean separation between development and production configurations. In the next section, we focus specifically on containerizing LLM inference servers like vLLM, TGI, and Ollama.