FM.2.4: Pathway 4: "I Want to Deploy LLMs in Production" (Platform / DevOps Engineer)

Pathway 4: "I Want to Deploy LLMs in Production" (Platform / DevOps Engineer)

Time estimate: 4 to 5 weeks Difficulty: Intermediate

Target audience: Platform engineers, DevOps engineers, and SREs responsible for LLM infrastructure

Goal: Understand how to serve, monitor, scale, and secure LLM-powered systems in production environments.

Chapter Guide

Skim Ch 06: Pre-training and Scaling Laws (context on model capabilities and costs) context on compute and memory requirements
Skim Ch 07: The Modern LLM Landscape (context on model capabilities and costs) context on model sizes and hardware needs
Focus Ch 09: Inference Optimization quantization, KV-cache, batching, and serving
Focus Ch 10: Working with LLM APIs rate limiting, failover, and cost management
Skim Ch 14: Fine-Tuning Fundamentals (Sections 14.1 through 14.3) understand what training jobs look like
Skim Ch 15: Parameter-Efficient Fine-Tuning (LoRA, QLoRA, adapter merging) serving multiple LoRA adapters efficiently
Skim Ch 20: RAG (infrastructure sections) infrastructure side of retrieval pipelines
Skim Ch 26: Agent Safety and Production Infrastructure production guardrails for agent systems
Focus Ch 29: Evaluation and Experiment Design build evaluation into your CI/CD pipeline
Focus Ch 30: Observability and Monitoring traces, dashboards, alerting, and debugging
Focus Ch 31: Production Engineering and LLMOps CI/CD, deployment strategies, and cost control
Focus Ch 32: Safety, Ethics and Security infrastructure-level security and compliance
Skim Ch 33: Strategy and ROI cost modeling to justify infrastructure spend
Skim Ch 34: Emerging Architectures MoE and new architectures affecting deployment
Optional Ch 35: AI and Society governance context for platform decisions

Recommended Appendices

Appendix S: Inference Serving – deploy and serve models at scale
Appendix U: Docker and Containers – containerize ML services for reproducible deployments
Appendix T: Distributed ML – scale training and inference across clusters

Return to the Reading Pathways overview to explore other pathways, or proceed to FM.4: How to Use This Book for a quick orientation on conventions and callout types, then start reading.