Pathway 4: "I Want to Deploy LLMs in Production" (Platform / DevOps Engineer)
Target audience: Platform engineers, DevOps engineers, and SREs responsible for LLM infrastructure
Goal: Understand how to serve, monitor, scale, and secure LLM-powered systems in production environments.
Chapter Guide
- Skim Ch 06: Pre-training and Scaling Laws (context on model capabilities and costs) context on compute and memory requirements
- Skim Ch 07: The Modern LLM Landscape (context on model capabilities and costs) context on model sizes and hardware needs
- Focus Ch 09: Inference Optimization quantization, KV-cache, batching, and serving
- Focus Ch 10: Working with LLM APIs rate limiting, failover, and cost management
- Skim Ch 14: Fine-Tuning Fundamentals (Sections 14.1 through 14.3) understand what training jobs look like
- Skim Ch 15: Parameter-Efficient Fine-Tuning (LoRA, QLoRA, adapter merging) serving multiple LoRA adapters efficiently
- Skim Ch 20: RAG (infrastructure sections) infrastructure side of retrieval pipelines
- Skim Ch 26: Agent Safety and Production Infrastructure production guardrails for agent systems
- Focus Ch 29: Evaluation and Experiment Design build evaluation into your CI/CD pipeline
- Focus Ch 30: Observability and Monitoring traces, dashboards, alerting, and debugging
- Focus Ch 31: Production Engineering and LLMOps CI/CD, deployment strategies, and cost control
- Focus Ch 32: Safety, Ethics and Security infrastructure-level security and compliance
- Skim Ch 33: Strategy and ROI cost modeling to justify infrastructure spend
- Skim Ch 34: Emerging Architectures MoE and new architectures affecting deployment
- Optional Ch 35: AI and Society governance context for platform decisions
Recommended Appendices
- Appendix S: Inference Serving – deploy and serve models at scale
- Appendix U: Docker and Containers – containerize ML services for reproducible deployments
- Appendix T: Distributed ML – scale training and inference across clusters
What Comes Next
Return to the Reading Pathways overview to explore other pathways, or proceed to FM.4: How to Use This Book for a quick orientation on conventions and callout types, then start reading.