FM.3.T4: MLOps / Production Track

Prerequisites

DevOps experience (CI/CD, Docker, Kubernetes). Python packaging and testing. Chapters 10 through 11 for API basics. Cloud provider experience (AWS/GCP/Azure).

MLOps / Production Track

Deploying, operating, and maintaining LLM systems in production environments.

Learning Sequence

Follow the numbered steps in order. Each step builds on the previous one to give you a coherent understanding of this topic area.

Chapter 09: Inference Optimization and Efficient Serving (quantization, KV-cache, speculative decoding)
Sections 14.1 through 14.3: Fine-Tuning Basics (when and how to fine-tune for your use case)
Chapter 15: Parameter-Efficient Fine-Tuning (PEFT) (LoRA, QLoRA, adapter merging)
Chapter 26: Agent Safety and Production Infrastructure (guardrails, sandboxing, agent monitoring)
Chapter 29: Evaluation and Experiment Design (LLM-as-judge, A/B testing, benchmarks)
Chapter 30: Observability and Monitoring (drift monitoring, trace analysis, dashboards)
Chapter 31: Production Engineering and Operations (CI/CD for LLMs, guardrails, cost management)
Chapter 32: Safety, Ethics and Regulation (red-teaming, compliance, responsible deployment)
Chapter 34: Emerging Architectures and Scaling Frontiers (mixture of experts, state-space models, and infrastructure implications)
Chapter 35: AI, Society and Open Problems (AI safety alignment, governance, and operational responsibilities)

Recommended Appendices

Appendix S: Inference Serving – deploy and serve models with low latency
Appendix U: Docker and Containers – containerize ML services for reproducible deployments
Appendix T: Distributed ML – scale training and inference across multiple GPUs
Appendix R: Experiment Tracking – track experiments and model versions in production

What Comes Next

Return to the Course Syllabi overview to explore other tracks and courses, or proceed to FM.4: How to Use This Book for a quick orientation on conventions and callout types.