FM.2.17: Pathway 17: "I Want to Build AI Infrastructure" (AI Infrastructure Engineer)

Pathway 17: "I Want to Build AI Infrastructure" (AI Infrastructure Engineer)

Time estimate: 4 to 5 weeks Difficulty: Intermediate

Target audience: Infrastructure engineers, SREs, and platform engineers who need to deploy, serve, optimize, and monitor LLM systems at scale

Goal: Master the operational side of LLM systems: inference serving, quantization, caching, monitoring pipelines, cost optimization, and production guardrails.

Chapter Guide

Skim Ch 06: Pre-training and Scaling Laws (understand compute and memory requirements) understand compute and memory requirements
Skim Ch 07: The Modern LLM Landscape (model sizes, capabilities, hardware needs) model sizes, capabilities, and hardware needs
Focus Ch 09: Inference Optimization (quantization, KV-cache, batching, speculative decoding) quantization, KV-cache, batching, speculative decoding
Focus Ch 10: Working with LLM APIs (rate limiting, failover, cost management) rate limiting, failover, and cost management
Skim Ch 15: PEFT (serving multiple LoRA adapters efficiently) serve multiple LoRA adapters efficiently
Focus Ch 19: Embeddings and Vector Databases (index scaling, ANN algorithms) index scaling and ANN algorithm tradeoffs
Skim Ch 26: Agent Safety and Production Infrastructure production guardrails for agent infrastructure
Focus Ch 29: Evaluation and Experiment Design build evaluation into deployment pipelines
Focus Ch 30: Observability and Monitoring (your core chapter: traces, dashboards, alerting) your core chapter: traces, dashboards, and alerting
Focus Ch 31: Production Engineering and LLMOps (CI/CD, deployment strategies, cost management) CI/CD, blue-green deploys, and cost management
Focus Ch 32: Safety, Ethics and Security (infrastructure-level security) infrastructure-level security and access controls
Skim Ch 34: Emerging Architectures infrastructure implications of MoE and state-space models
Optional Ch 35: AI and Society governance context for infrastructure decisions

Recommended Appendices

Appendix S: Inference Serving – deploy and serve models at scale
Appendix T: Distributed ML – scale training and inference across multiple GPUs
Appendix U: Docker and Containers – containerize ML services for reproducible deployments

What Comes Next

Return to the Reading Pathways overview to explore other pathways, or proceed to FM.4: How to Use This Book for a quick orientation on conventions and callout types, then start reading.