Pathway 17: "I Want to Build AI Infrastructure" (AI Infrastructure Engineer)
Target audience: Infrastructure engineers, SREs, and platform engineers who need to deploy, serve, optimize, and monitor LLM systems at scale
Goal: Master the operational side of LLM systems: inference serving, quantization, caching, monitoring pipelines, cost optimization, and production guardrails.
Chapter Guide
- Skim Ch 06: Pre-training and Scaling Laws (understand compute and memory requirements) understand compute and memory requirements
- Skim Ch 07: The Modern LLM Landscape (model sizes, capabilities, hardware needs) model sizes, capabilities, and hardware needs
- Focus Ch 09: Inference Optimization (quantization, KV-cache, batching, speculative decoding) quantization, KV-cache, batching, speculative decoding
- Focus Ch 10: Working with LLM APIs (rate limiting, failover, cost management) rate limiting, failover, and cost management
- Skim Ch 15: PEFT (serving multiple LoRA adapters efficiently) serve multiple LoRA adapters efficiently
- Focus Ch 19: Embeddings and Vector Databases (index scaling, ANN algorithms) index scaling and ANN algorithm tradeoffs
- Skim Ch 26: Agent Safety and Production Infrastructure production guardrails for agent infrastructure
- Focus Ch 29: Evaluation and Experiment Design build evaluation into deployment pipelines
- Focus Ch 30: Observability and Monitoring (your core chapter: traces, dashboards, alerting) your core chapter: traces, dashboards, and alerting
- Focus Ch 31: Production Engineering and LLMOps (CI/CD, deployment strategies, cost management) CI/CD, blue-green deploys, and cost management
- Focus Ch 32: Safety, Ethics and Security (infrastructure-level security) infrastructure-level security and access controls
- Skim Ch 34: Emerging Architectures infrastructure implications of MoE and state-space models
- Optional Ch 35: AI and Society governance context for infrastructure decisions
Recommended Appendices
- Appendix S: Inference Serving – deploy and serve models at scale
- Appendix T: Distributed ML – scale training and inference across multiple GPUs
- Appendix U: Docker and Containers – containerize ML services for reproducible deployments
What Comes Next
Return to the Reading Pathways overview to explore other pathways, or proceed to FM.4: How to Use This Book for a quick orientation on conventions and callout types, then start reading.