"In God we trust; all others must bring data."
W. Edwards Deming
Part Overview
Part VIII covers the three pillars that separate prototypes from production systems: evaluation, observability, and operations. You will design rigorous evaluation frameworks, build observability infrastructure, set up continuous monitoring, and learn the production engineering practices needed to deploy, scale, and maintain LLM applications reliably.
Chapters: 3 (Chapters 29, 30, and 31). These chapters bridge the gap between "it works in a notebook" and "it works in production for millions of users."
Building an LLM application is only half the battle; measuring its quality and keeping it running reliably is the other half. Part VIII gives you the evaluation frameworks, observability tools, and production engineering patterns needed to deploy LLM systems with confidence and maintain them over time.
Measuring what matters: evaluation frameworks, benchmark design, A/B testing, statistical rigor, RAG and agent evaluation, and testing LLM applications.
Production observability with tracing tools, monitoring for drift, experiment reproducibility, and arena-style evaluation at scale.
Take LLM applications from notebook to production. Covers deployment architectures, frontend frameworks, scaling, guardrails, and LLMOps practices.
What Comes Next
Continue to Part IX: Safety and Strategy, where we address the safety, ethics, regulatory, and strategic considerations that govern responsible AI deployment.