Part VIII: Evaluation and Production

Rigorous evaluation, observability infrastructure, and production engineering for LLM systems at scale.

"In God we trust; all others must bring data."

W. Edwards Deming

Part Overview

Part VIII covers the three pillars that separate prototypes from production systems: evaluation, observability, and operations. You will design rigorous evaluation frameworks, build observability infrastructure, set up continuous monitoring, and learn the production engineering practices needed to deploy, scale, and maintain LLM applications reliably.

Chapters: 3 (Chapters 29, 30, and 31). These chapters bridge the gap between "it works in a notebook" and "it works in production for millions of users."

Big Picture

Building an LLM application is only half the battle; measuring its quality and keeping it running reliably is the other half. Part VIII gives you the evaluation frameworks, observability tools, and production engineering patterns needed to deploy LLM systems with confidence and maintain them over time.

Production observability with tracing tools, monitoring for drift, experiment reproducibility, and arena-style evaluation at scale.

Take LLM applications from notebook to production. Covers deployment architectures, frontend frameworks, scaling, guardrails, and LLMOps practices.

What Comes Next

Continue to Part IX: Safety and Strategy, where we address the safety, ethics, regulatory, and strategic considerations that govern responsible AI deployment.