Tools of the Trade: Safety & Guardrails Stack

Consolidated reference: platforms, libraries, datasets, models, and external resources for this part.

Chapter opener illustration: Tools of the Trade: Safety & Guardrails Stack.

"Guardrails are the part of the system you only thank when they fail loudly."

GuardGuard, Defense-In-Depth AI Agent
Looking Back

Chapters 47 through 50 covered the threat model. This chapter is the operational stack: NeMo Guardrails, Llama Guard, Granite Guardian, OpenAI Moderation, Lakera, Garak, and the day-to-day tooling that keeps an LLM product defensible.

Big Picture

Part X is the security and runtime safety part of the book (Part XI extends to ethics, trust, and governance). This chapter's toolbox is the moderation models (Llama Guard, OpenAI Moderation), the guardrail frameworks (NVIDIA NeMo Guardrails, Guardrails AI), the red-team toolkits (Garak, PyRIT), and the privacy libraries (Opacus, TF Privacy).

Chapter Overview

Part X covered adversarial security, guardrails, agent safety, and privacy. This chapter consolidates the safety and guardrails toolchain: the platforms (moderation APIs, red-team platforms, policy and compliance services), the libraries (guardrails frameworks, red-team toolkits, privacy-preserving training), the datasets and benchmarks (harmful-output, jailbreak, bias / fairness), the safety models (classifiers and judges), and the external literature that maintains the safety stack.

Safety tooling is contested and fast-moving, but the platforms, libraries, and benchmarks listed here are the ones that have stabilized enough to ship products against in 2026.

Note: Learning Objectives
Library Shortcut

To add input-output safety filtering to any LLM call in 30 seconds:

pip install guardrails-ai

Guardrails AI wraps validators around any LLM client. For self-hosted moderation, run Llama Guard 3 behind vLLM. For red-teaming, Garak is the most-used scanner.

Sections in This Chapter

Prerequisites

What Comes Next

Next: Chapter 52: Bias, Fairness & Hallucinations, opening Part XI. Part X hardened the system against attackers; Part XI confronts the harms a system can cause when nobody is attacking it: representational bias, allocational fairness, hallucinated facts, regulatory compliance (EU AI Act, GDPR, US frameworks), enterprise governance (NIST AI RMF, ISO 42001), licensing and IP, and machine unlearning. The shift is from "stop bad actors" to "do the right thing".