Safety libraries split into guardrails frameworks (which wrap LLM calls with validators), red-team toolkits, and privacy-preserving training libraries. Install with uv (Astral, 10-100x faster than pip and the modern default).
51.2.1 Guardrails frameworks
- Guardrails AI (Guardrails AI, 2023) is the Python framework for input and output validation, with a "RAIL spec" DSL describing assertions like "no PII", "matches schema", "no profanity". Its objective is to make LLM output validation declarative and reusable, which matters when you want validation to be code-reviewable rather than buried in prompt engineering. The core concept is the Validator chain: each output goes through validators that pass, repair, or reject the response. Pick Guardrails AI for Python apps with clear validation policies; for complex multi-turn dialog policies, NeMo Guardrails is more powerful.
- NVIDIA NeMo Guardrails (NVIDIA, 2023) is the programmable dialog-policy framework using the Colang DSL. Its objective is to express complex dialog policies declaratively ("if user asks for medical advice, route to disclaimer flow"), which matters when policies are too complex for prompt-based instructions. The core concept is Colang, a Python-adjacent DSL where you write "flows" (multi-turn policy graphs) that the framework executes alongside the LLM. Pick NeMo Guardrails for multi-turn dialog policy enforcement; expect a learning curve for Colang.
- LLM Guard (Protect AI, 2023) is a scanner-based guardrails library bundling input scanners (jailbreak detection, prompt injection, PII, secrets) and output scanners (toxicity, bias, hallucination signals). Its objective is to provide a battery of pre-built scanners you can chain into a guardrail without writing custom validators, which matters when you want broad security coverage out of the box. The core concept is independent scanners that produce risk scores; aggregation is up to you. Pick LLM Guard for broad scanner coverage; for policy-specific validation, Guardrails AI's DSL is cleaner.
- Aporia Guardrails (Aporia, 2023) is the commercial alternative with a Python client and managed policy enforcement. Its objective is to provide enterprise-managed guardrails (centralized policy, audit logging, no-code config), which matters when guardrail policies must be controlled by a compliance team rather than developers. Pick Aporia when centralized policy management is a procurement requirement.
51.2.2 Red-team libraries
- Garak (NVIDIA, 2023) is the probe-based LLM vulnerability scanner covering 50+ failure modes (jailbreaks, prompt injection, data leakage, toxic generation, hallucination). Its objective is to be the "nmap for LLMs", which matters as an automated baseline-security check. The core concept is probes (attack generators) and detectors (success classifiers) plus a harness that runs them against any LLM. Pick Garak for CI vulnerability scanning; for structured multi-turn campaigns, PyRIT is more powerful.
- PyRIT (Microsoft, 2023) is the orchestration framework for automated adversarial testing with multi-turn attack scenarios. Its objective is to automate complex red-team campaigns (gradual escalation, multi-step injection, conversational extraction), which matters when single-turn probes miss compound vulnerabilities. The core concept is the orchestrator that runs attack strategies against target LLMs with classifiers analyzing whole sessions. Pick PyRIT for structured red-team campaigns; for one-shot scans, Garak is simpler.
- Garak source mirror: the upstream maintainer's mirror, useful for filing issues against the original author Leon Derczynski.
- prompt-injection-defenses (tldrsec, 2023) is a curated reference of prompt-injection attack patterns and defense strategies. Its objective is to be the canonical "what does prompt injection look like" reference for security teams, which matters when threat modeling LLM applications. Pick this as your literature reference; for runtime detection, LLM Guard's scanners are the practical tool.
51.2.3 Privacy-preserving training
- Opacus (Meta, 2020) is the differential-privacy library for PyTorch, implementing DP-SGD (differentially-private stochastic gradient descent). Its objective is to make training with formal privacy guarantees as easy as adding a wrapper around the optimizer, which matters when training data privacy is a legal requirement (HIPAA, GDPR). The core concept is per-sample gradient clipping plus calibrated noise injection at each step, producing (epsilon, delta)-DP guarantees. Pick Opacus when training on sensitive data requires formal privacy; expect accuracy loss proportional to the privacy budget.
- TensorFlow Privacy (Google, 2019) is the TensorFlow-equivalent DP-SGD library. Its objective is to provide the same DP-SGD primitives for TF-based training pipelines, which matters when you cannot switch frameworks. Pick when TensorFlow is your training framework; for PyTorch (the modern default), Opacus is the equivalent.
- PySyft (OpenMined, 2017) is the federated-learning and secure-computation library, supporting both DP and secure multi-party computation. Its objective is to enable training across distributed sensitive data without centralizing it, which matters for cross-institution training (e.g., multi-hospital medical data). The core concept is "syft" tensors that abstract location and access policy. Pick PySyft for federated learning with privacy guarantees; for simpler federated workflows, Flower is lighter weight.
- Flower (Adap, 2020) is the federated-learning framework focused on simplicity and framework-agnostic operation. Its objective is to make federated learning easy across PyTorch, TensorFlow, JAX, and even XGBoost, which matters when you want federated training without committing to one ML stack. Pick Flower as the production-default federated framework; PySyft is more research-oriented and feature-rich but heavier.
51.2.4 Comparing the libraries
Table 51.2.1a: 39.2.1 Safety libraries (2026).
| Library | Role | Best for | Tradeoff |
|---|---|---|---|
| Guardrails AI | Validator chain | Python-first apps | Latency overhead |
| NeMo Guardrails | Dialog policy | Multi-turn agents | Colang learning curve |
| Garak | Vulnerability scanner | Pre-deploy testing | Coverage limited to known probes |
| PyRIT | Red-team orchestration | Structured campaigns | Heavier setup |
| Opacus | DP-SGD | Privacy-preserving fine-tunes | Accuracy hit |
What's Next?
In the next section, Section 51.3: Datasets & Benchmarks, we build on the material covered here.
Further Reading
Security Libraries
NVIDIA (2024). "NeMo Guardrails." github.com/NVIDIA/NeMo-Guardrails. Reference open-source guardrails framework.
Guardrails AI (2024). "Guardrails Documentation." docs.guardrailsai.com. Reference output-validation library.
Microsoft (2024). "Presidio." microsoft.github.io/presidio. Reference PII-detection library.