
"Your model can leak data it never wrote down, by paraphrasing what it learned."
Sentinel, Privacy-Preserving AI Agent
Chapters 47 through 49 protected the system. This chapter protects the user: PII handling, differential privacy, memorization risks, GDPR/CCPA obligations, on-device inference, and the data-handling discipline every LLM-enabled product owes its customers.
Memorization, extraction attacks, differential privacy, federated learning, machine unlearning, and confidential inference.
Chapter Overview
Privacy is the engineering counterpart to the legal framework that the regulation chapters cover. This chapter walks the technical primitives: differential privacy and the attacks (membership inference, model inversion, extraction) it defends against, federated learning for cross-organization training without sharing raw data (FedAvg, federated LoRA, secure aggregation, DP-FL), and machine unlearning for removing specific knowledge from a trained model without full retraining.
Privacy work is the part of safety engineering where math and operations meet. By the end of this chapter you will know which privacy primitive answers which threat and how to combine them with the safety, security, and governance layers from the rest of Part X and Part XI.
- Explain the canonical privacy attacks: membership inference, model inversion, training-data extraction.
- Apply differential privacy (DP-SGD, DP-FL) to LLM training and fine-tuning.
- Architect a federated learning system with FedAvg, federated LoRA, and secure aggregation.
- Implement machine unlearning to remove specific knowledge from a trained model.
- Combine differential privacy, federated learning, and unlearning into a layered privacy posture.
Prerequisites
- Adversarial security from Chapter 47
- Pretraining and memorization from Chapter 6
- Basic familiarity with privacy regulation (GDPR/CCPA at a high level)
Sections
- 50.1 Privacy Attacks and Differential Privacy Federated learning (FL) enables multiple parties to collaboratively train or fine-tune a model without sharing their raw data. Advanced
- 50.2 Machine Unlearning Machine unlearning is the ability to remove specific knowledge from a trained model without retraining from scratch. Advanced
- 50.3 Federated Learning for Privacy-Preserving Training Train and fine-tune LLMs across organizations without sharing raw data, using FedAvg, federated LoRA, secure aggregation, and DP-FL. Advanced
What's Next?
Next: Chapter 51: Tools of the Trade, Safety & Guardrails Stack, which closes Part X. Chapter 51 consolidates the Part X toolbox: NeMo Guardrails, Llama Guard, Granite Guardian, OpenAI Moderation, the OWASP and MITRE ATLAS taxonomies, agentic security benchmarks (AgentDojo, INJECAGENT), red-teaming kits (PyRIT, Garak, HarmBench), and the privacy primitives (Opacus, PySyft) you can pick up off the shelf.