
"If we cannot prove what a model wrote, we cannot prove what a human did not."
Sentinel, Provenance-Tracking AI Agent
Chapter 53 covered the rules; this chapter starts on the technical primitives that satisfy them. Watermarking, provenance metadata, C2PA, SynthID, DeepMind's text watermarking, and the broader problem of tracking AI-generated content as it moves through the internet.
The provenance layer of responsible AI: text watermarking (Kirchenbauer green-list, SynthID-Text), image and video provenance (C2PA, SynthID-Image, Content Credentials), deepfake detection, and the adversarial cat-and-mouse game. The complementary Chapter 54: Transparency and Disclosure covers the documentation side, model cards, datasheets, system cards, audit trails, and explainability.
Chapter Overview
Provenance is the discipline of telling humans where a piece of content came from: which model produced it, with which prompt, under whose authority, and whether it was edited afterward. This chapter covers the business and regulatory case for provenance, text watermarking (Kirchenbauer green-list, SynthID-Text), image and video provenance (C2PA, SynthID-Image, Adobe Content Credentials), deepfake detection and its limits, and the adversarial robustness story (paraphrase attacks, copy-paste removal, unforgeability trade-offs).
Provenance is the part of the trust stack that legislatures and platforms started enforcing in 2024 and 2025. This chapter is the practitioner's picture of what works, what does not, and where the genuine cat-and-mouse game still lives.
- Explain why provenance matters for content platforms, regulators, and downstream users.
- Apply text watermarking (Kirchenbauer green-list, SynthID-Text) and evaluate paraphrase robustness.
- Architect image and video provenance with C2PA, SynthID-Image, or Adobe Content Credentials.
- Compare classifier-based deepfake detection with provenance-based approaches.
- Diagnose adversarial watermark removal attacks and reason about unforgeability trade-offs.
Prerequisites
- Regulation and compliance from Chapter 53
- Decoding and sampling from Chapter 4
- Inference optimization from Chapter 9
Sections
- 54.1 Why Provenance Matters The business and regulatory case for provenance, plus the threats provenance does and does not address. Intermediate
- 54.2 Text Watermarking: Kirchenbauer Green-List and SynthID-Text Algorithmic text watermarking techniques, their detection accuracy, and their robustness to paraphrasing. Advanced
- 54.3 Image and Video Provenance: C2PA, SynthID-Image, Adobe Content Credentials Cryptographic content credentials, signed media, and the adoption story for provenance metadata. Advanced
- 54.4 Deepfake and Synthetic-Media Detection Classifier-based detection, biometric anti-spoofing, and the limits of detection-only approaches. Advanced
- 54.5 Limitations: Adversarial Watermark Removal and the Cat-and-Mouse Game Paraphrase attacks, copy-paste attacks, the inherent unforgeability tradeoffs, and the hardness of universal watermarks. Advanced
What's Next?
This chapter begins with Section 54.1: Why Provenance Matters. Each section builds on the previous one, so we recommend reading them in order.