Chapter 54: Watermarking and Provenance

Chapter opener illustration: Watermarking and Provenance.

"If we cannot prove what a model wrote, we cannot prove what a human did not."
Sentinel, Provenance-Tracking AI Agent

Looking Back

Chapter 53 covered the rules; this chapter starts on the technical primitives that satisfy them. Watermarking, provenance metadata, C2PA, SynthID, DeepMind's text watermarking, and the broader problem of tracking AI-generated content as it moves through the internet.

Big Picture

The provenance layer of responsible AI: text watermarking (Kirchenbauer green-list, SynthID-Text), image and video provenance (C2PA, SynthID-Image, Content Credentials), deepfake detection, and the adversarial cat-and-mouse game. The complementary Chapter 54: Transparency and Disclosure covers the documentation side, model cards, datasheets, system cards, audit trails, and explainability.

Chapter Overview

Provenance is the discipline of telling humans where a piece of content came from: which model produced it, with which prompt, under whose authority, and whether it was edited afterward. This chapter covers the business and regulatory case for provenance, text watermarking (Kirchenbauer green-list, SynthID-Text), image and video provenance (C2PA, SynthID-Image, Adobe Content Credentials), deepfake detection and its limits, and the adversarial robustness story (paraphrase attacks, copy-paste removal, unforgeability trade-offs).

Provenance is the part of the trust stack that legislatures and platforms started enforcing in 2024 and 2025. This chapter is the practitioner's picture of what works, what does not, and where the genuine cat-and-mouse game still lives.

Note: Learning Objectives

Explain why provenance matters for content platforms, regulators, and downstream users.
Apply text watermarking (Kirchenbauer green-list, SynthID-Text) and evaluate paraphrase robustness.
Architect image and video provenance with C2PA, SynthID-Image, or Adobe Content Credentials.
Compare classifier-based deepfake detection with provenance-based approaches.
Diagnose adversarial watermark removal attacks and reason about unforgeability trade-offs.

Prerequisites

Regulation and compliance from Chapter 53
Decoding and sampling from Chapter 4
Inference optimization from Chapter 9

Sections

What's Next?

This chapter begins with Section 54.1: Why Provenance Matters. Each section builds on the previous one, so we recommend reading them in order.

Further Reading

Text Watermarking

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). "A Watermark for Large Language Models." ICML. arXiv:2301.10226. The green-list/red-list construction that defined the modern LLM watermarking baseline analyzed in Section 54.2.

Dathathri, S., See, A., Ghaisas, S., Huang, P.-S., McAdam, R., Welbl, J., et al. (2024). "Scalable Watermarking for Identifying Large Language Model Outputs." Nature, 634. Nature 634:818. Google's SynthID-Text paper; the production-scale evaluation of LLM watermarking deployed on Gemini and discussed throughout 54.2.

Aaronson, S. (2023). "Watermarking GPT Outputs." OpenAI talk / blog. Aaronson blog post. The Aaronson-Kirchner sampling-based watermark proposal; the conceptual root of cryptographically signed LLM outputs.

Image and Media Provenance

Coalition for Content Provenance and Authenticity. (2024). C2PA Technical Specification v2.0. C2PA specs. The Adobe-Microsoft-BBC-Intel cryptographic provenance standard adopted by Adobe Content Credentials and OpenAI; the reference for Section 54.3.

Fernandez, P., Couairon, G., Jegou, H., Douze, M., & Furon, T. (2023). "The Stable Signature: Rooting Watermarks in Latent Diffusion Models." ICCV. arXiv:2303.15435. Demonstrates fine-tuning diffusion decoders to embed identifier watermarks; pairs with SynthID-Image as the technical baseline.

Adversarial Robustness

Zhang, H., Edelman, B. L., Francati, D., Venturi, D., Ateniese, G., & Barak, B. (2024). "Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models." ICML. arXiv:2311.04378. Formal hardness result for strong watermarking under paraphrase attacks; underpins the "cat-and-mouse" framing of 54.5.