Governance without engineering is policy theater. Engineering without governance is an audit waiting to happen.
A Steadfast Guard, Governance-Weary AI Agent
Enterprise AI governance requires structured frameworks that map every LLM deployment to a risk classification, assign ownership, and maintain auditable records. Established frameworks like SR 11-7 (banking model risk), NIST AI RMF, and ISO 42001 provide the scaffolding. Building on the regulatory landscape from Section 32.4 and the enterprise application patterns from Section 28.1, this section covers how to build a practical AI governance program that satisfies regulators while remaining lightweight enough for engineering teams to follow.
Prerequisites
Before starting, make sure you are familiar with the regulatory landscape from Section 32.4, the Section 29.1 that underpin audit processes, and the interpretability techniques from Section 18.1 that support model explainability requirements.
1. Governance Frameworks Comparison
| Framework | Origin | Scope | Key Contribution |
|---|---|---|---|
| SR 11-7 | US Federal Reserve | Banking / Financial | Three lines of defense, independent validation |
| NIST AI RMF | US NIST | Cross-sector | Govern, Map, Measure, Manage lifecycle |
| ISO 42001 | ISO | International | AI management system certification |
| EU AI Act | European Parliament | EU market | Risk-based obligations, conformity assessment |
The NIST AI Risk Management Framework organizes governance into four core functions. As shown in Figure 32.5.1, Govern provides the overarching structure while Map, Measure, and Manage form a continuous cycle.
Mental Model: The Flight Safety System. LLM risk governance resembles the aviation safety framework. Govern sets the flight rules and certifications. Map identifies the routes and weather conditions (risks). Measure tracks altitude, speed, and fuel (metrics and monitoring). Manage handles turbulence and course corrections (mitigation and response). Just as airlines maintain exhaustive flight logs and conduct regular safety audits, AI governance requires a model inventory, risk classifications, and audit trails. The analogy breaks down in one important way: aviation safety has decades of standardized practice, while AI governance frameworks are still maturing rapidly.
Content provenance standards like C2PA (Coalition for Content Provenance and Authenticity) embed cryptographic signatures into AI-generated images, audio, and video, creating an unforgeable trail of origin. Adobe, Microsoft, and major camera manufacturers have adopted C2PA, making it the leading candidate for combating deepfakes at scale.
Governance becomes especially important in organizations with multiple LLM deployments. Without a centralized inventory, teams often deploy models with overlapping capabilities but inconsistent safety standards. One team's customer-facing chatbot might have rigorous guardrails and monitoring from Section 31.3, while another team's internal assistant operates with no safety controls. The model inventory described below provides the visibility needed to enforce consistent governance across the organization.
Start your model inventory today, even if it is just a spreadsheet. Record the model name, provider, version, deployment date, owning team, and risk tier for every LLM in production. Teams that wait until they have 20 or more deployments before creating an inventory find that half their models are undocumented, some are running deprecated versions, and nobody knows who owns the one handling customer data. A simple inventory prevents governance surprises.
2. Model Inventory and Risk Classification
A model inventory tracks every LLM deployment in the organization, its risk tier, ownership, and review status. Code Fragment 32.5.1 below shows how to implement a risk-classified model inventory with automated review flagging.
# Define RiskTier, ModelInventoryEntry; implement needs_review
# Key operations: results display, deployment configuration
from dataclasses import dataclass
from enum import Enum
from datetime import datetime
class RiskTier(Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
@dataclass
class ModelInventoryEntry:
"""Enterprise model inventory record for governance tracking."""
model_id: str
model_name: str
use_case: str
owner: str
risk_tier: RiskTier
deployment_date: str
last_validation: str
next_review: str
data_sources: list[str]
regulations: list[str]
def needs_review(self) -> bool:
return datetime.fromisoformat(self.next_review) <= datetime.now()
def to_dict(self):
return {
"model_id": self.model_id,
"model_name": self.model_name,
"use_case": self.use_case,
"owner": self.owner,
"risk_tier": self.risk_tier.value,
"overdue": self.needs_review(),
}
entry = ModelInventoryEntry(
model_id="LLM-CS-001", model_name="Customer Support Bot v2",
use_case="Customer service automation", owner="ML Platform Team",
risk_tier=RiskTier.MEDIUM, deployment_date="2025-01-15",
last_validation="2025-01-10", next_review="2025-07-10",
data_sources=["support_tickets", "knowledge_base"],
regulations=["GDPR", "EU AI Act (limited risk)"],
)
print(entry.to_dict())
Audit Trail Implementation
An immutable audit trail records every LLM interaction with hash-chaining so that any tampering with historical records is detectable. Code Fragment 32.5.2 below implements this pattern.
# Define AuditTrail; implement __init__, log, verify_chain
# Key operations: structured logging, data protection
import json, hashlib
from datetime import datetime
class AuditTrail:
"""Immutable audit log for LLM interactions."""
def __init__(self):
self.entries = []
def log(self, request_id: str, model: str, input_text: str,
output_text: str, user_id: str, metadata: dict = None):
entry = {
"request_id": request_id,
"timestamp": datetime.utcnow().isoformat(),
"model": model,
"user_id": user_id,
"input_hash": hashlib.sha256(input_text.encode()).hexdigest()[:16],
"output_hash": hashlib.sha256(output_text.encode()).hexdigest()[:16],
"metadata": metadata or {},
}
# Chain entries for tamper detection
if self.entries:
prev_hash = hashlib.sha256(
json.dumps(self.entries[-1]).encode()
).hexdigest()[:16]
entry["prev_hash"] = prev_hash
self.entries.append(entry)
return entry
def verify_chain(self) -> bool:
for i in range(1, len(self.entries)):
expected = hashlib.sha256(
json.dumps(self.entries[i-1]).encode()
).hexdigest()[:16]
if self.entries[i].get("prev_hash") != expected:
return False
return True
The hash-chained audit trail creates an immutable record of every LLM interaction. Figure 32.5.2 illustrates how each entry links to the previous one, making any tampering immediately detectable.
In financial services, SR 11-7 provides a well-tested governance model. Figure 32.5.3 shows how its three lines of defense separate model development, independent validation, and audit oversight.
Many organizations track traditional ML models but forget to inventory their LLM deployments. Every use of an LLM API, whether it is a direct OpenAI call, a LangChain chain, or an embedded copilot feature, should be registered in the enterprise model inventory with a risk classification and assigned owner.
ISO 42001 is the first international standard for AI management systems. It provides a certifiable framework for organizations to demonstrate responsible AI practices, similar to how ISO 27001 certifies information security management. Certification may become a market differentiator as AI regulation increases.
Audit trails for LLM systems should use hash chaining (similar to blockchain) to ensure tamper resistance. Each log entry includes a hash of the previous entry, creating an immutable chain. If any entry is modified after the fact, the chain verification fails, alerting auditors to potential tampering.
1. What are the four core functions of the NIST AI RMF?
Show Answer
2. What is SR 11-7 and why does it matter for LLM deployments in banking?
Show Answer
3. Why should audit trail entries use hash chaining?
Show Answer
4. What should an enterprise model inventory capture for each LLM deployment?
Show Answer
5. How does ISO 42001 differ from the NIST AI RMF?
Show Answer
Who: A chief data officer and a risk management team at a mid-size bank
Situation: Regulators asked the bank to provide a complete inventory of all AI models in production, including any LLM usage. The CDO discovered that 14 different teams were using LLM APIs across customer service, compliance, and marketing, with no central tracking.
Problem: Without an inventory, the bank could not demonstrate SR 11-7 compliance (model risk management). Some LLM deployments had no assigned owner, no documented risk tier, and no review schedule.
Dilemma: Requiring every team to stop and complete full documentation would halt ongoing projects. Ignoring the gap risked regulatory sanctions.
Decision: They mandated a lightweight registration form (10 fields) for every existing LLM deployment within two weeks, followed by full documentation within 90 days for high-risk models only.
How: The inventory captured model ID, use case, owner, risk tier, data sources, and applicable regulations. An automated alerting system flagged overdue reviews. The second and third lines of defense (validation and audit teams) were assigned to review all high-risk entries.
Result: Within two weeks, all 14 LLM deployments were registered. Three were reclassified from "low risk" to "medium risk" based on their actual data access patterns. The bank passed its regulatory review with commendation for the governance framework.
Lesson: Start model inventories with a lightweight, mandatory registration process. Perfection is the enemy of visibility; a simple inventory today is more valuable than a comprehensive one six months from now.
Create a user-facing limitations page that honestly describes what your system cannot do, known failure modes, and when users should not rely on its output. This builds trust and reduces liability when edge cases inevitably occur.
- The NIST AI RMF provides a four-function framework (Govern, Map, Measure, Manage) applicable across industries.
- SR 11-7 requires three lines of defense for model risk in banking: development, independent validation, and audit.
- Every LLM deployment should be registered in an enterprise model inventory with risk classification and assigned ownership.
- Audit trails should use hash chaining for tamper resistance, logging request hashes (not raw content) to protect privacy. The observability tools from Chapter 29 provide the tracing infrastructure for building these audit trails.
- ISO 42001 provides a certifiable AI management system standard that may become a market differentiator.
- Risk classification should consider data sensitivity, decision impact, user population, and regulatory applicability.
Open Questions:
- What should a comprehensive LLM risk register look like, and how should it differ from traditional software risk management? LLMs introduce novel risk categories (hallucination, prompt injection, emergent behavior) that existing frameworks do not cover.
- How can organizations audit LLM systems when the models are black boxes served via third-party APIs?
Recent Developments (2024-2025):
- The NIST AI Risk Management Framework (AI RMF) gained broader adoption in 2024-2025, with practical implementation guides specifically addressing foundation model risks and governance structures.
Explore Further: Create a risk register for an LLM application you have access to. Identify at least 10 risks, categorize them (technical, ethical, legal, operational), and propose a mitigation strategy for each.
Exercises
Compare the NIST AI RMF's four functions (Govern, Map, Measure, Manage) with the three lines of defense model from SR 11-7. What does each framework emphasize that the other does not?
Answer Sketch
NIST AI RMF emphasizes a lifecycle approach: Govern (establish policies), Map (identify risks), Measure (assess risks quantitatively), Manage (mitigate and monitor). SR 11-7 emphasizes organizational accountability: 1st line (model developers/users), 2nd line (risk management), 3rd line (internal audit). NIST focuses on what to do; SR 11-7 focuses on who does it. NIST is broader and technology-agnostic; SR 11-7 is specific to regulated industries. Best practice: use NIST for the process and SR 11-7 for the organizational structure.
Design a risk register template for an LLM application. Include columns for: risk ID, description, likelihood (1-5), impact (1-5), risk score, mitigation strategy, owner, and review date. Populate it with 5 example risks for a healthcare chatbot.
Answer Sketch
Example risks: (1) Hallucinated medical advice (likelihood: 4, impact: 5, score: 20, mitigation: RAG grounding + disclaimer). (2) PII exposure in responses (3, 5, 15, mitigation: PII filtering). (3) Unauthorized diagnosis (3, 5, 15, mitigation: output classifier + human escalation). (4) Provider model degradation (2, 4, 8, mitigation: canary testing). (5) Regulatory non-compliance (2, 4, 8, mitigation: compliance checklist + audit). Sort by risk score descending. Review quarterly or after any system change.
Explain why enterprises need an AI model inventory (registry of all deployed models). What metadata should the inventory capture for each LLM deployment? How does this support audit requirements?
Answer Sketch
The inventory provides visibility into all AI usage across the organization. Metadata per deployment: model name and version, provider, use case description, risk classification, data sources, evaluation results, deployment date, owner, compliance status, incident history. Audit support: auditors can quickly identify all high-risk deployments, verify that each has proper documentation and testing, check that reviews are current, and trace any incident to the responsible team. Without an inventory, shadow AI deployments create unmanaged risk.
Describe what information an LLM audit trail should capture for every interaction. Balance the need for comprehensive logging with privacy and storage constraints.
Answer Sketch
Capture: (1) Timestamp and request ID. (2) User identifier (hashed for privacy). (3) Input prompt (with PII redacted). (4) Model name and version. (5) Generation parameters. (6) Output response (with PII redacted). (7) Any guardrail triggers. (8) Latency and cost. Privacy balance: redact PII before logging, use role-based access to audit logs, implement retention policies (e.g., 90 days for full logs, 1 year for aggregated metrics), and encrypt logs at rest. Storage optimization: compress older logs, move to cold storage after the retention window.
Design a lightweight AI governance program for a 200-person startup that uses LLMs in 3 products. Include: organizational roles, review processes, documentation requirements, and incident handling. How does this differ from governance at a large bank?
Answer Sketch
Startup: designate a part-time AI ethics lead, create a simple risk classification (high/medium/low) with lightweight review requirements, require model cards for all deployments, maintain a shared risk register, and establish an incident response channel. Review high-risk deployments quarterly. Large bank: dedicated AI governance team, formal three-lines-of-defense structure, mandatory model validation by independent teams, detailed documentation per SR 11-7, quarterly board reporting, and regulatory examination readiness. The key difference is formality and staffing: startups need pragmatic governance that does not slow shipping, while banks face regulatory mandates that require extensive documentation.
What Comes Next
In the next section, Section 32.6: LLM Licensing, IP & Privacy, we address licensing, intellectual property, and privacy considerations for LLM-generated content and training data.
NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0).
The US national standard for AI risk management, organized around four core functions: Govern, Map, Measure, and Manage. Provides a flexible, non-prescriptive framework adaptable to any organization size. Essential starting point for teams building an AI governance program.
Board of Governors, Federal Reserve. (2011). SR 11-7: Guidance on Model Risk Management.
The banking industry's foundational model risk management guidance, requiring independent validation and three lines of defense. Its principles have been widely adopted beyond financial services for AI governance. Required reading for regulated industries deploying LLMs.
ISO. (2023). ISO/IEC 42001:2023 Artificial Intelligence Management System.
International standard for establishing, implementing, and maintaining an AI management system, modeled on ISO 27001 for information security. Provides a certifiable framework for demonstrating AI governance maturity. Relevant for enterprises seeking formal AI governance certification.
Proposes a practical internal audit framework for AI systems inspired by financial auditing practices. Covers scoping, testing, documentation, and remediation stages. Directly applicable to building internal audit processes for LLM deployments.
Legal analysis of how the EU AI Act's risk management requirements translate into practical compliance obligations. Bridges the gap between legal text and engineering implementation. Useful for teams interpreting the AI Act's technical requirements.
Multi-stakeholder report proposing mechanisms for verifying safety and fairness claims about AI systems, including third-party audits and bug bounties. Outlines a vision for accountable AI development practices. Recommended for organizations designing their external accountability structures.
