Risk Governance and Model Inventory

Section 53.3

Governance without engineering is policy theater. Engineering without governance is an audit waiting to happen.

GuardA Steadfast Guard, Governance-Weary AI Agent
Big Picture

Enterprise AI governance requires structured frameworks that map every LLM deployment to a risk classification, assign ownership, and maintain auditable records. Established frameworks like SR 11-7 (the US Federal Reserve's 2011 guidance on model risk management for banks, which introduced the three-lines-of-defense pattern this section returns to), NIST AI RMF, and ISO 42001 provide the scaffolding. Building on the regulatory landscape from Section 47.1 and the enterprise application patterns from Section 27.1, this section covers how to build a practical AI governance program that satisfies regulators while remaining lightweight enough for engineering teams to follow.

Prerequisites

Before starting, make sure you are familiar with the regulatory landscape from Section 47.1, the evaluation metrics that underpin audit processes, and the interpretability techniques from Section 10.1 that support model explainability requirements.

53.3.1 Governance Frameworks Comparison

Table 53.3.1: Governance Frameworks Comparison (as of 2026).
FrameworkOriginScopeKey Contribution
SR 11-7US Federal ReserveBanking / FinancialThree lines of defense, independent validation
NIST AI RMFUS NISTCross-sectorGovern, Map, Measure, Manage lifecycle
ISO 42001ISOInternationalAI management system certification
EU AI ActEuropean ParliamentEU marketRisk-based obligations, conformity assessment

The NIST AI Risk Management Framework organizes governance into four core functions. As shown in Figure 53.3.1a, Govern provides the overarching structure while Map, Measure, and Manage form a continuous cycle.

NIST AI Risk Management Framework: a top-level GOVERN function oversees three continuously cycling sub-functions, MAP (context, stakeholders, risks), MEASURE (metrics, testing, monitoring), and MANAGE (mitigate, respond, improve).
Figure 53.3.1b: The NIST AI RMF defines four core functions; Govern is the overarching function while Map, Measure, and Manage form a continuous cycle.
Key Insight

Mental Model: The Flight Safety System. LLM risk governance resembles the aviation safety framework. Govern sets the flight rules and certifications. Map identifies the routes and weather conditions (risks). Measure tracks altitude, speed, and fuel (metrics and monitoring). Manage handles turbulence and course corrections (mitigation and response). Just as airlines maintain exhaustive flight logs and conduct regular safety audits, AI governance requires a model inventory, risk classifications, and audit trails. The analogy breaks down in one important way: aviation safety has decades of standardized practice, while AI governance frameworks are still maturing rapidly.

Fun Fact

Content provenance standards like C2PA (Coalition for Content Provenance and Authenticity) embed cryptographic signatures into AI-generated images, audio, and video, creating an unforgeable trail of origin. Adobe, Microsoft, and major camera manufacturers have adopted C2PA, making it the leading candidate for combating deepfakes at scale.

Governance gets urgent in organizations with multiple LLM deployments. Without a centralized inventory, teams deploy models with overlapping capabilities and incompatible safety standards. One team's customer-facing chatbot runs rigorous guardrails and monitoring from Section 62.1; another team's internal assistant ships with none. The model inventory below is the minimum visibility you need to enforce consistent governance.

Tip

Start your model inventory today, even if it is just a spreadsheet. Record the model name, provider, version, deployment date, owning team, and risk tier for every LLM in production. Teams that wait until they have 20 or more deployments before creating an inventory find that half their models are undocumented, some are running deprecated versions, and nobody knows who owns the one handling customer data. A simple inventory prevents governance surprises.

53.3.2 Model Inventory and Risk Classification

A model inventory tracks every LLM deployment in the organization, its risk tier, ownership, and review status. Code Fragment 53.3.1c below shows how to implement a risk-classified model inventory with automated review flagging.

"""LLM governance entry that validates with Pydantic and emits tags for MLflow.

Production model registries (MLflow Model Registry, Weights & Biases, SageMaker
Model Cards) are where regulated teams put their model inventory. This fragment
shows the data layer: a Pydantic schema that validates each entry, computes the
EU AI Act risk tier from use case + data sources, and emits a dict ready to
POST to MLflow's set-model-version-tag endpoint.
"""
from datetime import datetime, timedelta
from enum import Enum
from typing import Literal

from pydantic import BaseModel, Field, computed_field, field_validator

class RiskTier(str, Enum):
    LOW          = "low"
    LIMITED      = "limited"
    HIGH         = "high"
    UNACCEPTABLE = "unacceptable"

# EU AI Act risk taxonomy: a use case maps to its baseline tier. In a real
# deployment this dict is curated by the legal / compliance team.
USE_CASE_RISK = {
    "customer_service_chatbot": RiskTier.LIMITED,
    "credit_scoring":           RiskTier.HIGH,
    "biometric_categorization": RiskTier.HIGH,
    "medical_diagnosis":        RiskTier.HIGH,
    "social_scoring":           RiskTier.UNACCEPTABLE,
    "content_recommendation":   RiskTier.LIMITED,
    "internal_search":          RiskTier.LOW,
    "code_completion":          RiskTier.LOW,
}

# Data-source modifiers can bump the tier UP. PII processing escalates
# Limited -> High under the EU AI Act risk model.
PII_DATA_SOURCES = {"customer_records", "medical_records", "biometric_data",
                    "financial_transactions", "minor_users"}

def compute_risk_tier(use_case: str, data_sources: list[str]) -> RiskTier:
    """Map a (use_case, data_sources) pair to an EU AI Act tier."""
    base = USE_CASE_RISK.get(use_case, RiskTier.LIMITED)
    if base == RiskTier.UNACCEPTABLE:
        return base
    has_pii = bool(set(data_sources) & PII_DATA_SOURCES)
    if base == RiskTier.LOW     and has_pii: return RiskTier.LIMITED
    if base == RiskTier.LIMITED and has_pii: return RiskTier.HIGH
    return base

# Review cadence per tier (regulator expectation, not law).
REVIEW_INTERVAL = {
    RiskTier.LOW:          timedelta(days=365),
    RiskTier.LIMITED:      timedelta(days=180),
    RiskTier.HIGH:         timedelta(days=90),
    RiskTier.UNACCEPTABLE: timedelta(days=0),     # do not deploy
}

class ModelInventoryEntry(BaseModel):
    """Production model-registry record, validated by Pydantic."""
    model_id:        str = Field(pattern=r"^LLM-[A-Z]{2,4}-\d{3}$")
    model_name:      str
    use_case:        str
    owner_email:     str = Field(pattern=r"^[\w.+-]+@[\w-]+\.[\w.-]+$")
    deployment_date: datetime
    last_validation: datetime
    data_sources:    list[str] = Field(min_length=1)
    regulations:     list[Literal["GDPR", "EU AI Act", "HIPAA", "SOX", "PCI-DSS"]]

    @field_validator("last_validation")
    @classmethod
    def validation_not_before_deployment(cls, v, info):
        dep = info.data.get("deployment_date")
        if dep and v < dep:
            raise ValueError("last_validation cannot precede deployment_date")
        return v

    @computed_field
    @property
    def risk_tier(self) -> RiskTier:
        return compute_risk_tier(self.use_case, self.data_sources)

    @computed_field
    @property
    def next_review(self) -> datetime:
        return self.last_validation + REVIEW_INTERVAL[self.risk_tier]

    @computed_field
    @property
    def overdue(self) -> bool:
        return self.next_review <= datetime.utcnow()

def to_mlflow_tags(entry: ModelInventoryEntry) -> dict[str, str]:
    """Format an entry as MLflow Model Registry version tags.

    Each tag can be set via mlflow.client.set_model_version_tag(...) or
    POSTed to /api/2.0/mlflow/model-versions/set-tag from any language.
    """
    return {
        "governance.risk_tier":   entry.risk_tier.value,
        "governance.owner":       entry.owner_email,
        "governance.use_case":    entry.use_case,
        "governance.regulations": ",".join(entry.regulations),
        "governance.next_review": entry.next_review.date().isoformat(),
        "governance.overdue":     str(entry.overdue).lower(),
    }

# Demo: register two models that differ ONLY in their data sources. The
# PII bump escalates the tier from LIMITED to HIGH and shrinks the review
# window from 180 to 90 days.
entry_no_pii = ModelInventoryEntry(
    model_id="LLM-CS-001",
    model_name="Customer Support Bot v2",
    use_case="customer_service_chatbot",
    owner_email="ml-platform@example.com",
    deployment_date="2026-01-15T00:00:00",
    last_validation="2026-02-01T00:00:00",
    data_sources=["public_faq", "knowledge_base"],
    regulations=["GDPR", "EU AI Act"],
)
entry_with_pii = entry_no_pii.model_copy(update={
    "model_id":     "LLM-CS-002",
    "data_sources": ["customer_records", "knowledge_base"],
})

for e in (entry_no_pii, entry_with_pii):
    print(f"{e.model_id}: tier={e.risk_tier.value:8s} "
          f"next_review={e.next_review.date()} overdue={e.overdue}")

print()
print("MLflow tags for", entry_with_pii.model_id + ":")
for k, v in to_mlflow_tags(entry_with_pii).items():
    print(f"  {k}: {v}")
Output: LLM-CS-001: tier=limited next_review=2026-07-31 overdue=False LLM-CS-002: tier=high next_review=2026-05-02 overdue=True MLflow tags for LLM-CS-002: governance.risk_tier: high governance.owner: ml-platform@example.com governance.use_case: customer_service_chatbot governance.regulations: GDPR,EU AI Act governance.next_review: 2026-05-02 governance.overdue: true
Code Fragment 53.3.1d: A model inventory entry built on Pydantic for runtime validation, with the EU AI Act risk tier computed from use_case + data_sources rather than hand-set. The two demo entries differ only in their data sources; adding PII data bumps the tier from limited to high and the review cadence from 180 days to 90 days. to_mlflow_tags() produces the dict you would POST to MLflow's model-version-tag endpoint to wire the governance metadata into your existing model registry.

Audit Trail Implementation

An immutable audit trail records every LLM interaction with hash-chaining so that any tampering with historical records is detectable. Code Fragment 53.3.2 below implements this pattern.

import json, hashlib
from datetime import datetime
class AuditTrail:
    """Immutable audit log for LLM interactions."""
    def __init__(self):
        self.entries = []
    def log(self, request_id: str, model: str, input_text: str,
        output_text: str, user_id: str, metadata: dict = None):
        entry = {
            "request_id": request_id,
            "timestamp": datetime.utcnow().isoformat(),
            "model": model,
            "user_id": user_id,
            "input_hash": hashlib.sha256(input_text.encode()).hexdigest()[:16],
            "output_hash": hashlib.sha256(output_text.encode()).hexdigest()[:16],
            "metadata": metadata or {},
            }
        # Chain entries for tamper detection
        if self.entries:
            prev_hash = hashlib.sha256(
                json.dumps(self.entries[-1]).encode()
                ).hexdigest()[:16]
            entry["prev_hash"] = prev_hash
            self.entries.append(entry)
            return entry
        def verify_chain(self) -> bool:
            for i in range(1, len(self.entries)):
                expected = hashlib.sha256(
                    json.dumps(self.entries[i-1]).encode()
                    ).hexdigest()[:16]
                if self.entries[i].get("prev_hash") != expected:
                    return False
                    return True
Code Fragment 53.3.2a: Define RiskTier, ModelInventoryEntry; implement needs_review

The hash-chained audit trail creates an immutable record of every LLM interaction.

SR 11-7 three lines of defense
Figure 53.3.2b: SR 11-7's three lines of defense separate concerns to prevent any single team from grading its own work. The first line (model owners and developers) builds and runs the model. The second line (an independent validation function) reviews the first line's work against the firm's model risk policy. The third line (internal audit, reporting to the board) verifies that the first two lines are doing their jobs. Banks regulated under SR 11-7 must demonstrate all three lines exist and function; the framework now applies in practice to LLM-based decision systems even though SR 11-7 predates LLMs.

What's Next?

In the next section, Section 53.4: LLM Licensing, IP, and Privacy, we build on the material covered here.

Further Reading

Core References

NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). The US national standard for AI risk management, organized around four core functions: Govern, Map, Measure, and Manage. Provides a flexible, non-prescriptive framework adaptable to any organization size. Essential starting point for teams building an AI governance program.
Board of Governors, Federal Reserve. (2011). SR 11-7: Guidance on Model Risk Management. The banking industry's foundational model risk management guidance, requiring independent validation and three lines of defense. Its principles have been widely adopted beyond financial services for AI governance. Required reading for regulated industries deploying LLMs.
ISO. (2023). ISO/IEC 42001:2023 Artificial Intelligence Management System. International standard for establishing, implementing, and maintaining an AI management system, modeled on ISO 27001 for information security. Provides a certifiable framework for demonstrating AI governance maturity. Relevant for enterprises seeking formal AI governance certification.
Raji, I. D. et al. (2020). Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. FAT* 2020. Proposes a practical internal audit framework for AI systems inspired by financial auditing practices. Covers scoping, testing, documentation, and remediation stages. Directly applicable to building internal audit processes for LLM deployments.
Schuett, J. (2023). Risk Management in the Artificial Intelligence Act. European Journal of Risk Regulation. Legal analysis of how the EU AI Act's risk management requirements translate into practical compliance obligations. Bridges the gap between legal text and engineering implementation. Useful for teams interpreting the AI Act's technical requirements.
Brundage, M. et al. (2020). Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. Multi-stakeholder report proposing mechanisms for verifying safety and fairness claims about AI systems, including third-party audits and bug bounties. Outlines a vision for accountable AI development practices. Recommended for organizations designing their external accountability structures.