Section 51.1: Platforms

This section catalogs the third-party platforms that production LLM teams use to enforce safety, security, and governance policies. The landscape divides cleanly into three roles: moderation APIs (real-time classifiers that pre-screen user inputs and model outputs for harmful content), red-team platforms (tools that systematically probe a deployed LLM for jailbreaks and vulnerabilities), and compliance / governance services (SaaS that produces audit-ready evidence for the EU AI Act, NIST AI Risk Management Framework, and ISO/IEC 42001). For each role we name the dominant 2026 vendors, their pricing tier, and the deployment context where each one wins.

**Figure 51.1.1**: The Part X safety-platform landscape divides cleanly into three roles. Moderation APIs (OpenAI Moderation, Azure Content Safety) sit on the request path. Red-team platforms (Garak, PyRIT, Haize, Giskard) run offline probes. Compliance services (Credo AI, Fairly AI, ModelBench) translate eval evidence into audit reports for the EU AI Act and NIST AI RMF.

51.1.1 Moderation APIs

OpenAI Moderation API (OpenAI, 2022; omni-moderation-latest in 2024) is OpenAI's free safety classifier for text and image, scoring inputs across 13 categories (hate, self-harm, sexual, violence, etc.). Its objective is to provide a zero-cost first-line filter so apps can pre-screen user input and model output before showing it to downstream systems, which matters because even a basic classifier catches most obvious abuse. The core concept is per-category probability scores with configurable thresholds. Pick OpenAI Moderation as your default input/output filter; supplement with domain-specific classifiers for nuanced harm categories.
Anthropic safety classifiers (Anthropic, integrated) are the on-by-default safety classifiers built into the Claude API; they are not exposed as a separate endpoint but operate transparently on every call. Their objective is to provide platform-level safety without requiring developers to remember to call a moderation endpoint, which matters because moderation calls that exist as opt-in often get skipped. Pick this implicitly by using Claude; the API returns refusals or warnings when classifiers fire.
Azure AI Content Safety (Microsoft, 2023) is Azure's multi-modal classification service covering text, images, prompts (injection detection), and jailbreaks. Its objective is to provide enterprise-grade safety classification with Azure-native compliance (SOC 2, HIPAA, FedRAMP), which matters when third-party SaaS classifiers fail your enterprise security review. The core concept is multi-category multi-modal classifiers with explicit prompt-shield and jailbreak-detection endpoints. Pick Azure Content Safety for enterprise multi-modal moderation inside the Azure ecosystem.
Google Cloud Natural Language safety (Google, 2017+ for sentiment, safety-classification 2023) is Google's classification service for text safety inside the GCP ecosystem. Its objective is to provide Vertex-AI-adjacent moderation that integrates with the Gemini API, which matters for GCP-resident workflows. Pick when GCP is your platform; otherwise OpenAI Moderation is simpler and free.

51.1.2 Red-team and adversarial-testing platforms

Garak (NVIDIA, 2023) is the open-source LLM vulnerability scanner, designed to probe models for weaknesses like jailbreaks, prompt injection, data leakage, and toxic generation. Its objective is to be the "nmap for LLMs": run a battery of attack probes against any model and report which ones succeed, which matters when you need an automated security-test suite. The core concept is "probes" (attack generators) plus "detectors" (success classifiers) plus "harnesses" (LLM connectors); the result is a vulnerability report. Pick Garak as your CI vulnerability scanner.
PyRIT (Microsoft, 2023) is Microsoft's Python Risk Identification Toolkit, an open red-teaming framework focused on structured multi-turn attack campaigns. Its objective is to automate complex multi-turn red-team scenarios (the adversary gradually escalates, models reset, classifiers analyze whole sessions), which matters when single-turn attack scanners miss compound vulnerabilities. The core concept is an orchestrator that runs scenarios with attack strategies, target models, and converters. Pick PyRIT for structured red-team campaigns; for one-shot vulnerability scanning, Garak is simpler.
Haize Labs (Haize Labs, 2024) is the commercial automated red-team platform offering managed red-team as a service. Its objective is to provide expert-curated red-team campaigns plus continuous monitoring, which matters for enterprises that lack internal red-team expertise. Pick Haize when you want a managed red-team service rather than running PyRIT or Garak yourself.
Giskard (Giskard, 2021) is the ML model and LLM testing platform with an open-source core and a managed hub. Its objective is to be the "test framework for ML" (assertions about model behavior, regression tests, vulnerability scans) which matters when you want LLM tests in the same shape as software unit tests. Pick Giskard for unit-test-style LLM behavior assertions.

51.1.3 Compliance and governance

Credo AI (2020) and Fairly AI (2022) are AI governance SaaS platforms producing compliance reports mapped to frameworks like NIST AI RMF, EU AI Act, ISO/IEC 42001. Their objective is to make AI compliance audit-ready by collecting evidence, mapping it to regulations, and generating reports auditors will accept, which matters for enterprises selling into regulated industries or operating under the EU AI Act. Pick one of these when EU AI Act or NIST AI RMF compliance is a procurement requirement; for smaller deployments, manual evidence collection is still workable.
ModelBench (ModelBench, 2024) is the enterprise model evaluation platform with policy-mapping features (which eval covers which policy requirement). Its objective is to bridge eval-results to compliance-policy claims, which matters for compliance teams that need to show "this benchmark satisfies this control". Pick ModelBench when eval-to-policy traceability is required.

51.1.4 Comparing the platforms

Table 51.1.1a: Safety platforms (2026).

Platform	Role	Cost	Best for
OpenAI Moderation	Pre-classify input/output	Free	Quick safety net
Azure Content Safety	Multi-modal classification	Per-call	Enterprise multi-modal
Garak	Open red-team scanner	Free	CI vulnerability scans
PyRIT	Open red-team automation	Free	Structured red-team campaigns
Credo AI	Governance	Enterprise SaaS	EU AI Act compliance

What's Next?

In the next section, Section 51.2: Libraries & Frameworks, we build on the material covered here.

Further Reading

Security Platforms

Microsoft (2024). "Microsoft Security Copilot." microsoft.com/en-us/security/business/ai-machine-learning/microsoft-security-copilot. Reference enterprise LLM security platform.

Cloudflare (2024). "Cloudflare Workers AI." developers.cloudflare.com/workers-ai. Reference platform for LLM-edge security including prompt-injection scanning.