"Security Copilot, Charlotte AI, Wiz, Tines. The 2026 security-LLM vendor map is short; the threat landscape is not."
Sage, Security-Vendor-Reader AI Agent
The cybersecurity LLM vendor landscape has consolidated around three categories: endpoint and SIEM-integrated copilots from the dominant security platforms, SOAR-meets-LLM workflow automation vendors, and specialty products (vulnerability management, cloud security, threat intelligence). This closing section consolidates the vendor list, the in-book cross-references, and the canonical external sources.
Prerequisites
This is a vendors-and-further-reading section and assumes familiarity with the earlier sections in Chapter 71.
The 2026 Cybersecurity LLM Vendor Landscape
Wiz was founded in early 2020 by four Israeli former Microsoft Cloud Security executives who had previously sold Adallom to Microsoft in 2015. Wiz reached $100 million ARR in 18 months, the fastest pace of any cybersecurity company on record. In July 2024, Wiz famously walked away from a $23 billion all-cash acquisition offer from Google, the largest declined acquisition in cybersecurity history.
- Microsoft Security Copilot. The most-cited reference for a SIEM-and-XDR-integrated LLM copilot. Tight integration with Defender XDR, Sentinel, and the broader Microsoft Security stack. Available in a per-user-per-month SKU plus optional consumption-based units.
- CrowdStrike Charlotte AI. The Falcon-integrated copilot, the parallel reference for an EDR-and-XDR-integrated LLM. Strong on documented productivity claims (40 analyst-hours saved per week per team).
- Wiz. Cloud-security platform with LLM-augmented vulnerability prioritization, attack-path analysis, and natural-language query over the cloud configuration graph.
- SOAR platforms with LLM augmentation. Tines, Torq, Tracecat, Palo Alto Cortex XSOAR, Splunk SOAR. All ship LLM-augmented workflow building, detection authoring, and case management. The differentiation is workflow-language ergonomics and integration breadth.
- Threat-intelligence platforms. Recorded Future AI, Mandiant Intelligence Console, Anomali ThreatStream. LLM-augmented digest, threat-actor profiling, and natural-language search over the threat-intel corpus.
- Vulnerability management. Snyk, Tenable, Qualys, Rapid7. All have integrated LLM features for prioritization narratives, CVE-to-asset mapping, and remediation guidance.
- Email security. Abnormal Security, Proofpoint, Mimecast. LLM-augmented intent classification and BEC detection.
- Phishing simulation and training. Hoxhunt, KnowBe4, Cofense. LLM-generated training content and personalized simulations.
The structural feature of the 2026 security-LLM market is that capability has consolidated to the platform incumbents. The dominant deployments come from Microsoft Security (Copilot + Defender + Sentinel + Entra) and CrowdStrike (Charlotte + Falcon) for SOC and endpoint, plus the specialty vendors above. Stand-alone security-LLM products that do not integrate with an existing security stack have struggled to achieve adoption; the integration with the data pipes is more valuable than the LLM itself. Procurement teams asking "which security-LLM should we buy?" are usually better served by asking "what does our existing security platform's LLM offering cover, and what gaps remain?"
Cross-References Inside This Book
- Section 72.1 (Cybersecurity & LLMs), the focused production-pattern section in this chapter.
- Chapter 47 (Safety, Ethics & Regulation), broader safety framework including red-teaming and privacy attacks.
- Chapter 49 (Agent Safety & Security), the agent-side of cyber-sensitive LLM deployment.
- Chapter 49 (Agent Safety & Production), the foundational architecture this chapter builds on.
Canonical External References
- OWASP Top 10 for Large Language Model Applications. Community-maintained ranking of the dominant LLM vulnerability classes (prompt injection, insecure output handling, training-data poisoning, etc.). The 1.1 list and successor materials are the de-facto starting point for any LLM threat model.
- MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems), the ATT&CK-style knowledge base of AI-targeted adversarial tactics, techniques, and case studies.
- Microsoft Security. Microsoft Security Copilot documentation, the canonical reference for the Copilot-style SOC agent architecture, including data-handling and audit-log guarantees.
- CrowdStrike. Charlotte AI, the agentic-SOC product page with public performance claims widely cited in 2026 SOC modernization discussions.
- Tines. Workflow-automation platform documentation, representative of the broader SOAR-meets-LLM category.
- NIST. SP 800-53 Rev. 5 (security and privacy controls catalog) and the AI Risk Management Framework (and its Generative AI Profile), the U.S. federal control baseline plus the AI-specific risk overlay every regulated SOC LLM deployment is benchmarked against.
Who. A global manufacturer with a 60-person SOC, ~$15M annual security-operations budget, mature SIEM (Splunk) and EDR (CrowdStrike Falcon) deployments. Situation. Through 2024-2025, the firm ran a structured bake-off between Microsoft Security Copilot (paired with Defender XDR and Sentinel) and CrowdStrike Charlotte AI (paired with the existing Falcon deployment). Problem. The two platforms have similar productivity claims but different integration depth: Security Copilot is strongest when the SOC is already on Defender/Sentinel; Charlotte AI is strongest when the EDR backbone is Falcon. The firm's existing Falcon investment created asymmetric switching costs. Decision. The bake-off ran for 90 days with parallel pilot teams: 10 analysts on Security Copilot integrated with a Sentinel ingestion pilot, 10 analysts on Charlotte AI integrated with the existing Falcon deployment. Measured metrics: investigation time per alert, false-positive rate on auto-triage recommendations, analyst satisfaction, integration friction. How. Both platforms were configured with the five-layer trust-boundary pattern (Section 71.4); auto-execute was disabled on both. Results were reviewed by the CISO, the SOC director, and the analyst pilots at the 30, 60, and 90-day marks. Result. Charlotte AI won on integration ergonomics (no migration cost, deeper Falcon visibility) and produced a ~52 percent reduction in investigation time. Security Copilot would have required a Sentinel migration estimated at $3-5M and 12-18 months; the productivity gains alone did not justify the migration. The firm selected Charlotte AI for SOC augmentation and retained Security Copilot for the Microsoft 365 / Entra-adjacent identity-security workflows. Lesson. The structural feature of the 2026 security-LLM market is that integration depth dominates raw model capability; procurement that ignores the existing security stack typically produces worse outcomes than procurement that respects it.
The global security-LLM market reached roughly $2.5-4B in 2025 ARR across the named platforms, growing 75-110 percent year-over-year. The sub-vertical breakdown reflects the platform-consolidation thesis. SIEM/XDR copilots: Microsoft Security Copilot, CrowdStrike Charlotte AI, Palo Alto XSIAM-AI, and SentinelOne Purple AI together represent the largest sub-segment at $1-1.5B ARR; pricing typically $4-8/security-compute-unit at Microsoft's tier, $50K-$500K/year per-customer at the SaaS vendors. SOAR-with-LLM: Tines, Torq, Tracecat, Hunters, and the established XSOAR/Splunk SOAR products together represent $400-700M ARR. Vulnerability management with LLM: Wiz, Snyk, Tenable, Qualys collectively integrated LLM features add roughly $200-400M of attributable ARR. Email security with LLM: Abnormal Security, Proofpoint, Mimecast represent $300-500M of LLM-augmented revenue.
Workforce economics. ISC2's 2024 Workforce Study placed the global cybersecurity workforce shortage at 4 million unfilled roles. The economic case for security-LLM augmentation is sized against this gap: even a 30-40 percent productivity gain across the existing 6 million-strong global workforce represents roughly 2 million FTE-equivalents of analyst capacity, well above the unfilled-role count. The structural argument is that security-LLM augmentation is not optional in 2026; the unaugmented SOC cannot keep pace with modern alert volumes and adversary velocity.
- Section 69.5 (Healthcare LLM Vendors) for the parallel platform-incumbent consolidation pattern (Abridge, Dragon Copilot) in healthcare.
- Section 72.5 (Government LLM Vendors and Postmortems) for the FedRAMP-authorized platform structure that constrains public-sector LLM procurement.
- Chapter 66 (Procurement and Vendor Selection) for the cross-cutting vendor-bake-off methodology.
- Chapter 47 (Adversarial Security and Red Team) for the broader security framework underlying the SOC-LLM platforms.
- Chapter 49 (Agent Safety) for the agent-architecture safety framework that constrains SOC-LLM deployments.
Objective
Triage 100 real CVE descriptions with GPT-4o-mini and produce a four-bucket severity classification (Critical, High, Medium, Low), then compare against the published CVSS v3.1 base-score buckets from the National Vulnerability Database. The point is to feel how the cheapest available model performs on a real SOC-triage workload: where it agrees with CVSS, where it systematically over- or under-rates, and where the disagreement is actually the model being right.
Setup
You need an OpenAI API key and the NVD JSON feeds (free, at nvd.nist.gov/vuln/data-feeds). Pick a recent year (2024) and randomly sample 100 CVEs that have both a description and an analyst-assigned CVSS v3.1 base score.
pip install openai requests pandas scikit-learnSteps
- Download and sample. Pull the 2024 NVD JSON feed, filter to entries with a CVSS v3.1 base score, sample 100 with a fixed seed. Bucket the gold scores: 0.1 to 3.9 Low, 4.0 to 6.9 Medium, 7.0 to 8.9 High, 9.0 to 10.0 Critical.
- Write a triage prompt that gives GPT-4o-mini the CVE description, the affected product, and a CWE if available, and asks for a single severity label plus a one-sentence rationale. Constrain to JSON; temperature 0.
- Run the 100 CVEs and store predictions with the rationale. Track latency and total token cost; CVE triage at the SOC scale is a cost-sensitive workload, and the cheapest-model-that-works choice is the central engineering decision.
- Score against the CVSS buckets using sklearn's
classification_reportand a 4x4 confusion matrix. Calculate Cohen's kappa between LLM and CVSS as the inter-rater agreement metric. - Read the disagreements. Sample 20 entries where the LLM disagreed with CVSS. The interesting outcome is that some are LLM hallucinations (the CVE is actually about an obscure plugin) and others are CVSS under-rating because the NVD analyst missed the network exploitability context; both failure modes inform whether you would deploy this triage step to a real SOC.
Expected Output
A classification report and a confusion matrix, plus a Cohen's kappa value. Published baselines using frontier LLMs against CVSS report agreement of roughly 0.55 to 0.70 kappa with the strongest disagreements concentrated in the Medium-vs-High boundary, which is also the band where human analysts most often disagree with each other.
Extension
Add a retrieval step that pulls CISA's Known Exploited Vulnerabilities (KEV) catalog (cisa.gov/known-exploited-vulnerabilities-catalog) as additional context. Active-exploitation evidence shifts CVSS-equivalent priority sharply upward; measuring how the LLM uses that signal is the closest analogue to a real SOC-priority workflow.
Research Frontier: Where Cybersecurity LLMs Are Heading
The 2024 to 2026 frontier for cybersecurity LLMs is dominated by two opposing arcs: agents that hunt and respond on the defender's side, and the parallel risk of LLM-augmented offensive automation. Both are moving fast and the research questions are sharply asymmetric.
On the offensive side, Fang et al. (Cornell, 2024, arXiv:2404.08144) demonstrated that GPT-4 can autonomously exploit roughly 87 percent of one-day CVE vulnerabilities given only the CVE description, raising the practical baseline for what an AI-augmented attacker can do. Mulpuri et al. (2024) and the HackBench benchmark provide standardized evaluations for end-to-end exploitation chains. On the defender side, SOC-CoPilot (Schwartz et al., 2024), Microsoft's Project AI Security Copilot research, and the academic CyberSecEval 2 benchmark (Meta, 2024, arXiv:2404.13161) measure both attack-capability and defensive-utility properties.
Underlying both arcs are the OWASP Top 10 for LLM Applications taxonomy, MITRE ATLAS adversary-tactics catalog, and the literature on indirect prompt injection (Greshake et al., 2023, arXiv:2302.12173), which is now the canonical class of attack against LLM-augmented SOC tools themselves.
Where this is going: agentic SOC operations that autonomously investigate and contain incidents end-to-end with human supervision only on action escalation, deeper integration between LLMs and formal-verification tools for vulnerability triage, and a regulatory backlog as legislators (EU AI Act, NIST AI RMF) figure out how to govern dual-use offensive cyber capabilities. The interesting open question is whether the asymmetric advantage of LLMs accrues more to defenders (with their telemetry advantage) or to attackers (with their initiative advantage), and the empirical answer over the next 24 months will shape security architecture for the decade.
Show Answer
Show Answer
Show Answer
What Comes Next
Chapter 71 ends here. Chapter 72 on government and public-sector turns to the vertical where the same compliance and auditability requirements are amplified by administrative-law constraints and the unique procurement reality that federal RFP cycles outlast frontier-model generations.
What's Next?
In the next chapter, Chapter 72: Government Use Cases That Actually Work, we continue building on the material from this chapter.