Offensive (Red Team) Use Cases

Section 71.2

"Red-team LLM use cases: phishing copy, vulnerability research, malware adaptation. The asymmetry of LLM offense and defense is the whole chapter in one sentence."

GuardGuard, Red-Team-Realist AI Agent
Big Picture

Attackers use the same generation, summarization, and code-completion capabilities defenders use, with one critical difference: there is no compliance team slowing them down. This section catalogs what attackers do with LLMs because defenders must understand the threat to defend against it. The section is calibrated to publicly-known capabilities; it does not extend what is already accessible to motivated adversaries. Three capability classes have stabilized as the dominant offensive uses: phishing content generation, vulnerability research acceleration, and malware adaptation. Each comes with a defender-facing response pattern.

Offensive LLM use cases vs defender mitigations
Figure 71.2.1: The four offensive LLM categories from Section 71.2 with attacker cost reduction (red) paired with the defender response that has emerged (green). The 100-400x cost-reduction on phishing content is the largest single shift; the defender response is technical (passkeys, conditional access, behaviour-based EDR) not user-training-based, because LLM-generated phishing defeats grammar-based heuristics.

Prerequisites

This section assumes the defensive LLM use cases from Section 71.1, the LLM-safety framing from Section 49.1, and the jailbreaking vocabulary from Section 48.1.

The capabilities below have offensive analogs to the defender use cases in Section 71.1. We cover them because defenders must understand attacker capabilities; we do not cover techniques that would substantially extend attacker capability beyond what is already public.

Phishing Content Generation

Fun Fact

CyberSecEval, Meta's open-source benchmark for measuring LLM offensive capability, was released in late 2023 as part of the Purple Llama project. The benchmark's "exploit generation" sub-test was so contentious that Meta reportedly spent 4 months working with U.S. government cybersecurity reviewers before releasing it; the eventual public release included a quietly redacted appendix listing the specific test categories that were withheld from the publicly downloadable evaluation set.

LLMs generate plausible spear-phishing emails at scale, in any target language, with realistic context drawn from public sources. The pre-LLM phishing landscape was constrained by attacker writing quality, particularly in languages the attacker did not speak natively; the LLM era removed that constraint. Attackers can now produce well-written, contextually-aware spear-phishing in any language at near-zero marginal cost.

The blue-team mitigation is to assume your users will see well-crafted phishing and invest accordingly in MFA, conditional access, and detection rather than user-training-as-only-defense. The defenses that scale are technical: passkeys and FIDO2 hardware-bound credentials, conditional-access policies that detect impossible travel and anomalous device fingerprints, and email security that scores message intent (not just keywords). The defenses that do not scale are pure user training; users will be tricked some fraction of the time, and the architecture must remain safe through that fraction.

Vulnerability Research Acceleration

LLMs accelerate fuzz-input generation, exploit-development scaffolding, and understanding-of-unfamiliar-codebases for attackers. Red teams use these openly; criminal actors do too. Public benchmarks (CyberSecEval, CYBER-bench) measure capability; the consensus is that frontier LLMs assist with but do not autonomously perform competent vulnerability research. The 2024 to 2025 evaluations of frontier models on standardized exploitation benchmarks consistently show that LLMs accelerate experienced researchers (by perhaps 2x) but do not enable novice attackers to perform research they could not otherwise perform.

Anthropic, OpenAI, and Google have all published responsible-disclosure programs for the capabilities they catch their frontier models supporting; the OpenAI disrupting deceptive uses reports and Anthropic's policy papers are the standard reference. The defender implication is that frontier model providers cooperate with the security community on disclosure, while open-weight models do not have an equivalent cooperative posture and are increasingly the substrate for adversarial use.

Malware Adaptation

LLMs can rewrite malware code to evade signature-based detection. Defensive implication: signature-based detection has been declining in value for years; this accelerates the trend toward behavior-based and ML-based detection. The architectural response: invest in EDR products that detect behavior (process tree anomalies, syscall patterns, unusual network connections) rather than file-hash matches, and accept that AV signatures provide a useful baseline rather than a complete defense.

Persona Generation and Influence Operations

A fourth category that does not fit cleanly into the traditional pen-test taxonomy: LLM-generated personas, fake reviews, fabricated social-media content, and the broader category of influence operations. Frontier model providers report disrupting state-affiliated and commercial actors using their APIs for these purposes; the OpenAI and Anthropic threat-disruption reports are the canonical sources. The cybersecurity overlap is the social-engineering vector: LLM-generated personas have been used in pretext-based attacks on targeted individuals, and the SOC must be able to investigate "is this person who they say they are?" questions when they arise.

What Defenders Can Do

The pattern that has emerged across mature security organizations in 2026 has three elements. First, raise the cost of the highest-volume attack categories (phishing, credential stuffing, account takeover) through technical controls that do not depend on user vigilance. Second, instrument the environment heavily enough that the LLM-assisted defender can investigate quickly when an attack does land. Third, maintain a relationship with the frontier-model providers' trust-and-safety teams; the cooperative disclosure of attacker patterns has been one of the unexpectedly valuable side effects of the LLM era.

Key Insight

The capability asymmetry between attackers and defenders has not changed much under LLMs. Attackers got faster at the volume-heavy parts of their work (phishing content, malware variants); defenders got faster at the volume-heavy parts of their work (alert triage, postmortems, detection authoring). The ratio of attacker capability to defender capability is broadly similar to the pre-LLM era. What has changed is that both sides operate at higher velocity; the marginal cost of marginally better attacks and the marginal cost of marginally better defenses have both collapsed. Organizations that fail to adopt the defender LLM tools fall behind faster, not because attackers gained capability but because the defender capacity required to operate at the new velocity exceeds the unaugmented human equivalent.

Real-World Scenario
OpenAI's Disrupting-Deceptive-Uses Reports

Who. OpenAI's Trust and Safety team, with cooperation from Microsoft Threat Intelligence and the broader security community. Situation. Through 2024-2025, OpenAI published a recurring series of disrupting-deceptive-uses reports documenting state-affiliated and commercial actors who used the API for covert influence operations, social engineering, and other adversarial purposes. Problem. Frontier-model providers face a structural dilemma: the same generative capabilities that produce productivity gains for defenders also produce phishing content, fake-persona scaffolding, and influence-operation copy for attackers. Banning users individually is reactive; the question is whether systematic disruption is possible. Decision. OpenAI built an internal threat-detection capability (combining behavioral signals, content classifiers, and human review) and committed to publishing periodic threat-disruption reports. How. Identified accounts are banned; observed tradecraft is documented and shared with the broader security community; the threat-disruption reports name the threat actors when possible (e.g., Sandstorm, Forest Blizzard, Crimson Sandstorm in the May 2024 report co-published with Microsoft) and describe the patterns of misuse without providing operational details that would help adversaries. Result. By late 2025, the OpenAI, Anthropic, and Google threat-disruption reports collectively document over 100 disrupted operations and produce a quasi-public threat-intelligence feed of LLM-enabled adversary tradecraft. Lesson. Frontier-model providers can disrupt adversarial use at the API layer, but the disruption is structurally weaker for open-weight models where no provider has visibility into the inference pipeline. The defender implication: the security community now has a partial map of LLM-enabled adversary behavior, but the map is biased toward frontier-API usage.

Numeric Example
CyberSecEval and the attacker-augmentation ceiling

The most-cited benchmark on offensive LLM capability is Meta's CyberSecEval and its successors, plus the academic CYBER-bench. Two numbers anchor the discussion. First, frontier models show roughly 2x speedup for experienced security researchers on standardized exploitation tasks, with no evidence that they enable novice attackers to perform research they could not otherwise perform. Second, the capture-the-flag (CTF) competition baseline: top frontier models in 2025 solved roughly 30-50 percent of intermediate-difficulty CTF challenges (CyberSecEval-v2 numbers), against 80-95 percent for skilled human teams. The gap is closing but the cap on novice-amplification has held through 2026.

Phishing economics. Pre-LLM, large-scale phishing required either careful translation budgets (~$0.05-0.20/email for human translation in target languages) or accepted lower yield from poorly-written content. LLM generation drops the marginal cost to ~$0.0005/email at frontier-API pricing, a 100-400x reduction. The defensive implication is that phishing volume scales without proportional cost, but the marginal effectiveness improvement is smaller: users still click on roughly the same fraction of well-crafted phishing emails, and the technical defenses (passkeys, MFA, anomaly detection) remain the dominant mitigation regardless of email quality. The architecture-not-training defense holds.

See Also
Self-Check
1. The defender mitigation for LLM-generated phishing emphasizes technical controls (MFA, passkeys, conditional access) rather than user training. Why?
Show Answer
The pre-LLM phishing landscape was partially constrained by attacker writing quality, especially in non-native languages; the LLM era removed that constraint, so users now see well-crafted, contextually-aware phishing in any language at near-zero marginal cost. User training that depends on spotting poor grammar or formulaic language fails against LLM-generated content. The defenses that scale are technical: passkeys and FIDO2 hardware-bound credentials that defeat credential theft regardless of email quality, conditional-access policies that detect anomalous device or location patterns, and email security that scores message intent rather than relying on signature-based matching. The architecture must remain safe through user mistakes.
2. The published capability evaluations (CyberSecEval, CYBER-bench) consistently show that frontier LLMs accelerate experienced security researchers without enabling novice attackers. What architectural feature of current LLMs explains this asymmetry?
Show Answer
Current LLMs accelerate the volume-heavy parts of work (writing exploit scaffolding, generating fuzz inputs, understanding unfamiliar codebases) but cannot autonomously produce novel exploits or perform competent vulnerability research without guidance. Experienced researchers know what to ask, recognize when output is plausible-looking nonsense, and integrate LLM output into their own judgment; novices lack the priors to do either. The asymmetry is structural to current frontier capability, but the threshold for novice-amplification is closing as models improve, and the threshold is what frontier-model providers monitor in their responsible-disclosure programs.
3. Why is the open-weight-model threat model structurally different from the frontier-API threat model in cybersecurity contexts?
Show Answer
Frontier-API providers (OpenAI, Anthropic, Google) have visibility into the inference pipeline and can detect and disrupt adversarial use: ban accounts, publish threat-disruption reports, cooperate with the security community. Open-weight models (Llama, Mistral, Qwen, etc.) run on adversary-controlled infrastructure with no provider visibility; there is no equivalent disruption capability. The asymmetry shapes the threat landscape: state-affiliated and commercial adversaries increasingly use open-weight models to avoid disruption, while frontier APIs catch the less-sophisticated misuse. Defender strategy must assume that high-volume well-resourced adversaries are operating outside the frontier-provider visibility, and that the cooperative-disclosure benefit applies primarily to lower-tier threats.

What Comes Next

Section 71.3 turns to the LLM-specific attack surface: prompt injection, training-data poisoning, membership inference, model extraction. The defender's LLM stack is itself a target, and the OWASP Top 10 for LLM Applications and MITRE ATLAS are the canonical references.

What's Next?

In the next section, Section 71.3: LLM-Specific Attack Surface, we build on the material covered here.

Further Reading

Offensive LLM Capabilities

Fang, R., Bindu, R., Gupta, A., Zhan, Q., & Kang, D. (2024). "LLM Agents can Autonomously Exploit One-day Vulnerabilities." arXiv:2404.08144. Empirical demonstration that GPT-4 agents can exploit known vulnerabilities; reference for red-team LLM capabilities.
Fang, R., Bindu, R., Gupta, A., & Kang, D. (2024). "LLM Agents can Autonomously Hack Websites." arXiv:2402.06664. Reference paper on autonomous LLM-driven website attacks; the canonical 2024 offensive-LLM result.
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., & Fredrikson, M. (2023). "Universal and Transferable Adversarial Attacks on Aligned Language Models." arXiv:2307.15043. The GCG attack paper showing transferable adversarial suffixes that jailbreak aligned models; the canonical reference for automated red-teaming against safety-tuned LLMs.
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." AISec '23. arXiv:2302.12173. The foundational paper on indirect prompt injection; essential reading for understanding how attackers weaponize retrieved content against LLM-driven agents.

Red Team Methodology

MITRE (2024). "ATT&CK Framework." attack.mitre.org. The standard adversarial-TTPs reference; informs LLM red-team scenario design.
MITRE (2024). "ATLAS: Adversarial Threat Landscape for AI Systems." atlas.mitre.org. MITRE's AI-specific ATT&CK companion; the structured taxonomy for cataloguing LLM-targeted attacks in red-team plans.