Use Cases That Actually Work in Legal Practice

Section 67.1

"Contract review by LLM: cheaper than an associate, more reliable than a paralegal, less reliable than the LLM thinks it is."

GuardGuard, Legal-Pragmatist AI Agent
Big Picture: Why Legal Was an Early LLM Vertical

Legal practice is the prototype enterprise LLM market: the work is text-on-text, the per-hour billable rate is high, and the cost of a missed clause or a missed citation is itself measurable in further billable hours. By mid-2026 the vertical has consolidated around a recognizable shortlist of production deployments: Casetext CoCounsel (acquired by Thomson Reuters in 2023 and merged into the Thomson Reuters AI Assistant stack), Harvey at the AmLaw 50 (PwC, A&O Shearman, Latham), and Lexis+ AI on the incumbent-publisher side. Regulators have moved in parallel: the EU AI Act classified several legal-tech use cases as high-risk under Annex III, while ABA Formal Opinion 512 (2024) set the U.S. baseline for an attorney's competence and supervision duties when deploying generative AI. Five categories of work have demonstrated reliable LLM augmentation in production: contract review, e-discovery, citation generation, regulatory research, and document summarization. The pattern that unites them is the same, the LLM does the volume-heavy first pass, a licensed attorney does the verification, and an automated check sits between them when stakes are high. The takeaway for this chapter: a legal-defensible deployment is built around verification infrastructure, not around model choice.

Prerequisites

This section builds on the RAG architecture from Chapter 32 (retrieval pipelines and grounding) and the agentic-coding pattern from Section 29.4 for verification loops. Familiarity with the regulation framework in Chapter 47 is useful when reading the citation-verification subsection.

Five categories of legal work have demonstrated reliable LLM augmentation in production by mid-2026.

Contract Review: Assistive, Not Autonomous

Fun Fact

Harvey's founders chose the name as a joke reference to the movie "Suits", where the lead attorney Harvey Specter is famous for never losing a case. The pitch deck reportedly had a Suits screenshot on slide 1, which the founders kept in the deck through the Series A even after their counsel suggested removing it. The Series E in 2025 valued the company at $5 billion, and the joke is now a $5 billion joke.

A cartoon hospital ER triage desk where a receptionist robot stamps clipboards and routes patients to specialists, used here as a visual analogy for first-pass document triage in legal e-discovery
Figure 67.1.1: Legal LLM workflows mirror hospital triage. The model is the receptionist that classifies, stamps, and routes (responsive, privileged, not responsive; standard clause, deviation, novel). The licensed attorney is the specialist who still treats the case; the LLM accelerates the queue, not the diagnosis.

LLMs reliably flag standard-clause deviations (limitation of liability, indemnification, governing law, change-of-control) in commercial contracts. The pattern: a base playbook of "what we expect to see," the LLM compares an incoming document to that playbook, surfaces redlines for human review. Quality is high enough that Big Law associates routinely use this as a first pass; quality is not high enough to skip the human review. Vendors: Harvey, Hebbia, Robin AI, Spellbook, plus increasingly capable in-house deployments using fine-tuned open-weight models on private corpora. See Chapter 32 for the RAG patterns these tools build on.

Harvey deserves a closer look because it has become the canonical reference for what a 2026-era legal LLM product looks like. Founded in 2022 by an ex-OpenAI researcher and a Latham & Watkins associate, Harvey raised over $300 million across 2023 to 2025 and signed enterprise deals with PwC, Allen & Overy, A&O Shearman, and most of the AmLaw 50. The product is structured as a tenant-isolated workspace, with the firm's own matter documents indexed in a private retrieval store; the LLM (a frontier model accessed via the Azure OpenAI Service or via Anthropic's enterprise API) never sees another firm's data. The differentiation Harvey leans on is not raw model quality but workflow integration: drafting templates, redline-comparison UIs, citation-checking against Westlaw, and audit-log defaults that satisfy the bar's competence and supervision rules.

E-Discovery and Document Triage

For discovery in litigation, vendor-reported throughput gains for LLM-assisted first-pass relevance review typically fall in the 3-10x range over manual associate review, with accuracy "comparable on routine matters" in the limited published evaluations (most numbers come from vendor case studies; independent published evaluations remain scarce). Accuracy on novel or factually unusual matters tracks lower and is the reason recall-validation protocols are mandatory. The pattern: classify each document into "responsive / privileged / not responsive," surface the top-K most likely privileged or responsive documents for human review, audit-log every classification for defensibility. Critical: courts have increasingly accepted technology-assisted review (TAR) protocols where the LLM is properly disclosed and validated. The doctrinal basis traces to Da Silva Moore v. Publicis Groupe (S.D.N.Y. 2012), the early TAR-approval case; the LLM-era extensions of that doctrine are being litigated through the mid-2020s.

Citation Generation, With Verification

LLMs draft Bluebook citations from raw case captions or article metadata with high accuracy. The failure mode is the one everyone warns about: the LLM invents cases that do not exist when asked to find supporting precedent. The fix is now standard practice: every citation gets verified against an authoritative source (Westlaw, Lexis, CourtListener API) before it leaves the firm. Tools that do not do this verification step are professional malpractice waiting to happen.

Key Insight

The citation-verification step is the load-bearing engineering decision in a legal LLM product. Anything else, model choice, prompt design, UI, can be debated and tuned. The verification step is what separates a tool that ships from a tool that puts an attorney in front of a disciplinary committee. Build it as a hard requirement that fails closed: every cited case, statute, or regulation must resolve to a record in an authoritative database (Westlaw, Lexis, CourtListener, the Federal Register API, the relevant state's primary-law repository). If verification fails, the citation is stripped from the output and flagged for human research, not silently allowed through.

Regulatory and Compliance Research

LLMs over a RAG index of regulatory text (CFR, statutes, agency guidance) answer compliance questions with citations to the source paragraph. Banking, healthcare, securities, and energy have all deployed these internally. The key engineering decision: RAG with the regulatory corpus as the only retrieval source, with the LLM explicitly instructed to refuse if the retrieved chunks do not contain the answer. (See Section 35.3 on grounding strategies.)

The compliance-research use case has been one of the few where firms have successfully built an internal product that competes with the commercial vendors. The reason is corpus specificity: a regional bank's regulatory exposure (CFPB rules, state banking law, internal policy memos, examination findings) is narrow enough that a small, well-curated retrieval index plus a frontier model outperforms a generalist legal LLM on the bank's actual questions. The investment that pays off is the corpus curation, not the model.

Summarizing depositions, expert reports, regulatory filings, and case law. Mature use case; main risk is omission rather than fabrication, so the pattern is "summary + key-quote highlights + source page links" rather than free-form prose. Litigation-support teams report meaningful review-time reductions on routine matters (vendor case studies cluster in the 50-80 percent range, with the higher end reserved for high-volume, structurally similar depositions like personal-injury or product-liability series; complex matters with novel fact patterns see considerably smaller gains). The pattern shifts attorney attention from "where is the relevant testimony?" to "is the summary accurate and complete?"

Real-World Scenario: Harvey at Allen & Overy

Allen & Overy (now A&O Shearman after its 2024 merger with Shearman & Sterling) was Harvey's first announced major-firm customer in February 2023. The deployment rolled out to roughly 3,500 lawyers across more than 40 offices. The published before-and-after metrics, reported by the firm at Allen & Overy's launch announcement and subsequent industry interviews, are instructive: roughly half of the firm's lawyers used Harvey in their day-to-day work within the first six months; the highest-frequency use cases were research-summary drafting, due-diligence document triage, and first-draft client memos. The firm did not report the tool replacing associates; the firm reported associates handling more matters per week. The pattern that has held across most major Big Law deployments is consistent: Harvey (or Hebbia, or Spellbook, or a fine-tuned in-house equivalent) does not change billable-hour structures, but it does shift where those hours are spent, from undifferentiated reading toward higher-leverage analytical work.

Production Pattern: Legal LLM Vendor Landscape, 2026

The market that has consolidated around assistive contract review and litigation drafting is mapped in the table below. All five vendors operate verified-RAG architectures of the kind described in Section 67.4; their differentiation is corpus focus and deployment posture rather than core retrieval approach.

Table 67.1.1a: Legal LLM vendors as of 2026. All five vendors operate verified-RAG architectures; differentiation is corpus focus and deployment posture rather than core retrieval approach.
Vendor Focus Deployment Pricing tier
Harvey Assistive contract review and litigation drafting Cloud (multi-tenant with tenant isolation) Enterprise
Hebbia Search and structured extraction over large document sets Cloud Enterprise
Casetext / Co-Counsel Legal research and memo drafting (Thomson Reuters) Cloud Mid-market and enterprise
Spellbook Transactional drafting and Word-integrated redlining Cloud (Word add-in) Mid-market
Robin AI Transactional contract review and negotiation support Cloud Enterprise
Warning

None of the five use cases above is safe to deploy without the verification or human-review step described alongside it. Legal practice operates under a duty of competence (ABA Model Rule 1.1, Comment 8) and a duty of supervision over non-attorney assistants (Rule 5.3) that both extend to LLM-augmented work. Skipping the human check is not a productivity optimization; it is a bar-discipline event waiting to happen. Section 67.2 catalogs the most common failure modes in detail, and Section 67.4 specifies the verified-RAG architecture that is now the de-facto standard for compliant deployment.

What Comes Next

Section 67.2 turns to the failure modes specific to legal LLMs, starting with the hallucinated-precedent problem that produced the Mata v. Avianca sanctions order. The use cases above all work; the question of how they fail and how to defend against those failures is what defines a deployable legal-LLM stack.

See Also

For advanced RAG patterns used in legal retrieval, see Section 35.3. For RAG fundamentals these legal pipelines build on, see Chapter 32. For legal-specific evaluation and deployment patterns, see Section 67.4.

What's Next?

In the next section, Section 67.2: Failure Modes Specific to Legal Practice, we build on the material covered here.

Further Reading

Foundational Papers

Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C. D., & Ho, D. E. (2024). "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools." Stanford HAI. arXiv:2405.20362. Empirical audit of Westlaw, Lexis, and Casetext; the canonical reference for legal-LLM reliability claims.
Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models." Journal of Legal Analysis. arXiv:2401.01301. Reference taxonomy for legal hallucinations; defines the failure modes that production legal LLMs must guard against.

Legal Benchmarks

Guha, N., Nyarko, J., Ho, D. E., et al. (2023). "LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models." NeurIPS 2023. arXiv:2308.11462. The standard benchmark for legal LLM evaluation; covers 162 tasks across legal reasoning categories.
Chalkidis, I., Pasini, T., Zhang, S., et al. (2022). "LexGLUE: A Benchmark Dataset for Legal Language Understanding in English." ACL 2022. arXiv:2110.00976. Earlier legal-NLP benchmark; useful for tasks like case classification and statute retrieval.