Use Cases That Actually Ship in Finance

Section 68.1

"Equity research synthesis, KYC, customer ops, code generation. The boring use cases that finance actually deployed, while everyone talked about robo-advisors."

QuantQuant, Banking-Reality-Reader AI Agent
Big Picture: Finance Under Model Risk Management

Finance was an early enterprise adopter of LLMs because the workflows are text-on-text, the per-employee cost is high, and the back-office processes are well-defined; but unlike legal, every deployment lands inside a model-risk-management framework that predates generative AI by fifteen years. The reference deployments by mid-2026 trace a recognizable arc: BloombergGPT (2023) established the domain-pretraining pattern with 50B parameters on 363B tokens of Bloomberg's proprietary corpus, JPMorgan's IndexGPT turned that pattern into a thematic-basket product, and the open FinLLM and FinGPT benchmarks now anchor academic and vendor evaluation. Regulators have not stood still: the Federal Reserve's SR 11-7 framework on model-risk management is the binding constraint on every U.S. bank deployment, the OCC and FRB have issued joint guidance on third-party AI providers, and FINRA has clarified that LLM-generated client communications fall under existing recordkeeping rules. Five categories of finance LLM work now ship reliably: equity research synthesis, sentiment and event extraction, code generation for quant and middle-office workflows, KYC and compliance summarization, and customer operations. The takeaway: in finance the productivity win is real, but the load-bearing engineering decision is making the deployment auditable enough to survive an SR 11-7 review.

Finance LLM use cases by ship-readiness, mid-2026
Figure 68.1.1: The five shipping finance-LLM patterns (left, green) vs the three regulator-blocked categories (right, red). Note the common shape across all five green cases: LLM produces a first draft, a licensed professional verifies, an audit log captures every decision, the SR 11-7 survival pattern Section 68.3 unpacks.

Prerequisites

This section assumes familiarity with the RAG patterns from Chapter 32, conversational AI patterns from Chapter 37, and the agentic-coding patterns from Section 29.4.

Glossary for non-finance readers: KYC (Know Your Customer) and AML (Anti-Money-Laundering) are mandatory customer-identity and transaction-monitoring processes banks must perform on every account; 10-K and 10-Q are annual and quarterly company filings required by the U.S. SEC; SR 11-7 is the Federal Reserve guidance on model-risk management that governs every model a U.S. bank deploys (each model must have documented assumptions, independent validation, and an audit log); the OCC (Office of the Comptroller of the Currency) and FRB (Federal Reserve Board) are the U.S. banking regulators; FINRA is the U.S. broker-dealer regulator.

Equity Research Synthesis

Fun Fact

BloombergGPT (Wu et al., 2023) was trained on 363 billion tokens of proprietary Bloomberg data; the cost was estimated at $2.7 million, modest by frontier standards. The model has never been released publicly, and Bloomberg Terminal users access it only through embedded features. The published paper became famous less for the model and more for the appendix listing all 50+ financial benchmarks Bloomberg had quietly built over a decade.

LLMs over a private corpus of 10-K, 10-Q, transcripts, and analyst reports produce first-draft research notes. Sell-side analysts and buy-side researchers use these tools (BloombergGPT lineage, Hebbia for buy-side, FactSet's Mercury AI assistant) to compress the read-this-and-summarize part of the workflow. The output is always reviewed by a human analyst before it leaves the firm; the productivity win is in the synthesis time, not in the publication step.

Real-World Scenario
BloombergGPT and the Domain-Pretraining Pattern

Bloomberg's BloombergGPT (Wu et al., 2023) is the historical anchor for a finance-domain pretrained LLM: a 50-billion-parameter decoder trained from scratch on 363 billion tokens of Bloomberg's proprietary financial corpus (news wires, filings, transcripts, press releases) blended with a 345-billion-token general-domain corpus. It outperformed comparably-sized general LLMs on Bloomberg's internal financial-task benchmarks (NER over earnings calls, headline sentiment, FiQA QA) while remaining competitive on general benchmarks. By 2026 the pattern has shifted; FinanceBench (Islam et al., 2023) became the public-benchmark reference point and most production stacks now combine a frontier general LLM (Claude Sonnet, GPT-4o) with finance-specific retrieval and verification layers rather than training a domain LLM from scratch. The open-source community equivalent is FinBERT, a much smaller encoder fine-tuned for financial sentiment; major banks have launched their own narrower systems, e.g. JPMorgan's IndexGPT for thematic-basket construction. The takeaway for builders: BloombergGPT's advantage was measured in 2023 against same-era general models; by 2026, frontier models combined with retrieval have closed or reversed this gap for most finance tasks, so from-scratch domain pretraining is now justified mainly for narrow, high-volume workloads where the vocabulary, document genre, and reasoning shape are unusual, you can afford the corpus, and the frontier-plus-RAG alternative cannot reach the same numerical-traceability bar.

Sentiment and Event Extraction

Stream of news, earnings calls, central-bank communications, and social media classified for sentiment, named entities, and event types (M&A, guidance change, regulatory action). Quant funds have used variants of this for over a decade; LLMs improved accuracy on the harder cases (sarcasm, hedged language, multi-entity stories) and reduced annotation cost. The trade-able edge is small and has been arbitraged away on standard signals; the value is in custom signals over private data. BlackRock's Aladdin platform, the dominant institutional risk-and-portfolio system, integrated LLM-based sentiment scoring across its news feeds in 2024 and 2025 as a standard signal augmentation rather than as a standalone alpha source.

Code Generation for Finance Workflows

Excel formulas, SQL queries against trade databases, Python for backtest scripts, KDB/q snippets. Heavy adoption in middle-office and quant teams. The risk is the standard one (hallucinated APIs, subtle off-by-one errors); the mitigation is the standard one (tests, code review, sandboxed execution). See Section 29.4 for the agentic-coding pattern. Goldman Sachs and Morgan Stanley both deployed GitHub Copilot Enterprise and equivalent assistants firm-wide by 2024; reported productivity gains cluster around 15 to 30 percent on routine code, though these are largely company-reported figures rather than independent measurements, and the senior-engineer review remains in place.

Compliance and KYC Summarization

Adverse-media screening for KYC/AML, summarization of customer communications for compliance review, regulatory filing first-drafts. The sensitivity is privilege-equivalent: false positives create regulatory friction, false negatives create fines. Human-in-the-loop is non-negotiable. The dominant pattern at the big four banks is a tiered review: the LLM produces a structured summary plus a risk indicator, a junior compliance analyst confirms low-risk cases, senior compliance reviews high-risk cases, and the audit log captures every decision.

Customer Operations

Account-servicing chatbots, advisor-assistant tools, bank-app voice interfaces. Conservative rollouts: most large banks use LLMs as drafting assistants for human agents, not as direct customer-facing autonomous systems. The handful of customer-facing deployments use heavily-constrained prompt designs with explicit out-of-scope refusals. (See Chapter 37.) Bank of America's "Erica" virtual assistant, deployed before the LLM era and incrementally upgraded with LLM components, handled over two billion interactions by 2024 and is the most-cited reference for a heavily-scoped banking chatbot that ships safely at scale.

Key Insight

The five use cases above share a common shape: LLM does first-draft work, a licensed professional verifies, an audit log captures every decision. Where finance differs from legal is in the verification target. In legal, verification is "does this citation exist and say what is claimed?" In finance, verification is "is this number traceable to a structured filing or transaction record?" The legal pattern (citation resolution against authoritative APIs) translates almost directly into the finance pattern (numerical traceability to XBRL, trade-blotter records, or audit-grade source data). The verification step is what makes the deployment defensible under SR 11-7 model risk management, which Section 68.3 covers in detail.

Use Cases That Do Not Ship (Yet)

It is worth noting what is missing from the list above. Autonomous trading on LLM signals remains uncommon outside hedge-fund quant pods, where the systems are tightly-instrumented and the LLM is a feature generator rather than a decision-maker. Direct LLM-driven credit decisions are blocked by ECOA and Fair Housing Act fair-lending review; the LLM may inform the application packet but the decision flows through a documented model-risk-management framework. Fully-autonomous customer-facing chatbots for advisory services are blocked by SEC and FINRA suitability rules; the LLM helps the human advisor but does not give advice on its own. Pilots in each of these categories exist; production deployments are rare and tightly-scoped.

What Comes Next

Section 68.2 covers the failure modes specific to finance: hallucinated numbers, fair-lending exposure, and the market-manipulation adjacency that constrains client-facing LLM commentary.

What's Next?

In the next section, Section 68.2: Failure Modes Specific to Finance, we build on the material covered here.

Further Reading

Foundational Papers

Wu, S., Irsoy, O., Lu, S., et al. (2023). "BloombergGPT: A Large Language Model for Finance." arXiv:2303.17564. The reference financial-domain LLM; defines what production finance-LLM looks like.
Yang, H., Liu, X.-Y., & Wang, C. D. (2023). "FinGPT: Open-Source Financial Large Language Models." arXiv:2306.06031. The open-source counterpart to BloombergGPT; reference for self-hosted finance LLMs.

Finance Benchmarks

Shah, A., Paturi, S., & Chava, S. (2023). "Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis." ACL 2023. aclanthology.org/2023.acl-long.368. Reference benchmark for FOMC-statement analysis with LLMs.
Xie, Q., Han, W., Zhang, X., et al. (2023). "PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance." NeurIPS 2023. arXiv:2306.05443. Multi-task financial NLP benchmark; the standard evaluation suite for finance LLMs.