
"In finance, every LLM answer is also a risk number."
Sage, Risk-Aware AI Agent
Chapter 67 covered legal; this chapter covers finance. Research, trading, risk management, KYC, fraud, compliance, structured-data interrogation, and the high-stakes evaluation and governance discipline that finance demands.
Finance was an early adopter of LLMs because the workflows are text-on-text (analyst reports, regulatory filings, news, internal memos), the per-employee cost is high enough to justify automation investment, and the back-office processes are well-defined. But finance is also one of the most regulated industries in the world, and the failure modes (model risk, market manipulation, fair-lending violations, fiduciary breach) have legal teeth.
The 2023 deployment anchors that defined the industry's playbook: Bloomberg's BloombergGPT (Mar 2023, 50B parameters) as the first domain-specialized pretraining bet; Morgan Stanley's GPT-4 deployment for wealth advisors (Mar 2023) as the first major Wall Street rollout; JPMorgan's IndexGPT trademark filing (May 2023) as a public signal of in-house build intent; and Goldman Sachs' Mar 2023 "300M jobs" macro report as the macro framing that pushed boards to act.
This chapter is the practitioner snapshot of what 2026 settled. Section 68.1 covers the use cases that ship. Section 68.2 covers the failure modes. Section 68.3 covers the regulatory framework. Section 68.4 covers the tiered LLM trust architecture. Section 68.5 closes with the vendor landscape, and Section 68.6 is the longer production-pattern companion.
Chapter Overview
Finance LLM deployment sits inside one of the most regulated production environments in the book. This chapter walks the use cases that actually ship (equity research synthesis, sentiment extraction, code generation, KYC, customer operations, the BloombergGPT pattern), the failure modes specific to finance (hallucinated numbers, fair-lending disparate impact, market-manipulation adjacency), the regulatory framework (SR 11-7 model risk, EU AI Act high-risk, FINRA recordkeeping, DORA, consumer disclosure), the tiered trust architecture (Tier 0 through Tier 3) that major banks have settled on, and the vendor landscape plus canonical sources.
Finance is the industry where model risk has a forty-year regulatory tradition and LLMs are still new enough to be treated as exceptions. This chapter teaches what ships, what fails, and what the regulators expect.
- Map the finance use cases (research, sentiment, code, KYC, customer ops) that actually ship.
- Diagnose hallucinated numbers, fair-lending disparate impact, and market-manipulation adjacency in finance LLMs.
- Apply SR 11-7, EU AI Act, FINRA recordkeeping, and DORA to a finance LLM deployment.
- Architect a tiered LLM trust stack (Tier 0 through Tier 3) for bank governance.
- Evaluate finance-specific LLMs (BloombergGPT, FactSet Mercury, Hebbia) against use-case fit.
Sections in This Chapter
Prerequisites
- RAG fundamentals from Chapter 32
- Evaluation foundations from Chapter 42
- Bias and fairness from Chapter 52
- 68.1 Use Cases That Actually Ship in Finance Equity research synthesis, sentiment extraction, code generation, KYC, customer operations, and the BloombergGPT pattern. Intermediate
- 68.2 Failure Modes Specific to Finance Hallucinated numbers, fair-lending disparate impact, market-manipulation adjacency, and the structured-extraction-then-LLM mitigation pattern. Advanced
- 68.3 Regulatory Framework for Finance LLMs SR 11-7 model risk, EU AI Act high-risk, FINRA recordkeeping, DORA operational resilience, and consumer disclosure rules. Advanced
- 68.4 Tiered LLM Trust Architecture The Tier 0 through Tier 3 framework that has consolidated at major U.S. and EU banks for governing LLM deployments under SR 11-7. Advanced
- 68.5 Finance LLM Vendors and Further Reading BloombergGPT, FactSet Mercury, Hebbia, JPMorgan IndexGPT, BlackRock Aladdin, and the regulatory canon. Intermediate
Pick one financial-services use case from this chapter (research summarization, earnings-call extraction, KYC narrative drafting, or trade-document review). Assign each step of the workflow to one of three trust tiers: T1 (fully autonomous), T2 (analyst-reviewed), or T3 (human-only, LLM forbidden). Justify each assignment in one sentence based on the failure mode (numerical hallucination, fair-lending risk, market-manipulation adjacency) and the regulatory framework (SEC, MiFID II, fair-lending laws) that constrains the choice.
Answer Sketch
Example for earnings-call extraction: structured-data extraction (numbers from a transcript) is T2 with required reconciliation against the official 10-K; sentiment scoring on prepared remarks is T1; sentiment scoring on Q&A is T2 because subtle phrasing matters; any extracted forward-looking statement that is republished to clients is T3 until compliance signs off. The point of the exercise is to commit to a tier per step rather than treating the whole workflow as "LLM-assisted" without granularity.
What Comes Next
Finance produced the tiered-trust framework that generalizes well across regulated industries. Chapter 69 turns to healthcare, where the regulatory friction is at least as intense (FDA SaMD, HIPAA, malpractice exposure) and the highest-leverage use case (ambient clinical documentation) is unlike anything in finance.