"The market speaks in numbers, but its true language is narrative. I read both fluently."
Deploy, Bullishly Literate AI Agent
Finance is one of the most text-intensive industries, making it a natural fit for LLMs. Earnings calls, SEC filings, analyst reports, news feeds, and social media create an enormous volume of unstructured text that drives investment decisions. LLMs can process this text at scale, extracting sentiment, generating reports, identifying risks, and even producing trading signals. However, financial applications demand exceptional accuracy, explainability, and regulatory compliance, creating unique challenges beyond what general-purpose models handle out of the box. The RAG techniques from Chapter 20 and the hybrid ML/LLM patterns from Section 12.3 are essential for building reliable financial AI systems.
Prerequisites
This section builds on the application patterns from Section 28.1 and the agent foundations from Section 22.1. Understanding RAG from Section 20.1 and hybrid ML/LLM patterns from Section 12.3 is important for building reliable financial AI systems.
1. Financial NLP and Sentiment Analysis
Financial sentiment analysis differs from general sentiment analysis in important ways.
In finance, the phrase "in line with expectations" is positive, "slightly below" is catastrophic, and "exploring strategic alternatives" means someone is about to have a very bad quarter. LLMs trained on general text get this spectacularly wrong.
The word "liability" is negative in general text but neutral in finance. Phrases like "above expectations" or "revised guidance" carry specific quantitative implications. Financial NLP models must understand these domain-specific nuances to produce reliable signals. Code Fragment 28.2.2 below puts this into practice.
# Implementation example
# Key operations: results display
from transformers import pipeline
# FinBERT: finance-specific sentiment model
fin_sentiment = pipeline(
"sentiment-analysis",
model="ProsusAI/finbert",
)
headlines = [
"Company reports Q3 earnings above analyst expectations",
"Fed signals potential rate cuts amid cooling inflation",
"Tech giant announces major layoffs, restructuring plan",
"Supply chain disruptions continue to pressure margins",
]
for headline in headlines:
result = fin_sentiment(headline)[0]
print(f"{result['label']:>10} ({result['score']:.3f}): {headline}")
Domain-Specific Financial Models
| Model | Base | Training Data | Strength |
|---|---|---|---|
| FinBERT | BERT | Financial news, reports | Sentiment classification |
| BloombergGPT | Custom 50B | Bloomberg terminal data | Broad financial NLP |
| FinGPT | LLaMA / Mistral | Open financial data | Open-source, customizable |
| FinMA | LLaMA | Financial instructions | Financial QA, reasoning |
In finance, the decision is rarely "LLM vs. traditional ML" but rather "which layer should the LLM handle?" Time-series forecasting, anomaly detection, and quantitative risk models are better served by specialized ML models (XGBoost, LSTM, statistical methods) that are fast, interpretable, and auditable. LLMs excel at the natural language layer: reading earnings transcripts, summarizing SEC filings, generating analyst reports, and translating quantitative findings into human-readable insights. The most effective financial AI systems combine both: ML models produce the numbers, and LLMs interpret and communicate them. This mirrors the hybrid ML/LLM pattern from Chapter 12, which is the dominant architecture in production financial systems.
For financial sentiment analysis, always include a "neutral" or "mixed" category in your label set. Earnings calls and SEC filings frequently contain sentences that are simultaneously positive about one metric and negative about another ("Revenue exceeded expectations, but margins contracted due to rising input costs"). Forcing a binary positive/negative classification on these sentences injects noise into your signal and can flip trading indicators.
2. Automated Report Generation
LLMs can generate financial reports by combining structured data (financial statements, KPIs) with natural language analysis. Investment banks, asset managers, and corporate finance teams use these systems to produce first drafts of earnings summaries, market commentaries, and client reports, reducing the time from data availability to published analysis from hours to minutes. The hallucination mitigation strategies from Section 32.2 are particularly important here, where factual errors can have financial consequences. Code Fragment 28.2.2 below puts this into practice.
# Implementation example
# Key operations: forward pass computation, results display, API interaction
from openai import OpenAI
client = OpenAI()
# Financial data as context
financial_data = """
Q3 2025 Results for TechCorp Inc:
Revenue: $4.2B (vs $3.8B est.), +18% YoY
EPS: $2.15 (vs $1.90 est.)
Cloud segment: $1.8B (+32% YoY)
Operating margin: 28.5% (vs 26.1% prior year)
Guidance: Q4 revenue $4.4B-$4.6B (est. $4.3B)
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a financial analyst. Write concise, factual earnings
summaries. Include key beats/misses, segment highlights, and forward
guidance. Use professional financial language. No speculation."""},
{"role": "user", "content": f"Write an earnings summary:\n{financial_data}"},
],
)
print(response.choices[0].message.content)
3. Trading Signals and Risk Analysis
LLMs can extract trading signals from news, social media, and regulatory filings. The pipeline typically involves: ingesting text streams in real time, extracting entities and events (earnings surprises, M&A activity, regulatory actions), scoring sentiment and magnitude, and generating structured signals that quantitative systems can act on. The challenge is latency, because in financial markets, information decays rapidly and milliseconds matter. Figure 28.2.1 traces the financial NLP signal generation pipeline. Code Fragment 28.2.2 below puts this into practice.
# implement extract_trading_signal
# Key operations: results display, API interaction
import json
from openai import OpenAI
client = OpenAI()
def extract_trading_signal(news_text: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "system",
"content": """Extract structured trading signals from financial news.
Return JSON with: ticker, event_type, sentiment (-1 to 1),
magnitude (low/medium/high), time_horizon (immediate/short/long),
confidence (0 to 1), and reasoning."""
}, {
"role": "user",
"content": news_text
}],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
signal = extract_trading_signal(
"Apple announces $100B share buyback program, largest in history"
)
print(json.dumps(signal, indent=2))
4. Fraud Detection and KYC/AML
LLMs assist in fraud detection by analyzing transaction narratives, customer communications, and account patterns. For Know Your Customer (KYC) and Anti-Money Laundering (AML), LLMs process adverse media screening, analyze complex corporate structures, and generate investigation summaries. They excel at reducing false positive rates in traditional rule-based systems by understanding the contextual nuances that distinguish legitimate transactions from suspicious activity.
5. Aspect-Based Sentiment Analysis (ABSA)
Standard sentiment analysis assigns a single polarity (positive, negative, neutral) to a document or sentence. In practice, a single review or earnings call transcript often expresses mixed sentiment across multiple topics. A customer might praise a product's battery life while criticizing its screen quality. An earnings call might report strong revenue growth alongside margin compression. Aspect-Based Sentiment Analysis (ABSA) addresses this limitation by extracting individual aspects from text, categorizing them, and assigning a separate sentiment polarity to each one.
5.1 The ABSA Pipeline
A complete ABSA pipeline consists of three stages. First, aspect extraction identifies the specific entities or features mentioned in the text (for example, "battery life," "screen quality," "customer service"). Second, aspect categorization maps extracted terms to a predefined taxonomy (for example, mapping "runs hot" to the "Thermal Performance" category). Third, sentiment classification determines the polarity and intensity of opinion for each extracted aspect. Traditional ABSA systems required separate models or rule sets for each stage. LLMs collapse these three stages into a single prompt, producing structured output that covers all three steps simultaneously.
| ABSA Approach | Aspect Extraction | Domain Adaptation | Setup Cost | Structured Output |
|---|---|---|---|---|
| Rule-based (patterns) | Predefined lists | Per-domain rules | High | Rigid templates |
| Fine-tuned BERT/RoBERTa | Sequence labeling | Labeled data needed | Medium | Requires post-processing |
| LLM zero-shot | Prompt-based | Instructions only | Low | Native JSON output |
| LLM few-shot | In-context examples | 3 to 5 examples | Low | Native JSON output |
5.2 LLM-Based ABSA with Structured Output
LLMs excel at zero-shot ABSA because they can follow detailed extraction instructions without any task-specific training data. By requesting structured JSON output, the model returns aspect, sentiment, and supporting evidence in a format that downstream systems can consume directly. Code Fragment 28.2.4 demonstrates this approach for product review analysis.
import json
from openai import OpenAI
client = OpenAI()
def extract_aspect_sentiments(review: str, domain: str = "product") -> dict:
"""Extract aspect-level sentiment from a review using an LLM."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"""You are an aspect-based sentiment analysis system for {domain} reviews.
Extract every distinct aspect mentioned in the review.
For each aspect, return a JSON object with:
- "aspect": the specific feature or attribute discussed
- "category": a normalized category (e.g., Performance, Design, Price, Service)
- "sentiment": one of "positive", "negative", or "neutral"
- "intensity": a float from 0.0 (weak) to 1.0 (strong)
- "evidence": the exact quote from the review supporting this judgment
Return a JSON object with key "aspects" containing an array of these objects."""},
{"role": "user", "content": review},
],
response_format={"type": "json_object"},
temperature=0.0,
)
return json.loads(response.choices[0].message.content)
# Example: analyze a product review with mixed sentiment
review = """The laptop's performance is outstanding for data science workloads,
and the keyboard feels premium. However, the fan noise is distractingly
loud under load, and at $2,400 it is overpriced compared to competitors
with similar specs. Battery life is acceptable at around 6 hours."""
result = extract_aspect_sentiments(review, domain="laptop")
for aspect in result["aspects"]:
print(f" {aspect['category']:>15} | {aspect['sentiment']:>8} ({aspect['intensity']:.1f}) | {aspect['aspect']}")
5.3 ABSA Applications
ABSA serves several high-value use cases across industries. In product review analysis, e-commerce platforms aggregate aspect-level sentiment across thousands of reviews to surface strengths and weaknesses per product feature, enabling both product teams and shoppers to make informed decisions. In customer feedback mining, support teams track sentiment trends by aspect over time, detecting emerging issues (such as a sudden spike in negative sentiment for "shipping speed") before they escalate. In brand monitoring, marketing teams compare aspect-level sentiment across competitors: "our battery sentiment is 78% positive versus the competitor's 52%, but their display sentiment beats ours by 20 points." In financial earnings analysis, the technique extends naturally to the financial domain covered earlier in this section; an analyst can decompose an earnings call into aspects like revenue growth, margin outlook, capital expenditure plans, and management confidence, with separate sentiment for each.
The power of LLM-based ABSA over traditional pipeline approaches lies in its ability to handle implicit aspects and domain transfer without retraining. A fine-tuned BERT model trained on restaurant reviews ("The pasta was bland") will not generalize to electronics reviews ("The speakers sound tinny") without new labeled data. An LLM handles both domains from the same prompt by adjusting the domain parameter and category taxonomy. This makes LLM-based ABSA especially valuable for organizations that need sentiment analysis across multiple product lines or business units.
Who: Product analytics team at a consumer electronics marketplace with 2 million monthly reviews
Situation: The team needed to understand which specific product attributes drove customer satisfaction and returns across 15,000 SKUs, but their existing sentiment system only produced a single score per review.
Problem: A product with 4.2 stars might have excellent performance ratings but terrible build quality. Aggregate sentiment hid these actionable details, making it impossible to give suppliers targeted improvement feedback.
Decision: The team deployed an LLM-based ABSA pipeline that extracted aspects and per-aspect sentiment from every review, aggregated results by product and category, and surfaced a dashboard showing the top three strengths and top three weaknesses per SKU.
How: Reviews were batched and processed nightly through GPT-4o-mini with structured output. A post-processing layer normalized aspect categories using a 50-category taxonomy. Results were stored in a data warehouse and served through an internal dashboard with time-series views.
Result: Supplier feedback became specific ("your battery sentiment dropped 15 points this quarter") instead of vague. Return rates fell 8% for products where suppliers acted on ABSA insights. The product team identified that "packaging quality" was the single most predictive aspect for negative reviews across all categories.
Lesson: Aspect-level sentiment transforms reviews from opaque ratings into actionable product intelligence; aggregate scores hide the details that drive purchasing decisions and return behavior.
Emotion vs. sentiment in financial contexts: In finance, emotion recognition adds value beyond sentiment. An earnings call where executives express confidence carries different implications than one expressing relief, even if both are classified as "positive" by a sentiment model. Detecting anxiety or evasiveness in management language can provide early warning signals that simple polarity scores miss entirely.
Financial applications of LLMs face stringent regulatory requirements. Models must be explainable (regulators need to understand why a decision was made), auditable (every prediction must be traceable), and free from protected-class bias. The EU AI Act classifies many financial AI systems as "high-risk," requiring conformity assessments and human oversight. SEC and FINRA (Financial Industry Regulatory Authority) regulations govern automated trading and investment advice. Always involve legal and compliance teams early when deploying LLMs in financial workflows. Figure 28.2.2 maps the regulatory landscape for financial LLM applications.
The most successful financial LLM deployments augment human analysts rather than replacing them. An LLM can process 500 earnings calls overnight and flag the 20 most significant changes for an analyst to review in the morning. This "AI as triage" pattern satisfies regulatory requirements for human oversight while dramatically improving analyst productivity. Pure automation of trading decisions remains limited by explainability requirements and the catastrophic risk profile of financial errors.
Who: Quantitative research team at a mid-size systematic hedge fund
Situation: The fund tracked SEC EDGAR filings (10-K, 10-Q, 8-K) for 3,000 US equities. Manually reading each filing took 2 to 4 hours, and important disclosures in 8-K filings (material events) needed same-day analysis to inform trading decisions.
Problem: The team could only cover the top 200 holdings with human analysts, missing material disclosures in the remaining 2,800 names until they appeared in news (typically 6 to 24 hours later).
Decision: They built a pipeline using FinGPT (fine-tuned on SEC filings) for initial extraction and GPT-4o for nuanced interpretation. The system polled EDGAR every 60 seconds for new filings.
How: New 8-K filings were parsed with sec-edgar-downloader, split into sections, and processed by FinGPT for entity and event extraction. A structured output schema captured: filing type, material changes, risk factors, forward guidance changes, and insider transactions. Filings flagged as "high impact" (guidance changes, M&A, restatements) were sent to GPT-4o for a detailed narrative summary with trading implications. The final output was a scored alert (1 to 10 urgency) delivered to the trading desk via Slack.
Result: Average detection-to-analysis time dropped from 8 hours to 4 minutes for high-impact filings. The fund identified 12 material 8-K filings in smaller-cap names during the first quarter that human analysts would have missed entirely. The hybrid FinGPT/GPT-4o approach kept API costs under $800/month for the full 3,000-name universe.
Lesson: In finance, the combination of a domain-specific model (FinGPT) for high-volume triage and a general-purpose model (GPT-4o) for nuanced interpretation mirrors the hybrid ML/LLM pattern and delivers the best cost-to-insight ratio.
Guardrails for financial LLM outputs. Financial regulators (SEC, FINRA, FCA) require auditability and explainability for any automated system that influences trading or advisory decisions. Production financial LLM systems should: (1) log every prompt and response with timestamps for audit trails; (2) include source citations for every factual claim (link to the specific SEC filing paragraph or data source); (3) add explicit confidence scores and flag low-confidence outputs for human review; (4) never generate forward-looking predictions without a disclaimer; (5) implement a "compliance filter" that checks outputs against a list of prohibited statements before delivery. Tools like guardrails-ai and NeMo Guardrails (NVIDIA) can enforce these constraints programmatically.
Who: Quantitative research team at a mid-size investment management firm
Situation: The team needed to analyze 3,000+ quarterly earnings call transcripts per season to extract sentiment signals, forward guidance changes, and management tone shifts across their coverage universe.
Problem: Human analysts could only cover 50 calls in depth per quarter. Rule-based keyword matching missed nuanced language like hedged optimism or confident understatement that carried significant signal.
Dilemma: Using a general-purpose LLM for financial sentiment produced frequent misclassifications (e.g., "aggressive growth" flagged as negative). Fine-tuning FinBERT was accurate but only produced sentiment labels without the explanations analysts needed.
Decision: The team deployed a two-stage pipeline: FinBERT for fast sentiment scoring across all transcripts, followed by FinGPT for detailed analysis and explanation of the top 200 highest-signal calls.
How: FinBERT processed all transcripts in under an hour, scoring each paragraph. Transcripts with sentiment shifts exceeding two standard deviations were routed to FinGPT, which generated structured reports highlighting specific statements, tone changes, and comparison to prior quarters.
Result: Coverage expanded from 50 to 3,000 companies per quarter. The sentiment signals showed a statistically significant 60-day predictive relationship with stock returns. Analyst productivity increased fourfold because they received pre-analyzed reports instead of raw transcripts.
Lesson: Domain-specific financial models are essential for accurate sentiment extraction; combining fast classifiers for triage with detailed LLM analysis for high-signal items balances coverage and depth.
Before writing a single line of application code, create 20 to 50 input/output pairs that define correct behavior. This test set guides development, catches regressions, and prevents the common trap of optimizing for vibes instead of measurable quality.
- Financial NLP requires domain-specific models (FinBERT, FinGPT) because general sentiment analysis misinterprets financial terminology.
- Automated report generation reduces the time from data availability to published analysis from hours to minutes, though human review remains essential.
- Trading signal extraction with LLMs can process vast text volumes but faces challenges in latency, reliability, and backtesting validation.
- KYC/AML applications benefit from LLM contextual understanding that reduces false positive rates in rule-based screening systems.
- Regulatory compliance (EU AI Act, SEC, FINRA) demands explainability, auditability, and human oversight for financial AI systems.
- "AI as triage" is the dominant deployment pattern: LLMs process at scale and flag items for human expert review.
- Aspect-Based Sentiment Analysis (ABSA) with LLMs extracts per-feature sentiment from reviews and earnings calls, turning aggregate scores into actionable product and market intelligence.
Multimodal ABSA extends aspect-based sentiment beyond text to incorporate images and video. Researchers are exploring models that can identify visual aspects (product color fading, physical damage) alongside textual complaints, producing unified aspect-sentiment maps from mixed-media reviews. Work on comparative ABSA goes further by detecting comparative opinions ("better screen than Brand X, but worse speakers") and structuring them into competitive intelligence.
Meanwhile, temporal ABSA tracks how sentiment for individual aspects evolves over time, enabling brands to measure the impact of product updates on specific features.
6. Emotion Recognition in Text
Sentiment analysis tells you whether text is positive or negative, but emotion recognition goes deeper, identifying the specific emotional state behind a piece of text. While sentiment operates on a simple polarity axis, emotions form a richer taxonomy: joy, anger, sadness, fear, surprise, disgust, and many finer-grained categories. This distinction matters for applications where understanding the "why" behind a reaction is as important as knowing whether the reaction is positive or negative.
6.1 Emotion Taxonomies and Datasets
The foundational emotion taxonomy comes from Paul Ekman's six basic emotions (joy, anger, sadness, fear, surprise, disgust), but modern NLP research has expanded well beyond this set. Google's GoEmotions dataset, built from 58,000 Reddit comments, defines 27 fine-grained emotion labels plus a neutral category, covering states like admiration, amusement, curiosity, confusion, disappointment, gratitude, and relief. The SemEval shared tasks on affect in tweets have similarly pushed the field toward nuanced emotion detection across multiple languages. These datasets provide the benchmarks against which both fine-tuned models and LLM-based approaches are evaluated.
Emotion detection is inherently more challenging than binary sentiment classification for several reasons. A single sentence can express multiple emotions simultaneously ("I'm thrilled about the promotion but terrified of the responsibility"). Cultural context shapes how emotions are expressed in text. Sarcasm and irony can mask the true underlying emotion. And the boundary between related emotions (annoyance versus anger, sadness versus disappointment) is often subjective.
6.2 LLM-Based Emotion Detection
LLMs excel at emotion recognition because they can leverage world knowledge, contextual understanding, and nuanced reasoning that fine-tuned classifiers lack. By requesting structured output, you can obtain not just the predicted emotion but also a confidence score and the specific textual evidence that supports the classification. This explainability is critical for applications in mental health monitoring and content moderation, where decisions must be justifiable. Code Fragment 28.2.3 demonstrates this approach.
# LLM-based emotion detection with structured output
# Returns emotion label, confidence, and supporting evidence
from openai import OpenAI
import json
client = OpenAI()
def detect_emotions(text: str, top_k: int = 3) -> list[dict]:
"""Detect emotions in text with confidence scores and evidence."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are an emotion detection system.
Analyze the input text and identify the top emotions present.
For each emotion, provide:
- "emotion": one of [joy, sadness, anger, fear, surprise, disgust,
admiration, amusement, confusion, curiosity, disappointment,
gratitude, relief, anxiety, annoyance, neutral]
- "confidence": float between 0.0 and 1.0
- "evidence": the specific phrase or context supporting this label
Return a JSON object with key "emotions" containing a list of objects.
Order by confidence descending."""},
{"role": "user", "content": text}
],
temperature=0.1,
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
return result["emotions"][:top_k]
# Example: customer feedback with mixed emotions
feedback = ("I waited three weeks for the delivery, which was incredibly "
"frustrating. But when I finally opened the package, the quality "
"blew me away. Honestly, I'm torn between recommending this "
"company and warning people about the shipping delays.")
emotions = detect_emotions(feedback)
for e in emotions:
print(f" {e['emotion']:>15} ({e['confidence']:.2f}): {e['evidence']}")
6.3 Applications of Emotion Recognition
Emotion recognition powers several high-impact applications. In mental health monitoring, platforms can track emotional patterns in user-generated text (with appropriate consent and privacy safeguards) to identify signs of depression, anxiety, or crisis situations that warrant intervention. In customer experience analytics, emotion detection goes beyond "positive/negative" to reveal whether customers feel confused (indicating UX problems), frustrated (indicating process friction), or grateful (indicating successful resolution), each of which demands a different organizational response. In content moderation, detecting anger, disgust, or fear helps platforms identify toxic or harmful content more accurately than keyword-based filters, particularly when harmful intent is expressed through indirect language. These applications connect naturally to the safety and ethics considerations in Chapter 32.
Show Answer
Show Answer
Show Answer
Show Answer
Show Answer
Real-time financial LLMs are a fast-moving research area. Researchers are exploring on-device financial models that can process market data with sub-100ms latency, enabling LLM-based trading strategies that were previously too slow.
Work on financial reasoning models (building on the chain-of-thought advances in Section 11.5) aims to produce explainable investment theses that satisfy both regulatory requirements and portfolio managers. New multimodal financial models can analyze charts, tables, and text simultaneously from earnings presentations, closing the gap between how analysts and AI systems process financial information.
Exercises
Write a Python function that uses an LLM to classify financial news headlines as positive, negative, or neutral for a given stock. Include confidence scores and test with 5 example headlines.
Answer Sketch
Send each headline with the prompt: 'Classify this financial headline for [stock] as positive, negative, or neutral. Return JSON with sentiment and confidence (0 to 1).' Use structured output. Test with headlines like 'Apple beats Q4 earnings expectations' (positive), 'FDA delays drug approval for Pfizer' (negative). Compare results with a domain-specific model like FinBERT for validation.
Design a prompt that takes a JSON object of financial metrics (revenue, expenses, profit margin, YoY growth) and generates a quarterly earnings summary paragraph suitable for investor communications.
Answer Sketch
The prompt should include the metrics and instructions to: write in formal financial reporting style, highlight key trends, compare to previous quarters if data is provided, note any concerning metrics, and keep the summary to one paragraph. Include a constraint: do not make claims not supported by the data. Test with sample Q3 vs Q4 data.
Discuss the limitations and risks of using LLMs to generate trading signals. Why should LLM-based signals be treated as one input among many rather than as standalone trading advice?
Answer Sketch
Limitations: LLMs have knowledge cutoffs and may not reflect the latest market conditions. They can hallucinate correlations. They cannot process real-time market data. They may reflect biases from training data (survivorship bias in financial narratives). Risks: over-reliance on a single model, regulatory exposure (SEC has rules about automated trading advice). LLM signals should complement quantitative models, fundamental analysis, and human judgment.
Implement aspect-based sentiment analysis for an earnings call transcript. Extract sentiments for specific aspects: revenue, margins, guidance, and competition. Return a structured report.
Answer Sketch
Split the transcript into paragraphs. For each paragraph, identify which aspects are discussed (use keyword matching or an LLM classifier). For each aspect mention, extract the sentiment and a supporting quote. Aggregate sentiments per aspect across the full transcript. Output: {aspect: {sentiment: pos/neg/neutral, confidence: float, supporting_quotes: [str]}}.
How can LLMs assist in fraud detection and KYC/AML processes? What are the risks of using LLMs for compliance-critical tasks, and what safeguards are needed?
Answer Sketch
LLMs can: analyze transaction narratives for suspicious patterns, extract entities from documents for KYC verification, summarize suspicious activity reports, and translate compliance rules into monitoring queries. Risks: hallucinated findings (false positives or missed fraud), lack of auditability (regulators need explainable decisions), and liability for missed fraud. Safeguards: human review of all flagged cases, audit logging, regular model validation against known fraud cases.
What Comes Next
In the next section, Section 28.3: Healthcare & Biomedical AI, we examine healthcare and biomedical AI, where LLMs assist with clinical decisions, drug discovery, and medical documentation.
Bibliography
Yang, H., Liu, X.Y., & Wang, C.D. (2023). "FinGPT: Open-Source Financial Large Language Models." arXiv:2306.06031
Wu, S., Irsoy, O., Lu, S., et al. (2023). "BloombergGPT: A Large Language Model for Finance." arXiv:2303.17564
Araci, D. (2019). "FinBERT: Financial Sentiment Analysis with Pre-trained Language Models." arXiv:1908.10063
Lopez-Lira, A. & Tang, Y. (2023). "Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models." arXiv:2304.07619
Pontiki, M., Galanis, D., Pavlopoulos, J., et al. (2016). "SemEval-2016 Task 5: Aspect Based Sentiment Analysis." ACL Anthology: S16-1002
Zhang, W., Deng, Y., Li, B., et al. (2023). "A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges." arXiv:2203.01054
Scaria, K., Gupta, H., Goyal, S., et al. (2024). "InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis." arXiv:2302.08624
Xie, Q., Han, W., Zhang, X., et al. (2023). "PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance." arXiv:2306.05443
