Section 28.4: LLM-Powered Recommendation & Search

"You do not need to know what you want. Just tell me how you feel, and I will find it for you."
Deploy, Emotionally Perceptive AI Agent

Big Picture

LLMs are transforming both search and recommendation from retrieval problems into reasoning problems. Traditional search returns ranked documents matching keywords. LLM-powered search (Perplexity, Google AI Overviews) understands intent, synthesizes information across sources, and generates direct answers with citations. Similarly, traditional recommendation relies on collaborative filtering and content-based features, while LLM-powered recommendation understands nuanced preferences expressed in natural language and can explain its reasoning. This shift from pattern matching to comprehension represents a fundamental change in how users discover information and products. The embedding and retrieval infrastructure from Chapter 19 powers the semantic search capabilities that underpin these systems.

Prerequisites

This section builds on the application foundations from Section 28.1 through Section 28.3. Understanding conversational AI patterns from Section 21.1 and agent architectures from Section 22.1 provides essential context.

1. LLMs as Recommendation Engines

LLMs can serve as recommendation engines by leveraging their world knowledge and reasoning abilities.

Fun Fact

Traditional recommendation systems need thousands of user interactions to learn your preferences. An LLM can infer that someone who likes "Dune" probably does not want "The Notebook" from a single sentence. Collaborative filtering took 20 years to achieve worse cold-start performance.

Given a description of user preferences, past interactions, and a catalog of items, an LLM can generate personalized recommendations with natural language explanations. This approach excels for cold-start scenarios (new users with no history) and for nuanced preferences that are difficult to capture with traditional feature vectors. Code Fragment 28.4.2 below puts this into practice.


# implement recommend_items
# Key operations: visualization, API interaction
from openai import OpenAI
import json

client = OpenAI()

def recommend_items(user_profile: str, catalog: list, n: int = 5) -> dict:
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[
 {"role": "system", "content": f"""You are a recommendation engine.
Given a user profile and catalog, recommend {n} items.
Return JSON with 'recommendations' array, each having:
'item_id', 'score' (0-1), 'reasoning' (brief explanation)."""},
 {"role": "user", "content": f"""User Profile: {user_profile}
Catalog: {json.dumps(catalog)}"},
 ],
 response_format={"type": "json_object"},
 )
 return json.loads(response.choices[0].message.content)

recs = recommend_items(
 user_profile="Enjoys sci-fi with strong worldbuilding, dislikes romance subplots",
 catalog=[
 {"id": "b1", "title": "Dune", "genre": "sci-fi"},
 {"id": "b2", "title": "The Notebook", "genre": "romance"},
 {"id": "b3", "title": "Neuromancer", "genre": "sci-fi"},
 ],
)

Great choice! To help narrow down the best laptop for your data science work, I have a few questions: 1. What's your budget range? 2. Will you be training large models locally, or mostly using cloud compute? 3. Do you have a preference for screen size (portability vs. display area)? 4. Any specific software requirements (CUDA for GPU computing, Docker, etc.)?

Code Fragment 28.4.1: implement recommend_items

Tip

For recommendation systems, log the LLM's reasoning alongside its suggestions, not just the final ranked list. When a recommendation fails ("Why did it suggest a horror movie to someone who only watches comedies?"), the reasoning trace lets you diagnose whether the problem was in the user preference model, the candidate retrieval, or the LLM's ranking logic. Without traces, debugging recommendation quality becomes guesswork.

2. LLM-Powered Search

LLM-powered search systems like Perplexity represent a paradigm shift from "ten blue links" to direct answers with cited sources. The architecture combines a search engine (retrieving relevant web pages), a reader model (extracting key information from each source), and a generator model (synthesizing a coherent answer with inline citations). This is essentially RAG (Chapter 20) applied at web scale. Figure 28.4.1 shows the LLM-powered search architecture. Code Fragment 28.4.3 below puts this into practice.

Figure 28.4.1: LLM-powered search architecture. A user query is processed, expanded into sub-queries, and used to retrieve from multiple sources (web, knowledge base, academic papers). A reader extracts and ranks relevant passages, then an LLM synthesizes a cited answer.

# Building a simple LLM-powered search with RAG
from openai import OpenAI
import requests

client = OpenAI()

def llm_search(query: str, search_results: list) -> str:
 # Format search results as context
 context = "\n\n".join([
 f"[Source {i+1}] {r['title']}\nURL: {r['url']}\n{r['snippet']}"
 for i, r in enumerate(search_results)
 ])

 response = client.chat.completions.create(
 model="gpt-4o",
 messages=[
 {"role": "system", "content": """Answer the user's query using the provided sources.
Cite sources inline using [Source N] notation. Be concise and factual.
If sources conflict, note the disagreement."""},
 {"role": "user", "content": f"Query: {query}\n\nSources:\n{context}"},
 ],
 )
 return response.choices[0].message.content

Code Fragment 28.4.2: Building semantic search with RAG that retrieves relevant documents and generates natural-language answers with source citations.

3. Conversational Recommendation

Conversational recommendation combines dialogue management with recommendation logic. Instead of a one-shot recommendation, the system engages in a multi-turn conversation to elicit preferences, clarify constraints, and refine suggestions. This is particularly valuable for high-consideration purchases (electronics, travel, real estate) where user needs are complex and evolving. Code Fragment 28.4.2 below puts this into practice.


# Define ConversationalRecommender; implement __init__, chat
# Key operations: results display, API interaction
from openai import OpenAI

client = OpenAI()

class ConversationalRecommender:
 def __init__(self, catalog_context: str):
 self.messages = [{
 "role": "system",
 "content": f"""You are a helpful product recommendation assistant.
Ask clarifying questions to understand user needs before recommending.
Available products:\n{catalog_context}
Always explain why each recommendation fits the user's stated needs."""
 }]

 def chat(self, user_message: str) -> str:
 self.messages.append({"role": "user", "content": user_message})
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=self.messages,
 )
 reply = response.choices[0].message.content
 self.messages.append({"role": "assistant", "content": reply})
 return reply

recommender = ConversationalRecommender(catalog_context="...")
print(recommender.chat("I need a laptop for data science work"))

Code Fragment 28.4.3: Define ConversationalRecommender; implement __init__, chat

Key Insight

The fundamental advantage of LLM-powered recommendation over traditional collaborative filtering is explainability and preference elicitation. An LLM can explain "I recommended this because you mentioned you prefer quiet keyboards, and this laptop has a low-profile mechanical keyboard" while collaborative filtering can only say "users like you also bought this." This explainability builds user trust and enables the system to correct misunderstandings through dialogue, creating a more effective recommendation loop.

4. User Preference Modeling

LLMs can build rich user preference models from natural language interactions, product reviews, and browsing histories. Rather than reducing preferences to sparse feature vectors, LLMs maintain a natural language summary of what the user likes, dislikes, and values. This "preference narrative" can be updated through conversation and used to condition future recommendations.

Approach Comparison

Approach	Cold Start	Explainability	Scale	Latency
Collaborative Filtering	Poor	Low	Excellent	Very fast
Content-Based	Good	Medium	Good	Fast
LLM Recommendation	Excellent	High	Limited	Slow
Hybrid (CF + LLM)	Good	High	Good	Moderate

When to Use What: Search and Recommendation Approaches

Keyword search (BM25) remains the right choice when users know exactly what they want and can express it precisely, such as searching for a specific product name or error code. Semantic search with embeddings (see Chapter 19) excels when users describe what they need in natural language but the exact terminology differs from the documents. LLM-powered search (Perplexity-style) is best for complex, multi-faceted questions requiring synthesis across sources. Collaborative filtering is unbeatable at scale when you have rich interaction data but poor for cold-start. LLM recommendation shines for cold-start, niche preferences, and explainability. In production, the hybrid approach (traditional retrieval for candidate generation, LLM for re-ranking and explanation) gives you the best of both worlds.

5. Automated Analytics and NL-to-Dashboard

One of the most transformative applications of LLMs in information access is the ability to translate natural language questions into data queries, execute those queries, generate visualizations, and narrate the results. This NL-to-Analytics pipeline democratizes data analysis by letting business users ask questions like "What were our top 5 products by revenue last quarter, broken down by region?" and receive a chart with an accompanying narrative, all without writing SQL or Python.

5.1 The NL-to-Analytics Pipeline

The full pipeline has four stages. First, the user poses a question in natural language. Second, the LLM translates that question into executable code, typically SQL for database queries or Python for statistical analysis. Third, the system executes the generated code against the data source and captures the results. Fourth, the LLM generates a visualization specification and a written narrative that explains the key insights. Each stage introduces potential errors, so production systems include validation checks between stages to catch hallucinated table names, syntactically invalid queries, and misleading chart configurations.

Figure 28.4.2: NL-to-Analytics pipeline. A natural language question is converted to SQL or code, executed against a data source, and the results are visualized with a narrative explanation. Validation checks at each stage guard against hallucinated queries and misleading charts.

5.2 Text-to-SQL with LLMs

The most critical step in the pipeline is translating natural language to SQL. This requires schema linking: the LLM must identify which database tables and columns correspond to the entities and attributes in the user's question. For instance, "revenue by region" must be mapped to the correct table (perhaps orders) and columns (total_amount, shipping_region). Modern approaches use few-shot prompting with example question-SQL pairs specific to the database schema. The DIN-SQL method (Pourreza and Rafiei, 2023) decomposes complex queries into sub-problems, solving schema linking, query classification, and SQL generation in separate stages. The DAIL-SQL approach (Gao et al., 2024) uses efficient example selection to choose the most informative few-shot examples based on query similarity, achieving state-of-the-art results on the Spider benchmark. These techniques connect directly to the Text-to-SQL coverage in Section 20.5.


import json
from openai import OpenAI

client = OpenAI()

# Database schema provided as context for schema linking
DB_SCHEMA = """Tables:
 orders(order_id, customer_id, product_id, total_amount, order_date, region)
 products(product_id, product_name, category, unit_price)
 customers(customer_id, name, segment, signup_date)"""

def nl_to_sql(question: str, schema: str = DB_SCHEMA) -> dict:
 """Translate a natural language question to SQL with explanation."""
 response = client.chat.completions.create(
 model="gpt-4o",
 messages=[
 {"role": "system", "content": f"""You are a SQL generation assistant. Given a database schema
and a natural language question, generate a valid SQL query.

Database schema:
{schema}

Return JSON with:
 "sql": the SQL query (read-only SELECT statements only)
 "explanation": brief description of what the query does
 "tables_used": list of tables referenced
 "assumptions": any assumptions made about the question"""},
 {"role": "user", "content": question},
 ],
 response_format={"type": "json_object"},
 temperature=0.0,
 )
 return json.loads(response.choices[0].message.content)

result = nl_to_sql("What are the top 5 product categories by total revenue this year?")
print("SQL:", result["sql"])
print("Explanation:", result["explanation"])

SQL: SELECT p.category, SUM(o.total_amount) AS total_revenue FROM orders o JOIN products p ON o.product_id = p.product_id WHERE o.order_date >= '2025-01-01' GROUP BY p.category ORDER BY total_revenue DESC LIMIT 5 Explanation: Joins orders with products, filters to current year, groups by category, and returns the top 5 by total revenue.

Code Fragment 28.4.4: Implementation of nl_to_sql

5.3 NL-to-Visualization

Once query results are available, the LLM can generate visualization specifications. The most common approach uses declarative formats like Vega-Lite (a JSON-based grammar for interactive graphics) or matplotlib code. The Chat2Vis system (Maddigan and Susnjak, 2023) demonstrated that LLMs can select appropriate chart types, map data fields to visual channels, and configure axes and legends from a natural language description alone. The NL4DV framework (Narechania et al., 2021) takes a more structured approach, decomposing the visualization generation into attribute inference, task inference, and visualization design. In production, the LLM examines the shape of the query results (number of dimensions, data types, cardinality) and selects a chart type accordingly: bar charts for categorical comparisons, line charts for time series, scatter plots for correlations, and tables for detailed breakdowns.


def generate_chart_spec(question: str, query_results: list[dict]) -> dict:
 """Generate a Vega-Lite chart spec from query results."""
 response = client.chat.completions.create(
 model="gpt-4o",
 messages=[
 {"role": "system", "content": """Generate a Vega-Lite JSON specification for the data.
Choose the most appropriate chart type for the question and data shape.
Include: title, axis labels, color encoding if useful, and a tooltip.
Also include a key "narrative" field with 2 to 3 sentences summarizing
the main insight visible in the data."""},
 {"role": "user", "content": f"""Question: {question}
Data (first 5 rows): {json.dumps(query_results[:5])}
Total rows: {len(query_results)}"},
 ],
 response_format={"type": "json_object"},
 )
 return json.loads(response.choices[0].message.content)

# Example: visualize revenue by category
sample_data = [
 {"category": "Electronics", "total_revenue": 1250000},
 {"category": "Clothing", "total_revenue": 890000},
 {"category": "Home", "total_revenue": 720000},
]
spec = generate_chart_spec("Top product categories by revenue", sample_data)
print("Chart type:", spec.get("mark", "unknown"))
print("Narrative:", spec.get("narrative", ""))

Chart type: bar Narrative: Electronics leads revenue at $1.25M, followed by Clothing ($890K) and Home ($720K). Electronics generates 40% more revenue than the next category.

Code Fragment 28.4.5: Implementation of generate_chart_spec

5.4 Augmented Analytics Platforms

Several commercial products have integrated LLMs into their analytics workflows, creating what the industry calls "augmented analytics." Tableau AI (Salesforce) lets users type questions about their data and receive auto-generated visualizations with narrative explanations. ThoughtSpot Sage uses an LLM to translate natural language into its proprietary search query language, enabling business users to explore data conversationally. Microsoft Copilot for Power BI allows users to describe the report they want, and the system generates DAX queries, visualizations, and written summaries. These platforms share a common design pattern: the analyst-in-the-loop approach, where the LLM generates a first draft of the analysis and the human analyst reviews, refines, and validates before sharing. This mirrors the "AI as triage" pattern from Section 28.2 on financial applications.

Platform Comparison

Platform	NL Interface	Query Language	Visualization	Narrative Gen
Tableau AI	Ask Data (chat)	VizQL	Native Tableau charts	Yes (Explain Data)
ThoughtSpot Sage	Search bar + chat	TQL (proprietary)	Auto-selected charts	Yes
Power BI Copilot	Chat panel	DAX / M	Power BI visuals	Yes (summaries)
Custom (LLM + SQL)	Any chat UI	SQL / Python	Vega-Lite / matplotlib	LLM-generated

5.5 Challenges in NL-to-Analytics

The NL-to-Analytics pipeline faces several significant challenges. Hallucinated SQL is the most dangerous: the LLM may reference tables or columns that do not exist, join tables on incorrect keys, or apply aggregate functions incorrectly. Schema validation (checking that all referenced objects exist) and result sanity checks (verifying row counts and value ranges) are essential guardrails. Schema misinterpretation occurs when column names are ambiguous; a column named date could refer to order date, ship date, or creation date, and the LLM must disambiguate from context or ask clarifying questions. Data confidentiality is a concern when using cloud LLM APIs: sending database schemas and query results to external services may violate data governance policies, pushing some organizations toward self-hosted models. Finally, trust in automated insights remains a barrier, because users may accept a generated chart at face value without verifying that the underlying query correctly represents their question. The analyst-in-the-loop pattern mitigates this risk by making review a required step.

Key Insight

The largest source of errors in NL-to-Analytics pipelines is not the SQL generation itself but the schema linking step. When an LLM misidentifies which table or column maps to the user's concept ("revenue" could be total_amount, net_revenue, or gross_sales), the generated SQL may be syntactically valid but semantically wrong, producing a convincing chart that answers the wrong question. Production systems address this by maintaining a semantic layer: a curated mapping of business terms to database objects that the LLM consults before generating queries. This is the same principle behind the metadata-enriched retrieval discussed in Section 20.3.

Real-World Scenario: NL-to-Dashboard for a Retail Analytics Team

Who: Business intelligence team at a national retail chain with 500+ stores

Situation: Regional managers needed weekly performance reports but lacked SQL skills. The BI team of four analysts was the bottleneck, with a 2-week backlog of ad-hoc report requests.

Problem: Each new question ("How did promotions perform in the Southeast last month?") required an analyst to write SQL, build a chart, and write commentary. By the time reports were delivered, the business had moved on to new questions.

Decision: The team built an internal NL-to-Dashboard tool using GPT-4o for SQL generation, a read-only database connection for safe execution, and Vega-Lite for visualization. A semantic layer mapped 200 common business terms to their database representations.

How: Regional managers typed questions into a chat interface. The system generated SQL, executed it against a read-replica, displayed a chart, and provided a 2-sentence narrative. All generated SQL was logged and the semantic layer was updated weekly based on correction patterns.

Result: Ad-hoc report backlog dropped from 2 weeks to zero. Regional managers answered 85% of their data questions without BI team involvement. SQL accuracy reached 91% after three months of semantic layer refinement. The BI team shifted from report production to strategic analysis projects.

Lesson: The semantic layer (mapping business terms to database objects) is the critical success factor for NL-to-Analytics; without it, even the best LLM will generate plausible but incorrect queries.

Key Insight

The map-reduce pattern is not limited to text analytics. It applies to any LLM task where the input corpus exceeds the context window: document summarization, competitive intelligence gathering, legal discovery, and research literature review. The critical design decision is what structure to impose in the map phase. Overly rigid schemas miss unexpected insights; overly open prompts produce summaries that are difficult to aggregate. A practical middle ground is to define required fields (sentiment, themes) while including an open-ended "other notable observations" field that captures surprises.

Warning

LLM-based recommendation faces significant scalability challenges. Generating a personalized recommendation for each user request requires an LLM inference call, which is orders of magnitude slower and more expensive than a collaborative filtering lookup. Production systems address this through caching (pre-compute recommendations for popular queries), hybrid architectures (use CF for candidate generation, LLM for re-ranking and explanation), and batching (generate recommendations in bulk during off-peak hours).

Production Tip

Building a Perplexity-style search pipeline with open-source tools (2025). You can build an LLM-powered search system using entirely open-source components. For retrieval: use Serper or Tavily APIs for web search, combined with a local vector store (Qdrant, Milvus) for domain-specific documents. For reading and extraction: use Jina Reader or Firecrawl to convert web pages to clean markdown that LLMs can process. For synthesis: use any frontier model with a structured prompt that requires inline citations. Key implementation details: (1) rewrite the user query into 2 to 3 search sub-queries to improve recall; (2) deduplicate retrieved passages before sending to the LLM to avoid redundancy; (3) include source URLs in the context so the model can cite them; (4) use streaming responses so the user sees results as they generate. The LangGraph framework (from LangChain) provides a built-in "research assistant" template that implements this pattern with configurable retriever and LLM components. For latency-sensitive applications, pre-fetch search results during query processing and use speculative decoding to accelerate the synthesis step.

Real-World Scenario: Conversational Product Search at an Online Retailer

Who: Search and discovery team at a specialty outdoor gear retailer

Situation: Customers often searched with complex, intent-rich queries like "waterproof hiking boots for wide feet, good ankle support, under $200" that traditional faceted search handled poorly.

Problem: Keyword search returned too many irrelevant results. Customers had to manually apply 5+ filters, and 40% abandoned the search before finding a product.

Decision: The team built a conversational search layer using GPT-4o mini for query understanding and a hybrid retrieval system combining BM25 keyword search with embedding-based semantic search over product descriptions.

How: The LLM parsed natural language queries into structured filters (category, price range, features), generated embedding queries for semantic matching, and produced a ranked list with natural language explanations ("This boot has a wide toe box and waterproof Gore-Tex membrane, matching your requirements"). Follow-up questions like "do they come in green?" maintained conversation context.

Result: Search-to-purchase conversion improved by 28%. Average session time decreased by 35% (customers found products faster). The conversational interface handled 73% of queries without requiring manual filter adjustments.

Lesson: LLM-powered search excels when user intent is complex and multi-dimensional; the key is combining semantic understanding with structured product metadata rather than relying on the LLM alone.

Real-World Scenario: Conversational Product Recommendation for Home Furnishing

Who: Product engineering team at an online furniture retailer with 50,000+ SKUs

Situation: Customers frequently abandoned the site because filtering by attributes (color, size, material, price) was insufficient for subjective needs like "a cozy reading nook for a small apartment."

Problem: Traditional collaborative filtering recommended based on purchase history but could not understand nuanced preferences expressed in natural language. New customers had no history at all (cold-start problem).

Dilemma: Running an LLM over the full 50,000-item catalog for every query was cost-prohibitive. But limiting recommendations to pre-filtered categories missed creative cross-category suggestions.

Decision: The team built a hybrid system: embedding-based retrieval narrowed candidates to 50 items, then an LLM re-ranked and explained recommendations through multi-turn conversation.

How: Product descriptions were embedded and indexed in a vector database. Customer queries retrieved the top 50 candidates by semantic similarity. An LLM then re-ranked these candidates based on the full conversational context, asked clarifying questions about space constraints and style preferences, and generated natural language explanations for each recommendation.

Result: Conversion rate increased 34% for customers who engaged with the conversational recommender. Average order value rose 22% because the LLM suggested complementary items with persuasive explanations. Cold-start customer satisfaction scores matched those of returning customers.

Lesson: Hybrid architectures that combine fast vector retrieval for candidate generation with LLM re-ranking and explanation deliver both scalability and the natural language understanding that customers expect.

Key Takeaways

LLM-powered search transforms retrieval into comprehension, synthesizing direct answers with citations rather than returning ranked links.
LLM recommendation excels at cold-start scenarios and nuanced preferences expressed in natural language, but faces scalability challenges.
Conversational recommendation uses multi-turn dialogue to elicit, clarify, and refine user preferences for complex decisions.
User preference modeling with LLMs maintains natural language preference narratives rather than sparse feature vectors.
Hybrid architectures combine fast traditional methods for candidate generation with LLMs for re-ranking and explanation.
Explainability is the fundamental advantage of LLM-based recommendation: users understand why items are recommended and can correct misunderstandings.
NL-to-Analytics pipelines translate natural language questions into SQL, execute queries, generate visualizations, and narrate insights, democratizing data access for non-technical users.
Schema linking (mapping business terms to database objects) is the critical success factor; a semantic layer prevents plausible but semantically wrong queries.

Research Frontier

Conversational data analysis extends NL-to-Analytics beyond single questions to multi-turn exploration sessions. Research on systems like Data-Copilot (Zhang et al., 2024) chains multiple queries together, with each question building on the context of previous results. Self-correcting SQL generation uses the LLM to analyze query execution errors (syntax errors, empty results, unexpected row counts) and automatically revise the query, reducing the need for human intervention.

Meanwhile, proactive analytics inverts the paradigm entirely: instead of waiting for questions, the LLM continuously monitors data streams and surfaces anomalies, trend changes, and noteworthy patterns before anyone asks.

6. Text Analytics at Scale

Beyond search, recommendation, and dashboards, LLMs unlock a category of application that was previously impractical: large-scale qualitative analysis. Surveys with thousands of open-ended responses, product review corpora with millions of entries, and support ticket archives spanning years all contain rich insights that traditional NLP (keyword extraction, topic modeling) captured only partially. LLMs can read, interpret, and synthesize unstructured text at a depth that approaches human analysis, at a fraction of the cost and time.

6.1 Survey Analysis and Review Mining

Consider a company that collects 10,000 open-ended survey responses after a product launch. A human analyst might read 200 and extrapolate. An LLM can process all 10,000, extracting themes, sentiment, specific feature requests, and emotional tone from each response. Similarly, review mining across e-commerce platforms extracts structured insights from unstructured text: which product features do customers love, which cause frustration, and how do opinions shift over time? The hybrid ML/LLM patterns from Section 12.3 apply directly here: use a small classifier to categorize responses into broad buckets (positive, negative, feature request, bug report), then use a larger LLM to synthesize narratives within each bucket.

6.2 The Map-Reduce Pattern for LLM Analytics

Processing thousands of documents through an LLM requires a systematic approach. The map-reduce pattern divides the work into two phases. In the map phase, each document (or chunk) is independently analyzed by the LLM, producing a structured summary: extracted themes, sentiment scores, key quotes, and entity mentions. In the reduce phase, the per-document summaries are aggregated into a final synthesis: ranked themes by frequency, sentiment trends, representative quotes for each theme, and actionable recommendations. This pattern parallelizes naturally (all map calls are independent) and keeps each LLM call within reasonable context limits.

# Map-reduce text analytics over a corpus from openai import OpenAI import json from concurrent.futures import ThreadPoolExecutor client = OpenAI() def map_analyze(text: str, doc_id: str) -> dict: """Map phase: analyze a single document.""" response = client.chat.completions.create( model="gpt-4o-mini", # small model for classification messages=[{ "role": "user", "content": ( "Analyze this customer review. Return JSON with:\n" "- sentiment: positive/negative/mixed\n" "- themes: list of 1-3 topic tags\n" "- key_quote: the most informative sentence\n" "- feature_requests: list (empty if none)\n\n" f"Review: {text}" ) }], response_format={"type": "json_object"}, temperature=0, ) result = json.loads(response.choices[0].message.content) result["doc_id"] = doc_id return result def reduce_synthesize(summaries: list[dict]) -> str: """Reduce phase: synthesize all map results into a report.""" summary_text = json.dumps(summaries, indent=1) response = client.chat.completions.create( model="gpt-4o", # large model for synthesis messages=[{ "role": "user", "content": ( "You are a product analytics expert. Given these per-review " "summaries, produce a report with:\n" "1. Top 5 themes ranked by frequency\n" "2. Overall sentiment breakdown (% positive/negative/mixed)\n" "3. Top feature requests with supporting quotes\n" "4. Three actionable recommendations\n\n" f"Review summaries:\n{summary_text}" ) }], temperature=0.3, ) return response.choices[0].message.content # Execute map phase in parallel reviews = [{"id": f"r{i}", "text": t} for i, t in enumerate(all_reviews)] with ThreadPoolExecutor(max_workers=10) as pool: futures = [pool.submit(map_analyze, r["text"], r["id"]) for r in reviews] mapped = [f.result() for f in futures] # Execute reduce phase report = reduce_synthesize(mapped) print(report)

## Customer Review Analysis Report **Sentiment Breakdown:** 62% positive, 24% negative, 14% mixed **Top 5 Themes:** 1. Battery Life (mentioned in 43% of reviews) 2. Performance/Speed (38%) 3. Customer Support (31%) 4. Build Quality (28%) 5. Price/Value (25%) **Top Feature Requests:** - "I wish there was a dark mode option" (12 mentions) - "Offline mode would be a game-changer" (9 mentions) ...

Code Fragment 28.4.6: Map-reduce text analytics over a corpus

6.3 Cost Optimization for Large-Scale Analytics

Processing 100,000 reviews at $0.15 per 1K input tokens (GPT-4o) would cost hundreds of dollars for the map phase alone. Cost-conscious pipelines use a tiered approach. The cheapest tier (GPT-4o-mini, Claude Haiku, or even a fine-tuned classifier) handles the high-volume map phase where each call is simple and structured. The mid-tier model handles the reduce phase, which requires more nuanced synthesis but processes far fewer tokens. The most expensive tier is reserved for edge cases: reviews flagged as ambiguous by the map phase, or topics where the initial synthesis was uncertain. This three-tier strategy can reduce costs by 80% compared to running the most capable model on every document, while maintaining comparable quality on the final report.

Self-Check

Q1: How does LLM-powered search differ from traditional keyword search?

Show Answer
Traditional search matches keywords to documents and returns ranked results. LLM-powered search understands query intent, expands or rewrites the query, retrieves relevant sources, and synthesizes a direct answer with citations. It transforms search from a retrieval problem into a comprehension and generation problem, providing answers rather than links.

Q2: Why are LLMs effective for cold-start recommendation scenarios?

Show Answer
Cold-start is when a new user has no interaction history, making collaborative filtering impossible. LLMs can leverage their world knowledge to make recommendations from a natural language description of preferences alone. A user saying "I enjoy complex strategy games with resource management" gives an LLM enough information to recommend relevant games, while collaborative filtering would have nothing to work with.

Q3: What is the architecture of an LLM-powered search system like Perplexity?

Show Answer
The architecture has three stages: (1) query understanding, where the LLM rewrites and expands the user's query into search-optimized forms; (2) retrieval, where a web search engine and/or knowledge base finds relevant sources; and (3) synthesis, where the LLM reads the retrieved content and generates a coherent answer with inline citations to specific sources. This is essentially RAG applied at web scale.

Q4: What advantage does conversational recommendation have over one-shot recommendation?

Show Answer
Conversational recommendation engages in multi-turn dialogue to elicit preferences, clarify constraints, and refine suggestions. This is valuable for complex decisions where user needs are nuanced and evolving. It can ask "Do you prioritize portability or screen size?" to disambiguate preferences, correct misunderstandings, and progressively narrow recommendations to an ideal match.

Q5: How do hybrid systems address the scalability limitations of LLM-based recommendation?

Show Answer
Hybrid systems use traditional methods (collaborative filtering, content-based) for fast candidate generation from large catalogs, then use LLMs for re-ranking the top candidates and generating natural language explanations. This provides the scalability of traditional methods with the explainability and preference understanding of LLMs, keeping the expensive LLM inference limited to a small number of pre-filtered candidates.

Research Frontier

Generative recommendation is emerging as a paradigm where LLMs generate item descriptions or even entire product concepts tailored to individual users, rather than selecting from existing catalogs. Research into preference alignment for recommendation (adapting the RLHF techniques from Section 17.1) trains recommendation LLMs to better match human preferences. Multimodal search systems that understand images, text, and voice queries simultaneously are blurring the line between search and conversation, with products like Google Lens + Gemini enabling "point and ask" discovery experiences.

Exercises

Exercise 28.4.1: LLM as Recommendation Engine Conceptual

How can an LLM be used as a recommendation engine? Compare the LLM-based approach with traditional collaborative filtering. What are the strengths and limitations of each?

Answer Sketch

LLMs can generate recommendations by reasoning about user preferences described in natural language, handling the cold-start problem well, and explaining their recommendations. Collaborative filtering excels with large interaction datasets and produces more statistically reliable recommendations. LLM limitations: no access to behavioral data (clicks, purchases), higher latency, and potential to hallucinate non-existent items. Best approach: combine both.

Exercise 28.4.2: Semantic Search Implementation Coding

Implement a simple LLM-powered search system that rewrites a user's natural language query into multiple search queries, retrieves results from each, and re-ranks the combined results.

Answer Sketch

Step 1: use an LLM to generate 3 query variations from the user's input (synonym expansion, specificity adjustment, related concepts). Step 2: execute each query against a search index. Step 3: deduplicate results. Step 4: re-rank using an LLM or a cross-encoder that scores relevance of each result to the original query. Return the top-k results with relevance scores.

Exercise 28.4.3: Conversational Recommendation Conceptual

Design a conversational recommendation system that progressively refines suggestions based on user feedback. How should the system balance exploration (showing diverse options) and exploitation (showing likely matches)?

Answer Sketch

The system asks clarifying questions to narrow preferences, presents diverse initial recommendations, and adapts based on feedback ('I like this one but want something cheaper'). Track user preferences as a running summary. For exploration/exploitation: start with diverse recommendations (explore), then narrow based on feedback (exploit), but periodically introduce a surprise option to discover new preferences the user has not expressed.

Exercise 28.4.4: NL-to-SQL for Analytics Coding

Write a function that converts a natural language analytics question into a SQL query, executes it, and returns both the data and a natural language summary of the results.

Answer Sketch

Provide the LLM with the database schema (table names, column names, types). Send the user's question and ask for a SQL query. Validate the query (no mutations, reasonable LIMIT). Execute against the database. Pass the results back to the LLM with the original question and ask for a plain-language summary. Include the SQL query in the response for transparency.

Exercise 28.4.5: User Preference Modeling Conceptual

Compare explicit preference modeling (user states preferences) with implicit preference modeling (inferred from behavior) in LLM-powered recommendation systems. What are the trade-offs?

Answer Sketch

Explicit: more accurate (the user tells you what they want) but requires user effort and may not capture unconscious preferences. Implicit: requires no user effort, captures actual behavior, but is noisier and harder to interpret. LLMs enable a hybrid approach: track implicit signals (click patterns, time spent) and use the LLM to synthesize them into a natural language preference profile that can be refined through conversation.

What Comes Next

In the next section, Section 28.5: Cybersecurity & LLMs, we explore cybersecurity applications of LLMs, from threat detection and analysis to defensive automation.

Bibliography

LLM Recommendation

Hou, Y., Zhang, J., Lin, Z., et al. (2024). "Large Language Models are Zero-Shot Rankers for Recommender Systems." arXiv:2305.08845

Demonstrates that LLMs can perform recommendation by ranking items without any task-specific training, using only natural language descriptions. Compares zero-shot, few-shot, and fine-tuned approaches across multiple recommendation benchmarks. Essential for understanding the paradigm shift from collaborative filtering to language-based recommendation.

LLM Recommendation

Bao, K., Zhang, J., Zhang, Y., et al. (2023). "TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation." arXiv:2305.00447

Introduces an efficient tuning framework that aligns LLMs with recommendation tasks using lightweight adapters. Shows how to bridge the gap between language modeling and item ranking objectives. Recommended for practitioners building production recommendation systems with LLMs.

LLM Recommendation

Surveys

Shi, Z., Wang, H., & Yin, H. (2024). "Large Language Models for Generative Recommendation: A Survey and Visionary Discussions." arXiv:2309.01157

Surveys the emerging field of generative recommendation where LLMs produce recommendations through text generation rather than scoring. Covers conversational recommendation, explanation generation, and hybrid architectures. Best starting point for researchers entering this rapidly evolving area.

Surveys

Wu, L., Zheng, Z., Qiu, Z., et al. (2024). "A Survey on Large Language Models for Recommendation." arXiv:2305.19860

Organizes the landscape of LLM-based recommendation into pre-training, fine-tuning, and prompting paradigms, with detailed comparison of approaches. Covers both academic research and industrial deployments. Useful for teams deciding which LLM recommendation architecture fits their use case.

Surveys

NL-to-Analytics & Text-to-SQL

Pourreza, M. & Rafiei, D. (2023). "DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction." arXiv:2304.11015

Introduces the DIN-SQL method that decomposes complex text-to-SQL into sub-problems (schema linking, query classification, SQL generation), achieving state-of-the-art results on the Spider benchmark. Covers the schema linking challenges that are central to reliable NL-to-SQL. Essential for practitioners building text-to-SQL systems.

NL-to-Analytics & Text-to-SQL

Gao, D., Wang, H., Li, Y., et al. (2024). "DAIL-SQL: Efficient Few-Shot Text-to-SQL with Question Representation." arXiv:2308.15363

Demonstrates that careful selection of few-shot examples based on query similarity dramatically improves text-to-SQL accuracy. Covers the example selection strategies and prompt organization that practitioners can apply directly. Recommended for teams optimizing SQL generation accuracy in production.

NL-to-Analytics & Text-to-SQL

Maddigan, P. & Susnjak, T. (2023). "Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex, and GPT-4." arXiv:2302.02094

Evaluates LLMs' ability to generate data visualizations from natural language, comparing chart type selection, axis configuration, and visual encoding accuracy across models. Provides practical insights into prompt design for visualization generation. Useful for teams building NL-to-dashboard interfaces.

NL-to-Analytics & Text-to-SQL

Narechania, A., Srinivasan, A., & Stasko, J. (2021). "NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries." IEEE TVCG 2021

Presents a structured approach to NL-to-visualization that decomposes the task into attribute inference, task inference, and visualization design. Covers the formal pipeline that preceded LLM-based approaches. Valuable background for understanding the design decisions in modern augmented analytics systems.

NL-to-Analytics & Text-to-SQL

Search & Retrieval

Zhu, Y., Yuan, H., Wang, S., et al. (2023). "Large Language Models for Information Retrieval: A Survey." arXiv:2308.07107

Comprehensive survey covering LLMs across the full search pipeline: query understanding, document indexing, retrieval, re-ranking, and answer generation. Maps the architectural landscape of LLM-powered search systems. Valuable for search engineers evaluating LLM integration strategies.

Search & Retrieval

Healthcare & Biomedical AI Chapter 28: LLM Applications Across Industries Cybersecurity & LLMs

Fifth Edition, 2026 · Contents