"You do not need to know what you want. Just tell me how you feel, and I will find it for you."
Deploy, Emotionally Perceptive AI Agent
LLMs are transforming both search and recommendation from retrieval problems into reasoning problems. Traditional search returns ranked documents matching keywords. LLM-powered search (Perplexity, Google AI Overviews) understands intent, synthesizes information across sources, and generates direct answers with citations. Similarly, traditional recommendation relies on collaborative filtering and content-based features, while LLM-powered recommendation understands nuanced preferences expressed in natural language and can explain its reasoning. This shift from pattern matching to comprehension represents a fundamental change in how users discover information and products. The embedding and retrieval infrastructure from Chapter 19 powers the semantic search capabilities that underpin these systems.
Prerequisites
This section builds on the application foundations from Section 28.1 through Section 28.3. Understanding conversational AI patterns from Section 21.1 and agent architectures from Section 22.1 provides essential context.
1. LLMs as Recommendation Engines
LLMs can serve as recommendation engines by leveraging their world knowledge and reasoning abilities.
Traditional recommendation systems need thousands of user interactions to learn your preferences. An LLM can infer that someone who likes "Dune" probably does not want "The Notebook" from a single sentence. Collaborative filtering took 20 years to achieve worse cold-start performance.
Given a description of user preferences, past interactions, and a catalog of items, an LLM can generate personalized recommendations with natural language explanations. This approach excels for cold-start scenarios (new users with no history) and for nuanced preferences that are difficult to capture with traditional feature vectors. Code Fragment 28.4.2 below puts this into practice.
# implement recommend_items
# Key operations: visualization, API interaction
from openai import OpenAI
import json
client = OpenAI()
def recommend_items(user_profile: str, catalog: list, n: int = 5) -> dict:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"""You are a recommendation engine.
Given a user profile and catalog, recommend {n} items.
Return JSON with 'recommendations' array, each having:
'item_id', 'score' (0-1), 'reasoning' (brief explanation)."""},
{"role": "user", "content": f"""User Profile: {user_profile}
Catalog: {json.dumps(catalog)}"},
],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
recs = recommend_items(
user_profile="Enjoys sci-fi with strong worldbuilding, dislikes romance subplots",
catalog=[
{"id": "b1", "title": "Dune", "genre": "sci-fi"},
{"id": "b2", "title": "The Notebook", "genre": "romance"},
{"id": "b3", "title": "Neuromancer", "genre": "sci-fi"},
],
)
For recommendation systems, log the LLM's reasoning alongside its suggestions, not just the final ranked list. When a recommendation fails ("Why did it suggest a horror movie to someone who only watches comedies?"), the reasoning trace lets you diagnose whether the problem was in the user preference model, the candidate retrieval, or the LLM's ranking logic. Without traces, debugging recommendation quality becomes guesswork.
2. LLM-Powered Search
LLM-powered search systems like Perplexity represent a paradigm shift from "ten blue links" to direct answers with cited sources. The architecture combines a search engine (retrieving relevant web pages), a reader model (extracting key information from each source), and a generator model (synthesizing a coherent answer with inline citations). This is essentially RAG (Chapter 20) applied at web scale. Figure 28.4.1 shows the LLM-powered search architecture. Code Fragment 28.4.3 below puts this into practice.
# Building a simple LLM-powered search with RAG
from openai import OpenAI
import requests
client = OpenAI()
def llm_search(query: str, search_results: list) -> str:
# Format search results as context
context = "\n\n".join([
f"[Source {i+1}] {r['title']}\nURL: {r['url']}\n{r['snippet']}"
for i, r in enumerate(search_results)
])
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """Answer the user's query using the provided sources.
Cite sources inline using [Source N] notation. Be concise and factual.
If sources conflict, note the disagreement."""},
{"role": "user", "content": f"Query: {query}\n\nSources:\n{context}"},
],
)
return response.choices[0].message.content
3. Conversational Recommendation
Conversational recommendation combines dialogue management with recommendation logic. Instead of a one-shot recommendation, the system engages in a multi-turn conversation to elicit preferences, clarify constraints, and refine suggestions. This is particularly valuable for high-consideration purchases (electronics, travel, real estate) where user needs are complex and evolving. Code Fragment 28.4.2 below puts this into practice.
# Define ConversationalRecommender; implement __init__, chat
# Key operations: results display, API interaction
from openai import OpenAI
client = OpenAI()
class ConversationalRecommender:
def __init__(self, catalog_context: str):
self.messages = [{
"role": "system",
"content": f"""You are a helpful product recommendation assistant.
Ask clarifying questions to understand user needs before recommending.
Available products:\n{catalog_context}
Always explain why each recommendation fits the user's stated needs."""
}]
def chat(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=self.messages,
)
reply = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": reply})
return reply
recommender = ConversationalRecommender(catalog_context="...")
print(recommender.chat("I need a laptop for data science work"))
The fundamental advantage of LLM-powered recommendation over traditional collaborative filtering is explainability and preference elicitation. An LLM can explain "I recommended this because you mentioned you prefer quiet keyboards, and this laptop has a low-profile mechanical keyboard" while collaborative filtering can only say "users like you also bought this." This explainability builds user trust and enables the system to correct misunderstandings through dialogue, creating a more effective recommendation loop.
4. User Preference Modeling
LLMs can build rich user preference models from natural language interactions, product reviews, and browsing histories. Rather than reducing preferences to sparse feature vectors, LLMs maintain a natural language summary of what the user likes, dislikes, and values. This "preference narrative" can be updated through conversation and used to condition future recommendations.
| Approach | Cold Start | Explainability | Scale | Latency |
|---|---|---|---|---|
| Collaborative Filtering | Poor | Low | Excellent | Very fast |
| Content-Based | Good | Medium | Good | Fast |
| LLM Recommendation | Excellent | High | Limited | Slow |
| Hybrid (CF + LLM) | Good | High | Good | Moderate |
Keyword search (BM25) remains the right choice when users know exactly what they want and can express it precisely, such as searching for a specific product name or error code. Semantic search with embeddings (see Chapter 19) excels when users describe what they need in natural language but the exact terminology differs from the documents. LLM-powered search (Perplexity-style) is best for complex, multi-faceted questions requiring synthesis across sources. Collaborative filtering is unbeatable at scale when you have rich interaction data but poor for cold-start. LLM recommendation shines for cold-start, niche preferences, and explainability. In production, the hybrid approach (traditional retrieval for candidate generation, LLM for re-ranking and explanation) gives you the best of both worlds.
5. Automated Analytics and NL-to-Dashboard
One of the most transformative applications of LLMs in information access is the ability to translate natural language questions into data queries, execute those queries, generate visualizations, and narrate the results. This NL-to-Analytics pipeline democratizes data analysis by letting business users ask questions like "What were our top 5 products by revenue last quarter, broken down by region?" and receive a chart with an accompanying narrative, all without writing SQL or Python.
5.1 The NL-to-Analytics Pipeline
The full pipeline has four stages. First, the user poses a question in natural language. Second, the LLM translates that question into executable code, typically SQL for database queries or Python for statistical analysis. Third, the system executes the generated code against the data source and captures the results. Fourth, the LLM generates a visualization specification and a written narrative that explains the key insights. Each stage introduces potential errors, so production systems include validation checks between stages to catch hallucinated table names, syntactically invalid queries, and misleading chart configurations.
5.2 Text-to-SQL with LLMs
The most critical step in the pipeline is translating natural language to SQL. This requires schema linking: the LLM must identify which database tables and columns correspond to the entities and attributes in the user's question. For instance, "revenue by region" must be mapped to the correct table (perhaps orders) and columns (total_amount, shipping_region). Modern approaches use few-shot prompting with example question-SQL pairs specific to the database schema. The DIN-SQL method (Pourreza and Rafiei, 2023) decomposes complex queries into sub-problems, solving schema linking, query classification, and SQL generation in separate stages. The DAIL-SQL approach (Gao et al., 2024) uses efficient example selection to choose the most informative few-shot examples based on query similarity, achieving state-of-the-art results on the Spider benchmark. These techniques connect directly to the Text-to-SQL coverage in Section 20.5.
import json
from openai import OpenAI
client = OpenAI()
# Database schema provided as context for schema linking
DB_SCHEMA = """Tables:
orders(order_id, customer_id, product_id, total_amount, order_date, region)
products(product_id, product_name, category, unit_price)
customers(customer_id, name, segment, signup_date)"""
def nl_to_sql(question: str, schema: str = DB_SCHEMA) -> dict:
"""Translate a natural language question to SQL with explanation."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"""You are a SQL generation assistant. Given a database schema
and a natural language question, generate a valid SQL query.
Database schema:
{schema}
Return JSON with:
"sql": the SQL query (read-only SELECT statements only)
"explanation": brief description of what the query does
"tables_used": list of tables referenced
"assumptions": any assumptions made about the question"""},
{"role": "user", "content": question},
],
response_format={"type": "json_object"},
temperature=0.0,
)
return json.loads(response.choices[0].message.content)
result = nl_to_sql("What are the top 5 product categories by total revenue this year?")
print("SQL:", result["sql"])
print("Explanation:", result["explanation"])
5.3 NL-to-Visualization
Once query results are available, the LLM can generate visualization specifications. The most common approach uses declarative formats like Vega-Lite (a JSON-based grammar for interactive graphics) or matplotlib code. The Chat2Vis system (Maddigan and Susnjak, 2023) demonstrated that LLMs can select appropriate chart types, map data fields to visual channels, and configure axes and legends from a natural language description alone. The NL4DV framework (Narechania et al., 2021) takes a more structured approach, decomposing the visualization generation into attribute inference, task inference, and visualization design. In production, the LLM examines the shape of the query results (number of dimensions, data types, cardinality) and selects a chart type accordingly: bar charts for categorical comparisons, line charts for time series, scatter plots for correlations, and tables for detailed breakdowns.
def generate_chart_spec(question: str, query_results: list[dict]) -> dict:
"""Generate a Vega-Lite chart spec from query results."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """Generate a Vega-Lite JSON specification for the data.
Choose the most appropriate chart type for the question and data shape.
Include: title, axis labels, color encoding if useful, and a tooltip.
Also include a key "narrative" field with 2 to 3 sentences summarizing
the main insight visible in the data."""},
{"role": "user", "content": f"""Question: {question}
Data (first 5 rows): {json.dumps(query_results[:5])}
Total rows: {len(query_results)}"},
],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
# Example: visualize revenue by category
sample_data = [
{"category": "Electronics", "total_revenue": 1250000},
{"category": "Clothing", "total_revenue": 890000},
{"category": "Home", "total_revenue": 720000},
]
spec = generate_chart_spec("Top product categories by revenue", sample_data)
print("Chart type:", spec.get("mark", "unknown"))
print("Narrative:", spec.get("narrative", ""))
5.4 Augmented Analytics Platforms
Several commercial products have integrated LLMs into their analytics workflows, creating what the industry calls "augmented analytics." Tableau AI (Salesforce) lets users type questions about their data and receive auto-generated visualizations with narrative explanations. ThoughtSpot Sage uses an LLM to translate natural language into its proprietary search query language, enabling business users to explore data conversationally. Microsoft Copilot for Power BI allows users to describe the report they want, and the system generates DAX queries, visualizations, and written summaries. These platforms share a common design pattern: the analyst-in-the-loop approach, where the LLM generates a first draft of the analysis and the human analyst reviews, refines, and validates before sharing. This mirrors the "AI as triage" pattern from Section 28.2 on financial applications.
| Platform | NL Interface | Query Language | Visualization | Narrative Gen |
|---|---|---|---|---|
| Tableau AI | Ask Data (chat) | VizQL | Native Tableau charts | Yes (Explain Data) |
| ThoughtSpot Sage | Search bar + chat | TQL (proprietary) | Auto-selected charts | Yes |
| Power BI Copilot | Chat panel | DAX / M | Power BI visuals | Yes (summaries) |
| Custom (LLM + SQL) | Any chat UI | SQL / Python | Vega-Lite / matplotlib | LLM-generated |
5.5 Challenges in NL-to-Analytics
The NL-to-Analytics pipeline faces several significant challenges. Hallucinated SQL is the most dangerous: the LLM may reference tables or columns that do not exist, join tables on incorrect keys, or apply aggregate functions incorrectly. Schema validation (checking that all referenced objects exist) and result sanity checks (verifying row counts and value ranges) are essential guardrails. Schema misinterpretation occurs when column names are ambiguous; a column named date could refer to order date, ship date, or creation date, and the LLM must disambiguate from context or ask clarifying questions. Data confidentiality is a concern when using cloud LLM APIs: sending database schemas and query results to external services may violate data governance policies, pushing some organizations toward self-hosted models. Finally, trust in automated insights remains a barrier, because users may accept a generated chart at face value without verifying that the underlying query correctly represents their question. The analyst-in-the-loop pattern mitigates this risk by making review a required step.
The largest source of errors in NL-to-Analytics pipelines is not the SQL generation itself but the schema linking step. When an LLM misidentifies which table or column maps to the user's concept ("revenue" could be total_amount, net_revenue, or gross_sales), the generated SQL may be syntactically valid but semantically wrong, producing a convincing chart that answers the wrong question. Production systems address this by maintaining a semantic layer: a curated mapping of business terms to database objects that the LLM consults before generating queries. This is the same principle behind the metadata-enriched retrieval discussed in Section 20.3.
Who: Business intelligence team at a national retail chain with 500+ stores
Situation: Regional managers needed weekly performance reports but lacked SQL skills. The BI team of four analysts was the bottleneck, with a 2-week backlog of ad-hoc report requests.
Problem: Each new question ("How did promotions perform in the Southeast last month?") required an analyst to write SQL, build a chart, and write commentary. By the time reports were delivered, the business had moved on to new questions.
Decision: The team built an internal NL-to-Dashboard tool using GPT-4o for SQL generation, a read-only database connection for safe execution, and Vega-Lite for visualization. A semantic layer mapped 200 common business terms to their database representations.
How: Regional managers typed questions into a chat interface. The system generated SQL, executed it against a read-replica, displayed a chart, and provided a 2-sentence narrative. All generated SQL was logged and the semantic layer was updated weekly based on correction patterns.
Result: Ad-hoc report backlog dropped from 2 weeks to zero. Regional managers answered 85% of their data questions without BI team involvement. SQL accuracy reached 91% after three months of semantic layer refinement. The BI team shifted from report production to strategic analysis projects.
Lesson: The semantic layer (mapping business terms to database objects) is the critical success factor for NL-to-Analytics; without it, even the best LLM will generate plausible but incorrect queries.
The map-reduce pattern is not limited to text analytics. It applies to any LLM task where the input corpus exceeds the context window: document summarization, competitive intelligence gathering, legal discovery, and research literature review. The critical design decision is what structure to impose in the map phase. Overly rigid schemas miss unexpected insights; overly open prompts produce summaries that are difficult to aggregate. A practical middle ground is to define required fields (sentiment, themes) while including an open-ended "other notable observations" field that captures surprises.
LLM-based recommendation faces significant scalability challenges. Generating a personalized recommendation for each user request requires an LLM inference call, which is orders of magnitude slower and more expensive than a collaborative filtering lookup. Production systems address this through caching (pre-compute recommendations for popular queries), hybrid architectures (use CF for candidate generation, LLM for re-ranking and explanation), and batching (generate recommendations in bulk during off-peak hours).
Building a Perplexity-style search pipeline with open-source tools (2025). You can build an LLM-powered search system using entirely open-source components. For retrieval: use Serper or Tavily APIs for web search, combined with a local vector store (Qdrant, Milvus) for domain-specific documents. For reading and extraction: use Jina Reader or Firecrawl to convert web pages to clean markdown that LLMs can process. For synthesis: use any frontier model with a structured prompt that requires inline citations. Key implementation details: (1) rewrite the user query into 2 to 3 search sub-queries to improve recall; (2) deduplicate retrieved passages before sending to the LLM to avoid redundancy; (3) include source URLs in the context so the model can cite them; (4) use streaming responses so the user sees results as they generate. The LangGraph framework (from LangChain) provides a built-in "research assistant" template that implements this pattern with configurable retriever and LLM components. For latency-sensitive applications, pre-fetch search results during query processing and use speculative decoding to accelerate the synthesis step.
Who: Search and discovery team at a specialty outdoor gear retailer
Situation: Customers often searched with complex, intent-rich queries like "waterproof hiking boots for wide feet, good ankle support, under $200" that traditional faceted search handled poorly.
Problem: Keyword search returned too many irrelevant results. Customers had to manually apply 5+ filters, and 40% abandoned the search before finding a product.
Decision: The team built a conversational search layer using GPT-4o mini for query understanding and a hybrid retrieval system combining BM25 keyword search with embedding-based semantic search over product descriptions.
How: The LLM parsed natural language queries into structured filters (category, price range, features), generated embedding queries for semantic matching, and produced a ranked list with natural language explanations ("This boot has a wide toe box and waterproof Gore-Tex membrane, matching your requirements"). Follow-up questions like "do they come in green?" maintained conversation context.
Result: Search-to-purchase conversion improved by 28%. Average session time decreased by 35% (customers found products faster). The conversational interface handled 73% of queries without requiring manual filter adjustments.
Lesson: LLM-powered search excels when user intent is complex and multi-dimensional; the key is combining semantic understanding with structured product metadata rather than relying on the LLM alone.
Who: Product engineering team at an online furniture retailer with 50,000+ SKUs
Situation: Customers frequently abandoned the site because filtering by attributes (color, size, material, price) was insufficient for subjective needs like "a cozy reading nook for a small apartment."
Problem: Traditional collaborative filtering recommended based on purchase history but could not understand nuanced preferences expressed in natural language. New customers had no history at all (cold-start problem).
Dilemma: Running an LLM over the full 50,000-item catalog for every query was cost-prohibitive. But limiting recommendations to pre-filtered categories missed creative cross-category suggestions.
Decision: The team built a hybrid system: embedding-based retrieval narrowed candidates to 50 items, then an LLM re-ranked and explained recommendations through multi-turn conversation.
How: Product descriptions were embedded and indexed in a vector database. Customer queries retrieved the top 50 candidates by semantic similarity. An LLM then re-ranked these candidates based on the full conversational context, asked clarifying questions about space constraints and style preferences, and generated natural language explanations for each recommendation.
Result: Conversion rate increased 34% for customers who engaged with the conversational recommender. Average order value rose 22% because the LLM suggested complementary items with persuasive explanations. Cold-start customer satisfaction scores matched those of returning customers.
Lesson: Hybrid architectures that combine fast vector retrieval for candidate generation with LLM re-ranking and explanation deliver both scalability and the natural language understanding that customers expect.
- LLM-powered search transforms retrieval into comprehension, synthesizing direct answers with citations rather than returning ranked links.
- LLM recommendation excels at cold-start scenarios and nuanced preferences expressed in natural language, but faces scalability challenges.
- Conversational recommendation uses multi-turn dialogue to elicit, clarify, and refine user preferences for complex decisions.
- User preference modeling with LLMs maintains natural language preference narratives rather than sparse feature vectors.
- Hybrid architectures combine fast traditional methods for candidate generation with LLMs for re-ranking and explanation.
- Explainability is the fundamental advantage of LLM-based recommendation: users understand why items are recommended and can correct misunderstandings.
- NL-to-Analytics pipelines translate natural language questions into SQL, execute queries, generate visualizations, and narrate insights, democratizing data access for non-technical users.
- Schema linking (mapping business terms to database objects) is the critical success factor; a semantic layer prevents plausible but semantically wrong queries.
Conversational data analysis extends NL-to-Analytics beyond single questions to multi-turn exploration sessions. Research on systems like Data-Copilot (Zhang et al., 2024) chains multiple queries together, with each question building on the context of previous results. Self-correcting SQL generation uses the LLM to analyze query execution errors (syntax errors, empty results, unexpected row counts) and automatically revise the query, reducing the need for human intervention.
Meanwhile, proactive analytics inverts the paradigm entirely: instead of waiting for questions, the LLM continuously monitors data streams and surfaces anomalies, trend changes, and noteworthy patterns before anyone asks.
6. Text Analytics at Scale
Beyond search, recommendation, and dashboards, LLMs unlock a category of application that was previously impractical: large-scale qualitative analysis. Surveys with thousands of open-ended responses, product review corpora with millions of entries, and support ticket archives spanning years all contain rich insights that traditional NLP (keyword extraction, topic modeling) captured only partially. LLMs can read, interpret, and synthesize unstructured text at a depth that approaches human analysis, at a fraction of the cost and time.
6.1 Survey Analysis and Review Mining
Consider a company that collects 10,000 open-ended survey responses after a product launch. A human analyst might read 200 and extrapolate. An LLM can process all 10,000, extracting themes, sentiment, specific feature requests, and emotional tone from each response. Similarly, review mining across e-commerce platforms extracts structured insights from unstructured text: which product features do customers love, which cause frustration, and how do opinions shift over time? The hybrid ML/LLM patterns from Section 12.3 apply directly here: use a small classifier to categorize responses into broad buckets (positive, negative, feature request, bug report), then use a larger LLM to synthesize narratives within each bucket.
6.2 The Map-Reduce Pattern for LLM Analytics
Processing thousands of documents through an LLM requires a systematic approach. The map-reduce pattern divides the work into two phases. In the map phase, each document (or chunk) is independently analyzed by the LLM, producing a structured summary: extracted themes, sentiment scores, key quotes, and entity mentions. In the reduce phase, the per-document summaries are aggregated into a final synthesis: ranked themes by frequency, sentiment trends, representative quotes for each theme, and actionable recommendations. This pattern parallelizes naturally (all map calls are independent) and keeps each LLM call within reasonable context limits.
# Map-reduce text analytics over a corpus
from openai import OpenAI
import json
from concurrent.futures import ThreadPoolExecutor
client = OpenAI()
def map_analyze(text: str, doc_id: str) -> dict:
"""Map phase: analyze a single document."""
response = client.chat.completions.create(
model="gpt-4o-mini", # small model for classification
messages=[{
"role": "user",
"content": (
"Analyze this customer review. Return JSON with:\n"
"- sentiment: positive/negative/mixed\n"
"- themes: list of 1-3 topic tags\n"
"- key_quote: the most informative sentence\n"
"- feature_requests: list (empty if none)\n\n"
f"Review: {text}"
)
}],
response_format={"type": "json_object"},
temperature=0,
)
result = json.loads(response.choices[0].message.content)
result["doc_id"] = doc_id
return result
def reduce_synthesize(summaries: list[dict]) -> str:
"""Reduce phase: synthesize all map results into a report."""
summary_text = json.dumps(summaries, indent=1)
response = client.chat.completions.create(
model="gpt-4o", # large model for synthesis
messages=[{
"role": "user",
"content": (
"You are a product analytics expert. Given these per-review "
"summaries, produce a report with:\n"
"1. Top 5 themes ranked by frequency\n"
"2. Overall sentiment breakdown (% positive/negative/mixed)\n"
"3. Top feature requests with supporting quotes\n"
"4. Three actionable recommendations\n\n"
f"Review summaries:\n{summary_text}"
)
}],
temperature=0.3,
)
return response.choices[0].message.content
# Execute map phase in parallel
reviews = [{"id": f"r{i}", "text": t} for i, t in enumerate(all_reviews)]
with ThreadPoolExecutor(max_workers=10) as pool:
futures = [pool.submit(map_analyze, r["text"], r["id"]) for r in reviews]
mapped = [f.result() for f in futures]
# Execute reduce phase
report = reduce_synthesize(mapped)
print(report)
6.3 Cost Optimization for Large-Scale Analytics
Processing 100,000 reviews at $0.15 per 1K input tokens (GPT-4o) would cost hundreds of dollars for the map phase alone. Cost-conscious pipelines use a tiered approach. The cheapest tier (GPT-4o-mini, Claude Haiku, or even a fine-tuned classifier) handles the high-volume map phase where each call is simple and structured. The mid-tier model handles the reduce phase, which requires more nuanced synthesis but processes far fewer tokens. The most expensive tier is reserved for edge cases: reviews flagged as ambiguous by the map phase, or topics where the initial synthesis was uncertain. This three-tier strategy can reduce costs by 80% compared to running the most capable model on every document, while maintaining comparable quality on the final report.
Show Answer
Show Answer
Show Answer
Show Answer
Show Answer
Generative recommendation is emerging as a paradigm where LLMs generate item descriptions or even entire product concepts tailored to individual users, rather than selecting from existing catalogs. Research into preference alignment for recommendation (adapting the RLHF techniques from Section 17.1) trains recommendation LLMs to better match human preferences. Multimodal search systems that understand images, text, and voice queries simultaneously are blurring the line between search and conversation, with products like Google Lens + Gemini enabling "point and ask" discovery experiences.
Exercises
How can an LLM be used as a recommendation engine? Compare the LLM-based approach with traditional collaborative filtering. What are the strengths and limitations of each?
Answer Sketch
LLMs can generate recommendations by reasoning about user preferences described in natural language, handling the cold-start problem well, and explaining their recommendations. Collaborative filtering excels with large interaction datasets and produces more statistically reliable recommendations. LLM limitations: no access to behavioral data (clicks, purchases), higher latency, and potential to hallucinate non-existent items. Best approach: combine both.
Implement a simple LLM-powered search system that rewrites a user's natural language query into multiple search queries, retrieves results from each, and re-ranks the combined results.
Answer Sketch
Step 1: use an LLM to generate 3 query variations from the user's input (synonym expansion, specificity adjustment, related concepts). Step 2: execute each query against a search index. Step 3: deduplicate results. Step 4: re-rank using an LLM or a cross-encoder that scores relevance of each result to the original query. Return the top-k results with relevance scores.
Design a conversational recommendation system that progressively refines suggestions based on user feedback. How should the system balance exploration (showing diverse options) and exploitation (showing likely matches)?
Answer Sketch
The system asks clarifying questions to narrow preferences, presents diverse initial recommendations, and adapts based on feedback ('I like this one but want something cheaper'). Track user preferences as a running summary. For exploration/exploitation: start with diverse recommendations (explore), then narrow based on feedback (exploit), but periodically introduce a surprise option to discover new preferences the user has not expressed.
Write a function that converts a natural language analytics question into a SQL query, executes it, and returns both the data and a natural language summary of the results.
Answer Sketch
Provide the LLM with the database schema (table names, column names, types). Send the user's question and ask for a SQL query. Validate the query (no mutations, reasonable LIMIT). Execute against the database. Pass the results back to the LLM with the original question and ask for a plain-language summary. Include the SQL query in the response for transparency.
Compare explicit preference modeling (user states preferences) with implicit preference modeling (inferred from behavior) in LLM-powered recommendation systems. What are the trade-offs?
Answer Sketch
Explicit: more accurate (the user tells you what they want) but requires user effort and may not capture unconscious preferences. Implicit: requires no user effort, captures actual behavior, but is noisier and harder to interpret. LLMs enable a hybrid approach: track implicit signals (click patterns, time spent) and use the LLM to synthesize them into a natural language preference profile that can be refined through conversation.
What Comes Next
In the next section, Section 28.5: Cybersecurity & LLMs, we explore cybersecurity applications of LLMs, from threat detection and analysis to defensive automation.
Bibliography
Hou, Y., Zhang, J., Lin, Z., et al. (2024). "Large Language Models are Zero-Shot Rankers for Recommender Systems." arXiv:2305.08845
Bao, K., Zhang, J., Zhang, Y., et al. (2023). "TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation." arXiv:2305.00447
Shi, Z., Wang, H., & Yin, H. (2024). "Large Language Models for Generative Recommendation: A Survey and Visionary Discussions." arXiv:2309.01157
Wu, L., Zheng, Z., Qiu, Z., et al. (2024). "A Survey on Large Language Models for Recommendation." arXiv:2305.19860
Pourreza, M. & Rafiei, D. (2023). "DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction." arXiv:2304.11015
Gao, D., Wang, H., Li, Y., et al. (2024). "DAIL-SQL: Efficient Few-Shot Text-to-SQL with Question Representation." arXiv:2308.15363
Maddigan, P. & Susnjak, T. (2023). "Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex, and GPT-4." arXiv:2302.02094
Narechania, A., Srinivasan, A., & Stasko, J. (2021). "NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries." IEEE TVCG 2021
Zhu, Y., Yuan, H., Wang, S., et al. (2023). "Large Language Models for Information Retrieval: A Survey." arXiv:2308.07107
