Part V: Retrieval and Conversation
Chapter 20: Retrieval-Augmented Generation

Source Attribution and Citation in RAG

"Citing your sources is not pedantry. It is the difference between a trustworthy assistant and a confident confabulator."

RAG RAG, Scrupulously Footnoted AI Agent
Big Picture

A RAG system that generates correct answers but cannot tell you where those answers came from is only half-built. Source attribution transforms RAG from a "trust me" black box into a verifiable information system. Users, auditors, and downstream processes need to trace every claim back to a specific document, paragraph, or data record. This section covers the system design problem of building citation into RAG pipelines: from prompt-level strategies through post-generation verification to end-to-end attribution architectures.

Prerequisites

This section builds on the RAG architecture from Section 20.1, the advanced retrieval techniques in Section 20.2, and the hallucination detection methods from Section 32.2. Familiarity with prompt engineering and structured output parsing is assumed.

1. Why Attribution Matters

Source attribution serves multiple purposes beyond user trust:

Warning: Citation Does Not Guarantee Faithfulness

A model can generate a citation to a real source while stating something the source does not actually say. This is citation hallucination, one of the most dangerous failure modes because it creates false confidence. Every attribution system needs a verification layer, not just a generation layer.

2. Prompt-Level Attribution Strategies

The simplest approach to attribution is instructing the LLM to cite its sources within the generation prompt. This works surprisingly well with capable models but requires careful prompt design.

Inline Citation with Source IDs

def build_attributed_prompt(query, retrieved_chunks):
 """Build a prompt that instructs the LLM to cite sources inline."""
 context_block = ""
 for i, chunk in enumerate(retrieved_chunks):
 source_id = f"[{i+1}]"
 context_block += (
 f"Source {source_id}: {chunk['title']}\n"
 f" URL: {chunk['url']}\n"
 f" Content: {chunk['text']}\n\n"
 )

 system_prompt = """You are a research assistant. Answer the user's question
using ONLY the provided sources. Follow these citation rules strictly:

1. Every factual claim must have an inline citation like [1], [2], etc.
2. If multiple sources support a claim, cite all of them: [1][3].
3. If no source supports a claim, do not make it. Say "I could not find
 information about this in the provided sources."
4. At the end, list all cited sources with their titles and URLs.
5. Never cite a source for a claim it does not actually support."""

 return system_prompt, f"Sources:\n{context_block}\nQuestion: {query}"

# Example usage
chunks = [
 {"title": "Returns Policy", "url": "/docs/returns", "text": "Items may be returned within 30 days..."},
 {"title": "Warranty Guide", "url": "/docs/warranty", "text": "Electronics carry a 2-year warranty..."},
]
system, user_msg = build_attributed_prompt("What is the return window?", chunks)
# LLM output: "Items can be returned within 30 days of purchase [1].
# Electronics also carry a 2-year warranty [2].
#
# Sources:
# [1] Returns Policy - /docs/returns
# [2] Warranty Guide - /docs/warranty"
Code Fragment 20.9.1: Prompt template for inline citation. Source IDs are assigned before generation so the model can reference them unambiguously.
Fun Fact

In a 2024 study by Vectara, roughly 15% of RAG citations pointed to real sources but misrepresented what those sources actually said. The system was not lying outright; it was doing the AI equivalent of citing a reference in a term paper without reading past the abstract.

Structured Output with Citation Objects

For programmatic consumption, structured output formats are more reliable than parsing inline citations from free text:

from pydantic import BaseModel
from openai import OpenAI

class Citation(BaseModel):
 source_id: int
 quote: str # Exact quote from the source that supports the claim

class AnswerStatement(BaseModel):
 claim: str
 citations: list[Citation]

class AttributedAnswer(BaseModel):
 statements: list[AnswerStatement]
 unsupported_aspects: list[str] # Parts of the question with no source support

client = OpenAI()

def generate_attributed_answer(query, chunks):
 """Generate a structured answer with per-claim citations."""
 context = "\n".join(
 f"[Source {i+1}]: {c['text']}" for i, c in enumerate(chunks)
 )

 response = client.beta.chat.completions.parse(
 model="gpt-4o",
 messages=[
 {"role": "system", "content": (
 "Answer the question using only the provided sources. "
 "For each statement, include the source ID and an exact "
 "quote from that source as evidence. List any aspects of "
 "the question that the sources do not address."
 )},
 {"role": "user", "content": f"Sources:\n{context}\n\nQuestion: {query}"}
 ],
 response_format=AttributedAnswer,
 )
 return response.choices[0].message.parsed
Code Fragment 20.9.2: Structured attribution using Pydantic models and OpenAI's structured output. Each claim carries a source ID and an exact supporting quote, enabling automated verification.

3. Post-Generation Citation Verification

Prompt-level attribution tells the model to cite sources, but does not guarantee accuracy. Verification checks whether each citation actually supports its associated claim.

NLI-Based Verification

Key Insight

Citation verification is a classification problem, not a generation problem. Rather than asking another LLM "is this citation correct?" (which introduces more hallucination risk), use a specialized NLI model trained specifically to detect logical relationships between text pairs. These models are smaller, faster, and more reliable for this task than general-purpose LLMs.

Natural Language Inference (NLI) models classify the relationship between a premise (the source text) and a hypothesis (the generated claim) as entailment, contradiction, or neutral. A valid citation should produce an entailment score above a threshold.

from transformers import pipeline

nli = pipeline("text-classification", model="cross-encoder/nli-deberta-v3-large")

def verify_citations(statements, source_texts):
 """Verify that each citation's source actually supports its claim."""
 results = []
 for stmt in statements:
 claim = stmt.claim
 for cit in stmt.citations:
 source = source_texts[cit.source_id - 1]
 # NLI: does the source entail the claim?
 result = nli(f"{source}", f"{claim}")
 label = result[0]["label"]
 score = result[0]["score"]
 results.append({
 "claim": claim,
 "source_id": cit.source_id,
 "nli_label": label,
 "nli_score": score,
 "verified": label == "ENTAILMENT" and score > 0.7
 })
 return results

# Flag unverified citations for human review or removal
Code Fragment 20.9.3: NLI-based citation verification. Claims with contradicted or neutral citations are flagged for removal or human review.

Quote Matching Verification

When citations include exact quotes (as in the structured output approach), a simpler verification checks whether the quote actually appears in the source document. Fuzzy string matching handles minor formatting differences:

from rapidfuzz import fuzz

def verify_quote(quote, source_text, threshold=85):
 """Check if a citation quote actually appears in the source."""
 # Try exact substring match first
 if quote.lower() in source_text.lower():
 return {"match": "exact", "score": 100}

 # Fall back to fuzzy matching for minor variations
 # Slide a window of quote-length across the source
 best_score = 0
 quote_len = len(quote)
 for i in range(len(source_text) - quote_len + 1):
 window = source_text[i:i + quote_len]
 score = fuzz.ratio(quote.lower(), window.lower())
 best_score = max(best_score, score)
 if best_score >= threshold:
 return {"match": "fuzzy", "score": best_score}

 return {"match": "none", "score": best_score}
Code Fragment 20.9.4: Quote verification using fuzzy string matching. Catches cases where the LLM slightly paraphrases or reformats source text in its citations.

4. End-to-End Attribution Architectures

Production attribution systems combine multiple strategies into a pipeline:

  1. Retrieval with provenance metadata: Every chunk carries its source document ID, URL, page number, paragraph index, and ingestion timestamp. This metadata propagates through the entire pipeline.
  2. Attributed generation: The LLM generates answers with inline citations using structured output (Section 2 above).
  3. Citation verification: An NLI model or quote-matching pipeline verifies each citation. Unverified citations are either removed or flagged.
  4. Citation enrichment: Verified citations are enriched with display metadata (document title, section heading, page number, deep link URL) for the frontend.
  5. Feedback collection: Users can flag incorrect citations, creating a feedback loop for improving retrieval and generation quality.

Granularity Levels

Citation granularity is a critical design decision:

Granularity Example Citation Pros Cons
Document-level "Source: Annual Report 2024" Simple to implement; always available User must search a large document to verify
Page/section-level "Annual Report 2024, Section 3.2, p. 47" Reasonable precision; easy to navigate Requires page/section metadata in chunks
Paragraph-level "Annual Report 2024, p. 47, para. 3" High precision; fast to verify Requires fine-grained chunking and indexing
Sentence-level + quote "The revenue grew 15% YoY" (AR 2024, p.47) Maximally verifiable; builds strong trust Highest implementation complexity; quote matching needed
Table 20.9.1: Citation granularity levels. Finer granularity increases user trust and verifiability at the cost of implementation complexity.

ALCE: The Attribution Benchmark

The Automatic LLM Citation Evaluation (ALCE) benchmark (Gao et al., 2023) provides standardized evaluation for attribution quality. It measures:

ALCE uses NLI models as automated judges. A citation is considered correct if an NLI model classifies the source passage as entailing the associated claim with high confidence. This automated evaluation enables rapid iteration on attribution prompts and architectures without requiring expensive human annotation.

5. Common Failure Modes

6. Integration with Hallucination Detection

Attribution and hallucination detection are complementary. A claim with no valid citation is a candidate hallucination. A claim with a verified citation but low semantic similarity to the source may be a subtle hallucination. Production systems typically combine both:

  1. Generate answer with citations (this section)
  2. Verify citations via NLI (this section)
  3. Run hallucination detection on uncited claims (Section 32.2)
  4. Score overall answer faithfulness using evaluation frameworks (RAGAS, DeepEval)
Self-Check
What is citation hallucination, and why is it particularly dangerous?

Citation hallucination occurs when the model generates a citation to a real source document but the claim is not actually supported by that document. It is dangerous because it creates false confidence: users see a citation and assume the claim is verified, when in fact the model fabricated the association. This is worse than no citation at all, because it actively misleads the user.

Why use structured output for citations rather than parsing inline markers from free text?

Structured output (e.g., Pydantic models with JSON schema) guarantees a parseable format with explicit source IDs and supporting quotes. Free-text inline citations like [1] can be inconsistently formatted, hard to parse reliably, and do not enforce that the model provides supporting evidence. Structured output also makes automated verification straightforward since each citation object can be independently checked.

How do NLI models help verify citations?

NLI models classify the relationship between a premise (source passage) and hypothesis (generated claim) as entailment, contradiction, or neutral. If the source entails the claim, the citation is valid. If the relationship is contradiction or neutral, the citation does not actually support the claim. This provides automated, scalable verification without human review.

Key Takeaways
Self-Check
Q1: What is the difference between inline citation and structured citation objects in RAG output?
Show Answer

Inline citations embed source references directly in the generated text (e.g., [1], [2]), while structured citation objects return a separate data structure mapping each claim to its source document, passage, and confidence score, enabling programmatic verification.

Q2: Why is NLI-based verification more robust than simple quote matching for attribution checking?
Show Answer

NLI (Natural Language Inference) models can detect semantic entailment even when the generated text paraphrases the source rather than quoting it verbatim. Quote matching fails when the model rephrases information, which is the common case.

Q3: What does the ALCE benchmark measure and why does it matter for RAG systems?
Show Answer

ALCE (Automatic LLM Citation Evaluation) measures how well language models attribute their outputs to source documents. It matters because it provides a standardized way to compare attribution quality across different RAG implementations.

Research Frontier

Fine-grained attribution research is exploring token-level and span-level source linking, where each phrase in a generated answer traces back to a specific passage and character offset in the source. Multi-document attribution extends citation to claims that synthesize information from multiple sources, requiring the system to cite all contributing documents. Self-attributed generation trains models to produce citations as part of their generation process rather than as a post-hoc verification step, improving both accuracy and efficiency. Research into attribution for chain-of-thought reasoning aims to verify not just the final answer but each intermediate reasoning step.

What Comes Next

This concludes the RAG chapter. In Chapter 21: Building Conversational AI Systems, we apply the retrieval and generation techniques covered throughout this chapter to build complete conversational systems with memory, context management, and multi-turn dialogue.

References & Further Reading

Gao, T. et al. (2023). "Enabling Large Language Models to Generate Text with Citations." EMNLP.

Introduces the ALCE benchmark for evaluating LLM citation quality. Proposes automated evaluation using NLI models and establishes citation precision, recall, and fluency metrics. The standard reference for attribution evaluation.

Paper

Bohnet, B. et al. (2023). "Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models."

Defines the Attributable to Identified Sources (AIS) framework for evaluating whether LLM statements are supported by cited sources. Provides a rigorous formalization of attribution quality.

Paper

Rashkin, H. et al. (2023). "Measuring Attribution in Natural Language Generation Models." Computational Linguistics.

Comprehensive study of attribution measurement methods, comparing NLI-based, question-generation, and human evaluation approaches. Essential for understanding the tradeoffs between automated and human attribution assessment.

Paper

Liu, N. et al. (2023). "HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution."

A dataset of information-seeking queries with human-annotated attributions, useful for training and evaluating attribution systems.

Paper