Part V: Retrieval and Conversation
Chapter 20: Retrieval-Augmented Generation

RAG with Knowledge Graphs

The universe is not made of atoms. It is made of stories. And the best stories are the ones where you can trace every connection.

RAG RAG, Graph-Obsessed AI Agent
Big Picture

Vector search finds documents that are semantically similar, but it cannot reason about relationships between entities. Knowledge graphs encode structured relationships (entities, relations, triples) that enable multi-hop reasoning, entity disambiguation, and relationship-aware retrieval. GraphRAG combines the structured reasoning of knowledge graphs with the generative power of LLMs, unlocking capabilities like "find all companies that partner with competitors of our top customer" that are impossible with vector search alone. This section covers knowledge graph fundamentals, graph embedding methods, and the GraphRAG paradigm. The vector search limitations that motivate GraphRAG were introduced in Section 19.2.

Prerequisites

Knowledge graph RAG builds on the vector retrieval foundations from Section 20.1 and the advanced retrieval strategies in Section 20.2. Understanding entity relationships and graph traversal will help, though the section introduces these concepts as needed. The structured knowledge representation here complements the unstructured document processing covered in Section 19.4.

Islands connected by bridges representing entities linked by relationships in a knowledge graph
Figure 20.3.1: Knowledge graphs turn your documents into connected islands of facts. Each bridge is a relationship, and RAG can traverse them to find answers that span multiple hops.
A subway map where stations are entities and lines are relationships, representing knowledge graph traversal
Figure 20.3.2: Navigating a knowledge graph is like riding the subway: hop between entity stations along relationship lines to reach your destination answer.

1. Knowledge Graph Fundamentals

A knowledge graph (KG) is a structured representation of real-world entities and the relationships between them. Unlike documents stored as unstructured text, knowledge graphs encode information as a network of nodes (entities) connected by typed edges (relations). This structured representation complements the dense vector representations used in standard semantic search. This structure enables precise queries about relationships, paths, and patterns that would require complex reasoning over unstructured text.

Key Insight

Knowledge graphs formalize a representational strategy that philosophers and logicians have explored since Aristotle's categorical syllogisms: encoding knowledge as structured relationships between entities. The modern triple format (subject, predicate, object) is a direct descendant of the Resource Description Framework (RDF), which itself draws on the predicate logic of Frege and Russell. The power of this representation is that it supports compositional reasoning: if you know (Einstein, bornIn, Ulm) and (Ulm, locatedIn, Germany), you can infer (Einstein, bornIn, Germany) by traversing the graph. This is the same principle of transitivity that underlies deductive logic. GraphRAG's advantage over vector-based retrieval for multi-hop questions stems from this compositionality: vector similarity can find documents about Einstein or about Germany, but it cannot perform the relational reasoning needed to connect them through intermediate entities. The knowledge graph provides the relational "backbone" that transforms associative retrieval into something closer to logical inference.

1.1 Entities, Relations, and Triples

The fundamental unit of a knowledge graph is the triple, expressed as (subject, predicate, object) or equivalently (head, relation, tail). For example, (Albert Einstein, bornIn, Ulm) encodes a single fact. A knowledge graph is simply a collection of such triples, forming a directed labeled graph where entities are nodes and relations are edge labels.

Fun Fact

Google's Knowledge Graph, launched in 2012, contained 570 million entities and 18 billion facts. Wikidata now has over 100 million items. If you ever feel overwhelmed organizing your Notion workspace, just imagine maintaining a knowledge graph where "Douglas Adams" needs edges to "author," "42," and "the answer to life, the universe, and everything."

A detective's evidence board with photos, strings, and pins connecting clues, representing GraphRAG's entity linking
Figure 20.3.3: GraphRAG turns your retrieval system into a detective. Pin the entities to the board, connect them with strings, and follow the relationships to crack the case.
Figure 20.3.4 shows how a knowledge graph encodes entities and relationships.

Albert Einstein Ulm General Relativity Physics Nobel Prize 1921 ETH Zurich bornIn developed field wonAward studiedAt
Figure 20.3.4: A knowledge graph encodes entities as nodes and relationships as labeled directed edges, enabling structured queries and multi-hop reasoning.

1.2 RDF vs. Property Graphs

Tip

If you are starting a new knowledge graph project for RAG, choose Neo4j with property graphs unless you have a specific reason to use RDF (such as integration with Wikidata or compliance with Semantic Web standards). Property graphs are more intuitive for developers, Cypher is easier to learn than SPARQL, and the Neo4j ecosystem has the best LLM integration tooling available today.

Two main formalisms exist for representing knowledge graphs. RDF (Resource Description Framework) is a W3C standard that represents all knowledge as triples and uses URIs to identify entities and relations. RDF is queried using SPARQL and is the foundation of the Semantic Web (Wikidata, DBpedia). Property graphs allow nodes and edges to carry arbitrary key-value properties, offering greater flexibility. Property graphs are queried using Cypher (Neo4j) or Gremlin and are more commonly used in industry applications.

Feature Comparison
Feature RDF Property Graphs
Data model Subject-Predicate-Object triples Nodes and edges with properties
Query language SPARQL Cypher, Gremlin
Edge properties Requires reification (complex) Native support
Standards W3C standard, widely interoperable No universal standard
Typical databases Blazegraph, Virtuoso, Stardog Neo4j, Amazon Neptune, TigerGraph
Best for Linked data, semantic web, ontologies Application data, recommendations

2. Building Knowledge Graphs with LLMs

Traditionally, knowledge graph construction required extensive manual curation or specialized NLP pipelines for entity extraction and relation classification. LLMs have dramatically simplified this process by extracting structured triples directly from unstructured text through carefully designed prompts. Code Fragment 20.3.1 below puts this into practice.

# implement extract_triples
# Key operations: RAG pipeline, API interaction

from openai import OpenAI
import json

client = OpenAI()

def extract_triples(text, entity_types=None):
 """Extract knowledge graph triples from text using an LLM."""
 type_hint = ""
 if entity_types:
 type_hint = f"\nFocus on these entity types: {', '.join(entity_types)}"

 response = client.chat.completions.create(
 model="gpt-4o",
 messages=[{
 "role": "system",
 "content": f"""Extract knowledge graph triples from the text.
Return a JSON array of objects with keys: head, relation, tail.
Use concise, canonical entity names.
Use lowercase relation names with underscores.{type_hint}

Example output:
[
 {{"head": "OpenAI", "relation": "developed", "tail": "GPT-4"}},
 {{"head": "GPT-4", "relation": "is_a", "tail": "language model"}}
]"""
 }, {
 "role": "user",
 "content": text
 }],
 response_format={"type": "json_object"},
 temperature=0.0
 )

 result = json.loads(response.choices[0].message.content)
 return result.get("triples", result)
Code Fragment 20.3.1: The extract_triples function uses an LLM to parse unstructured text into subject-predicate-object triples, and the KnowledgeGraphStore class wraps Neo4j operations for storing triples and querying neighborhoods. The find_neighbors method retrieves directly connected entities, while find_path discovers multi-hop relationships using variable-length path matching.

2.1 Storing Triples in Neo4j

This snippet inserts knowledge-graph triples into a Neo4j database using Cypher queries.

# Define KnowledgeGraphStore; implement __init__, add_triples, find_neighbors
# See inline comments for step-by-step details.

from neo4j import GraphDatabase

class KnowledgeGraphStore:
 """Store and query knowledge graph triples in Neo4j."""

 def __init__(self, uri, user, password):
 self.driver = GraphDatabase.driver(uri, auth=(user, password))

 def add_triples(self, triples):
 """Insert triples as nodes and relationships."""
 with self.driver.session() as session:
 for triple in triples:
 session.run("""
 MERGE (h:Entity {name: $head})
 MERGE (t:Entity {name: $tail})
 MERGE (h)-[r:RELATES {type: $relation}]->(t)
 """, head=triple["head"],
 tail=triple["tail"],
 relation=triple["relation"])

 def find_neighbors(self, entity, max_hops=2):
 """Find all entities within N hops of the given entity."""
 with self.driver.session() as session:
 result = session.run(f"""
 MATCH path = (e:Entity {{name: $entity}})
 -[*1..{max_hops}]-(neighbor)
 RETURN path
 LIMIT 50
 """, entity=entity)
 return [record["path"] for record in result]

 def query_relationship(self, head, tail):
 """Find relationship paths between two entities."""
 with self.driver.session() as session:
 result = session.run("""
 MATCH path = shortestPath(
 (h:Entity {name: $head})-[*]-(t:Entity {name: $tail})
 )
 RETURN path
 """, head=head, tail=tail)
 return [record["path"] for record in result]
Code Fragment 20.3.2: implement extract_triples
Note

One of the hardest challenges in LLM-powered KG construction is entity resolution: determining that "Einstein," "Albert Einstein," and "A. Einstein" all refer to the same entity. Without careful entity resolution, the graph becomes fragmented with duplicate nodes. Common approaches include embedding-based deduplication (comparing entity name embeddings), LLM-based coreference resolution, and linking to external KGs like Wikidata for canonical entity identifiers.

3. Graph Embeddings

Graph embeddings map entities and relations from a knowledge graph into continuous vector spaces, enabling similarity-based retrieval, link prediction (predicting missing triples), and integration with dense retrieval systems. The key challenge is preserving the structural properties of the graph in the embedding space.

3.1 TransE

TransE (Bordes et al., 2013) is the foundational graph embedding model. It represents relations as translations in embedding space: if (h, r, t) is a true triple, then the embedding of the head entity plus the embedding of the relation should approximately equal the embedding of the tail entity: h + r ≈ t. The model is trained by minimizing a margin-based loss that pushes correct triples closer than corrupted triples.

3.2 DistMult

DistMult (Yang et al., 2015) models relations as diagonal matrices (represented as vectors) and scores triples using a bilinear product: score(h, r, t) = h · diag(r) · t. This is equivalent to an element-wise product followed by a sum. DistMult is simple and effective but cannot model asymmetric relations (it gives the same score to (h, r, t) and (t, r, h)), which limits its applicability. Code Fragment 20.3.3 below puts this into practice.

# Define TransE; implement __init__, forward, margin_loss
# Key operations: forward pass computation, loss calculation, embedding lookup

import torch
import torch.nn as nn

class TransE(nn.Module):
 """TransE knowledge graph embedding model."""

 def __init__(self, num_entities, num_relations, dim=128):
 super().__init__()
 self.entity_emb = nn.Embedding(num_entities, dim)
 self.relation_emb = nn.Embedding(num_relations, dim)

 # Initialize with uniform distribution
 nn.init.uniform_(self.entity_emb.weight, -6/dim**0.5, 6/dim**0.5)
 nn.init.uniform_(self.relation_emb.weight, -6/dim**0.5, 6/dim**0.5)

 def forward(self, heads, relations, tails):
 """Score triples: lower distance = more plausible."""
 h = self.entity_emb(heads)
 r = self.relation_emb(relations)
 t = self.entity_emb(tails)

 # TransE scoring: ||h + r - t||
 score = torch.norm(h + r - t, p=2, dim=-1)
 return score

 def margin_loss(self, pos_score, neg_score, margin=1.0):
 """Margin-based ranking loss."""
 return torch.relu(margin + pos_score - neg_score).mean()
Code Fragment 20.3.3: Implementing Adaptive RAG with a multi-armed bandit that learns which retrieval strategy works best for different query types over time.

4. GraphRAG

GraphRAG (Microsoft Research, 2024) represents a paradigm shift in how knowledge graphs are used for RAG. Instead of querying a pre-existing knowledge graph, GraphRAG builds a knowledge graph from the document corpus itself, then uses community detection to create hierarchical summaries of entity clusters. This enables both local retrieval (finding specific facts) and global retrieval (answering questions that require synthesizing information across the entire corpus). Figure 20.3.5 depicts the full GraphRAG pipeline from document ingestion to query routing.

4.1 The GraphRAG Pipeline

  1. Entity and relation extraction: An LLM extracts entities and relationships from each document chunk, building a corpus-wide knowledge graph.
  2. Community detection: The Leiden algorithm (a graph clustering method that groups densely connected nodes into communities, similar to finding friend groups in a social network) partitions the graph into communities of closely connected entities, creating a hierarchy from fine-grained clusters to broad themes.
  3. Community summarization: An LLM generates a summary for each community, capturing the key entities, relationships, and themes within that cluster.
  4. Query routing: At query time, the system determines whether to use local search (entity-based retrieval for specific questions) or global search (community summary-based retrieval for broad questions).
GraphRAG Pipeline 1. Extract LLM extracts entities and relations from text 2. Build KG Merge into unified knowledge graph 3. Communities Leiden algorithm finds entity clusters 4. Summarize LLM summarizes each community cluster Query Router Local or Global? Local Search Entity neighbors + related chunks "What did Einstein discover?" Global Search Community summaries (map-reduce) "What are the main themes?"
Figure 20.3.5: GraphRAG builds a knowledge graph from documents, detects communities, generates summaries, then routes queries to local (entity) or global (community) search.
Key Insight

GraphRAG's most powerful capability is answering global questions that require synthesizing information across an entire corpus. Questions like "What are the major themes in this dataset?" or "How do the research areas in this collection relate to each other?" cannot be answered by retrieving a few chunks. GraphRAG's community summaries pre-compute these corpus-level insights at indexing time, making global queries feasible at serving time through a map-reduce approach over community summaries.

5. Hybrid KG + Vector Retrieval

The most effective production systems combine knowledge graph traversal with vector search. The knowledge graph provides structured multi-hop reasoning and entity-aware retrieval, while vector search provides semantic similarity for unstructured content that the KG does not cover. Code Fragment 20.3.4 below puts this into practice.

# implement hybrid_kg_vector_retrieve
# Key operations: retrieval pipeline, vector search, evaluation logic

def hybrid_kg_vector_retrieve(query, kg_store, vector_collection, k=5):
 """Combine KG traversal with vector search."""

 # Step 1: Extract entities from query
 entities = extract_entities_from_query(query)

 # Step 2: KG retrieval (structured paths)
 kg_context = []
 for entity in entities:
 neighbors = kg_store.find_neighbors(entity, max_hops=2)
 kg_context.extend(format_paths(neighbors))

 # Step 3: Vector retrieval (semantic similarity)
 vector_results = vector_collection.query(
 query_texts=[query],
 n_results=k
 )
 vector_context = vector_results["documents"][0]

 # Step 4: Combine both context types
 combined_context = f"""Structured Knowledge (from Knowledge Graph):
{chr(10).join(kg_context)}

Relevant Documents (from semantic search):
{chr(10).join(vector_context)}"""

 return combined_context
Code Fragment 20.3.4: Combining knowledge graph lookups with vector retrieval: entities are extracted from the query, matched to graph nodes, and their triples are merged with vector search results.
Warning

GraphRAG's indexing phase is significantly more expensive than standard RAG because it requires LLM calls for every chunk (entity extraction), every entity pair (relation validation), and every community (summary generation). For a corpus of 10,000 documents, indexing can cost $50 to $500 in LLM API calls depending on the model and chunk count. However, this is a one-time cost, and the resulting graph with community summaries enables query capabilities that are impossible with vector-only RAG.

6. When to Use Knowledge Graph RAG

6. When to Use Knowledge Graph RAG Intermediate
Use Case Vector RAG KG RAG GraphRAG
Simple factual Q&A Excellent Good Overkill
Multi-hop reasoning Poor Excellent Excellent
Corpus-level themes Poor Moderate Excellent
Entity disambiguation Poor Excellent Good
Setup complexity Low High Very High
Indexing cost Low (embedding only) Medium (KG construction) High (LLM-intensive)
Self-Check
Q1: What is a knowledge graph triple, and how does it differ from a vector embedding?
Show Answer
A triple is a structured fact expressed as (subject, predicate, object), for example (Einstein, bornIn, Ulm). It encodes an explicit, typed relationship between two entities. A vector embedding is a continuous numerical representation of text in a high-dimensional space that captures semantic similarity but does not encode explicit relationships. Triples are discrete and interpretable; embeddings are continuous and opaque.
Q2: How does TransE represent relations in embedding space?
Show Answer
TransE represents relations as translation vectors in embedding space. For a true triple (h, r, t), the model learns embeddings such that h + r is approximately equal to t. The training objective minimizes the distance ||h + r - t|| for correct triples while maximizing it for corrupted (false) triples using a margin-based ranking loss.
Q3: What is the key innovation of GraphRAG compared to traditional KG-based RAG?
Show Answer
GraphRAG builds a knowledge graph from the document corpus itself (rather than relying on a pre-existing KG), then applies community detection to find clusters of related entities, and generates LLM summaries for each community. This enables "global search" over community summaries to answer corpus-level questions (e.g., "What are the main themes?") that are impossible with entity-level or chunk-level retrieval alone.
Q4: What is the main limitation of DistMult compared to TransE?
Show Answer
DistMult cannot model asymmetric relations. Because it scores triples using a symmetric bilinear product (element-wise multiplication followed by sum), it assigns the same score to (h, r, t) and (t, r, h). This means it cannot distinguish "A manages B" from "B manages A." TransE handles asymmetry naturally because h + r = t does not imply t + r = h.
Q5: Why is entity resolution particularly challenging in LLM-based KG construction?
Show Answer
LLMs extract entity mentions as surface strings, so the same entity may appear with different names, abbreviations, or spellings across documents (e.g., "Einstein," "Albert Einstein," "A. Einstein"). Without entity resolution, the graph becomes fragmented with duplicate nodes that should be merged. This is hard because it requires disambiguating entities with similar names (e.g., "Jordan" the country vs. "Jordan" the person) and linking variant spellings to canonical forms.
Tip: Always Show Retrieved Sources to Users

Display the retrieved chunks or source documents alongside the generated answer. This builds user trust, enables fact-checking, and provides an immediate feedback signal for when retrieval goes wrong.

Key Takeaways
Real-World Scenario: Adding a Knowledge Graph to a Pharmaceutical RAG System

Who: A data scientist at a pharmaceutical company supporting drug interaction research

Situation: Researchers needed to answer multi-hop questions like "Which drugs that inhibit CYP3A4 are also contraindicated with warfarin?" across 80,000 clinical documents and FDA drug labels.

Problem: Vector-only RAG retrieved documents about CYP3A4 inhibitors or warfarin contraindications individually, but could not reliably find the intersection. Multi-hop reasoning across disconnected chunks produced hallucinated drug names.

Dilemma: Building a comprehensive drug interaction knowledge graph from scratch would take months of expert curation. Using an LLM to auto-extract triples was faster but introduced extraction errors (15% false positive rate on relation extraction).

Decision: They combined a curated seed graph from DrugBank (200K triples covering known interactions) with LLM-extracted triples from internal documents, using a confidence threshold of 0.85 and pharmacist review for borderline extractions.

How: Queries were classified as single-hop (vector search) or multi-hop (graph traversal plus vector search). For multi-hop queries, the system traversed the KG to find entity paths, then retrieved supporting text chunks for each hop to ground the answer.

Result: Multi-hop question accuracy improved from 41% to 79%. Researchers could trace the reasoning chain through explicit graph edges, which was critical for regulatory compliance.

Lesson: Knowledge graphs excel at structured, multi-hop reasoning that vector search alone cannot reliably perform. Combining curated seed data with LLM extraction (plus human review) is a practical path to building domain KGs.

Research Frontier

LLM-constructed knowledge graphs are enabling automatic ontology discovery and entity linking from unstructured text at scale, reducing the manual effort traditionally required for graph construction. GraphRAG (Microsoft, 2024) introduces community-based summarization of graph clusters, enabling global queries that span the entire knowledge base rather than just local neighborhoods. Temporal knowledge graphs track how relationships evolve over time, addressing the staleness problem in static knowledge representations. Research into hybrid graph-vector retrieval is developing unified index structures that support both structured graph traversals and dense vector similarity search in a single query.

Exercises

These exercises cover knowledge graphs and GraphRAG. Coding exercises use networkx and LLM APIs.

Exercise 20.3.1: KG vs. vector search Conceptual

Give an example of a query that a knowledge graph can answer but vector search cannot, and vice versa.

Show Answer

KG wins: "Which companies are competitors of our largest supplier's parent company?" (requires relationship traversal). Vector search wins: "Find documents about sustainable manufacturing practices" (semantic similarity, no specific entities).

Exercise 20.3.2: Triple extraction Conceptual

Given the sentence "Apple acquired Beats Electronics in 2014 for $3 billion," list all the knowledge graph triples you would extract.

Show Answer

(Apple, acquired, Beats Electronics), (Apple, acquisition_date, 2014), (Apple, acquisition_price, $3 billion), (Beats Electronics, acquired_by, Apple).

Exercise 20.3.3: Multi-hop reasoning Conceptual

Explain what multi-hop reasoning is and why knowledge graphs enable it better than vector search. Provide an example query that requires 3 hops.

Show Answer

Multi-hop reasoning traverses multiple relationship edges. Example: "What university did the CEO of the company that acquired Instagram attend?" Hop 1: Instagram acquired_by Facebook. Hop 2: Facebook CEO is Mark Zuckerberg. Hop 3: Zuckerberg attended Harvard. Vector search cannot chain these relationships.

Exercise 20.3.4: GraphRAG communities Conceptual

In Microsoft's GraphRAG, what are "communities" and how do community summaries enable global queries that vector search struggles with?

Show Answer

Communities are clusters of densely connected entities detected by graph algorithms (e.g., Leiden). Each community gets an LLM-generated summary. Global queries like "What are the main themes across all documents?" are answered by aggregating community summaries, rather than retrieving individual chunks.

Exercise 20.3.5: Hybrid retrieval Conceptual

Design a hybrid pipeline that uses both vector search and knowledge graph traversal. How do you decide which retriever to invoke for a given query?

Show Answer

Use a query classifier: entity-centric queries (containing named entities and relationship words) go to the KG. Semantic/topical queries go to vector search. Complex queries can invoke both and merge results with source attribution.

Exercise 20.3.6: KG construction Conceptual

Use an LLM to extract entities and relationships from 10 paragraphs of text. Store them as a NetworkX graph and visualize the resulting knowledge graph.

Exercise 20.3.7: Graph querying Coding

Given a knowledge graph built from a company's Wikipedia page, implement a function that answers multi-hop queries like "Who founded the company that acquired X?"

Exercise 20.3.8: GraphRAG pipeline Coding

Implement a simplified GraphRAG pipeline: extract entities, build a graph, detect communities using the Leiden algorithm, generate community summaries with an LLM, and answer a global query using those summaries.

What Comes Next

In the next section, Section 20.4: Deep Research & Agentic RAG, we explore deep research and agentic RAG patterns, where models actively plan and execute multi-step retrieval strategies.

References & Further Reading

Pan, S. et al. (2024). "Unifying Large Language Models and Knowledge Graphs: A Roadmap." IEEE TKDE.

A comprehensive roadmap for integrating LLMs with knowledge graphs. Covers synergies, challenges, and future directions. Essential reading for anyone combining structured and unstructured knowledge.

Paper

Edge, D. et al. (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." arXiv preprint.

Introduces Microsoft GraphRAG, which builds community-level summaries from knowledge graphs. Demonstrates superior performance on global questions over traditional RAG. Key paper for graph-enhanced retrieval.

Paper

Baek, J. et al. (2023). "Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering." arXiv preprint.

Shows how to prompt LLMs with knowledge graph information without fine-tuning. A practical approach for leveraging structured knowledge. Useful for practitioners working with existing KGs.

Paper

Neo4j Graph Database Documentation.

The leading graph database platform with native support for Cypher queries and graph algorithms. Widely used for knowledge graph storage in RAG systems. Recommended for teams building graph-backed retrieval.

Tool

Microsoft GraphRAG.

Open-source implementation of the GraphRAG approach for building knowledge graphs from documents. Includes community detection and summary generation. A practical starting point for graph-based RAG projects.

Tool

Wishart, D. et al. (2018). "DrugBank 5.0: A Major Update to the DrugBank Database." Nucleic Acids Research.

A widely-used biomedical knowledge graph demonstrating real-world KG applications. Provides a concrete example of domain-specific structured data. Relevant for healthcare and life sciences RAG use cases.

Paper