The typical RAG pipeline is simple: embed documents, find similar chunks, feed them to an LLM. It works surprisingly well. But it has a fundamental blindspot.
Ask it "Who worked with Marie Curie?" and it might retrieve chunks about radioactivity, Nobel prizes, Paris. Semantically related, sure. But the explicit fact — that Pierre Curie was her husband and research partner — might not surface because the embedding doesn't capture that relationship directly.
Vector search answers: "What text sounds like this query?"
It doesn't answer: "What facts connect to entities in this query?"
Graphs as Structured Memory
A knowledge graph is just a collection of facts in the form (subject, predicate, object):
(marie curie) --[discovered]--> (radium)
(marie curie) --[married to]--> (pierre curie)
(pierre curie) --[won]--> (nobel prize in physics)
Each subject and object becomes a node. Each predicate becomes a labeled edge. That's it.
The insight: some information is better stored as explicit relationships than as embeddings. "Marie Curie was married to Pierre Curie" is a discrete fact. You either have it or you don't. Fuzzy similarity search is the wrong tool here.
Hybrid Retrieval
The solution is to use both:
Query → [Vector Search] → semantically similar chunks
→ [Graph Search] → explicitly connected facts
→ Merge → LLM → Answer
For the query "Who worked with Marie Curie?":
- Extract entities: "marie curie"
- Vector search finds chunks mentioning her work
- Graph search traverses from the "marie curie" node, finds direct connections to "pierre curie", "henri becquerel"
- LLM gets both: fuzzy context + hard facts
The graph acts as structured memory. The vector store acts as fuzzy memory. Together they're more complete than either alone.
The Pipeline
Two phases.
Indexing:
Document → Chunk → Embed → Vector Store
→ Extract Triples (LLM) → Graph Store
Querying:
Query → Extract Entities (LLM)
→ Vector Search → chunks
→ Graph Traversal → facts
→ Combine → Generate Answer (LLM)
The interesting part is triple extraction. You feed text to an LLM and ask it to output structured facts. With modern structured output APIs, you define a schema and the LLM is constrained to return valid instances:
class Triple(BaseModel):
subject: str
predicate: str
object: str
class ExtractionResult(BaseModel):
triples: list[Triple]
structured_llm = llm.with_structured_output(ExtractionResult)
result = structured_llm.invoke("Extract facts from: Marie Curie discovered radium...")
# Returns: ExtractionResult(triples=[Triple(subject="marie curie", ...)])
No JSON parsing. No regex. No validation code. The output is guaranteed to match your schema.
When This Matters
Not every RAG needs graphs. If you're searching documentation or answering questions about a single domain, vector search is often enough.
Graphs shine when:
- Relationships matter: Who reports to whom? What depends on what? Who transacted with whom?
- Multi-hop reasoning: Questions requiring connections across multiple documents
- Entity-centric queries: "Tell me everything about X" is a graph traversal, not a similarity search
- Sparse but critical facts: Low-frequency, high-importance information that embeddings might not prioritize
The Trade-off
Graphs require extraction. Extraction requires LLM calls. LLM calls cost money and time.
For 10,000 documents, you're running 10,000+ extraction calls at indexing time. Not free. And extraction quality varies — the LLM might miss facts, hallucinate relationships, or produce inconsistent entity names ("Marie Curie" vs "M. Curie" vs "Curie").
Entity resolution — merging different names for the same entity — is a whole separate problem. So is schema design. What predicates do you allow? How granular should entities be?
Vector search has none of these problems. You embed and store. Done.
So the question isn't "graphs or vectors?" It's "what does my data look like and what questions will users ask?"
If the answer involves relationships, connections, and explicit facts — consider graphs.
Implementation
I built this with LangChain and LangGraph. About 200 lines total:
- Pydantic models for state and LLM output schemas
- A LangGraph pipeline for parallel triple extraction across chunks
- FAISS for vector storage, NetworkX for graph storage
- A second LangGraph pipeline for hybrid retrieval
The key insight is that both pipelines are just state machines. Text flows in, gets transformed by nodes, results flow out. LangGraph handles the orchestration — including parallel execution when the graph fans out.
The lesson here isn't really about knowledge graphs. It's about choosing the right data structure.
Vectors are good at fuzzy similarity. Graphs are good at explicit relationships. SQL is good at structured queries. Each has its place.
The mistake is reaching for the trendy tool instead of the right one.
Comments (0)
No comments yet. Be the first to share your thoughts.
Log in to leave a comment.