Building a HybridRAG System: A Detailed Guide to Integrating Knowledge Graphs and Vector Retrieval with LLMs

4 min readDec 30, 2024

Introduction

The research paper “HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction” introduces an innovative way to enhance information retrieval and generation tasks by combining the strengths of VectorRAG (vector-based retrieval) and GraphRAG (knowledge graph-based retrieval). This blog explains how you can build a similar system using tools like Chroma, Neo4j, and OpenAI GPT-3.5 Turbo. We’ll break down the core concepts, system design, and implementation steps to help you recreate and extend this powerful framework.

Core Concepts

1. Retrieval-Augmented Generation (RAG)

Traditional RAG integrates a retrieval step with language generation. It searches external databases for relevant context and feeds this into a generative model to enhance the quality and accuracy of responses.
VectorRAG retrieves textual information from a vector database using embeddings.
GraphRAG uses structured Knowledge Graphs (KGs) to extract contextual subgraphs for queries, providing relationship-aware context.

2. HybridRAG

HybridRAG combines the two RAG methods:

VectorRAG offers broad similarity-based retrieval for unstructured text.
GraphRAG contributes entity-relationship knowledge for contextually rich answers.
By integrating both, HybridRAG achieves better performance across both extractive and abstractive Q&A tasks.

Tools and Architecture

Tools Used

Chroma: For managing vector databases and handling similarity-based retrieval tasks.
Neo4j: For creating and querying knowledge graphs.
OpenAI GPT-3.5 Turbo: As the LLM for both content generation and intermediate prompt-based tasks.
Text Embedding Model: A 1536-dimensional embedding model (e.g., text-embedding-ada-002) to encode documents and queries for vector search.
LangChain: Famous library in python (suport other languages) for chaining together multiple steps in the RAG process, including retrieval and generation.

System Architecture

1. Data Ingestion and Preprocessing

flow of Data Ingestion process in hybrid Rag

Extract text from PDFs using tools like PyPDFLoader.
Chunk large documents into manageable parts with overlap (e.g., 1024 tokens with 200 overlaps).

2. VectorRAG Pipeline

Encode document chunks into embeddings using a text embedding model.
Store embeddings in Chroma (a vector database).
Perform similarity searches to retrieve context for queries.

3. Knowledge Graph Construction

Extract entities and relationships using a two-stage prompt engineering process.
Store structured Nodes and Relationships in Neo4j.
Query Neo4j for subgraphs relevant to the query.

4. Hybrid Context Combination

Merge contexts retrieved from Chroma (VectorRAG) and Neo4j (GraphRAG).
Pass combined context to GPT-3.5 Turbo for response generation.

4. Evaluation and Fine-tuning

Use metrics like faithfulness, answer relevance, context precision, and context recall to measure performance.

Implementation

To implement the HybridRAG system, the process begins with data preprocessing, where unstructured text from financial documents is chunked into manageable sizes with overlaps to ensure context continuity. VectorRAG is powered by Chroma, which stores high-dimensional embeddings (1536 dimensions) of these chunks, generated using OpenAI’s text-embedding-ada-002 model.
Queries are matched against these embeddings to retrieve the most relevant chunks based on cosine similarity. Simultaneously, the Knowledge Graph (KG) is constructed using Neo4j by extracting entities and relationships from the documents through a two-stage LLM-based prompt engineering process.
The resulting triplets are stored in Neo4j, enabling structured querying for relevant subgraphs. The HybridRAG system integrates both contexts by merging the results from Chroma and Neo4j into a unified context, which is fed into GPT-3.5 Turbo for generating responses.
This approach ensures that the system leverages both the breadth of vector-based similarity and the depth of relationship-rich knowledge graphs.
Due to the length of article i have uploaded the code in github and here is the link https://github.com/abhishekbiswas772/Hybrid_rag_simulation.git

Performance Evaluation

Use evaluation metrics to compare VectorRAG, GraphRAG, and HybridRAG:

Faithfulness: Verify if answers are grounded in retrieved context.
Answer Relevance: Use cosine similarity between embeddings of the question and generated answers.

Future Enhancements

Real-Time Updates: Incorporate real-time financial data streams into the KG.
Multi-Modal Inputs: Extend HybridRAG to handle tabular, numerical, and visual data.
Enhanced Evaluation Metrics: Focus on financial reasoning and numerical accuracy.

Conclusion

By leveraging Chroma for vector-based retrieval, Neo4j for knowledge graph queries, and OpenAI GPT-3.5 Turbo for generation, you can build a powerful HybridRAG system tailored for financial document analysis. This hybrid approach balances the strengths of unstructured text similarity and structured knowledge, offering a robust solution for complex Q&A tasks across domains.

Resources:

Here is the resources I used for making blogs.

https://arxiv.org/html/2408.04948v1

Thanks for reading! My name is Abhishek, I have an passion of building app and learning new technology.