Presenters

Source

Unmasking RAG Hallucinations: GraphRAG to the Rescue for Pinpoint Precision! 🎯

Hello, fellow tech enthusiasts! Ever felt like your brilliant AI agent, powered by Retrieval Augmented Generation (RAG), was incredibly smart until you asked it to count something? You’re not alone! Today, we’re diving deep into a common pitfall of RAG systems and discovering a powerful solution that brings verifiable precision to your AI applications: GraphRAG.

Elizabeth Fuentes Leone, a Developer Advocate on the AWS Agent Team, recently shed light on this crucial topic in her session, “When RAG Hallucinates Numbers: GraphRAG for Precise Answers.” She shared insights from her series of six blocks, all designed to help us avoid agent hallucination. Let’s explore how we can conquer those pesky fabricated numbers!

The Sneaky Problem: RAG’s Achilles’ Heel with Numbers 📉

Imagine you ask your RAG agent, “How many hotels have swimming pools?” Your agent diligently searches its documents, finds a few, reads some numbers, and confidently declares, “Approximately 45.” The catch? The real answer is a staggering 133. That’s not just a miss; that’s a fabrication delivered with full confidence!

Why does this happen? The core issue lies in how traditional RAG works:

  • Similarity Search: Vector search returns only the most similar documents, not all relevant data. Your agent might be answering from just 1% of the available information.
  • Incomplete Retrieval: It simply never has all the data to provide an accurate count or aggregation.
  • LLM Aggregation: The Large Language Model (LLM) then estimates from these limited chunks, often leading to incorrect, yet confidently presented, answers. This also means the LLM processes more tokens, which can increase costs.

This leads to various types of RAG hallucination: fabricated statistics, incomplete retrievals, and out-of-domain fabrication.

Enter GraphRAG: Your GPS for Precise Answers 🗺️

GraphRAG is a game-changer that fixes these numerical woes. Instead of relying on similarity search, GraphRAG transforms your unstructured documents into a knowledge graph. Think of it: every hotel, room, amenity, policy, and service becomes a node, and the relationships between them become edges.

How GraphRAG Works Its Magic ✨

  1. Knowledge Graph Creation: GraphRAG converts documents into a structured knowledge graph. A powerful LLM reads each document, discovers entity types (like hotels, rooms, amenities), extracts relationships between them, and performs entity resolution to merge duplicates (e.g., “outdoor pools” and “swimming pool” become one). This deduplication is key to accurate counts. For building the graph, tools like Neo4j simple knowledge base pipeline can be used, which leverage an LLM to auto-discover the schema, so you don’t have to hardcode it.
  2. Structured Query Generation: When your agent needs a count or a precise answer, it doesn’t search by similarity. Instead, it writes a structured query in a language like Cypher (Neo4j’s query language, similar to SQL for graphs). The LLM reads the schema from the tool’s Doc String, ensuring it generates a valid Cypher query that can’t invent non-existent nodes.
  3. Database Execution, Not LLM Aggregation: The database executes this Cypher query across every node in the graph. The LLM’s role is only to generate the query; it doesn’t perform the aggregation. This means you get the exact result directly from the structured data.
  4. No Fabrication: If the data doesn’t exist, GraphRAG simply returns empty, rather than fabricating an answer. This honesty is invaluable!

Beyond Numbers: GraphRAG’s Superpowers in Action 🚀

Elizabeth demonstrated GraphRAG’s capabilities with several compelling examples, comparing a standard RAG agent with a GraphRAG agent:

  • Precise Aggregations:
    • Question: “What is the average guest rating across all hotels in Paris?”
    • RAG’s Struggle: Found only two hotels and then the LLM performed a mathematical calculation on limited data, potentially inaccurately.
    • GraphRAG’s Precision: Queried all matching hotels in the graph database, providing the exact average rating.
  • Accurate Service Counts:
    • Question: “How many hotels have swimming pools as an amenity?”
    • RAG’s Struggle: Provided a lengthy, often incomplete, and potentially fabricated list based on limited retrievals.
    • GraphRAG’s Precision: Returned a concise, accurate count (or indicated if none existed), using minimal tokens.
  • Multi-Hop Reasoning:
    • Question: “What are the room types and prices for the highest-rated hotel?”
    • RAG’s Struggle: Found the hotel but often couldn’t traverse to related room or price data, leading to recommendations to “contact the hotel.”
    • GraphRAG’s Precision: Traversed the knowledge graph from the hotel node to room data (if available) via Cypher, providing a more complete and direct answer. If prices weren’t in the database, it explicitly stated, “price information is not available.”
  • Out-of-Domain Detection:
    • Question: “Tell me about hotels in Antarctica.” (Spoiler: there are none in the database).
    • RAG’s Hallucination: Couldn’t find specific info but still offered a “general overview” by fabricating information.
    • GraphRAG’s Honesty: Directly stated, “There are no hotels in Antarctica in the database.” No fabrication, just facts.

Under the Hood: The Tech Stack & Tools 🛠️

Elizabeth’s demo showcased a robust set of tools and technologies that make GraphRAG possible:

  • Knowledge Graph Database: Neo4j served as the powerhouse for storing and querying the knowledge graph.
  • LLMs for Graph Building and Query Generation: An LLM (like OpenAI or Amazon Bedrock) was instrumental in:
    • Reading documents and discovering entity types and relationships for graph construction.
    • Performing entity resolution to merge duplicate concepts.
    • Generating valid Cypher queries based on the question and the graph’s schema.
  • Agent Framework: The Strans AI framework SDK (available in Python and TypeScript) was used to build the agents. This open-source framework, maintained by AWS, facilitates the agentic loop, allowing you to build and modify agents dynamically without losing context.
  • Vector Store (for comparison): FAISS was used for the traditional vector store, with Sentence Transformers (an open-source embedding model) providing the embeddings.
  • Deployment: AI in Core, an AWS service, helps deploy AI applications into production by containerizing them, offering short-term and long-term memory, and enabling microservices architectures. This allows you to deploy agents in less than 10 minutes!
  • Model Providers: The demo used OpenAI, but the Strans AI framework supports various providers, including Amazon Bedrock, Anthropic, and Google.

For those eager to build their own GraphRAG applications, Elizabeth recommends starting with a “light version” of the graph (e.g., using 10% of the data) for faster testing, before scaling up to a full dataset of 300 hotels.

When to Wield Each Tool: RAG vs. GraphRAG 💡

It’s not about choosing one over the other; it’s about using the right tool for the right job.

  • When to use GraphRAG: For precise queries, aggregations, multi-hop reasoning across entities, and structured data where you need verifiable answers.
  • When to stick with RAG: For semantic search, unstructured text, fuzzy matching, and simple lookups where you don’t need to count or aggregate.

The best production systems often combine both: GraphRAG for structured queries and RAG for open-ended searches. Your specific use case will always dictate the optimal approach.

A Glimpse into the Future: Avoiding Hallucinations Holistically 🌐

GraphRAG is just one powerful technique in Elizabeth’s comprehensive strategy to combat AI hallucination. Her other blocks outline:

  • Semantic Tool Selection: Building a vector store for your tools, ensuring the agent only uses the most relevant tools for a query.
  • Multi-Agent Validation: Employing multiple agents in a critic pipeline to validate answers before they reach the customer.
  • Neuro-Symbolic Guardians: Implementing hard rules that LLMs cannot bypass, and using AgentCore for self-correction when rules are less rigid.
  • Production Deployment: Leveraging all these techniques together to build robust, production-ready agents.

The journey to building truly reliable and precise AI agents is ongoing, but with tools like GraphRAG, we’re taking giant leaps forward. By understanding the strengths of different approaches and combining them intelligently, we empower our AI to be not just smart, but accurate.

Stay curious, keep building, and let’s craft an AI future where hallucinations are a thing of the past! ✨

Appendix