TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
AI Engineering / Databases / Large Language Models

Building Graph-Based RAG Applications Just Got Easier

Explore a new generation of tools like Unstructured and the Graph Retriever library that simplify GraphRAG by walking through an example application.
Apr 4th, 2025 9:00am by and
Featued image for: Building Graph-Based RAG Applications Just Got Easier
Image from Dominika Meger on Shutterstock.

Retrieval-augmented generation (RAG) has emerged as a powerful method for enhancing the accuracy and contextual relevance of generative AI outputs. Traditional approaches have primarily relied on semantic similarity search within vector stores. While effective, this approach has inherent limitations. It can miss nuanced contextual relationships or structured associations between documents.

Graph-based RAG methods, which integrate RAG techniques with knowledge graphs in various ways, promise greater precision but have been notoriously challenging to implement. Previously, constructing, navigating and maintaining these “knowledge graphs” was difficult. It involved manual extraction of structured relationships from documents, inflexible static graph databases and dedicated graph database infrastructure.

Fortunately, recent advancements — particularly tools like Unstructured and the newly released Graph Retriever library — have greatly simplified these workflows. Unstructured provides push-button transformation of unstructured documents into structured, graph-ready data, using custom prompting of advanced large language models (LLMs) for automated entity extraction and vector databases for storage. The Graph Retriever library then dynamically constructs graph-based queries over these metadata-rich vector stores, eliminating the need for dedicated graph databases.

Here, we’ll explore a new generation of tools that simplify GraphRAG by walking through an example application.

How GraphRAG Works

GraphRAG uses many of the same tools and techniques as traditional semantic similarity-based RAG, but also adds some important features:

  • Documents are enriched with structured metadata, such as entities (people, locations, organizations).
  • A graph is dynamically built based on this structured metadata, capturing explicit relationships between documents.
  • Retrieval occurs by traversing these structured connections, enabling more contextually relevant document retrieval.

This structured approach provides superior context navigation, enabling applications to fetch documents related not merely semantically but based explicitly on relationships and entities present in the metadata.

Role of Unstructured and the Graph Retriever Library

Unstructured

As with most AI/ML problems, high-quality data is essential. In GraphRAG, having accurate metadata for each document or document chunk is crucial, because metadata is the basis for building and using the knowledge graph. Traditionally, assigning metadata has been similar to manually labeling a data set, but Unstructured provides extensive and extensible metadata out of the box. With enrichments to standard ETL (extract, transform, load) such as custom prompting, Unstructured uses LLM-based named entity recognition (NER) to automatically generate metadata key-value pairs. This, in turn, enhances the accuracy of retrieval.

Unstructured’s ETL+ for GenAI continuously harvests newly generated unstructured data from systems of record, transforming it into LLM-ready formats using optimized, pre-built pipelines and writing it to DataStax Astra DB. You can deploy complete ingestion and preprocessing pipelines in seconds, with configuration options and third-party integrations for the partitioning, enrichment, chunking and embedding steps. This enables knowledge graph building without needing to write any code or create any custom steps. The critical NER enrichment step can be easily configured within the full ETL+ pipeline that is available in Unstructured’s UI or API:

And your prompt can be customized and tested to ensure your metadata captures the entities you want extracted as well as the response format required for your knowledge graph:

 

Unstructured’s declarative approach enables non-developers to build workflows by simply connecting components. This not only accelerates development but also ensures efficient execution at scale, as workflows run seamlessly on Unstructured.

Graph Retriever Library

The open source Graph Retriever library builds on LangChain vector stores, enabling dynamic graph construction from structured metadata (in this case, the metadata generated by Unstructured). It allows applications to dynamically build graphs at runtime, enhancing retrieval flexibility, context awareness and precision without additional complex infrastructure.

Step by Step: Using Unstructured for Document Enrichment

All these steps can be accomplished with no code in the Unstructured UI by following the setup steps within the platform browser. Alternatively, we have provided a notebook with the full ETL + enrichment pipeline via Unstructured API using the Workflow Endpoint. This notebook is a precursor to the full GraphRAG flow, which assumes a workflow has already been set up. Below, we highlight a few key steps from the linked workflow creation notebook.

After setting up your credentials (as described in the notebook), create your Astra DB destination connector:


Then, you can create all the nodes for your workflow:


Note that the named_entity_recognizer_node will be creating the metadata for nodes and edges, and that it is critical to place it after the chunking node, whether you are creating your workflow via UI or API.

This is the prompt that the above workflow incorporated, both specifying the types of entities to extract and the response format that will be used to construct the knowledge graph.


Next, after we set up the workflow (see the workflow creation notebook for full code), we’ll run it and view the responses:


This automatically captures entities such as people (such as “Newton”) and locations ( “Woolsthorpe”), embedding them directly into structured metadata:


This structured metadata forms the basis for dynamic graph construction. And all the above can alternatively be constructed without a single line of code in the Unstructured UI.

Step by Step: Leveraging Graph Retriever for Dynamic Retrieval

The Graph Retriever library enables you to combine unstructured similarity search with structured graph traversal. Unlike a dedicated graph database, the Graph Retriever library builds on vector stores, using metadata to dynamically build connections between documents.

In the previous section, we saw how Unstructured can be used to populate Astra DB with document chunks and associated metadata. With enriched metadata ready, the Graph Retriever can dynamically create graph-based retrievers. Before we build such a retriever, let’s look at how this data set can be used to build traditional RAG queries.

First, you need to specify the embedding model and initiate the vector store. Then, you can construct a basic retriever and query for information. For example, to find information about Plato:


As you can see, this query correctly retrieved document chunks describing Plato, but notice that the results are narrowly focused on Plato. Graph Retriever enables us to retrieve a more diverse set of results, providing richer contextual information for our query.

To use the Graph Retriever, you need to specify the vector store, the edges of the graph and the search strategy. The “edges” configuration describes the graph’s schema; in this case, we’ve configured the retriever to connect documents related to a given place. Similarly, the “strategy” configuration describes how the graph should be used. In this case, we start by searching for the three most relevant documents by vector similarity, then retrieve up to 10 documents that are related to the initial three in the graph.


As you can see, the results now include supporting information about where Plato was born and lived. This second set of documents includes documents related to the first set of documents through shared locations, but we can change the structure of the graph by simply creating a new retriever configured to use different metadata. For example, we could follow connections based on shared people rather than places:


The Graph Retriever library enables the creation of many different types of graph structures and provides a rich set of graph traversal operations. It is also very easy to redefine graph edges and schemas. As long as the relevant metadata is available, we can simply reconfigure the GraphRetriever with the new schema or strategy and proceed without any additional data loading or overhead computation. The edge definitions and traversal strategies are then applied at query time. And, if using GraphRAG as part of an agentic workflow, this reconfiguration process is simple enough that the appropriate graph schema could even be selected by an AI agent on a per-query basis.

Conclusion

Modern tools like Unstructured and the Graph Retriever library have lowered barriers to implementing graph-based RAG applications. These technologies automate and streamline complex graph construction tasks, eliminate traditional infrastructure demands and offer dynamic retrieval capabilities that are flexible and powerful.

With these advancements, graph-based retrieval is accessible, intuitive and ready to significantly enrich your retrieval-based AI applications.

Group Created with Sketch.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.