Fairness-Aware GraphRAG for Trustworthy and Equitable Document Retrieval

Author

Guadalupe Gonzalez, Chirag Agarwal

Guadalupe Gonzalez

Guadalupe Gonzalez joined Genentech/Roche in February 2023, after completing a PhD on graph deep learning for drug discovery advised by Michael Bronstein and Kirill Veselkov at Imperial College London. She is part of the Frontier Research team at Prescient Design in Genentech Research and Early Development (gRED). Guadalupe’s expertise lies at the intersection of graph deep learning and causal inference. Her focus is on (causal) graph deep learning for drug discovery, from the small-scale (e.g., proteins) to the large-scale (e.g., patient data) systems. Guadalupe is particularly passionate about applying her knowledge to women’s health to catalyze breakthroughs in women-specific conditions such as endometriosis.

Chirag Agarwal

Chirag Agarwal is an Assistant Professor at the University of Virginia with appointments in the Data Science School and the Department of Computer Science. Dr. Agarwal researches on developing Scalable Trustworthy Machine Learning Frameworks that go beyond training models for specific downstream tasks and satisfy trustworthy properties, such as explainability, fairness, and robustness. He has authored in top-tier machine learning and computer vision conferences and leading scientific journals. His research has received Spotlight and Oral presentations at NeurIPS, ICML, CVPR, and ICIP, and received industrial grants from Adobe, Microsoft, and Google to support his work on Trustworthy Machine Learning.

Project

GraphRAG (Graph-based Retrieval-Augmented Generation) is a framework that enhances retrieval-augmented generation (RAG) by using graph structures to improve document retrieval and knowledge integration in large language models (LLMs) [1]. Unlike traditional RAG, which retrieves documents based on embedding similarity, GraphRAG organizes and retrieves information using a structured graph representation, allowing for more context-aware, interpretable, and interconnected document retrieval.

Traditional retrieval techniques in GraphRAG often prioritize embedding-based similarity or graph connectivity, which can amplify biases, underrepresent marginalized perspectives, or reinforce existing disparities. This project aims to develop a fairness-aware ranking algorithm for GraphRAG, ensuring that the document retrieval process across various domains, including healthcare, social sciences, and public policy, is trustworthy, representative, and unbiased. By modifying graph-based ranking algorithms, we will incorporate fairness constraints to improve document selection in GraphRAG. The resulting fairness of GraphRAG will be evaluated in both the documents retrieved by the retrieval algorithms [2] and the responses generated by the LLM [3].

[1] Zhang, Q. et al. A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv (2025) doi:10.48550/arxiv.2501.13958.

[2] Dong, Y., Ma, J., Wang, S., Chen, C. & Li, J. Fairness in Graph Mining: A Survey. arXiv (2022) doi:10.48550/arxiv.2204.09888.

[3] Gallegos, I. O. et al. Bias and Fairness in Large Language Models: A Survey. arXiv (2023) doi:10.48550/arxiv.2309.00770.