Powerful Graph Neural Networks for Relational Databases

Author

Joshua Robinson

Joshua Robinson

Joshua is a postdoctoral scholar at Stanford working with Jure Leskovec. He obtained his PhD from MIT CSAIL in 2023, where we was advised by Stefanie Jegelka and Suvrit Sra. His interests include 1) designing algorithms for learning over structured domains, such as graphs, eigenvectors, and relational databases, and 2) self-supervised representation learning. He was also a co-organizer of the first LoG conference.

Project

Much of the world’s data is stored in relational databases, which contain multiple tables whose rows are connected via primary-foreign key relations [1]. Consequently, many interesting forecasting problems can be thought of as predictions on relational data (Who will win next weekend’s F1? Will patient A respond to treatment X?). Crucially, relational databases can be viewed as “relational graphs”, with one node per table row, and edges given by primary-foreign key relations. However, relational graphs are not arbitrary graphs. They are k-partite, and some of the node partitions have a fixed number of in- and out-edges per node. This suggests a need for specialized GNN architectures suited to this graph data. The aim of this project will be an initial scoping of the (in)suitability of existing GNNs for processing relational graphs. Depending on student interest, this project’s scope is both a) empirical: evaluating the performance of existing GNN models and components to determine best modeling practices (for testing we have recently released a benchmark suite: https://relbench.stanford.edu/), or b) theoretical: finding examples of relational data structures that existing GNNs cannot distinguish. In both cases, this lays the groundwork for designing more powerful graph networks for relational data.

[1] Relational Deep Learning: Graph Representation Learning on Relational Databases, Fey et al. 2023 arXiv:2312.04615.