15 research outputs found
Correcting a Nonparametric Two-sample Graph Hypothesis Test for Graphs with Different Numbers of Vertices
Random graphs are statistical models that have many applications, ranging
from neuroscience to social network analysis. Of particular interest in some
applications is the problem of testing two random graphs for equality of
generating distributions. Tang et al. (2017) propose a test for this setting.
This test consists of embedding the graph into a low-dimensional space via the
adjacency spectral embedding (ASE) and subsequently using a kernel two-sample
test based on the maximum mean discrepancy. However, if the two graphs being
compared have an unequal number of vertices, the test of Tang et al. (2017) may
not be valid. We demonstrate the intuition behind this invalidity and propose a
correction that makes any subsequent kernel- or distance-based test valid. Our
method relies on sampling based on the asymptotic distribution for the ASE. We
call these altered embeddings the corrected adjacency spectral embeddings
(CASE). We show that CASE remedies the exchangeability problem of the original
test and demonstrate the validity and consistency of the test that uses CASE
via a simulation study. We also apply our proposed test to the problem of
determining equivalence of generating distributions of subgraphs in Drosophila
connectome
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes