We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph
Neural Networks (GNNs) that exploits node features and connectivity structure
of a graph while simultaneously adapting for both homophily and heterophily in
graphs. (In homophilic graphs vertices of the same class are more likely to be
connected, and vertices of different classes tend to be linked in heterophilic
graphs.) While GNNs have been successfully applied to homophilic graphs, their
application to heterophilic graphs remains challenging. The best-performing
GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high
computational costs, and are not inductive. We employ samplers based on
feature-similarity and feature-diversity to select subsets of neighbors for a
node, and adaptively capture information from homophilic and heterophilic
neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm
that we know of that explicitly controls homophily in the sampled subgraph
through similar and diverse neighborhood samples. For diverse neighborhood
sampling, we employ submodularity, which was not used in this context prior to
our work. The sampling distribution is pre-computed and highly parallel,
achieving the desired scalability. Using an extensive dataset consisting of 35
small (≤ 100K nodes) and large (>100K nodes) homophilic and heterophilic
graphs, we demonstrate the superiority of AGS-GNN compare to the current
approaches in the literature. AGS-GNN achieves comparable test accuracy to the
best-performing heterophilic GNNs, even outperforming methods using the entire
graph for node classification. AGS-GNN also converges faster compared to
methods that sample neighborhoods randomly, and can be incorporated into
existing GNN models that employ node or graph sampling.Comment: The paper has been accepted to KDD'24 in the research trac