Graph Neural Networks (GNNs) achieve state-of-the-art performance on
graph-structured data across numerous domains. Their underlying ability to
represent nodes as summaries of their vicinities has proven effective for
homophilous graphs in particular, in which same-type nodes tend to connect. On
heterophilous graphs, in which different-type nodes are likely connected, GNNs
perform less consistently, as neighborhood information might be less
representative or even misleading. On the other hand, GNN performance is not
inferior on all heterophilous graphs, and there is a lack of understanding of
what other graph properties affect GNN performance.
In this work, we highlight the limitations of the widely used homophily ratio
and the recent Cross-Class Neighborhood Similarity (CCNS) metric in estimating
GNN performance. To overcome these limitations, we introduce 2-hop Neighbor
Class Similarity (2NCS), a new quantitative graph structural property that
correlates with GNN performance more strongly and consistently than alternative
metrics. 2NCS considers two-hop neighborhoods as a theoretically derived
consequence of the two-step label propagation process governing GCN's
training-inference process. Experiments on one synthetic and eight real-world
graph datasets confirm consistent improvements over existing metrics in
estimating the accuracy of GCN- and GAT-based architectures on the node
classification task.Comment: Accepted at the 3rd Workshop on Graphs and more Complex structures
for Learning and Reasoning (GCLR) at AAAI 202