Graph neural networks (GNNs), as the de-facto model class for representation
learning on graphs, are built upon the multi-layer perceptrons (MLP)
architecture with additional message passing layers to allow features to flow
across nodes. While conventional wisdom commonly attributes the success of GNNs
to their advanced expressivity, we conjecture that this is not the main cause
of GNNs' superiority in node-level prediction tasks. This paper pinpoints the
major source of GNNs' performance gain to their intrinsic generalization
capability, by introducing an intermediate model class dubbed as
P(ropagational)MLP, which is identical to standard MLP in training, but then
adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs
consistently perform on par with (or even exceed) their GNN counterparts, while
being much more efficient in training. This finding sheds new insights into
understanding the learning behavior of GNNs, and can be used as an analytic
tool for dissecting various GNN-related research problems. As an initial step
to analyze the inherent generalizability of GNNs, we show the essential
difference between MLP and PMLP at infinite-width limit lies in the NTK feature
map in the post-training stage. Moreover, by examining their extrapolation
behavior, we find that though many GNNs and their PMLP counterparts cannot
extrapolate non-linear functions for extremely out-of-distribution samples,
they have greater potential to generalize to testing samples near the training
data range as natural advantages of GNN architectures.Comment: Accepted to ICLR 2023. Codes in https://github.com/chr26195/PML