Graph neural networks (GNNs) have demonstrated excellent performance in a
wide range of applications. However, the enormous size of large-scale graphs
hinders their applications under real-time inference scenarios. Although
existing scalable GNNs leverage linear propagation to preprocess the features
and accelerate the training and inference procedure, these methods still suffer
from scalability issues when making inferences on unseen nodes, as the feature
preprocessing requires the graph is known and fixed. To speed up the inference
in the inductive setting, we propose a novel adaptive propagation order
approach that generates the personalized propagation order for each node based
on its topological information. This could successfully avoid the redundant
computation of feature propagation. Moreover, the trade-off between accuracy
and inference latency can be flexibly controlled by simple hyper-parameters to
match different latency constraints of application scenarios. To compensate for
the potential inference accuracy loss, we further propose Inception
Distillation to exploit the multi scale reception information and improve the
inference performance. Extensive experiments are conducted on four public
datasets with different scales and characteristics, and the experimental
results show that our proposed inference acceleration framework outperforms the
SOTA graph inference acceleration baselines in terms of both accuracy and
efficiency. In particular, the advantage of our proposed method is more
significant on larger-scale datasets, and our framework achieves 75×
inference speedup on the largest Ogbn-products dataset