Recent advances in graph neural networks (GNNs) have enabled more
comprehensive modeling of molecules and molecular systems, thereby enhancing
the precision of molecular property prediction and molecular simulations.
Nonetheless, as the field has been progressing to bigger and more complex
architectures, state-of-the-art GNNs have become largely prohibitive for many
large-scale applications. In this paper, we explore the utility of knowledge
distillation (KD) for accelerating molecular GNNs. To this end, we devise KD
strategies that facilitate the distillation of hidden representations in
directional and equivariant GNNs, and evaluate their performance on the
regression task of energy and force prediction. We validate our protocols
across different teacher-student configurations and datasets, and demonstrate
that they can consistently boost the predictive accuracy of student models
without any modifications to their architecture. Moreover, we conduct
comprehensive optimization of various components of our framework, and
investigate the potential of data augmentation to further enhance performance.
All in all, we manage to close the gap in predictive accuracy between teacher
and student models by as much as 96.7% and 62.5% for energy and force
prediction respectively, while fully preserving the inference throughput of the
more lightweight models.Comment: Accepted as a conference paper at NeurIPS 202