Extracting informative representations of molecules using Graph neural
networks (GNNs) is crucial in AI-driven drug discovery. Recently, the graph
research community has been trying to replicate the success of self-supervised
pretraining in natural language processing, with several successes claimed.
However, we find the benefit brought by self-supervised pretraining on small
molecular data can be negligible in many cases. We conduct thorough ablation
studies on the key components of GNN pretraining, including pretraining
objectives, data splitting methods, input features, pretraining dataset scales,
and GNN architectures, to see how they affect the accuracy of the downstream
tasks. Our first important finding is, self-supervised graph pretraining do not
always have statistically significant advantages over non-pretraining methods
in many settings. Secondly, although noticeable improvement can be observed
with additional supervised pretraining, the improvement may diminish with
richer features or more balanced data splits. Thirdly, hyper-parameters could
have larger impacts on accuracy of downstream tasks than the choice of
pretraining tasks, especially when the scales of downstream tasks are small.
Finally, we provide our conjectures where the complexity of some pretraining
methods on small molecules might be insufficient, followed by empirical
evidences on different pretraining datasets