The quantitative description between chemical reaction
rates and
nucleophilicity parameters plays a crucial role in organic chemistry.
In this regard, the formula proposed by Mayr et al. and the constructed
reactivity database are important representatives. However, the determination
of Mayr’s nucleophilicity parameter N often
requires time-consuming experiments with reference electrophiles in
the solvent. Several machine learning (ML)-based models have been
proposed to realize the data-driven prediction of N in recent years. However, in addition to DFT-calculated electronic
descriptors, most of them also use a set of artificially predefined
structural descriptors as input, which may result in a biased representation
of the nucleophile’s structural information depending on descriptors’
definition preference. Compared with traditional ML algorithms, graph
neural networks (GNNs) can naturally take the molecule’s structural
information into account by applying the message passing technique.
We herein proposed a SchNet-based GNN model that only takes the molecular
conformation and solvent type as input. The model achieves a comparable
performance to the previous benchmark study on 10-fold cross-validation
of 894 data points (R2 = 0.91, RMSE =
2.25). To enhance the model’s ability to capture the molecule’s
electronic information, some DFT-calculated parameters are then incorporated
into the model via graph global features, and substantial improvement
is achieved in the prediction precision (R2 = 0.95, RMSE = 1.63). These results demonstrate that both structural
and electronic information are important for the prediction of N, and GNN can integrate these two kinds of information
more effectively