1 research outputs found
Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve <i>in Silico</i> p<i>K</i><sub>a</sub> Prediction
In
a unique collaboration between a software company and a pharmaceutical
company, we were able to develop a new <i>in silico</i> p<i>K</i><sub>a</sub> prediction tool with outstanding prediction
quality. An existing p<i>K</i><sub>a</sub> prediction method
from Simulations Plus based on artificial neural network ensembles
(ANNE), microstates analysis, and literature data was retrained with
a large homogeneous data set of drug-like molecules from Bayer. The
new model was thus built with curated sets of ∼14,000 literature
p<i>K</i><sub>a</sub> values (∼11,000 compounds,
representing literature chemical space) and ∼19,500 p<i>K</i><sub>a</sub> values experimentally determined at Bayer
Pharma (∼16,000 compounds, representing industry chemical space).
Model validation was performed with several test sets consisting of
a total of ∼31,000 new p<i>K</i><sub>a</sub> values
measured at Bayer. For the largest and most difficult test set with
>16,000 p<i>K</i><sub>a</sub> values that were not used
for training, the original model achieved a mean absolute error (MAE)
of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation
coefficient (<i>R</i><sup>2</sup>) of 0.87. The new model
achieves significantly improved prediction statistics, with MAE =
0.50, RMSE = 0.67, and <i>R</i><sup>2</sup> = 0.93. It is
commercially available as part of the Simulations Plus ADMET Predictor
release 7.0. Good predictions are only of value when delivered effectively
to those who can use them. The new p<i>K</i><sub>a</sub> prediction model has been integrated into Pipeline Pilot and the
PharmacophorInformatics (PIx) platform used by scientists at Bayer
Pharma. Different output formats allow customized application by medicinal
chemists, physical chemists, and computational chemists