4 research outputs found
Binding Free Energy Calculations for Lead Optimization: Assessment of Their Accuracy in an Industrial Drug Design Context
Correctly ranking compounds according
to their computed relative
binding affinities will be of great value for decision making in the
lead optimization phase of industrial drug discovery. However, the
performance of existing computationally demanding binding free energy
calculation methods in this context is largely unknown. We analyzed
the performance of the molecular mechanics continuum solvent, the
linear interaction energy (LIE), and the thermodynamic integration
(TI) approach for three sets of compounds from industrial lead optimization
projects. The data sets pose challenges typical for this early stage
of drug discovery. None of the methods was sufficiently predictive
when applied out of the box without considering these challenges.
Detailed investigations of failures revealed critical points that
are essential for good binding free energy predictions. When data
set-specific features were considered accordingly, predictions valuable
for lead optimization could be obtained for all approaches but LIE.
Our findings lead to clear recommendations for when to use which of
the above approaches. Our findings also stress the important role
of expert knowledge in this process, not least for estimating the
accuracy of prediction results by TI, using indicators such as the
size and chemical structure of exchanged groups and the statistical
error in the predictions. Such knowledge will be invaluable when it
comes to the question which of the TI results can be trusted for decision
making
Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water
Prediction of compound properties
from structure via quantitative
structure–activity relationship and machine-learning approaches
is an important computational chemistry task in small-molecule drug
research. Though many such properties are dependent on three-dimensional
structures or even conformer ensembles, the majority of models are
based on descriptors derived from two-dimensional structures. Here
we present results from a thorough benchmark study of force field,
semiempirical, and density functional methods for the calculation
of conformer energies in the gas phase and water solvation as a foundation
for the correct identification of relevant low-energy conformers.
We find that the tight-binding ansatz GFN-xTB shows the lowest error
metrics and highest correlation to the benchmark PBE0-D3Â(BJ)/def2-TZVP
in the gas phase for the computationally fast methods and that in
solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and
DFTB+ perform worse, whereas the performance-optimized but far more
expensive functional PBEh-3c yields energies almost perfectly correlated
to the benchmark and should be used whenever affordable. On the basis
of our findings, we have implemented a reliable and fast protocol
for the identification of low-energy conformers of drug-like molecules
in water that can be used for the quantification of strain energy
and entropy contributions to target binding as well as for the derivation
of conformer-ensemble-dependent molecular descriptors
Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water
Prediction of compound properties
from structure via quantitative
structure–activity relationship and machine-learning approaches
is an important computational chemistry task in small-molecule drug
research. Though many such properties are dependent on three-dimensional
structures or even conformer ensembles, the majority of models are
based on descriptors derived from two-dimensional structures. Here
we present results from a thorough benchmark study of force field,
semiempirical, and density functional methods for the calculation
of conformer energies in the gas phase and water solvation as a foundation
for the correct identification of relevant low-energy conformers.
We find that the tight-binding ansatz GFN-xTB shows the lowest error
metrics and highest correlation to the benchmark PBE0-D3Â(BJ)/def2-TZVP
in the gas phase for the computationally fast methods and that in
solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and
DFTB+ perform worse, whereas the performance-optimized but far more
expensive functional PBEh-3c yields energies almost perfectly correlated
to the benchmark and should be used whenever affordable. On the basis
of our findings, we have implemented a reliable and fast protocol
for the identification of low-energy conformers of drug-like molecules
in water that can be used for the quantification of strain energy
and entropy contributions to target binding as well as for the derivation
of conformer-ensemble-dependent molecular descriptors
Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve <i>in Silico</i> p<i>K</i><sub>a</sub> Prediction
In
a unique collaboration between a software company and a pharmaceutical
company, we were able to develop a new <i>in silico</i> p<i>K</i><sub>a</sub> prediction tool with outstanding prediction
quality. An existing p<i>K</i><sub>a</sub> prediction method
from Simulations Plus based on artificial neural network ensembles
(ANNE), microstates analysis, and literature data was retrained with
a large homogeneous data set of drug-like molecules from Bayer. The
new model was thus built with curated sets of ∼14,000 literature
p<i>K</i><sub>a</sub> values (∼11,000 compounds,
representing literature chemical space) and ∼19,500 p<i>K</i><sub>a</sub> values experimentally determined at Bayer
Pharma (∼16,000 compounds, representing industry chemical space).
Model validation was performed with several test sets consisting of
a total of ∼31,000 new p<i>K</i><sub>a</sub> values
measured at Bayer. For the largest and most difficult test set with
>16,000 p<i>K</i><sub>a</sub> values that were not used
for training, the original model achieved a mean absolute error (MAE)
of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation
coefficient (<i>R</i><sup>2</sup>) of 0.87. The new model
achieves significantly improved prediction statistics, with MAE =
0.50, RMSE = 0.67, and <i>R</i><sup>2</sup> = 0.93. It is
commercially available as part of the Simulations Plus ADMET Predictor
release 7.0. Good predictions are only of value when delivered effectively
to those who can use them. The new p<i>K</i><sub>a</sub> prediction model has been integrated into Pipeline Pilot and the
PharmacophorInformatics (PIx) platform used by scientists at Bayer
Pharma. Different output formats allow customized application by medicinal
chemists, physical chemists, and computational chemists