5 research outputs found
What the Heck?Automated Regioselectivity Calculations of Palladium-Catalyzed Heck Reactions Using Quantum Chemistry
We present a quantum chemistry (QM)-based method that
computes
the relative energies of intermediates in the Heck reaction that relate
to the regioselective reaction outcome: branched (α), linear
(β), or a mix of the two. The calculations are done for two
different reaction pathways (neutral and cationic) and are based on r2SCAN-3c single-point calculations on GFN2-xTB
geometries that, in turn, derive from a GFNFF-xTB conformational search.
The method is completely automated and is sufficiently efficient to
allow for the calculation of thousands of reaction outcomes. The method
can mostly reproduce systematic experimental studies where the ratios
of regioisomers are carefully determined. For a larger dataset extracted
from Reaxys, the results are somewhat worse with accuracies of 63%
for β-selectivity using the neutral pathway and 29% for α-selectivity
using the cationic pathway. Our analysis of the dataset suggests that
only the major or desired regioisomer is reported in the literature
in many cases, which makes accurate comparisons difficult. The code
is freely available on GitHub under the MIT open-source license: https://github.com/jensengroup/HeckQM
Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water
Prediction of compound properties
from structure via quantitative
structure–activity relationship and machine-learning approaches
is an important computational chemistry task in small-molecule drug
research. Though many such properties are dependent on three-dimensional
structures or even conformer ensembles, the majority of models are
based on descriptors derived from two-dimensional structures. Here
we present results from a thorough benchmark study of force field,
semiempirical, and density functional methods for the calculation
of conformer energies in the gas phase and water solvation as a foundation
for the correct identification of relevant low-energy conformers.
We find that the tight-binding ansatz GFN-xTB shows the lowest error
metrics and highest correlation to the benchmark PBE0-D3Â(BJ)/def2-TZVP
in the gas phase for the computationally fast methods and that in
solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and
DFTB+ perform worse, whereas the performance-optimized but far more
expensive functional PBEh-3c yields energies almost perfectly correlated
to the benchmark and should be used whenever affordable. On the basis
of our findings, we have implemented a reliable and fast protocol
for the identification of low-energy conformers of drug-like molecules
in water that can be used for the quantification of strain energy
and entropy contributions to target binding as well as for the derivation
of conformer-ensemble-dependent molecular descriptors
Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water
Prediction of compound properties
from structure via quantitative
structure–activity relationship and machine-learning approaches
is an important computational chemistry task in small-molecule drug
research. Though many such properties are dependent on three-dimensional
structures or even conformer ensembles, the majority of models are
based on descriptors derived from two-dimensional structures. Here
we present results from a thorough benchmark study of force field,
semiempirical, and density functional methods for the calculation
of conformer energies in the gas phase and water solvation as a foundation
for the correct identification of relevant low-energy conformers.
We find that the tight-binding ansatz GFN-xTB shows the lowest error
metrics and highest correlation to the benchmark PBE0-D3Â(BJ)/def2-TZVP
in the gas phase for the computationally fast methods and that in
solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and
DFTB+ perform worse, whereas the performance-optimized but far more
expensive functional PBEh-3c yields energies almost perfectly correlated
to the benchmark and should be used whenever affordable. On the basis
of our findings, we have implemented a reliable and fast protocol
for the identification of low-energy conformers of drug-like molecules
in water that can be used for the quantification of strain energy
and entropy contributions to target binding as well as for the derivation
of conformer-ensemble-dependent molecular descriptors
Predictive Modeling of PROTAC Cell Permeability with Machine Learning
Approaches for predicting proteolysis targeting chimera
(PROTAC)
cell permeability are of major interest to reduce resource-demanding
synthesis and testing of low-permeable PROTACs. We report a comprehensive
investigation of the scope and limitations of machine learning-based
binary classification models developed using 17 simple descriptors
for large and structurally diverse sets of cereblon (CRBN) and von
Hippel–Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa
nearest neighbor and random forest models performed best and predicted
the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original
training and the blinded test set performed equally well for a second
blinded VHL set. However, models for CRBN PROTACs were less successful,
mainly due to the imbalanced nature of the CRBN datasets. All descriptors
contributed to the models, but size and lipophilicity were the most
important. We conclude that properly trained machine learning models
can be integrated as effective filters in the PROTAC design process
Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve <i>in Silico</i> p<i>K</i><sub>a</sub> Prediction
In
a unique collaboration between a software company and a pharmaceutical
company, we were able to develop a new <i>in silico</i> p<i>K</i><sub>a</sub> prediction tool with outstanding prediction
quality. An existing p<i>K</i><sub>a</sub> prediction method
from Simulations Plus based on artificial neural network ensembles
(ANNE), microstates analysis, and literature data was retrained with
a large homogeneous data set of drug-like molecules from Bayer. The
new model was thus built with curated sets of ∼14,000 literature
p<i>K</i><sub>a</sub> values (∼11,000 compounds,
representing literature chemical space) and ∼19,500 p<i>K</i><sub>a</sub> values experimentally determined at Bayer
Pharma (∼16,000 compounds, representing industry chemical space).
Model validation was performed with several test sets consisting of
a total of ∼31,000 new p<i>K</i><sub>a</sub> values
measured at Bayer. For the largest and most difficult test set with
>16,000 p<i>K</i><sub>a</sub> values that were not used
for training, the original model achieved a mean absolute error (MAE)
of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation
coefficient (<i>R</i><sup>2</sup>) of 0.87. The new model
achieves significantly improved prediction statistics, with MAE =
0.50, RMSE = 0.67, and <i>R</i><sup>2</sup> = 0.93. It is
commercially available as part of the Simulations Plus ADMET Predictor
release 7.0. Good predictions are only of value when delivered effectively
to those who can use them. The new p<i>K</i><sub>a</sub> prediction model has been integrated into Pipeline Pilot and the
PharmacophorInformatics (PIx) platform used by scientists at Bayer
Pharma. Different output formats allow customized application by medicinal
chemists, physical chemists, and computational chemists