5 research outputs found

    What the Heck?Automated Regioselectivity Calculations of Palladium-Catalyzed Heck Reactions Using Quantum Chemistry

    No full text
    We present a quantum chemistry (QM)-based method that computes the relative energies of intermediates in the Heck reaction that relate to the regioselective reaction outcome: branched (α), linear (β), or a mix of the two. The calculations are done for two different reaction pathways (neutral and cationic) and are based on r2SCAN-3c single-point calculations on GFN2-xTB geometries that, in turn, derive from a GFNFF-xTB conformational search. The method is completely automated and is sufficiently efficient to allow for the calculation of thousands of reaction outcomes. The method can mostly reproduce systematic experimental studies where the ratios of regioisomers are carefully determined. For a larger dataset extracted from Reaxys, the results are somewhat worse with accuracies of 63% for β-selectivity using the neutral pathway and 29% for α-selectivity using the cationic pathway. Our analysis of the dataset suggests that only the major or desired regioisomer is reported in the literature in many cases, which makes accurate comparisons difficult. The code is freely available on GitHub under the MIT open-source license: https://github.com/jensengroup/HeckQM

    Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water

    No full text
    Prediction of compound properties from structure via quantitative structure–activity relationship and machine-learning approaches is an important computational chemistry task in small-molecule drug research. Though many such properties are dependent on three-dimensional structures or even conformer ensembles, the majority of models are based on descriptors derived from two-dimensional structures. Here we present results from a thorough benchmark study of force field, semiempirical, and density functional methods for the calculation of conformer energies in the gas phase and water solvation as a foundation for the correct identification of relevant low-energy conformers. We find that the tight-binding ansatz GFN-xTB shows the lowest error metrics and highest correlation to the benchmark PBE0-D3­(BJ)/def2-TZVP in the gas phase for the computationally fast methods and that in solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and DFTB+ perform worse, whereas the performance-optimized but far more expensive functional PBEh-3c yields energies almost perfectly correlated to the benchmark and should be used whenever affordable. On the basis of our findings, we have implemented a reliable and fast protocol for the identification of low-energy conformers of drug-like molecules in water that can be used for the quantification of strain energy and entropy contributions to target binding as well as for the derivation of conformer-ensemble-dependent molecular descriptors

    Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water

    No full text
    Prediction of compound properties from structure via quantitative structure–activity relationship and machine-learning approaches is an important computational chemistry task in small-molecule drug research. Though many such properties are dependent on three-dimensional structures or even conformer ensembles, the majority of models are based on descriptors derived from two-dimensional structures. Here we present results from a thorough benchmark study of force field, semiempirical, and density functional methods for the calculation of conformer energies in the gas phase and water solvation as a foundation for the correct identification of relevant low-energy conformers. We find that the tight-binding ansatz GFN-xTB shows the lowest error metrics and highest correlation to the benchmark PBE0-D3­(BJ)/def2-TZVP in the gas phase for the computationally fast methods and that in solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and DFTB+ perform worse, whereas the performance-optimized but far more expensive functional PBEh-3c yields energies almost perfectly correlated to the benchmark and should be used whenever affordable. On the basis of our findings, we have implemented a reliable and fast protocol for the identification of low-energy conformers of drug-like molecules in water that can be used for the quantification of strain energy and entropy contributions to target binding as well as for the derivation of conformer-ensemble-dependent molecular descriptors

    Predictive Modeling of PROTAC Cell Permeability with Machine Learning

    No full text
    Approaches for predicting proteolysis targeting chimera (PROTAC) cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using 17 simple descriptors for large and structurally diverse sets of cereblon (CRBN) and von Hippel–Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models performed best and predicted the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original training and the blinded test set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the imbalanced nature of the CRBN datasets. All descriptors contributed to the models, but size and lipophilicity were the most important. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process

    Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve <i>in Silico</i> p<i>K</i><sub>a</sub> Prediction

    No full text
    In a unique collaboration between a software company and a pharmaceutical company, we were able to develop a new <i>in silico</i> p<i>K</i><sub>a</sub> prediction tool with outstanding prediction quality. An existing p<i>K</i><sub>a</sub> prediction method from Simulations Plus based on artificial neural network ensembles (ANNE), microstates analysis, and literature data was retrained with a large homogeneous data set of drug-like molecules from Bayer. The new model was thus built with curated sets of ∼14,000 literature p<i>K</i><sub>a</sub> values (∼11,000 compounds, representing literature chemical space) and ∼19,500 p<i>K</i><sub>a</sub> values experimentally determined at Bayer Pharma (∼16,000 compounds, representing industry chemical space). Model validation was performed with several test sets consisting of a total of ∼31,000 new p<i>K</i><sub>a</sub> values measured at Bayer. For the largest and most difficult test set with >16,000 p<i>K</i><sub>a</sub> values that were not used for training, the original model achieved a mean absolute error (MAE) of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation coefficient (<i>R</i><sup>2</sup>) of 0.87. The new model achieves significantly improved prediction statistics, with MAE = 0.50, RMSE = 0.67, and <i>R</i><sup>2</sup> = 0.93. It is commercially available as part of the Simulations Plus ADMET Predictor release 7.0. Good predictions are only of value when delivered effectively to those who can use them. The new p<i>K</i><sub>a</sub> prediction model has been integrated into Pipeline Pilot and the PharmacophorInformatics (PIx) platform used by scientists at Bayer Pharma. Different output formats allow customized application by medicinal chemists, physical chemists, and computational chemists
    corecore