4 research outputs found

    Binding Free Energy Calculations for Lead Optimization: Assessment of Their Accuracy in an Industrial Drug Design Context

    No full text
    Correctly ranking compounds according to their computed relative binding affinities will be of great value for decision making in the lead optimization phase of industrial drug discovery. However, the performance of existing computationally demanding binding free energy calculation methods in this context is largely unknown. We analyzed the performance of the molecular mechanics continuum solvent, the linear interaction energy (LIE), and the thermodynamic integration (TI) approach for three sets of compounds from industrial lead optimization projects. The data sets pose challenges typical for this early stage of drug discovery. None of the methods was sufficiently predictive when applied out of the box without considering these challenges. Detailed investigations of failures revealed critical points that are essential for good binding free energy predictions. When data set-specific features were considered accordingly, predictions valuable for lead optimization could be obtained for all approaches but LIE. Our findings lead to clear recommendations for when to use which of the above approaches. Our findings also stress the important role of expert knowledge in this process, not least for estimating the accuracy of prediction results by TI, using indicators such as the size and chemical structure of exchanged groups and the statistical error in the predictions. Such knowledge will be invaluable when it comes to the question which of the TI results can be trusted for decision making

    Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water

    No full text
    Prediction of compound properties from structure via quantitative structure–activity relationship and machine-learning approaches is an important computational chemistry task in small-molecule drug research. Though many such properties are dependent on three-dimensional structures or even conformer ensembles, the majority of models are based on descriptors derived from two-dimensional structures. Here we present results from a thorough benchmark study of force field, semiempirical, and density functional methods for the calculation of conformer energies in the gas phase and water solvation as a foundation for the correct identification of relevant low-energy conformers. We find that the tight-binding ansatz GFN-xTB shows the lowest error metrics and highest correlation to the benchmark PBE0-D3­(BJ)/def2-TZVP in the gas phase for the computationally fast methods and that in solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and DFTB+ perform worse, whereas the performance-optimized but far more expensive functional PBEh-3c yields energies almost perfectly correlated to the benchmark and should be used whenever affordable. On the basis of our findings, we have implemented a reliable and fast protocol for the identification of low-energy conformers of drug-like molecules in water that can be used for the quantification of strain energy and entropy contributions to target binding as well as for the derivation of conformer-ensemble-dependent molecular descriptors

    Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water

    No full text
    Prediction of compound properties from structure via quantitative structure–activity relationship and machine-learning approaches is an important computational chemistry task in small-molecule drug research. Though many such properties are dependent on three-dimensional structures or even conformer ensembles, the majority of models are based on descriptors derived from two-dimensional structures. Here we present results from a thorough benchmark study of force field, semiempirical, and density functional methods for the calculation of conformer energies in the gas phase and water solvation as a foundation for the correct identification of relevant low-energy conformers. We find that the tight-binding ansatz GFN-xTB shows the lowest error metrics and highest correlation to the benchmark PBE0-D3­(BJ)/def2-TZVP in the gas phase for the computationally fast methods and that in solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and DFTB+ perform worse, whereas the performance-optimized but far more expensive functional PBEh-3c yields energies almost perfectly correlated to the benchmark and should be used whenever affordable. On the basis of our findings, we have implemented a reliable and fast protocol for the identification of low-energy conformers of drug-like molecules in water that can be used for the quantification of strain energy and entropy contributions to target binding as well as for the derivation of conformer-ensemble-dependent molecular descriptors

    Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve <i>in Silico</i> p<i>K</i><sub>a</sub> Prediction

    No full text
    In a unique collaboration between a software company and a pharmaceutical company, we were able to develop a new <i>in silico</i> p<i>K</i><sub>a</sub> prediction tool with outstanding prediction quality. An existing p<i>K</i><sub>a</sub> prediction method from Simulations Plus based on artificial neural network ensembles (ANNE), microstates analysis, and literature data was retrained with a large homogeneous data set of drug-like molecules from Bayer. The new model was thus built with curated sets of ∼14,000 literature p<i>K</i><sub>a</sub> values (∼11,000 compounds, representing literature chemical space) and ∼19,500 p<i>K</i><sub>a</sub> values experimentally determined at Bayer Pharma (∼16,000 compounds, representing industry chemical space). Model validation was performed with several test sets consisting of a total of ∼31,000 new p<i>K</i><sub>a</sub> values measured at Bayer. For the largest and most difficult test set with >16,000 p<i>K</i><sub>a</sub> values that were not used for training, the original model achieved a mean absolute error (MAE) of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation coefficient (<i>R</i><sup>2</sup>) of 0.87. The new model achieves significantly improved prediction statistics, with MAE = 0.50, RMSE = 0.67, and <i>R</i><sup>2</sup> = 0.93. It is commercially available as part of the Simulations Plus ADMET Predictor release 7.0. Good predictions are only of value when delivered effectively to those who can use them. The new p<i>K</i><sub>a</sub> prediction model has been integrated into Pipeline Pilot and the PharmacophorInformatics (PIx) platform used by scientists at Bayer Pharma. Different output formats allow customized application by medicinal chemists, physical chemists, and computational chemists
    corecore