14 research outputs found

    Solvation thermodynamics of organic molecules by the molecular integral equation theory : approaching chemical accuracy

    Get PDF
    The integral equation theory (IET) of molecular liquids has been an active area of academic research in theoretical and computational physical chemistry for over 40 years because it provides a consistent theoretical framework to describe the structural and thermodynamic properties of liquid-phase solutions. The theory can describe pure and mixed solvent systems (including anisotropic and nonequilibrium systems) and has already been used for theoretical studies of a vast range of problems in chemical physics / physical chemistry, molecular biology, colloids, soft matter, and electrochemistry. A consider- able advantage of IET is that it can be used to study speci fi c solute − solvent interactions, unlike continuum solvent models, but yet it requires considerably less computational expense than explicit solvent simulations

    Water envelope has a critical impact on thedesign of protein-protein interaction inhibitors

    Get PDF
    We show that a water envelope network plays a critical role in protein-protein interactions (PPI). The potency of a PPI inhibitor is modulated by orders of magnitude on manipulation of the solvent envelope alone. The structure-activity relationship of PEX14 inhibitors was analyzed as an example using in silico and X-ray data

    Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models

    Get PDF
    Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge"in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets

    Blinded predictions and post-hoc analysis of the second solubility challenge data : exploring training data and feature set selection for machine and deep learning models

    Get PDF
    Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state-of-the-art, the American Chemical Society organised a “Second Solubility Challenge” in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019, but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms, and were trained on a relatively small dataset of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility datasets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge datasets, with the best model, a graph convolutional neural network, resulting in a RMSE of 0.86 log units. Critical analysis of the models reveal systematic di↵erences between the performance of models using certain feature sets and training datasets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy, but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modelling complex chemical spaces from sparse training datasets

    On a relationship between molecular polarizability and partial molar volume in water

    No full text
    We reveal a universal relationship between molecular polarizability (a single-molecule property) and partial molar volume in water that is an ensemble property characterizing solute-solvent systems. Since both of these quantities are of the key importance to describe solvation behavior of dissolved molecular species in aqueous solutions, the obtained relationship should have a high impact in chemistry, pharmaceutical, and life sciences as well as in environments. We demonstrated that the obtained relationship between the partial molar volume in water and the molecular polarizability has in general a non-homogeneous character. We performed a detailed analysis of this relationship on a set of similar to 200 organic molecules from various chemical classes and revealed its fine well-organized structure. We found that this structure strongly depends on the chemical nature of the solutes and can be rationalized in terms of specific solute-solvent interactions. Efficiency and universality of the proposed approach was demonstrated on an external test set containing several dozens of polyfunctional and druglike molecules. (C) 2011 American Institute of Physics. [[doi:10.1063/1.3672094

    In silico screening of bioactive and biomimetic solutes using Integral Equation Theory

    No full text
    The Integral Equation Theory (IET) of Molecular Liquids is a theoretical framework for modelling solution phase behaviour that has recently found new applications in computational drug design. IET allows calculation of solvation thermodynamic parameters at significantly lower computational expense than explicit solvent simulations, but also provides information about the microscopic solvent structure that is not accessible by implicit continuum models. In this review we focus on recent advances in two fields of research using these methods: (i) calculation of the hydration free energies of bioactive molecules; (ii) modelling the aggregation of biomimetic molecules. In addition, we discuss sources of experimental solvation data for druglike molecules

    Hydration thermodynamics using the reference interaction site model : speed or accuracy?

    No full text
    We report a method to dramatically improve the accuracy of hydration free energies (HFE) calculated by the 1D and 3D reference interaction site models (RISM) of molecular integral equation theory. It is shown that the errors in HFEs calculated by RISM approaches using the Gaussian fluctuations (GF) free energy functional are not random, but can be decomposed into linear combination of contributions from different structural elements of molecules (number of double bonds, number of OH groups, etc.). Therefore, by combining RISM/GF with cheminformatics, one can develop an accurate method for HFE prediction. We call this approach the structural description correction model (SDC) (Ratkova et al. J. Phys. Chem. B 2010, 114, 12068). In this work, we investigated the prediction quality of the SDC model combined with 1D and 3D RISM approaches. In parallel, we analyzed the computational performance of these two methods. The SD C model parameters were obtained by fitting against a training set of 53 simple organic molecules. To demonstrate that the values of these parameters were transferable between different classes of molecules, the models were tested against 98 more complex molecules (including 38 polyfragment compounds). The results show that the 3D RISM/SDC model predicts the HFEs with very good accuracy (RMSE of 0.47 kcal/mol), while the ID RISM approach provides only moderate accuracy (RMSE of 1.96 kcal/mol). However, a single ID RISM/SDC calculation takes only a few seconds on a PC, whereas a single 3D RISM/SDC HFE calculation is approximately 100 times more computationally expensive. Therefore, we suggest that one should use the 1D RISM/SDC model for large-scale high-throughput screening of molecular hydration properties, while further refinement of these properties for selected compounds should be carried out with the more computationally expensive but more accurate 3D RISM/SDC model

    Reference interaction site model with structural descriptors correction as an efficient tool for hydration free energy predictions

    No full text
    This abstract discusses reference interaction site models with structural descriptors correction as an efficient tool for hydration free energy predictions

    In silico screening of bioactive molecules using molecular integral equation theory

    No full text
    This meeting abstract focuses on in silico screening of bioactive molecules using molecular integral equation theor

    An accurate prediction of hydration free energies by combination of molecular integral equations theory with structural descriptors

    No full text
    In this work, we report a novel method for the estimation of the hydration free energy of organic molecules, the structural descriptors correction (SDC) model. The method is based on a combination of the reference interaction site model (RISM) with several empirical corrections. The model requires only a small number of chemical descriptors associated with the main features of the chemical structure of solutes: excluded volume, branch, double bond, benzene ring, hydroxyl group, halogen atom, aldehyde group, ketone group, ether group, and phenol fragment. The optimum model was selected after testing of different RISM free energy expressions on a training set of 65 molecules. We show that the correction parameters of the SDC model are transferable between different chemical classes, which allows one to cover a wide range of organic solutes. The new model substantially increases the accuracy of calculated HFEs by RISM giving the standard deviation of the error for a test set of 120 organic molecules around 1.2 kcal/mol
    corecore