22,393 research outputs found

    Predicting the solvation of organic compounds in aqueous environments: from alkanes and alcohols to pharmaceuticals

    Get PDF
    The development of accurate models to predict the solvation, solubility, and partitioning of nonpolar and amphiphilic compounds in aqueous environments remains an important challenge. We develop state-of-the-art group-interaction models that deliver an accurate description of the thermodynamic properties of alkanes and alcohols in aqueous solution. The group-contribution formulation of the statistical associating fluid theory based on potentials with a variable Mie form (SAFT-γ Mie) is shown to provide accurate predictions of the phase equilibria, including liquid–liquid equilibria, solubility, free energies of solvation, and other infinite-dilution properties. The transferability of the model is further exemplified with predictions of octanol–water partitioning and solubility for a range of organic and pharmaceutically relevant compounds. Our SAFT-γ Mie platform is reliable for the prediction of challenging properties such as mutual solubilities of water and organic compounds which can span over 10 orders of magnitude, while remaining generic in its applicability to a wide range of compounds and thermodynamic conditions. Our work sheds light on contradictory findings related to alkane–water solubility data and the suitability of models that do not account explicitly for polarity

    Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks

    Full text link
    Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at \url{https://github.com/ur-whitelab/mol.dev}, and the model is usable at \url{https://mol.dev}

    Uncertainty Quantification Using Neural Networks for Molecular Property Prediction

    Full text link
    Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where unanticipated imprecision wastes valuable time and resources. The need for UQ is especially acute for neural models, which are becoming increasingly standard yet are challenging to interpret. While several approaches to UQ have been proposed in the literature, there is no clear consensus on the comparative performance of these models. In this paper, we study this question in the context of regression tasks. We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics. Our experiments show that none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets. While we believe these results show that existing UQ methods are not sufficient for all common use-cases and demonstrate the benefits of further research, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others

    Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models

    Get PDF
    Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge"in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets

    Review of risk from potential emerging contaminants in UK groundwater

    Get PDF
    This paper provides a review of the types of emerging organic groundwater contaminants (EGCs) which are beginning to be found in the UK. EGCs are compounds being found in groundwater that were previously not detectable or known to be significant and can come from agricultural, urban and rural point sources. EGCs include nanomaterials, pesticides, pharmaceuticals, industrial compounds, personal care products, fragrances, water treatment by-products, flame retardants and surfactants, as well as caffeine and nicotine. Many are relatively small polar molecules which may not be effectively removed by drinking water treatment. Data from the UK Environment Agency’s groundwater screening programme for organic pollutants found within the 30 most frequently detected compounds a number of EGCs such as pesticide metabolites, caffeine and DEET. Specific determinands frequently detected include pesticides metabolites, pharmaceuticals including carbamazepine and triclosan, nicotine, food additives and alkyl phosphates. This paper discusses the routes by which these compounds enter groundwater, their toxicity and potential risks to drinking water and the environment. It identifies challenges that need to be met to minimise risk to drinking water and ecosystems

    Collodial particles at a range of fluid-fluid particles

    Get PDF
    The study of solid particles residing at fluid-fluid interfaces has become an established area in surface and colloid science recently experiencing a renaissance since around 2000. Particles at interfaces arise in many industrial products and processes like anti-foam formulations, crude oil emulsions, aerated foodstuffs and flotation. Although they act in many ways like traditional surfactant molecules, they offer distinct advantages also and the area is now multi-disciplinary involving research in the fundamental science and potential applications. In this Feature Article, a flavour of some of this interest is given based on recent work from our own group and includes the behaviour of particles at oil-water, air-water, oil-oil, air-oil and water-water interfaces. The materials capable of being prepared by assembling various kinds of particles at fluid interfaces include particle-stabilised emulsions, particle-stabilised aqueous and oil foams, dry liquids, liquid marbles and powdered emulsions

    Multi-lab intrinsic solubility measurement reproducibility in CheqSol and shake-flask methods

    Get PDF
    This commentary compares 233 CheqSol intrinsic solubility values (log S0) reported in the Wiki-pS0 database for 145 different druglike molecules to the 838 log S0 values determined mostly by the saturation shake-flask (SSF) method for 124 of the molecules from the CheqSol set. The range of log S0 spans from -1.0 to -10.6 (log molar units), averaging at -3.8. The correlation plot between the two methods indicates r2 = 0.96, RMSE = 0.34 log unit, and a slight bias of -0.07 log unit. The average interlaboratory standard deviation (SDi) is slightly better for the CheqSol set than that of the SSF set: SDiCS = 0.15 and SDiSSF = 0.24. The intralaboratory errors reported in the CheqSol method (0.05 log) need to be multiplied by a factor of 3 to match the expected interlaboratory errors for the method. The scale factor, in part, relates to the hidden systematic errors in the single-lab values. It is expected that improved standardizations in the ‘gold standard’ SSF method, as suggested in the recent ‘white paper’ on solubility measurement methodology, should make the SDi of both methods be about ~0.15 log unit. The multi-lab averaged log S0 (and the corresponding SDi) values could be helpful additions to existing training-set molecules used to predict the intrinsic solubility of drugs and druglike molecules
    corecore