24 research outputs found

    Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient?

    Get PDF
    As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest -and hope -that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited -in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors

    Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules

    Get PDF
    We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ~1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.Publisher PDFPeer reviewe

    Deep Molecular Representation in Cheminformatics

    No full text
    It is clear that the molecular representations are clustered by the corresponding ELUMO values 7 Conclusion In this work the applications of machine learning in Cheminformatics are outlined together with the background of quantum-chemical ..
    corecore