3 research outputs found

    Prediction of Aqueous Solubility Based on Large Datasets Using Several QSPR Models Utilizing Topological Structure Representation

    No full text
    Several QSPR models were developed for predicting aqueous solubility, So. A dataset of 5,964 compounds was subdivided into two classes, aromatic ring containing and non-aromatic compounds. Three models were created with different methods on both data sets: two regression models (multiple linear regression and partial least squares) and an artificial neural network model. These models were based on 3343 aromatic and 1674 non-aromatic compounds with 938 compounds used in external validation testing. The range in-logSo was-2 to 10. Topological structure descriptors were used with all models. A genetic algorithm was used for descriptor selection for regression models. For the ANN model, descriptor selection was done with a standard backward elimination process. All models performed well with r 2 values ranging 0.72 to 0.84 in external validation testing. The mean absolute errors in validation ranged from 0.44 to 0.80 for both classes of compounds for the models. These statistical results indicate a sound model. Furthermore, in a comparison with eight other available models, based on predictions on a validation test set (442 compounds), the artificial neural network model presented in this work (CSLogWS) was clearly superior based on both the mean absolute error and the percentage of residuals less than one log unit. In the ANN model both E-State and hydrogen E-State descriptors were found to be important. Introduction and Backgroun
    corecore