5 research outputs found

    Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Hansen Solubility Parameters Based on 1D and 2D Molecular Descriptors Computed from SMILES String

    Full text link
    A new method of Hansen solubility parameters (HSPs) prediction was developed by combining the multivariate adaptive regression splines (MARSplines) methodology with a simple multivariable regression involving 1D and 2D PaDEL molecular descriptors. In order to adopt the MARSplines approach to QSPR/QSAR problems, several optimization procedures were proposed and tested. The effectiveness of the obtained models was checked via standard QSPR/QSAR internal validation procedures provided by the QSARINS software and by predicting the solubility classification of polymers and drug-like solid solutes in collections of solvents. By utilizing information derived only from SMILES strings, the obtained models allow for computing all of the three Hansen solubility parameters including dispersion, polarization, and hydrogen bonding. Although several descriptors are required for proper parameters estimation, the proposed procedure is simple and straightforward and does not require a molecular geometry optimization. The obtained HSP values are highly correlated with experimental data, and their application for solving solubility problems leads to essentially the same quality as for the original parameters. Based on provided models, it is possible to characterize any solvent and liquid solute for which HSP data are unavailable

    A confidence predictor for logD using conformal regression and a support-vector machine

    No full text
    Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service

    RDF Dataset for article: A confidence predictor for logD using conformal regression and a support-vector machine

    No full text
    RDF dataset described in article: "A confidence predictor for logD using conformal regression and a support-vector machine" (Manuscript in preparation). The dataset contains conformal logD values at 90% confidence level, computed for 91M compounds from PubChem, in RDF format. The .hdt.gz version contains the dataset in RDF HDT format (http://www.rdfhdt.org/), compressed with tar and gzip. The archive contains both the .hdt file, and an index file, generated by the hdtSearch C++ tool. The .ttl.gz file is a gzipped file in RDF Turtle format (https://www.w3.org/TR/turtle/)

    Uncertainty estimation for QSAR models using machine learning methods

    Get PDF
    corecore