5 research outputs found
A confidence predictor for logD using conformal regression and a support-vector machine
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service
Toward a unifying strategy for the structure-based prediction of toxicological endpoints
Most computational methods used for the prediction of toxicity endpoints are based on the assumption that similar compounds have similar biological properties. This principle can be exploited using computational methods like read across or quantitative structure-activity relationships. However, there is no general agreement about which method is the most appropriate for quantifying compound similarity neither for exploiting the similarity principle in order to obtain reliable estimations of the compound properties. Moreover, optimal similarity metrics and modeling methods might depend on the characteristics of the endpoints and training series used in each case. This study describes a comparative analysis of the predictive performance of diverse similarity metrics and modeling methods in toxicological applications. A collection of two quantitative (n = 660, n = 1114) and three qualitative (n = 447, n = 905, n = 1220) datasets representing very different endpoints of interest in drug safety evaluation and rigorous methods were used to estimate the external predictive ability in each case. The results confirm that no single approach produces the best results in all instances, and the best predictions were obtained using different tools in different situations. The trends observed in this study were exploited to propose a unifying strategy allowing the use of the most suitable method for every compound. A comparison of the quality of the predictions obtained by the unifying strategy with those obtained by standard prediction methods confirmed the usefulness of the proposed approach.The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking, under Grant Agreement No. 115002 (eTOX), resources of which are composed of a financial contribution from the European Union’s Seventh Framework Programme (FP7/2007–2013) and
EFPIA companies’ in kind contribution