Search CORE

5 research outputs found

ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches

Author: Christophe Molina
Gabriela Falcón-Cano
Miguel Angel Cabrera-Pérez
Publication venue: 'International Association of Physical Chemists (IAPC)'
Publication date: 01/01/2020
Field of study

In-silico prediction of aqueous solubility plays an important role during the drug discovery and development processes. For many years, the limited performance of in-silico solubility models has been attributed to the lack of high-quality solubility data for pharmaceutical molecules. However, some studies suggest that the poor accuracy of solubility prediction is not related to the quality of the experimental data and that more precise methodologies (algorithms and/or set of descriptors) are required for predicting aqueous solubility for pharmaceutical molecules. In this study a large and diverse database was generated with aqueous solubility values collected from two public sources; two new recursive machine-learning approaches were developed for data cleaning and variable selection, and a consensus model based on regression and classification algorithms was created. The modeling protocol, which includes the curation of chemical and experimental data, was implemented in KNIME, with the aim of obtaining an automated workflow for the prediction of new databases. Finally, we compared several methods or models available in the literature with our consensus model, showing results comparable or even outperforming previous published models. </p

PubMed Central

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Three machine learning models for the 2019 Solubility Challenge

Author: Mitchell John B. O.
Publication venue: 'International Association of Physical Chemists (IAPC)'
Publication date: 01/01/2020
Field of study

We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.Publisher PDFPeer reviewe

PubMed Central

University of St. Andrews - Pure

HRČAK - Portal of Croatian Scientific and Professional Journals

St Andrews Research Repository

Hrčak - Portal of scientific journals of Croatia

Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?

Author: Kern Roman
Lovrić Mario
Lučić Bono
Pavlović Kristina
Spataru Adrian
Wong Ming Wah
Žuvela Petar
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

We present a collection of publicly available intrinsic aqueous solubility data of 829 drug‐like compounds. Four different machine learning algorithms (random forests [RF], LightGBM, partial least squares, and least absolute shrinkage and selection operator [LASSO]) coupled with multistage permutation importance for feature selection and Bayesian hyperparameter optimization were used for the prediction of solubility based on chemical structural information. Our results show that LASSO yielded the best predictive ability on an external test set with a root mean square error (RMSE) (test) of 0.70 log points, an R2(test) of 0.80, and 105 features. Taking into account the number of descriptors as well, an RF model achieves the best balance between complexity and predictive ability with an RMSE(test) of 0.72 log points, an R2(test) of 0.78, and with only 17 features. On a more aggressive test set (principal component analysis [PCA]‐based split), better generalization was observed for the RF model. We propose a ranking score for choosing the best model, as test set performance is only one of the factors in creating an applicable model. The ranking score is a weighted combination of generalization, number of features, and test performance. Out of the two best learners, a consensus model was built exhibiting the best predictive ability and generalization with RMSE(test) of 0.67 log points and a R2(test) of 0.81

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute

ScholarBank@NUS

Can human experts predict solubility better than computers?

Author: A Lavecchia
A Llinas
A Lusci
A Shareef
AJ Hopfinger
Anne Osbourn
AU Bhat
BW Connors
C Lipinski
C Steinbeck
C-F Tsai
CA Lipinski
CAS Bergstrom
CAS Bergstrom
D Reker
D Weininger
DI Simon
DS Palmer
DS Palmer
DS Palmer
DS Palmer
DT Pham
E Rytting
F Cheng
F Galton
F Khatib
I Bose
J Comer
J Surowiecki
J-P Gattuso
JBO Mitchell
JJ Hopfield
JL McDonagh
JL McDonagh
John B. O. Mitchell
JR Quinlan
K Box
KT Savjani
L Breiman
L Breiman
LD Hughes
LE Raileanu
LYS Narasimham
N Guenther
NM O’Boyle
P Franco
P Geurts
PS Charifson
R Burbidge
R Garreta
R Iyer
R Judson
R Todeschini
RA Herman
RD King
RE Schapire
RS Michalski
S Wold
Samuel Boobier
SH Yalkowsky
SH Yalkowsky
T Denoeux
T Kennedy
V Svetnik
VC Müller
WL Jorgensen
Y Ran
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref