Beyond the Scope of Free-Wilson Analysis: Building
Interpretable QSAR Models with Machine Learning Algorithms
- Publication date
- Publisher
Abstract
A novel methodology was developed
to build Free-Wilson like local
QSAR models by combining R-group signatures and the SVM algorithm.
Unlike Free-Wilson analysis this method is able to make predictions
for compounds with R-groups not present in a training set. Eleven
public data sets were chosen as test cases for comparing the performance
of our new method with several other traditional modeling strategies,
including Free-Wilson analysis. Our results show that the R-group
signature SVM models achieve better prediction accuracy compared with
Free-Wilson analysis in general. Moreover, the predictions of R-group
signature models are also comparable to the models using ECFP6 fingerprints
and signatures for the whole compound. Most importantly, R-group contributions
to the SVM model can be obtained by calculating the gradient for R-group
signatures. For most of the studied data sets, a significant correlation
with that of a corresponding Free-Wilson analysis is shown. These
results suggest that the R-group contribution can be used to interpret
bioactivity data and highlight that the R-group signature based SVM
modeling method is as interpretable as Free-Wilson analysis. Hence
the signature SVM model can be a useful modeling tool for any drug
discovery project