4 research outputs found
Performance of machine-learning scoring functions in structure-based virtual screening
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and -0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary)
Recommended from our members
BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION
Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size \u3c 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction
Ligand-Protein Binding Affinity Prediction Using Machine Learning Scoring Functions.
In recent years, artificial intelligence makes its appearance in extremely different fields
with promising results able to produce enormous steps forward in some circumstances.
In chemoinformatics the use of machine learning technique, in particular, allows the
scientific community to build apparently accurate scoring functions for computational
docking. These types of scoring functions can overperform classic ones, the type of
scoring functions used until now. However the comparison between classic and
machine learning scoring functions are based on particular tests which can favour these
latter, as highlighted by some studies. In particular the machine learning scoring
functions, per definition, must be trained on some data, passing to the model the
instances chosen to describe the complexes and the relative ligand-protein affinity. In
these conditions the scoring power of the machine learning scoring functions can be
evaluated on different dataset and the scoring functions performance recorded can be
different depending on it. In particular, datasets very similar to the one used for the
training phase of the machine learning scoring function can facilitate in reaching high
performance in the scoring power. The objective of the present study is to verify the real efficiency and the effective
performances of the new born machine learning scoring functions. Our aim is to give an
answer to the scientific community about the doubts on the fact that the machine
learning scoring function can be or not the revolutionary road to be followed in the field
of chemioinformatic and drug discovery. In order to do this many tests are conducted
and a definitive test protocol to be executed to exhaustive validate a new machine
learning scoring function is proposed .
Here we investigate what are the circumstances in which a machine learning scoring
function produces overestimated performances and why it can happen. As a possible
solution we propose a tests protocol to be followed in order to guarantee a real
performance descriptions of machine learning scoring functions. Eventually an effective
and innovative solution in the field of machine learning scoring functions is proposed. It
consists in the use of per-target scoring functions which are machine learning scoring
functions created using complexes coming from a single protein and able to predict the
affinity of complexes which use that target. The data used to build the model are
synthetic and for this reason are easy to be created. The performances on the target
chosen are better than the ones obtained with basic model of scoring functions and
machine learning scoring functions trained on database composed by more than one
protein