2 research outputs found
Binding Activity Prediction of Cyclin-Dependent Inhibitors
The
Cyclin-Dependent Kinases (CDKs) are the core components coordinating
eukaryotic cell division cycle. Generally the crystal structure of
CDKs provides information on possible molecular mechanisms of ligand
binding. However, reliable and robust estimation of ligand binding
activity has been a challenging task in drug design. In this regard,
various machine learning techniques, such as Support Vector Machine,
Naive Bayesian classifier, Decision Tree, and <i>K</i>-Nearest
Neighbor classifier, have been used. The performance of these heterogeneous
classification techniques depends on proper selection of features
from the data set. This fact motivated us to propose an integrated
classification technique using Genetic Algorithm (GA), Rotational
Feature Selection (RFS) scheme, and Ensemble of Machine Learning methods,
named as the Genetic Algorithm integrated Rotational Ensemble based
classification technique, for the prediction of ligand binding activity
of CDKs. This technique can automatically find the important features
and the ensemble size. For this purpose, GA encodes the features and
ensemble size in a chromosome as a binary string. Such encoded features
are then used to create diverse sets of training points using RFS
in order to train the machine learning method multiple times. The
RFS scheme works on Principal Component Analysis (PCA) to preserve
the variability information of the rotational nonoverlapping subsets
of original data. Thereafter, the testing points are fed to the different
instances of trained machine learning method in order to produce the
ensemble result. Here accuracy is computed as a final result after
10-fold cross validation, which also used as an objective function for
GA to maximize. The effectiveness of the proposed classification technique
has been demonstrated quantitatively and visually in comparison with
different machine learning methods for 16 ligand binding CDK docking
and rescoring data sets. In addition, the best possible features have
been reported for CDK docking and rescoring data sets separately.
Finally, the Friedman test has been conducted to judge the statistical
significance of the results produced by the proposed technique. The results indicate that
the integrated classification technique has high relevance in predicting
of protein–ligand binding activity