ATPbind: Accurate Protein–ATP Binding Site
Prediction by Combining Sequence-Profiling and Structure-Based Comparisons
- Publication date
- Publisher
Abstract
Protein–ATP
interactions are ubiquitous in a wide variety
of biological processes. Correctly locating ATP binding sites from
protein information is an important but challenging task for protein
function annotation and drug discovery. However, there is no method
that can optimally identify ATP binding sites for different proteins.
In this study, we report a new composite predictor, ATPbind, for ATP
binding sites by integrating the outputs of two template-based predictors
(i.e., S-SITE and TM-SITE) and three discriminative sequence-driven
features of proteins: position specific scoring matrix, predicted
secondary structure, and predicted solvent accessibility. In ATPbind,
we assembled multiple support vector machines (SVMs) based on a random
undersampling technique to cope with the serious imbalance phenomenon
between the numbers of ATP binding sites and of non-ATP binding sites.
We also constructed a new gold-standard benchmark data set consisting
of 429 ATP binding proteins from the PDB database to evaluate and
compare the proposed ATPbind with other existing predictors. Starting
from a query sequence and predicted I-TASSER models, ATPbind can achieve
an average accuracy of 72%, covering 62% of all ATP binding sites
while achieving a Matthews correlation coefficient value that is significantly
higher than that of other state-of-the-art predictors