Search CORE

10 research outputs found

MetaDBSite: a meta approach to improve protein DNA-binding sites prediction

Author: Huang Bingding
Lin Biaoyang
Schroeder Michael
Si Jingna
Zhang Zengming
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Prediction of DNA-binding residues from protein sequence information using random forests

Author: Jack Y Yang
Liangjiang Wang
Mary Qu Yang
Wang Liangjiang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Springer

Springer - Publisher Connector

PubMed Central

BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features

Author: AP Bradley
AR Panchenko
C Yan
Caiyan Huang
CH Wu
DE Draper
E Bechara
IB Kuznetsov
JA Swets
Jack Y Yang
JC Darnell
L Wang
L Wang
Liangjiang Wang
M Terribilini
Mary Qu Yang
P Baldi
S Ahmad
S Ahmad
S Hwang
S Jones
SF Altschul
T Joachims
WS Noble
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Understanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences. Results In the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction. Conclusions Predictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ (http://bioinfo.ggc.org/bindn+/) to make the SVM classifiers accessible to the research community

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Sequence feature-based prediction of protein stability changes upon amino acid substitutions

Author: Srivastava Anand K
Teng Shaolei
Wang Liangjiang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature

Author: Ahmad
Ahmad
Altschul
Altschul
Berman
Bhardwaj
Breiman
Bullock
Cohen
Coulocheri
Dimitriadou
Egan
Frishman
Ho
Hongde Liu
Hongtao Wu
Hwang
Jiansheng Wu
Jones
Kubat
Kuznetsov
Liaw
Luscombe
Matthews
Ofran
Scheffer
Shen
Siggers
Stawiski
Tjong
Tsuchiya
Vapnik
Wang
Wang
Wang
Xiao Sun
Xueye Duan
Yan
Yan Ding
Yunfei Bai
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: In this work, we aim to develop a computational approach for predicting DNA-binding sites in proteins from amino acid sequences. To avoid overfitting with this method, all available DNA-binding proteins from the Protein Data Bank (PDB) are used to construct the models. The random forest (RF) algorithm is used because it is fast and has robust performance for different parameter values. A novel hybrid feature is presented which incorporates evolutionary information of the amino acid sequence, secondary structure (SS) information and orthogonal binary vector (OBV) information which reflects the characteristics of 20 kinds of amino acids for two physical–chemical properties (dipoles and volumes of the side chains). The numbers of binding and non-binding residues in proteins are highly unbalanced, so a novel scheme is proposed to deal with the problem of imbalanced datasets by downsizing the majority class

Crossref

PubMed Central

PDNAsite:identification of DNA-binding site from protein sequence by incorporating spatial and sequence context

Author: A Bochkarev
AN Bullock
AP Bradley
B Liu
C Yan
CA BDavey
CC Chang
CO Pabo
EW Stawiski
H Tjong
HM Berman
IB Kuznetsov
J Wu
JA Swets
KL Griffith
L Wang
L Wang
L Wang
L Wang
M Ptashne
M Radlinska
M Terribilini
MY Gutfreund
N Bhardwaj
NM Luscombe
NM Luscombe
P Ozbek
QW Dong
R Liu
R Liu
R Xu
R Xu
RD Kornberg
S Ahmad
S Ahmad
S Hwang
SY Ho
T Li
W Kabsch
X Ma
X Zhao
Y Ofran
YC Chen
Z Yuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

PolyU Institutional Repository

PubMed Central

Aston Publications Explorer

PRETICTIVE BIOINFORMATIC METHODS FOR ANALYZING GENES AND PROTEINS

Author: Teng Shaolei
Publication venue: Clemson University Libraries
Publication date: 01/05/2011
Field of study

Since large amounts of biological data are generated using various high-throughput technologies, efficient computational methods are important for understanding the biological meanings behind the complex data. Machine learning is particularly appealing for biological knowledge discovery. Tissue-specific gene expression and protein sumoylation play essential roles in the cell and are implicated in many human diseases. Protein destabilization is a common mechanism by which mutations cause human diseases. In this study, machine learning approaches were developed for predicting human tissue-specific genes, protein sumoylation sites and protein stability changes upon single amino acid substitutions. Relevant biological features were selected for input vector encoding, and machine learning algorithms, including Random Forests and Support Vector Machines, were used for classifier construction. The results suggest that the approaches give rise to more accurate predictions than previous studies and can provide valuable information for further experimental studies. Moreover, seeSUMO and MuStab web servers were developed to make the classifiers accessible to the biological research community. Structure-based methods can be used to predict the effects of amino acid substitutions on protein function and stability. The nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) located at the protein binding interface have dramatic effects on protein-protein interactions. To model the effects, the nsSNPs at the interfaces of 264 protein-protein complexes were mapped on the protein structures using homology-based methods. The results suggest that disease-causing nsSNPs tend to destabilize the electrostatic component of the binding energy and nsSNPs at conserved positions have significant effects on binding energy changes. The structure-based approach was developed to quantitatively assess the effects of amino acid substitutions on protein stability and protein-protein interaction. It was shown that the structure-based analysis could help elucidate the mechanisms by which mutations cause human genetic disorders. These new bioinformatic methods can be used to analyze some interesting genes and proteins for human genetic research and improve our understanding of their molecular mechanisms underlying human diseases

Clemson University: TigerPrints