Search CORE

7 research outputs found

Sequence based residue depth prediction using evolutionary information and predicted secondary structure

Author: Chen Ke
Kurgan Lukasz
Ruan Jishou
Shen Shiyi
Zhang Hua
Zhang Tuo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design. Results A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509–516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles. Conclusion The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Protein Depth Calculation and the Use for Improving Accuracy of Protein Fold Recognition

Author: Li Hua
Xu Dong
Zhang Yang
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 31/08/2013
Field of study

Protein structure and function are largely specified by the distribution of different atoms and residues relative to the core and surface of the molecule. Relative depths of atoms therefore are key attributions that have been widely used in protein structure modeling and function annotation. However, accurate calculation of depth is time consuming. Here, we developed an algorithm which uses Euclidean distance transform (EDT) to convert the target protein structure into a 3D gray-scale image, where depths of atoms in the protein can be conveniently and precisely derived from the minimum distance of the pixels to the surface of the protein. We tested the proposed EDT-based method on a set of 261 non-redundant protein structures, which shows that the method is 2.6 times faster than the widely used method proposed by Chakravarty and Varadarajan. Depth values by EDT method are highly accurate with a Pearson's correlation coefficient ≈1 compared to the calculations from exhaustive search. To explore the usefulness of the method in protein structure prediction, we add the calculated residue depth to the scoring function of the state of the art, profile?profile alignment based fold-recognition program, which shows an additional 3% improvement in the TM-score of the alignments. The data demonstrate that the EDT-based depth calculation program can be used as an efficient tool to assist protein structure analysis and structure-based function annotation.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/140343/1/cmb.2013.0071.pd

Crossref

PubMed Central

Deep Blue Documents at the University of Michigan

DBAC: A simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts

Author: Li Jinyan
Li Zhenhua
Wong Limsoon
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot. Results We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods. Conclusions Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central

ScholarBank@NUS

Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only

Author: A Pintar
A Pintar
A Pintar
A Schlessinger
A Schlessinger
A Schlessinger
A Schlessinger
A Shrake
AG Murzin
AR Kinjo
AR Kinjo
Ashley M. Buckle
B Lee
B Rost
B Rost
B Rost
C Chothia
CK Smith
D Baker
D Varrazzo
D Xie
DT Jones
DT Jones
E Schmitt
EM Marcotte
F Ferre
G Pollastri
Geoffrey I. Webb
GP Raghava
H Chen
H Zhang
H Zhou
Hao Tan
HM Berman
J Cheng
J Cheng
J Qiu
J Song
J Song
J Song
J Song
J Wan
James C. Whisstock
JC Whisstock
Jiangning Song
JJ Ward
JM Chandonia
JU Bowie
K Bajaj
K Chen
K Vlahovicek
Khalid Mahmood
L Kurgan
LA Kurgan
M Connolly
M Kumar
M Lee
M Stout
ME Lacombe-Harvey
MK Kalita
MN Nguyen
O Schueler-Furman
P Radivojac
RG Coleman
Ruby H. P. Law
S Ahmad
S Chakravarty
S Liu
S Miller
Sean David Mooney
SF Altschul
T Hamelryck
T Ishida
T Joachims
T Noguchi
Tatsuya Akutsu
TL Blundell
V Vapnik
V Vapnik
W Kabsch
W Liu
W Zhang
WL DeLano
X Wang
Y Bromberg
Y Kalidas
Y Ofran
Y Ofran
Z Yuan
Z Yuan
ZX Wang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

University of Melbourne Institutional Repository

Positive Selection Differs between Protein Secondary Structure Elements in Drosophila

Author: Alvarez-Valin
Anisimova
Aris-Brosou
Bazykin
Beck
Begun
Benach
Bernardi
Binkowski
Birzele
Bishop
Bouvier
Brown
Bryson
Bulmer
Chiusano
Chou
Christopher J. Dixon
Clark
Creighton
Dean
Dmitry A. Filatov
Dudgeon
Edgar
Ferrada
Hanada
Holloway
Hughes
Jones
Kabsch
Kate E. Ridout
Kaur
Kimura
Knight
Koehl
Komar
Kosiol
Kyte
Larracuente
Lin
Lindsay
Liu
Marcelino
Mondragon-Palomino
Montgomerie
Moult
Muse
O'Farrell
O'Neil
Pascarella
Petersen
Pollard
Sanner
Shepherd
Subramanian
Thomas
Thompson
Wong
Wong
Wright
Yang
Yang
Yang
Yang
Yap
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Different protein secondary structure elements have different physicochemical properties and roles in the protein, which may determine their evolutionary flexibility. However, it is not clear to what extent protein structure affects the way Darwinian selection acts at the amino acid level. Using phylogeny-based likelihood tests for positive selection, we have examined the relationship between protein secondary structure and selection across six species of Drosophila. We find that amino acids that form disordered regions, such as random coils, are far more likely to be under positive selection than expected from their proportion in the proteins, and residues in helices and β-structures are subject to less positive selection than predicted. In addition, it appears that sites undergoing positive selection are more likely than expected to occur close to one another in the protein sequence. Finally, on a genome-wide scale, we have determined that positively selected sites are found more frequently toward the gene ends. Our results demonstrate that protein structures with a greater degree of organization and strong hydrophobicity, represented here as helices and β-structures, are less tolerant to molecular adaptation than disordered, hydrophilic regions, across a diverse set of proteins

Crossref

PubMed Central

Oxford University Research Archive

TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

Author: A Schlessinger
A Schlessinger
A Schlessinger
AG de Brevern
B Rost
B Rost
B Rost
B Xue
C Bystroff
C Haynes
C Mooney
C Zhang
C Zheng
Christian Schönbach
D Xie
DT Jones
E Faraggi
E Faraggi
G Helles
Geoffrey I. Webb
GN Ramachandran
GP Raghava
H Zhang
H Zhang
Hao Tan
HJ Dyson
HS Kang
J Cheng
J Gao
J Gsponer
J Song
J Song
J Song
J Song
J Song
J Song
Jiangning Song
JJ Ward
JS Chauhan
K Chen
K Chen
K Chen
L Chen
L Kurgan
M Kumar
Mingjun Wang
MJ Mizianty
MJ Rooman
MJ Wood
MJ Wood
MK Kalita
MN Nguyen
MN Nguyen
MV Berjanskii
O Dor
O Dor
O Zimmermann
P Chen
P Kountouris
P Kountouris
P Sliz
PC Chen
R Gaudet
R Karchin
R Kuang
R Verma
S Ahmad
S Ahmad
S Liang
S Qiu
S Wu
S Wu
SF Altschul
T Ishida
T Zhang
T Zhang
Tatsuya Akutsu
V Vapnik
V Vapnik
W Kabsch
W Liu
W Zhang
X Miao
X Wang
XY Pan
Y Ofran
Y Ofran
YM Huang
Z Markovic-Housley
Z Yuan
Z Yuan
Z Yuan
Z Yuan
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Monash University Research Portal

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

Author: A Anand
A Andreeva
A Elofsson
A Krogh
A Paiardini
A Reinhardt
AG Murzin
AY Istomin
B Niu
B Rost
B Rost
C Chen
C Chen
C Orengo
C Zheng
CA Floudas
D Aha
D Jones
D Jones
D Przybylski
EP Carpenter
F Gu
G John
G von Heijne
GP Zhou
H Bigelow
H He
H Kim
H Liu
H Zhang
HM Berman
I Majumdar
I Witten
IB Kuznetsov
J Ruan
J Song
JM Bujnicki
JY Yang
K Bryson
K Chen
K Ginalski
K Kedarisetti
K Kedarisetti
K Tomii
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KY Feng
L Carlacci
L Dong
L Homaeian
L Jin
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LT Huang
Lukasz Kurgan
M Punta
M Punta
M Robnik-Sikonja
MA Hall
Marcin J Mizianty
MM Gromiha
MM Gromiha
MM Gromiha
O Gotoh
OV Galzitskaya
P Baldi
P Langley
P Raman
QS Du
R Apweiler
R Gupta
R Kohavi
RL Dunbrack
RL Marsden
S Brenner
S Cessie
S Costantini
S Costantini
S Jahandideh
S Jahandideh
S Keerthi
S Lee
S Wu
SF Altschul
SR Amirova
T Liu
TF Smith
TL Zhang
TL Zhang
W Chen
X Xiao
X Xiao
X Xiao
X Zheng
Y Cai
Y Cai
Y Cai
Y Cai
Y Cao
Y Zhang
YD Cai
YK Yu
YS Ding
YS Ding
Z Xiang
Z Zhang
ZC Li
ZC Li
ZX Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central