Search CORE

Repository of the Academy's Library

D2P2: database of disordered protein predictions

Author: Dosztányi Zsuzsanna
Ishida Takashi
Oates Matt E.
Romero Pedro
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

We present the Database of Disordered Protein Prediction (D2P2), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D2P2 will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life. © The Author(s) 2012. Published by Oxford University Press

New integrative tools for interactive protein structure modeling and function prediction

Author: Barbato Alessandro
Publication venue
Publication date: 27/02/2013
Field of study

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

Improving the accuracy of protein secondary structure prediction using structural alignment

Author: Gallin Warren J
Montgomerie Scott
Sundararaj Shan
Wishart David S
Publication venue: BioMed Central
Publication date: 01/06/2006
Field of study

BACKGROUND: The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high. RESULTS: We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%. CONCLUSION: By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at . For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally

Directory of Open Access Journals

-In silico functional characterization of a double histone fold domain from the Heliothis zea virus 1

Author: De Gioia Luca
Fantucci Piercarlo
Greco Claudio
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Histones are short proteins involved in chromatin packaging; in eukaryotes, two H2a-H2b and H3-H4 histone dimers form the nucleosomal core, which acts as the fundamental DNA-packaging element. The double histone fold is a rare globular protein fold in which two consecutive regions characterized by the typical structure of histones assemble together, thus originating a histone pseudodimer. This fold is included in a few prokaryotic histones and in the regulatory region of guanine nucleotide exchange factors of the Sos family. For the prokaryotic histones, there is no direct structural counterpart in the nucleosomal core particle, while the pseudodimer from Sos proteins is very similar to the dimer formed by histones H2a and H2b RESULTS: The absence of a H3-H4-like histone pseudodimer in the available structural databases prompted us to search for proteins that could assume such fold. The application of several secondary structure prediction and fold recognition methods allowed to show that the viral protein gi|22788712 is compatible with the structure of a H3-H4-like histone pseudodimer. Further in silico analyses revealed that this protein module could retain the ability of mediating protein-DNA interactions, and could consequently act as a DNA-binding domain. CONCLUSION: Our results suggest a possible functional role in viral pathogenicity for this novel double histone fold domain; thus, the computational analyses here reported will be helpful in directing future biochemical studies on gi|22788712 protein

Directory of Open Access Journals

SimShiftDB; local conformational restraints derived from chemical shift similarity searches on a large synthetic database

Author: B Seavey
FS Altschul
G Cornilescu
J Söding
J Söding
J-M Chandonia
JND Battey
Murray Coles
S Karlin
S Neal
S Neal
Simon W. Ginzinger
SW Ginzinger
SW Ginzinger
Publication venue: Springer Netherlands
Publication date: 01/01/2009
Field of study

We present SimShiftDB, a new program to extract conformational data from protein chemical shifts using structural alignments. The alignments are obtained in searches of a large database containing 13,000 structures and corresponding back-calculated chemical shifts. SimShiftDB makes use of chemical shift data to provide accurate results even in the case of low sequence similarity, and with even coverage of the conformational search space. We compare SimShiftDB to HHSearch, a state-of-the-art sequence-based search tool, and to TALOS, the current standard tool for the task. We show that for a significant fraction of the predicted similarities, SimShiftDB outperforms the other two methods. Particularly, the high coverage afforded by the larger database often allows predictions to be made for residues not involved in canonical secondary structure, where TALOS predictions are both less frequent and more error prone. Thus SimShiftDB can be seen as a complement to currently available methods

University of the South Pacific Electronic Research Repository

MPG.PuRe

Predict gram - positive and gram - negative subcellular localization via incorporating evolutionary information and physicochemical features into Chou’s general PseAAC

Author: Dehzangi A.
Lyons J.
Paliwal K.K.
Sharma Alokanand
Sharma Ronesh
Tsunoda T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this study, we used structural and evolutionary based features to represent the sequences of gram-positive and gram-negative subcellular localizations. To do this, we proposed a normalization method to construct a normalize Position Specific Scoring Matrix (PSSM) using the information from original PSSM. To investigate the effectiveness of the proposed method we compute feature vectors from normalize PSSM and by applying Support Vector Machine (SVM) and Naïve Bayes classifier, respectively, we compared achieved results with the previously reported results. We also computed features from original PSSM and normalized PSSM and compared their results. The archived results show enhancement in gram-positive and gram-negative subcellular localizations. Evaluating localization for each feature, our results indicate that employing SVM and concatenating features (amino acid composition feature, Dubchak feature (physicochemical-based features), normalized PSSM based auto-covariance feature and normalized PSSM based bigram feature) have higher accuracy while employing Naïve Bayes classifier with normalized PSSM based auto-covariance feature proves to have high sensitivity for both benchmarks. Our reported results in terms of overall locative accuracy is 84.8% and overall absolute accuracy is 85.16% for gram-positive dataset; and, for gram- negative dataset, overall locative accuracy is 85.4% and overall absolute accuracy is 86.3%

PURE: A webserver for the prediction of domains in unassigned regions in proteins

Author: Offmann Bernard O
Reddy Chilamakuri CS
Shameer Khader
Sowdhamini Ramanathan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein domains are the structural and functional units of proteins. The ability to parse proteins into different domains is important for effective classification, understanding of protein structure, function, and evolution and is hence biologically relevant. Several computational methods are available to identify domains in the sequence. Domain finding algorithms often employ stringent thresholds to recognize sequence domains. Identification of additional domains can be tedious involving intense computation and manual intervention but can lead to better understanding of overall biological function. In this context, the problem of identifying new domains in the unassigned regions of a protein sequence assumes a crucial importance. Results We had earlier demonstrated that accumulation of domain information of sequence homologues can substantially aid prediction of new domains. In this paper, we propose a computationally intensive, multi-step bioinformatics protocol as a web server named as PURE (Prediction of Unassigned REgions in proteins) for the detailed examination of stretches of unassigned regions in proteins. Query sequence is processed using different automated filtering steps based on length, presence of coiled-coil regions, transmembrane regions, homologous sequences and percentage of secondary structure content. Later, the filtered sequence segments and their sequence homologues are fed to PSI-BLAST, cd-hit and Hmmpfam. Data from the various programs are integrated and information regarding the probable domains predicted from the sequence is reported. Conclusion We have implemented PURE protocol as a web server for rapid and comprehensive analysis of unassigned regions in the proteins. This server integrates data from different programs and provides information about the domains encoded in the unassigned regions.</p

Directory of Open Access Journals