Search CORE

963 research outputs found

Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

Author: Bourne Philip E
Scheeff Eric D
Publication venue: eScholarship, University of California
Publication date: 01/09/2006
Field of study

BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used

PubMed Central

eScholarship - University of California

FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded

Author: Beckmann Jacques S.
Felder Clifford E.
Man Orna
Prilusky Jaime
Rydberg Edwin H.
Silman Israel
Sussman Joel L.
Zeev-Ben-Mordehai Tzviya
Publication venue
Publication date: 02/08/2017
Field of study

Summary: An easy-to-use, versatile and freely available graphic web server, FoldIndex© is described: it predicts if a given protein sequence is intrinsically unfolded implementing the algorithm of Uversky and co-workers, which is based on the average residue hydrophobicity and net charge of the sequence. FoldIndex© has an error rate comparable to that of more sophisticated fold prediction methods. Sliding windows permit identification of large regions within a protein that possess folding propensities different from those of the whole protein. Availability: FoldIndex© can be accessed at http://bioportal.weizmann.ac.il/fldbin/findex Contact: [email protected] Supplementary information: http://www.weizmann.ac.il/sb/faculty_pages/Sussman/papers/suppl/Prilusky_200

RERO DOC Digital Library

Prediction of β-barrel membrane proteins by searching for restricted domains

Author: Mirus Oliver
Schleiff Enrico
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

BACKGROUND: The identification of beta-barrel membrane proteins out of a genomic/proteomic background is one of the rapidly developing fields in bioinformatics. Our main goal is the prediction of such proteins in genome/proteome wide analyses. RESULTS: For the prediction of beta-barrel membrane proteins within prokaryotic proteomes a set of parameters was developed. We have focused on a procedure with a low false positive rate beside a procedure with lowest false prediction rate to obtain a high certainty for the predicted sequences. We demonstrate that the discrimination between beta-barrel membrane proteins and other proteins is improved by analyzing a length limited region. The developed set of parameters is applied to the proteome of E. coli and the results are compared to four other described procedures. CONCLUSION: Analyzing the beta-barrel membrane proteins revealed the presence of a defined membrane inserted beta-barrel region. This information can now be used to refine other prediction programs as well. So far, all tested programs fail to predict outer membrane proteins in the proteome of the prokaryote E. coli with high reliability. However, the reliability of the prediction is improved significantly by a combinatory approach of several programs. The consequences and usability of the developed scores are discussed

Springer - Publisher Connector

PubMed Central

Open Access LMU

Hochschulschriftenserver - Universität Frankfurt am Main

firestar—prediction of functionally important residues using structural templates and alignment reliability

Author: López Gonzalo
Tress Michael L.
Valencia Alfonso
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Here we present firestar, an expert system for predicting ligand-binding residues in protein structures. The server provides a method for extrapolating from the large inventory of functionally important residues organized in the FireDB database and adds information about the local conservation of potential-binding residues. The interface allows users to make queries by protein sequence or structure. The user can access pairwise and multiple alignments with structures that have relevant functionally important binding sites. The results are presented in a series of easy to read displays that allow users to compare binding residue conservation across homologous proteins. The binding site residues can also be viewed with molecular visualization tools. One feature of firestar is that it can be used to evaluate the biological relevance of small molecule ligands present in PDB structures. With the server it is easy to discern whether small molecule binding is conserved in homologous structures. We found this facility particularly useful during the recent assessment of CASP7 function prediction. Availability: http://firedb.bioinfo.cnio.es/Php/FireStar.php

CiteSeerX

Crossref

PubMed Central

Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

Author: A Krogh
A Lupas
A Sali
AC McHardy
Adam Godzik
AJ Enright
AJ Enright
B Rodriguez-Brito
BE Suzek
David Jones
DB Rusch
DH Huson
EF DeLong
FE Angly
G Yona
GW Tyson
J Park
JA Cuff
JC Venter
JD Bendtsen
JD Thompson
John C. Wooley
K Mavromatis
L Holm
L Krause
L Rychlewski
ML Tress
O Sasson
P Pipenbacher
PD Schloss
R Apweiler
RL Tatusov
S Mika
S Yooseph
SF Altschul
SG Tringe
SR Eddy
SR Gill
U Hobohm
W Li
W Li
W Li
W Li
Weizhong Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences

Author: Aguilera-Mendoza L.
Barigye S.J.
Liu J.
Llorente-Quesada M.T.
Marrero-Ponce Y.
Salgado J.
Tellez-Ibarra R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new nonredundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP-Patent database are included in CAMP-Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs. © The Author 2015. Published by Oxford University Press. All rights reserved.Antimicrobial Cationic Peptide

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Tecnológica de Bolívar: Repositorio Digital

CAMPO, SCR_FIND and CHC_FIND: a suite of web tools for computational structural biology

Author: Bossa Francesco
Paiardini Alessandro
Pascarella Stefano
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

The identification of evolutionarily conserved features of protein structures can provide insights into their functional and structural properties. Three methods have been developed and implemented as WWW tools, CAMPO, SCR_FIND and CHC_FIND, to analyze evolutionarily conserved residues (ECRs), structurally conserved regions (SCRs) and conserved hydrophobic contacts (CHCs) in protein families and superfamilies, on the basis of their 3D structures and the homologous sequences available. The programs identify protein segments that conserve a similar main-chain conformation, compute residue-to-residue hydrophobic contacts involving only apolar atoms common to all the 3D structures analyzed and allow the identification of conserved amino-acid sites among protein structures and their homologous sequences. The programs also allow the visualization of SCRs, CHCs and ECRs directly on the superposed structures and their multiple structural and sequence alignments. Tools and tutorials explaining their usage are available at , and

Crossref

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

Parallel Homologous Search With Hirschberg Algorithm: A Hybrid MPI-Pthreads Solution.

Author: Abdul Rashid Nur'Aini
Abdullah Rosni
Hj. Talib Abdullah Zawawi
Publication venue
Publication date: 01/07/2007
Field of study

In this paper, we apply two different parallel programming model, the message passing model using Message Passing Interface (MPI) and the multithreaded model using Pthreads, to-protein sequence homologous search. The protein sequence homologous search uses Hirschberg algorithm for the pair-wise sequence alignment

Repository@USM

THoR: a tool for domain discovery and curation of multiple alignments

Author: Dickens Nicholas J
Ponting Chris P
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

We describe a tool, THoR, that automatically creates and curates multiple sequence alignments representing protein domains. This exploits both PSI-BLAST and HMMER algorithms and provides an accurate and comprehensive alignment for any domain family. The entire process is designed for use via a web-browser, with simple links and cross-references to relevant information, to assist the assessment of biological significance. THoR has been benchmarked for accuracy using the SMART and pufferfish genome databases

Springer

Springer - Publisher Connector

PubMed Central

Oxford University Research Archive

Enlighten

A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

Author: A Lempel
A Puglisi
Andrew K Benson
CG Nevill-Manning
David J Russell
DR Bastola
E Ukkonen
EK Costello
EM McCreight
HH Otu
J Ziv
J Ziv
JD Parsons
JD Thompson
Khalid Sayood
L Holm
M Charikar
M Halkidi
P Weiner
RC Edgar
Samuel F Way
SF Altschul
W Li
W Li
W Li
WJ Wilbur
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created. Results: The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST and UCLUST. The validation results are especially striking for large datasets. Conclusions: We introduce a fast and accurate clustering algorithm that relies on a grammar-based sequence distance. Its statistical clustering quality is validated by clustering large datasets containing 16S rDNA sequences

Crossref

DigitalCommons@University of Nebraska

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central