Search CORE

Improving model construction of profile HMMs for remote homology detection through structural alignment

Author: A Andreeva
A Bateman
A Krogh
A Krogh
AC Camproux
Alberto MR Dávila
B Brejova
B Knudsen
B Qian
C Bystroff
C Do
C Notredame
D Feng
D Haft
F Altschul
F Goyon
Gerson Zaverucha
H Mamitsuka
I Letunic
J Espadaler
J Gough
J Park
J Shi
J Söding
J Thompson
JD Thompson
JR Beck
Juliana S Bernardes
K Bae
K Karplus
K Karplus
K Katoh
K Lin
K Mizuguchi
K Sjolander
L Holm
L Rabiner
M Gribskov
M Helen
M Madera
M Mendel
M Wistrand
M Wistrand
O Sullivan
P Bourne
P Nuin
R Edgar
R Hughey
R Hughey
R Karchin
S Altschul
S Eddy
S Jones
T Attwood
T Mitchell
V Alexandrov
Vítor S Costa
W Majoros
W Taylor
WR Pearson
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

Springer - Publisher Connector

arXiv.org e-Print Archive

The effectiveness of position- and composition-specific gap costs for protein similarity searches

Author: A. Stojmirovic
Barrett
Benner
Chandonia
Chang
E. M. Gertz
Eddy
Finn
Gotoh
Gough
Gribskov
Gribskov
Hajian-Tilaki
Hanley
Henikoff
Hughey
Krogh
Madera
Murzin
Pascarella
Qiu
Reese
S. F. Altschul
Schaffer
Smith
Vinga
Wistrand
Wrabl
Y.-K. Yu
Yu
Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/01/2008
Field of study

The flexibility in gap cost enjoyed by Hidden Markov Models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments. We found that position-specific gap penalties have an advantage over uniform gap costs. We did not explore optimizing distinct uniform gap costs for each query. For Pfam, PSSMs iteratively constructed from seeds based on HMM consensus sequences perform equivalently to HMMs that were adjusted to have constant gap transition probabilities, albeit with much greater variance. We observed no effect of composition-specific gap costs on retrieval performance.Comment: 17 pages, 4 figures, 2 table

University of Tennessee, Knoxville: Trace

Automated Genome-Wide Protein Domain Exploration

Author: Rekepalli Bhanu Prasad
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2007
Field of study

Exploiting the exponentially growing genomics and proteomics data requires high quality, automated analysis. Protein domain modeling is a key area of molecular biology as it unravels the mysteries of evolution, protein structures, and protein functions. A plethora of sequences exist in protein databases with incomplete domain knowledge. Hence this research explores automated bioinformatics tools for faster protein domain analysis. Automated tool chains described in this dissertation generate new protein domain models thus enabling more effective genome-wide protein domain analysis. To validate the new tool chains, the Shewanella oneidensis and Escherichia coli genomes were processed, resulting in a new peptide domain database, detection of poor domain models, and identification of likely new domains. The automated tool chains will require months or years to model a small genome when executing on a single workstation. Therefore the dissertation investigates approaches with grid computing and parallel processing to significantly accelerate these bioinformatics tool chains

Riboswitch Detection Using Profile Hidden Markov Models

Author: Bandyopadhyay Pradipta
Bhattacharya Sudha
Krishnamachari A
Sengupta Supratim
Singh Payal
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Riboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We propose a method for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain. Results Our method can detect riboswitches in genomic databases rapidly and accurately. Its sensitivity is comparable to the method based on the Covariance Model (CM). For six out of ten riboswitch classes, our method detects more than 99.5% of the candidates identified by the much slower CM method while being several hundred times faster. For three riboswitch classes, our method detects 97-99% of the candidates relative to the CM method. Our method works very well for those classes of riboswitches that are characterized by distinct and conserved sequence motifs. Conclusion Riboswitches play a crucial role in controlling the expression of several prokaryotic genes involved in metabolism and transport processes. As more and more new classes of riboswitches are being discovered, it is important to understand the patterns of their intra and inter genomic distribution. Understanding such patterns will enable us to better understand the evolutionary history of these genetic regulatory elements. However, a complete picture of the distribution pattern of riboswitches will emerge only after accurate identification of riboswitches across genomes. We believe that the riboswitch detection method developed in this paper will aid in that process. The significant advantage in terms of speed, of our pHMM-based approach over the method based on CM allows us to scan entire databases (rather than 5'UTRs only) in a relatively short period of time in order to accurately identify riboswitch candidates.</p

Public Library of Science (PLOS)

Accelerated Profile HMM Searches

Author: A Jacob
A Krogh
A Milosavljević
A Wozniak
AA Schäffer
B Rekapalli
C Camacho
DR Horn
EK Freyhult
EM Gertz
G Chukkapalli
GA Price
J Landman
JP Walters
JP Walters
K Karplus
LR Rabiner
LS Johnson
M Farrar
M Madera
R Durbin
RD Finn
RP Maddimsetty
S Derrien
S Hunter
S Johnson
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SJ Melnikoff
SR Eddy
T Oliver
T Rognes
T Rognes
TF Smith
V Chaudhary
V Sachdeva
William R. Pearson
WN Grundy
WR Pearson
Y Sun
Y Sun
YK Yu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

CiteSeerX

arXiv.org e-Print Archive

Automated Protein Structure Classification: A Survey

Author: Hassanzadeh Oktie
Publication venue
Publication date: 01/01/2008
Field of study

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront

CiteSeerX

Prediction of prognostic biomarkers for Interferon-based therapy to Hepatitis C Virus patients: a metaanalysis of the NS5A protein in subtypes 1a, 1b, and 3a

Author: A El-Shamy
A Macdonald
A Wohnsland
B Korber
B Liu
C Kuiken
C Sarrazin
D Wang
E Baralis
ea El-Hefnawi Mahmoud
GR Reyes
Iman A El-Azab
J Cohen
J Felsenstein
J Nousbaum
J Pei
J Song
JM Pawlotsky
K Tamura
M Clamp
M Torres-Puente
M Wistrand
Mahmoud M ElHefnawi
MM El Hefnawi
N Pavio
P Farci
RD Finn
SR Eddy
Suher Zada
TA Hall
U Mihm
V Vacic
V Wagner
WLaP Jiawei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

The SUPERFAMILY database in 2007: families and functions

Author: Chothia Cyrus
Gough Julian
Madera Martin
Vogel Christine
Wilson Derek
Publication venue: Oxford University Press
Publication date: 10/11/2006
Field of study

The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment

CiteSeerX