Search CORE

2,245 research outputs found

Hidden Markov model speed heuristic and iterative HMM search procedure

Author: Eddy Sean R
Johnson L Steven
Portugaly Elon
Publication venue: Digital Commons@Becker
Publication date: 01/01/2010
Field of study

BACKGROUND: Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. RESULTS: We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. CONCLUSIONS: Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

Hidden Markov model speed heuristic and iterative HMM search procedure

Author: AA Schaffer
ED Scheeff
Elon Portugaly
GA Price
JM Chandonia
K Karplus
L Holm
L Lo Conte
L Steven Johnson
M Madera
RD Finn
SE Brenner
SE Brenner
Sean R Eddy
SF Altschul
SF Altschul
WN Grundy
WR Pearson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Accelerated Profile HMM Searches

Author: A Jacob
A Krogh
A Milosavljević
A Wozniak
AA Schäffer
B Rekapalli
C Camacho
DR Horn
EK Freyhult
EM Gertz
G Chukkapalli
GA Price
J Landman
JP Walters
JP Walters
K Karplus
LR Rabiner
LS Johnson
M Farrar
M Madera
R Durbin
RD Finn
RP Maddimsetty
S Derrien
S Hunter
S Johnson
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SJ Melnikoff
SR Eddy
T Oliver
T Rognes
T Rognes
TF Smith
V Chaudhary
V Sachdeva
William R. Pearson
WN Grundy
WR Pearson
Y Sun
Y Sun
YK Yu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Greedy Methods in Plume Detection, Localization and Tracking

Author: Huimin Chen
Huimin Chen
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

Greedy method, as an efficient computing tool, can be applied to various combinatorial or nonlinear optimization problems where finding the global optimum is difficult, if not computationally infeasible. A greedy algorithm has the nature of making the locally optimal choice at each stage and then solving the subproblems that arise later. It iteratively make

IntechOpen

CiteSeerX

Crossref

Clustering Time Series from Mixture Polynomial Models with Discretised Data

Author: Bagnall AJ
Janacek GJ
Zhang M
Publication venue: University of East Anglia
Publication date: 01/01/2003
Field of study

Clustering time series is an active research area with applications in many fields. One common feature of time series is the likely presence of outliers. These uncharacteristic data can significantly effect the quality of clusters formed. This paper evaluates a method of over-coming the detrimental effects of outliers. We describe some of the alternative approaches to clustering time series, then specify a particular class of model for experimentation with k-means clustering and a correlation based distance metric. For data derived from this class of model we demonstrate that discretising the data into a binary series of above and below the median improves the clustering when the data has outliers. More specifically, we show that firstly discretisation does not significantly effect the accuracy of the clusters when there are no outliers and secondly it significantly increases the accuracy in the presence of outliers, even when the probability of outlier is very low

University of East Anglia digital repository

Fast search of sequences with complex symbol correlations using profile context-sensitive HMMS and pre-screening filters

Author: Vaidyanathan P. P.
Yoon Byung-Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Recently, profile context-sensitive HMMs (profile-csHMMs) have been proposed which are very effective in modeling the common patterns and motifs in related symbol sequences. Profile-csHMMs are capable of representing long-range correlations between distant symbols, even when these correlations are entangled in a complicated manner. This makes profile-csHMMs an useful tool in computational biology, especially in modeling noncoding RNAs (ncRNAs) and finding new ncRNA genes. However, a profile-csHMM based search is quite slow, hence not practical for searching a large database. In this paper, we propose a practical scheme for making the search speed significantly faster without any degradation in the prediction accuracy. The proposed method utilizes a pre-screening filter based on a profile-HMM, which filters out most sequences that will not be predicted as a match by the original profile-csHMM. Experimental results show that the proposed approach can make the search speed eighty times faster

CiteSeerX

Crossref

Caltech Authors

The path inference filter: model-based low-latency map matching of probe vehicle data

Author: Cheng
Erik Jenelius
Fosgerau
Haris N. Koutsopoulos
Hellinga
Hepple
Hofleitner
Hofleitner
LeSage
Liu
Miwa
Miwa
Park
Rahmani
Ramezani
Uno
Yuan
Publication venue: 'Elsevier BV'
Publication date: 20/06/2012
Field of study

We consider the problem of reconstructing vehicle trajectories from sparse sequences of GPS points, for which the sampling interval is between 10 seconds and 2 minutes. We introduce a new class of algorithms, called altogether path inference filter (PIF), that maps GPS data in real time, for a variety of trade-offs and scenarios, and with a high throughput. Numerous prior approaches in map-matching can be shown to be special cases of the path inference filter presented in this article. We present an efficient procedure for automatically training the filter on new data, with or without ground truth observations. The framework is evaluated on a large San Francisco taxi dataset and is shown to improve upon the current state of the art. This filter also provides insights about driving patterns of drivers. The path inference filter has been deployed at an industrial scale inside the Mobile Millennium traffic information system, and is used to map fleets of data in San Francisco, Sacramento, Stockholm and Porto.Comment: Preprint, 23 pages and 23 figure

arXiv.org e-Print Archive

Crossref