Search CORE

4,035 research outputs found

A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

Author: A Krogh
A Marchler-Bauer
A Milosavljević
A Pertsemlidis
AA Schäffer
AY Mitrophanov
BJ Webb
Burkhard Rost
C Barrett
C Webber
D Drasdo
D Metzler
D Siegmund
DJC MacKay
EJ Gumbel
EP Nawrocki
ET Jaynes
I Letunic
J Park
JD Storey
JF Lawless
JS Liu
K Karplus
K Karplus
K Sjölander
M Madera
MG Kann
MQ Zhang
MS Waterman
N Chia
P Bucher
R Bundschuh
R Durbin
R Mott
R Mott
R Mott
R Olsen
RC Edgar
RD Finn
S Johnson
S Karlin
S Karlin
S Miyazawa
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
SR Eddy
TF Smith
WR Pearson
Y-K Yu
Y-K Yu
Y-K Yu
Y-K Yu
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Error statistics of hidden Markov model and hidden Boltzmann model results

Author: Newberg Lee A
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Hidden Markov models and hidden Boltzmann models are employed in computational biology and a variety of other scientific fields for a variety of analyses of sequential data. Whether the associated algorithms are used to compute an actual probability or, more generally, an odds ratio or some other score, a frequent requirement is that the error statistics of a given score be known. What is the chance that random data would achieve that score or better? What is the chance that a real signal would achieve a given score threshold? Results Here we present a novel general approach to estimating these false positive and true positive rates that is significantly more efficient than are existing general approaches. We validate the technique via an implementation within the HMMER 3.0 package, which scans DNA or protein sequence databases for patterns of interest, using a profile-HMM. Conclusion The new approach is faster than general naïve sampling approaches, and more general than other current approaches. It provides an efficient mechanism by which to estimate error statistics for hidden Markov model and hidden Boltzmann model results.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Island method for estimating the statistical significance of profile-profile alignment scores

Author: A Dembo
A Gambin
A Poleksic
A Poleksic
AG Murzin
Aleksandar Poleksic
D Fischer
D Przybylski
DA Debe
E Lindahl
EJ Gumbel
G Yona
H Pang
J Heringa
J Moult
J Söding
JF Collins
JF Lawless
K Ginalski
L Holm
L Rychlewski
L Rychlewski
M Frenkel-Morgenstern
MS Waterman
MS Waterman
O Bastien
O Bastien
R Mott
R Mott
R Olsen
RI Sadreyev
RI Sadreyev
S Karlin
S Karlin
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
T Hulsen
TF Smith
TF Smith
WR Pearson
WR Pearson
YK Yu
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form. However, estimating distribution parameters by random simulations turns out to be computationally very expensive. Results We demonstrate that the background distribution of profile-profile alignment scores heavily depends on profiles' composition and thus the distribution parameters must be estimated independently, for each pair of profiles of interest. We also show that accurate estimates of statistical parameters can be obtained using the "island statistics" for profile-profile alignments. Conclusion The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization. Since multiple island scores can be extracted from a single comparison of two profiles, the island method has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Northern Iowa

Profile hidden Markov models for foreground object modelling

Author: Florez Revuelta Francisco
Kazantzidis Ioannis
Nebel Jean-Christophe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2018
Field of study

Accurate background/foreground segmentation is a preliminary process essential to most visual surveillance applications. With the increasing use of freely moving cameras, strategies have been proposed to refine initial segmentation. In this paper, it is proposed to exploit the Vide-omics paradigm, and Profile Hidden Markov Models in particular, to create a new type of object descriptors relying on spatiotemporal information. Performance of the proposed methodology has been evaluated using a standard dataset of videos captured by moving cameras. Results show that usage of the proposed object descriptors allows better foreground extraction than standard approaches

Crossref

Kingston University Research Repository

A nonparametric Bayesian approach toward robot learning by demonstration

Author: Antoniak
Argall
Argall
Billard
Billard
Billard
Billard
Bishop
Blackwell
Blei
Celeux
Chandler
Chatzis
Demiris
Dimitrios Korkinof
Ferguson
Ghahramani
Jordan
Leroux
Lopes
Lopes
Muller
Myersand
Neal
Pearlmutter
Qi
Rasmussen
Schwarz
Sethuraman
Skoglund
Sotirios P. Chatzis
Ude
Vapnik
Walker
Yiannis Demiris
Zegers
Publication venue: 'Elsevier BV'
Publication date: 01/06/2012
Field of study

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood- or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios

Crossref

Ktisis

Spiral - Imperial College Digital Repository

PyCogent: a toolkit for making sense from sequence

Author: Birmingham Amanda
Caporaso J Gregory
Carnes Jason
Easton Brett C
Eaton Michael
Hamady Micah
Huttley Gavin A
Knight Rob
Lindsay Helen
Liu Zongzhi
Lozupone Catherine
Maxwell Peter
McDonald Daniel
Robeson Michael
Sammut Raymond
Smit Sandra
Wakefield Matthew J
Widmann Jeremy
Wikman Shandy
Wilson Stephanie
Ying Hua
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

The COmparative GENomic Toolkit, a framework for probabilistic analyses of biological sequences, devising workflows and generating publication quality graphics, has been implemented in Python

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

The Australian National University

University of Melbourne Institutional Repository

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Sabanci University Research Database