19,539 research outputs found
Inference with Constrained Hidden Markov Models in PRISM
A Hidden Markov Model (HMM) is a common statistical model which is widely
used for analysis of biological sequence data and other sequential phenomena.
In the present paper we show how HMMs can be extended with side-constraints and
present constraint solving techniques for efficient inference. Defining HMMs
with side-constraints in Constraint Logic Programming have advantages in terms
of more compact expression and pruning opportunities during inference.
We present a PRISM-based framework for extending HMMs with side-constraints
and show how well-known constraints such as cardinality and all different are
integrated. We experimentally validate our approach on the biologically
motivated problem of global pairwise alignment
The Genomic HyperBrowser: inferential genomics at the sequence level
The immense increase in the generation of genomic scale data poses an unmet
analytical challenge, due to a lack of established methodology with the
required flexibility and power. We propose a first principled approach to
statistical analysis of sequence-level genomic information. We provide a
growing collection of generic biological investigations that query pairwise
relations between tracks, represented as mathematical objects, along the
genome. The Genomic HyperBrowser implements the approach and is available at
http://hyperbrowser.uio.no
Modeling pre-invasive bronchial epithelial lesions
The growth of cancer cells involves many different processes which can only be captured by a complex model. However, simplified models provide a great deal of insight into the fundamental processes involved. In this workshop we proposed two simple models - one discrete stochastic model and one PDE modelto solve a 2-D simplification of the original problem
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
Most Reading Comprehension methods limit themselves to queries which can be
answered using a single sentence, paragraph, or document. Enabling models to
combine disjoint pieces of textual evidence would extend the scope of machine
comprehension methods, but currently there exist no resources to train and test
this capability. We propose a novel task to encourage the development of models
for text understanding across multiple documents and to investigate the limits
of existing methods. In our task, a model learns to seek and combine evidence -
effectively performing multi-hop (alias multi-step) inference. We devise a
methodology to produce datasets for this task, given a collection of
query-answer pairs and thematically linked documents. Two datasets from
different domains are induced, and we identify potential pitfalls and devise
circumvention strategies. We evaluate two previously proposed competitive
models and find that one can integrate information across documents. However,
both models struggle to select relevant information, as providing documents
guaranteed to be relevant greatly improves their performance. While the models
outperform several strong baselines, their best accuracy reaches 42.9% compared
to human performance at 74.0% - leaving ample room for improvement.Comment: This paper directly corresponds to the TACL version
(https://transacl.org/ojs/index.php/tacl/article/view/1325) apart from minor
changes in wording, additional footnotes, and appendice
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Quickest Change Detection of a Markov Process Across a Sensor Array
Recent attention in quickest change detection in the multi-sensor setting has
been on the case where the densities of the observations change at the same
instant at all the sensors due to the disruption. In this work, a more general
scenario is considered where the change propagates across the sensors, and its
propagation can be modeled as a Markov process. A centralized, Bayesian version
of this problem, with a fusion center that has perfect information about the
observations and a priori knowledge of the statistics of the change process, is
considered. The problem of minimizing the average detection delay subject to
false alarm constraints is formulated as a partially observable Markov decision
process (POMDP). Insights into the structure of the optimal stopping rule are
presented. In the limiting case of rare disruptions, we show that the structure
of the optimal test reduces to thresholding the a posteriori probability of the
hypothesis that no change has happened. We establish the asymptotic optimality
(in the vanishing false alarm probability regime) of this threshold test under
a certain condition on the Kullback-Leibler (K-L) divergence between the post-
and the pre-change densities. In the special case of near-instantaneous change
propagation across the sensors, this condition reduces to the mild condition
that the K-L divergence be positive. Numerical studies show that this low
complexity threshold test results in a substantial improvement in performance
over naive tests such as a single-sensor test or a test that wrongly assumes
that the change propagates instantaneously.Comment: 40 pages, 5 figures, Submitted to IEEE Trans. Inform. Theor
- …