19,539 research outputs found

    Logic-Statistic Models with Constraints for Biological Sequence Analysis

    Get PDF

    Inference with Constrained Hidden Markov Models in PRISM

    Full text link
    A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. Defining HMMs with side-constraints in Constraint Logic Programming have advantages in terms of more compact expression and pruning opportunities during inference. We present a PRISM-based framework for extending HMMs with side-constraints and show how well-known constraints such as cardinality and all different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment

    The Genomic HyperBrowser: inferential genomics at the sequence level

    Get PDF
    The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no

    Modeling pre-invasive bronchial epithelial lesions

    Get PDF
    The growth of cancer cells involves many different processes which can only be captured by a complex model. However, simplified models provide a great deal of insight into the fundamental processes involved. In this workshop we proposed two simple models - one discrete stochastic model and one PDE modelto solve a 2-D simplification of the original problem

    Constructing Datasets for Multi-hop Reading Comprehension Across Documents

    Get PDF
    Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently there exist no resources to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence - effectively performing multi-hop (alias multi-step) inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information, as providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 42.9% compared to human performance at 74.0% - leaving ample room for improvement.Comment: This paper directly corresponds to the TACL version (https://transacl.org/ojs/index.php/tacl/article/view/1325) apart from minor changes in wording, additional footnotes, and appendice

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Quickest Change Detection of a Markov Process Across a Sensor Array

    Full text link
    Recent attention in quickest change detection in the multi-sensor setting has been on the case where the densities of the observations change at the same instant at all the sensors due to the disruption. In this work, a more general scenario is considered where the change propagates across the sensors, and its propagation can be modeled as a Markov process. A centralized, Bayesian version of this problem, with a fusion center that has perfect information about the observations and a priori knowledge of the statistics of the change process, is considered. The problem of minimizing the average detection delay subject to false alarm constraints is formulated as a partially observable Markov decision process (POMDP). Insights into the structure of the optimal stopping rule are presented. In the limiting case of rare disruptions, we show that the structure of the optimal test reduces to thresholding the a posteriori probability of the hypothesis that no change has happened. We establish the asymptotic optimality (in the vanishing false alarm probability regime) of this threshold test under a certain condition on the Kullback-Leibler (K-L) divergence between the post- and the pre-change densities. In the special case of near-instantaneous change propagation across the sensors, this condition reduces to the mild condition that the K-L divergence be positive. Numerical studies show that this low complexity threshold test results in a substantial improvement in performance over naive tests such as a single-sensor test or a test that wrongly assumes that the change propagates instantaneously.Comment: 40 pages, 5 figures, Submitted to IEEE Trans. Inform. Theor
    corecore