Search CORE

19,539 research outputs found

Logic-Statistic Models with Constraints for Biological Sequence Analysis

Author: Have Christian Theil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Inference with Constrained Hidden Markov Models in PRISM

Author: Chang
CHRISTIAN THEIL HAVE
Christiansen
HENNING CHRISTIANSEN
MATTHIEU PETIT
OLE TORP LASSEN
Roth
Roweis
Sato
Sato
Sato
Sato
Van Hentenryck
Publication venue
Publication date: 01/01/2010
Field of study

A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. Defining HMMs with side-constraints in Constraint Logic Programming have advantages in terms of more compact expression and pruning opportunities during inference. We present a PRISM-based framework for extending HMMs with side-constraints and show how well-known constraints such as cardinality and all different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment

arXiv.org e-Print Archive

Crossref

Roskilde Universitet

The Genomic HyperBrowser: inferential genomics at the sequence level

Author: Clancy Trevor
Ferkingstad Egil
Frigessi Arnoldo
Glad Ingrid K.
Gundersen Sveinung
Holden Lars
Holden Marit
Hovig Eivind
Johansen Morten
Liestøl Knut
Nygaard Vegard
Rydbeck Halfdan
Sandve Geir K.
Tøstesen Eivind
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no

arXiv.org e-Print Archive

Springer - Publisher Connector

PubMed Central

NORA - Norwegian Open Research Archives

Modeling pre-invasive bronchial epithelial lesions

Author: Barranco-Mendoza A.
Publication venue
Publication date: 01/01/1997
Field of study

The growth of cancer cells involves many different processes which can only be captured by a complex model. However, simplified models provide a great deal of insight into the fundamental processes involved. In this workshop we proposed two simple models - one discrete stochastic model and one PDE modelto solve a 2-D simplification of the original problem

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Author: Riedel Sebastian
Stenetorp Pontus
Welbl Johannes
Publication venue
Publication date: 28/05/2018
Field of study

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently there exist no resources to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence - effectively performing multi-hop (alias multi-step) inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information, as providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 42.9% compared to human performance at 74.0% - leaving ample room for improvement.Comment: This paper directly corresponds to the TACL version (https://transacl.org/ojs/index.php/tacl/article/view/1325) apart from minor changes in wording, additional footnotes, and appendice

arXiv.org e-Print Archive

UCL Discovery

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Quickest Change Detection of a Markov Process Across a Sensor Array

Author: Raghavan Vasanthan
Veeravalli Venugopal V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Recent attention in quickest change detection in the multi-sensor setting has been on the case where the densities of the observations change at the same instant at all the sensors due to the disruption. In this work, a more general scenario is considered where the change propagates across the sensors, and its propagation can be modeled as a Markov process. A centralized, Bayesian version of this problem, with a fusion center that has perfect information about the observations and a priori knowledge of the statistics of the change process, is considered. The problem of minimizing the average detection delay subject to false alarm constraints is formulated as a partially observable Markov decision process (POMDP). Insights into the structure of the optimal stopping rule are presented. In the limiting case of rare disruptions, we show that the structure of the optimal test reduces to thresholding the a posteriori probability of the hypothesis that no change has happened. We establish the asymptotic optimality (in the vanishing false alarm probability regime) of this threshold test under a certain condition on the Kullback-Leibler (K-L) divergence between the post- and the pre-change densities. In the special case of near-instantaneous change propagation across the sensors, this condition reduces to the mild condition that the K-L divergence be positive. Numerical studies show that this low complexity threshold test results in a substantial improvement in performance over naive tests such as a single-sensor test or a test that wrongly assumes that the change propagates instantaneously.Comment: 40 pages, 5 figures, Submitted to IEEE Trans. Inform. Theor

arXiv.org e-Print Archive

CiteSeerX

Crossref