5,033 research outputs found
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Reified Context Models
A classic tension exists between exact inference in a simple model and
approximate inference in a complex model. The latter offers expressivity and
thus accuracy, but the former provides coverage of the space, an important
property for confidence estimation and learning with indirect supervision. In
this work, we introduce a new approach, reified context models, to reconcile
this tension. Specifically, we let the amount of context (the arity of the
factors in a graphical model) be chosen "at run-time" by reifying it---that is,
letting this choice itself be a random variable inside the model. Empirically,
we show that our approach obtains expressivity and coverage on three natural
language tasks
Parametric Inference for Biological Sequence Analysis
One of the major successes in computational biology has been the unification,
using the graphical model formalism, of a multitude of algorithms for
annotating and comparing biological sequences. Graphical models that have been
applied towards these problems include hidden Markov models for annotation,
tree models for phylogenetics, and pair hidden Markov models for alignment. A
single algorithm, the sum-product algorithm, solves many of the inference
problems associated with different statistical models. This paper introduces
the \emph{polytope propagation algorithm} for computing the Newton polytope of
an observation from a graphical model. This algorithm is a geometric version of
the sum-product algorithm and is used to analyze the parametric behavior of
maximum a posteriori inference calculations for graphical models.Comment: 15 pages, 4 figures. See also companion paper "Tropical Geometry of
Statistical Models" (q-bio.QM/0311009
A two-phase approach for detecting recombination in nucleotide sequences
Genetic recombination can produce heterogeneous phylogenetic histories within
a set of homologous genes. Delineating recombination events is important in the
study of molecular evolution, as inference of such events provides a clearer
picture of the phylogenetic relationships among different gene sequences or
genomes. Nevertheless, detecting recombination events can be a daunting task,
as the performance of different recombinationdetecting approaches can vary,
depending on evolutionary events that take place after recombination. We
recently evaluated the effects of postrecombination events on the prediction
accuracy of recombination-detecting approaches using simulated nucleotide
sequence data. The main conclusion, supported by other studies, is that one
should not depend on a single method when searching for recombination events.
In this paper, we introduce a two-phase strategy, applying three statistical
measures to detect the occurrence of recombination events, and a Bayesian
phylogenetic approach in delineating breakpoints of such events in nucleotide
sequences. We evaluate the performance of these approaches using simulated
data, and demonstrate the applicability of this strategy to empirical data. The
two-phase strategy proves to be time-efficient when applied to large datasets,
and yields high-confidence results.Comment: 5 pages, 3 figures. Chan CX, Beiko RG and Ragan MA (2007). A
two-phase approach for detecting recombination in nucleotide sequences. In
Hazelhurst S and Ramsay M (Eds) Proceedings of the First Southern African
Bioinformatics Workshop, 28-30 January, Johannesburg, 9-1
- …