3,746 research outputs found
Parametric Inference for Biological Sequence Analysis
One of the major successes in computational biology has been the unification,
using the graphical model formalism, of a multitude of algorithms for
annotating and comparing biological sequences. Graphical models that have been
applied towards these problems include hidden Markov models for annotation,
tree models for phylogenetics, and pair hidden Markov models for alignment. A
single algorithm, the sum-product algorithm, solves many of the inference
problems associated with different statistical models. This paper introduces
the \emph{polytope propagation algorithm} for computing the Newton polytope of
an observation from a graphical model. This algorithm is a geometric version of
the sum-product algorithm and is used to analyze the parametric behavior of
maximum a posteriori inference calculations for graphical models.Comment: 15 pages, 4 figures. See also companion paper "Tropical Geometry of
Statistical Models" (q-bio.QM/0311009
The Mathematics of Phylogenomics
The grand challenges in biology today are being shaped by powerful
high-throughput technologies that have revealed the genomes of many organisms,
global expression patterns of genes and detailed information about variation
within populations. We are therefore able to ask, for the first time,
fundamental questions about the evolution of genomes, the structure of genes
and their regulation, and the connections between genotypes and phenotypes of
individuals. The answers to these questions are all predicated on progress in a
variety of computational, statistical, and mathematical fields.
The rapid growth in the characterization of genomes has led to the
advancement of a new discipline called Phylogenomics. This discipline results
from the combination of two major fields in the life sciences: Genomics, i.e.,
the study of the function and structure of genes and genomes; and Molecular
Phylogenetics, i.e., the study of the hierarchical evolutionary relationships
among organisms and their genomes. The objective of this article is to offer
mathematicians a first introduction to this emerging field, and to discuss
specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Parameter estimation in pair hidden Markov models
This paper deals with parameter estimation in pair hidden Markov models
(pair-HMMs). We first provide a rigorous formalism for these models and discuss
possible definitions of likelihoods. The model being biologically motivated,
some restrictions with respect to the full parameter space naturally occur.
Existence of two different Information divergence rates is established and
divergence property (namely positivity at values different from the true one)
is shown under additional assumptions. This yields consistency for the
parameter in parametrization schemes for which the divergence property holds.
Simulations illustrate different cases which are not covered by our results.Comment: corrected typo
Hidden Markov Models and their Applications in Biological Sequence Analysis
Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as pairwise and multiple sequence alignments, gene annotation, classification, similarity search, and many others
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
- ā¦