5,544 research outputs found
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
Although fully generative models have been successfully used to model the
contents of text documents, they are often awkward to apply to combinations of
text data and document metadata. In this paper we propose a
Dirichlet-multinomial regression (DMR) topic model that includes a log-linear
prior on document-topic distributions that is a function of observed features
of the document, such as author, publication venue, references, and dates. We
show that by selecting appropriate features, DMR topic models can meet or
exceed the performance of several previously published topic models designed
for specific data.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008
A Nonparametric Bayesian Approach to Uncovering Rat Hippocampal Population Codes During Spatial Navigation
Rodent hippocampal population codes represent important spatial information
about the environment during navigation. Several computational methods have
been developed to uncover the neural representation of spatial topology
embedded in rodent hippocampal ensemble spike activity. Here we extend our
previous work and propose a nonparametric Bayesian approach to infer rat
hippocampal population codes during spatial navigation. To tackle the model
selection problem, we leverage a nonparametric Bayesian model. Specifically, to
analyze rat hippocampal ensemble spiking activity, we apply a hierarchical
Dirichlet process-hidden Markov model (HDP-HMM) using two Bayesian inference
methods, one based on Markov chain Monte Carlo (MCMC) and the other based on
variational Bayes (VB). We demonstrate the effectiveness of our Bayesian
approaches on recordings from a freely-behaving rat navigating in an open field
environment. We find that MCMC-based inference with Hamiltonian Monte Carlo
(HMC) hyperparameter sampling is flexible and efficient, and outperforms VB and
MCMC approaches with hyperparameters set by empirical Bayes
Bayesian Portfolio Selection in a Markov Switching Gaussian Mixture Model
Departure from normality poses implementation barriers to the Markowitz mean-variance portfolio selection. When assets are affected by common and idiosyncratic shocks, the distribution of asset returns may exhibit Markov switching regimes and have a Gaussian mixture distribution conditional on each regime. The model is estimated in a Bayesian framework using the Gibbs sampler. An application to the global portfolio diversification is also discussed.Portfolio; Bayesian; Hidden Markov Model; Gaussian Mixture
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …