382 research outputs found
Approximate Inference in Continuous Determinantal Point Processes
Determinantal point processes (DPPs) are random point processes well-suited
for modeling repulsion. In machine learning, the focus of DPP-based models has
been on diverse subset selection from a discrete and finite base set. This
discrete setting admits an efficient sampling algorithm based on the
eigendecomposition of the defining kernel matrix. Recently, there has been
growing interest in using DPPs defined on continuous spaces. While the
discrete-DPP sampler extends formally to the continuous case, computationally,
the steps required are not tractable in general. In this paper, we present two
efficient DPP sampling schemes that apply to a wide range of kernel functions:
one based on low rank approximations via Nystrom and random Fourier feature
techniques and another based on Gibbs sampling. We demonstrate the utility of
continuous DPPs in repulsive mixture modeling and synthesizing human poses
spanning activity spaces
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Bayesian nonparametric approach to log-concave density estimation
The estimation of a log-concave density on is a canonical
problem in the area of shape-constrained nonparametric inference. We present a
Bayesian nonparametric approach to this problem based on an exponentiated
Dirichlet process mixture prior and show that the posterior distribution
converges to the log-concave truth at the (near-) minimax rate in Hellinger
distance. Our proof proceeds by establishing a general contraction result based
on the log-concave maximum likelihood estimator that prevents the need for
further metric entropy calculations. We also present two computationally more
feasible approximations and a more practical empirical Bayes approach, which
are illustrated numerically via simulations.Comment: 39 pages, 17 figures. Simulation studies were significantly expanded
and one more theorem has been adde
Nonparametric Bayes Modeling of Populations of Networks
Replicated network data are increasingly available in many research fields.
In connectomic applications, inter-connections among brain regions are
collected for each patient under study, motivating statistical models which can
flexibly characterize the probabilistic generative mechanism underlying these
network-valued data. Available models for a single network are not designed
specifically for inference on the entire probability mass function of a
network-valued random variable and therefore lack flexibility in characterizing
the distribution of relevant topological structures. We propose a flexible
Bayesian nonparametric approach for modeling the population distribution of
network-valued data. The joint distribution of the edges is defined via a
mixture model which reduces dimensionality and efficiently incorporates network
information within each mixture component by leveraging latent space
representations. The formulation leads to an efficient Gibbs sampler and
provides simple and coherent strategies for inference and goodness-of-fit
assessments. We provide theoretical results on the flexibility of our model and
illustrate improved performance --- compared to state-of-the-art models --- in
simulations and application to human brain networks
- …