32,790 research outputs found
Distance Dependent Chinese Restaurant Processes
We develop the distance dependent Chinese restaurant process (CRP), a
flexible class of distributions over partitions that allows for
non-exchangeability. This class can be used to model many kinds of dependencies
between data in infinite clustering models, including dependencies across time
or space. We examine the properties of the distance dependent CRP, discuss its
connections to Bayesian nonparametric mixture models, and derive a Gibbs
sampler for both observed and mixture settings. We study its performance with
three text corpora. We show that relaxing the assumption of exchangeability
with distance dependent CRPs can provide a better fit to sequential data. We
also show its alternative formulation of the traditional CRP leads to a
faster-mixing Gibbs sampling algorithm than the one based on the original
formulation
Distance dependent extensions of the Chinese restaurant process
In this paper we consider the clustering of text documents using the Chinese Restau- rant Process (CRP) and extensions that take time-correlations into account. To this pur- pose, we implement and test the Distance Dependent Chinese Restaurant Process (DD- CRP) for mixture models on both generated and real-world data. We also propose and im- plement a novel clustering algorithm, the Av- eraged Distance Dependent Chinese Restau- rant Process (ADDCRP), to model time- correlations, that is faster per iteration and attains similar performance as the fully dis- tance dependent CRP
The Greedy Dirichlet Process Filter - An Online Clustering Multi-Target Tracker
Reliable collision avoidance is one of the main requirements for autonomous
driving. Hence, it is important to correctly estimate the states of an unknown
number of static and dynamic objects in real-time. Here, data association is a
major challenge for every multi-target tracker. We propose a novel multi-target
tracker called Greedy Dirichlet Process Filter (GDPF) based on the
non-parametric Bayesian model called Dirichlet Processes and the fast posterior
computation algorithm Sequential Updating and Greedy Search (SUGS). By adding a
temporal dependence we get a real-time capable tracking framework without the
need of a previous clustering or data association step. Real-world tests show
that GDPF outperforms other multi-target tracker in terms of accuracy and
stability
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …