32,790 research outputs found

    Distance Dependent Chinese Restaurant Processes

    Full text link
    We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation

    Distance dependent extensions of the Chinese restaurant process

    Get PDF
    In this paper we consider the clustering of text documents using the Chinese Restau- rant Process (CRP) and extensions that take time-correlations into account. To this pur- pose, we implement and test the Distance Dependent Chinese Restaurant Process (DD- CRP) for mixture models on both generated and real-world data. We also propose and im- plement a novel clustering algorithm, the Av- eraged Distance Dependent Chinese Restau- rant Process (ADDCRP), to model time- correlations, that is faster per iteration and attains similar performance as the fully dis- tance dependent CRP

    The Greedy Dirichlet Process Filter - An Online Clustering Multi-Target Tracker

    Full text link
    Reliable collision avoidance is one of the main requirements for autonomous driving. Hence, it is important to correctly estimate the states of an unknown number of static and dynamic objects in real-time. Here, data association is a major challenge for every multi-target tracker. We propose a novel multi-target tracker called Greedy Dirichlet Process Filter (GDPF) based on the non-parametric Bayesian model called Dirichlet Processes and the fast posterior computation algorithm Sequential Updating and Greedy Search (SUGS). By adding a temporal dependence we get a real-time capable tracking framework without the need of a previous clustering or data association step. Real-world tests show that GDPF outperforms other multi-target tracker in terms of accuracy and stability

    A sticky HDP-HMM with application to speaker diarization

    Get PDF
    We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore