360 research outputs found
Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
This paper presents a novel algorithm, based upon the dependent Dirichlet
process mixture model (DDPMM), for clustering batch-sequential data containing
an unknown number of evolving clusters. The algorithm is derived via a
low-variance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM,
and provides a hard clustering with convergence guarantees similar to those of
the k-means algorithm. Empirical results from a synthetic test with moving
Gaussian clusters and a test with real ADS-B aircraft trajectory data
demonstrate that the algorithm requires orders of magnitude less computational
time than contemporary probabilistic and hard clustering algorithms, while
providing higher accuracy on the examined datasets.Comment: This paper is from NIPS 2013. Please use the following BibTeX
citation: @inproceedings{Campbell13_NIPS, Author = {Trevor Campbell and Miao
Liu and Brian Kulis and Jonathan P. How and Lawrence Carin}, Title = {Dynamic
Clustering via Asymptotics of the Dependent Dirichlet Process}, Booktitle =
{Advances in Neural Information Processing Systems (NIPS)}, Year = {2013}
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
- …