3,165 research outputs found
Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in
distributed environments, where data are distributed across multiple computing
nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that
they allow new components to be introduced on the fly as needed. This, however,
posts an important challenge to distributed estimation -- how to handle new
components efficiently and consistently. To tackle this problem, we propose a
new estimation method, which allows new components to be created locally in
individual computing nodes. Components corresponding to the same cluster will
be identified and merged via a probabilistic consolidation scheme. In this way,
we can maintain the consistency of estimation with very low communication cost.
Experiments on large real-world data sets show that the proposed method can
achieve high scalability in distributed and asynchronous environments without
compromising the mixing performance.Comment: This paper is published on IJCAI 2017.
https://www.ijcai.org/proceedings/2017/64
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
Hierarchically Clustered Representation Learning
The joint optimization of representation learning and clustering in the
embedding space has experienced a breakthrough in recent years. In spite of the
advance, clustering with representation learning has been limited to flat-level
categories, which often involves cohesive clustering with a focus on instance
relations. To overcome the limitations of flat clustering, we introduce
hierarchically-clustered representation learning (HCRL), which simultaneously
optimizes representation learning and hierarchical clustering in the embedding
space. Compared with a few prior works, HCRL firstly attempts to consider a
generation of deep embeddings from every component of the hierarchy, not just
leaf components. In addition to obtaining hierarchically clustered embeddings,
we can reconstruct data by the various abstraction levels, infer the intrinsic
hierarchical structure, and learn the level-proportion features. We conducted
evaluations with image and text domains, and our quantitative analyses showed
competent likelihoods and the best accuracies compared with the baselines.Comment: 10 pages, 7 figures, Under review as a conference pape
- …