4,471 research outputs found
Dirichlet Process Parsimonious Mixtures for clustering
The parsimonious Gaussian mixture models, which exploit an eigenvalue
decomposition of the group covariance matrices of the Gaussian mixture, have
shown their success in particular in cluster analysis. Their estimation is in
general performed by maximum likelihood estimation and has also been considered
from a parametric Bayesian prospective. We propose new Dirichlet Process
Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric
formulation of these parsimonious Gaussian mixture models. The proposed DPPM
models are Bayesian nonparametric parsimonious mixture models that allow to
simultaneously infer the model parameters, the optimal number of mixture
components and the optimal parsimonious mixture structure from the data. We
develop a Gibbs sampling technique for maximum a posteriori (MAP) estimation of
the developed DPMM models and provide a Bayesian model selection framework by
using Bayes factors. We apply them to cluster simulated data and real data
sets, and compare them to the standard parsimonious mixture models. The
obtained results highlight the effectiveness of the proposed nonparametric
parsimonious mixture models as a good nonparametric alternative for the
parametric parsimonious models
Warped Mixtures for Nonparametric Cluster Shapes
A mixture of Gaussians fit to a single curved or heavy-tailed cluster will
report that the data contains many clusters. To produce more appropriate
clusterings, we introduce a model which warps a latent mixture of Gaussians to
produce nonparametric cluster shapes. The possibly low-dimensional latent
mixture model allows us to summarize the properties of the high-dimensional
clusters (or density manifolds) describing the data. The number of manifolds,
as well as the shape and dimension of each manifold is automatically inferred.
We derive a simple inference scheme for this model which analytically
integrates out both the mixture parameters and the warping function. We show
that our model is effective for density estimation, performs better than
infinite Gaussian mixture models at recovering the true number of clusters, and
produces interpretable summaries of high-dimensional datasets.Comment: 10 pages, 6 figures, submitted for revie
Model Selection for Topic Models via Spectral Decomposition
Topic models have achieved significant successes in analyzing large-scale
text corpus. In practical applications, we are always confronted with the
challenge of model selection, i.e., how to appropriately set the number of
topics. Following recent advances in topic model inference via tensor
decomposition, we make a first attempt to provide theoretical analysis on model
selection in latent Dirichlet allocation. Under mild conditions, we derive the
upper bound and lower bound on the number of topics given a text collection of
finite size. Experimental results demonstrate that our bounds are accurate and
tight. Furthermore, using Gaussian mixture model as an example, we show that
our methodology can be easily generalized to model selection analysis for other
latent models.Comment: accepted in AISTATS 201
Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models
Bayesian nonparametrics are a class of probabilistic models in which the
model size is inferred from data. A recently developed methodology in this
field is small-variance asymptotic analysis, a mathematical technique for
deriving learning algorithms that capture much of the flexibility of Bayesian
nonparametric inference algorithms, but are simpler to implement and less
computationally expensive. Past work on small-variance analysis of Bayesian
nonparametric inference algorithms has exclusively considered batch models
trained on a single, static dataset, which are incapable of capturing time
evolution in the latent structure of the data. This work presents a
small-variance analysis of the maximum a posteriori filtering problem for a
temporally varying mixture model with a Markov dependence structure, which
captures temporally evolving clusters within a dataset. Two clustering
algorithms result from the analysis: D-Means, an iterative clustering algorithm
for linearly separable, spherical clusters; and SD-Means, a spectral clustering
algorithm derived from a kernelized, relaxed version of the clustering problem.
Empirical results from experiments demonstrate the advantages of using D-Means
and SD-Means over contemporary clustering algorithms, in terms of both
computational cost and clustering accuracy.Comment: 27 page
Learning Subspaces of Different Dimension
We introduce a Bayesian model for inferring mixtures of subspaces of
different dimensions. The key challenge in such a mixture model is
specification of prior distributions over subspaces of different dimensions. We
address this challenge by embedding subspaces or Grassmann manifolds into a
sphere of relatively low dimension and specifying priors on the sphere. We
provide an efficient sampling algorithm for the posterior distribution of the
model parameters. We illustrate that a simple extension of our mixture of
subspaces model can be applied to topic modeling. We also prove posterior
consistency for the mixture of subspaces model. The utility of our approach is
demonstrated with applications to real and simulated data
Directional Statistics in Machine Learning: a Brief Review
The modern data analyst must cope with data encoded in various forms,
vectors, matrices, strings, graphs, or more. Consequently, statistical and
machine learning models tailored to different data encodings are important. We
focus on data encoded as normalized vectors, so that their "direction" is more
important than their magnitude. Specifically, we consider high-dimensional
vectors that lie either on the surface of the unit hypersphere or on the real
projective plane. For such data, we briefly review common mathematical models
prevalent in machine learning, while also outlining some technical aspects,
software, applications, and open mathematical challenges.Comment: 12 pages, slightly modified version of submitted book chapte
Optimal Bayesian clustering using non-negative matrix factorization
Bayesian model-based clustering is a widely applied procedure for discovering
groups of related observations in a dataset. These approaches use Bayesian
mixture models, estimated with MCMC, which provide posterior samples of the
model parameters and clustering partition. While inference on model parameters
is well established, inference on the clustering partition is less developed. A
new method is developed for estimating the optimal partition from the pairwise
posterior similarity matrix generated by a Bayesian cluster model. This
approach uses non-negative matrix factorization (NMF) to provide a low-rank
approximation to the similarity matrix. The factorization permits hard or soft
partitions and is shown to perform better than several popular alternatives
under a variety of penalty functions
Autodetection and Classification of Hidden Cultural City Districts from Yelp Reviews
Topic models are a way to discover underlying themes in an otherwise
unstructured collection of documents. In this study, we specifically used the
Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to
classify restaurants based off of their reviews. Furthermore, we hypothesize
that within a city, restaurants can be grouped into similar "clusters" based on
both location and similarity. We used several different clustering methods,
including K-means Clustering and a Probabilistic Mixture Model, in order to
uncover and classify districts, both well-known and hidden (i.e. cultural areas
like Chinatown or hearsay like "the best street for Italian restaurants")
within a city. We use these models to display and label different clusters on a
map. We also introduce a topic similarity heatmap that displays the similarity
distribution in a city to a new restaurant
A Note on Bayesian Nonparametric Inference for Spherically Symmetric Distribution
In this paper, we describe a Bayesian nonparametric approach to make
inference for a bivariate spherically symmetric distribution. We consider a
Dirichlet invariant process prior on the set of all bivariate spherically
symmetric distributions and we derive the Dirichlet invariant process
posterior. Indeed, our approach is an extension of Dirichlet invariant process
for the symmetric distributions on the real line to a bivariate spherically
symmetric distribution where the underlying distribution is invariant under a
finite group of rotations. Moreover, we obtain the Dirichlet invariant process
posterior for the infinite transformation group and we prove that it approaches
to Dirichlet process
Detailed Derivations of Small-Variance Asymptotics for some Hierarchical Bayesian Nonparametric Models
In this note we provide detailed derivations of two versions of
small-variance asymptotics for hierarchical Dirichlet process (HDP) mixture
models and the HDP hidden Markov model (HDP-HMM, a.k.a. the infinite HMM). We
include derivations for the probabilities of certain CRP and CRF partitions,
which are of more general interest.Comment: 7 page
- …