47,915 research outputs found
Latent Gaussian processes for distribution estimation of multivariate categorical data
Multivariate categorical data occur in many applications of machine learning.
One of the main difficulties with these vectors of categorical variables is
sparsity. The number of possible observations grows exponentially with vector
length, but dataset diversity might be poor in comparison. Recent models have
gained significant improvement in supervised tasks with this data. These models
embed observations in a continuous space to capture similarities between them.
Building on these ideas we propose a Bayesian model for the unsupervised task
of distribution estimation of multivariate categorical data. We model vectors
of categorical variables as generated from a non-linear transformation of a
continuous latent space. Non-linearity captures multi-modality in the
distribution. The continuous representation addresses sparsity. Our model ties
together many existing models, linking the linear categorical latent Gaussian
model, the Gaussian process latent variable model, and Gaussian process
classification. We derive inference for our model based on recent developments
in sampling based variational inference. We show empirically that the model
outperforms its linear and discrete counterparts in imputation tasks of sparse
data.YG is supported by the Google European fellowship in Machine Learning.This is the final version of the article. It first appeared from Microtome Publishing via http://jmlr.org/proceedings/papers/v37/gala15.htm
Identifying Mixtures of Mixtures Using Bayesian Estimation
The use of a finite mixture of normal distributions in model-based clustering
allows to capture non-Gaussian data clusters. However, identifying the clusters
from the normal components is challenging and in general either achieved by
imposing constraints on the model or by using post-processing procedures.
Within the Bayesian framework we propose a different approach based on sparse
finite mixtures to achieve identifiability. We specify a hierarchical prior
where the hyperparameters are carefully selected such that they are reflective
of the cluster structure aimed at. In addition this prior allows to estimate
the model using standard MCMC sampling methods. In combination with a
post-processing approach which resolves the label switching issue and results
in an identified model, our approach allows to simultaneously (1) determine the
number of clusters, (2) flexibly approximate the cluster distributions in a
semi-parametric way using finite mixtures of normals and (3) identify
cluster-specific parameters and classify observations. The proposed approach is
illustrated in two simulation studies and on benchmark data sets.Comment: 49 page
From here to infinity - sparse finite versus Dirichlet process mixtures in model-based clustering
In model-based-clustering mixture models are used to group data points into
clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli
et al (2016) are sparse finite mixtures, where the prior distribution on the
weight distribution of a mixture with components is chosen in such a way
that a priori the number of clusters in the data is random and is allowed to be
smaller than with high probability. The number of cluster is then inferred
a posteriori from the data.
The present paper makes the following contributions in the context of sparse
finite mixture modelling. First, it is illustrated that the concept of sparse
finite mixture is very generic and easily extended to cluster various types of
non-Gaussian data, in particular discrete data and continuous multivariate data
arising from non-Gaussian clusters. Second, sparse finite mixtures are compared
to Dirichlet process mixtures with respect to their ability to identify the
number of clusters. For both model classes, a random hyper prior is considered
for the parameters determining the weight distribution. By suitable matching of
these priors, it is shown that the choice of this hyper prior is far more
influential on the cluster solution than whether a sparse finite mixture or a
Dirichlet process mixture is taken into consideration.Comment: Accepted versio
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
- …