5,745 research outputs found
Model-based approach for household clustering with mixed scale variables
The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for the improvement of the quality of life. To better target the social programmes, the Ministry is aimed to find clusters of households with the same needs based on demographic characteristics as well as poverty conditions of the household. Available data consists of continuous, ordinal, and nominal variables, all of which come from a non-i.i.d complex design survey sample. We propose a Bayesian nonparametric mixture model that jointly models a set of latent variables, as in an underlying variable response approach, associated to the observed mixed scale data and accommodates for the different sampling probabilities. The performance of the model is assessed via simulated data. A full analysis of socio-economic conditions in households in the Mexican State of Mexico is presented
Masking Strategies for Image Manifolds
We consider the problem of selecting an optimal mask for an image manifold,
i.e., choosing a subset of the pixels of the image that preserves the
manifold's geometric structure present in the original data. Such masking
implements a form of compressive sensing through emerging imaging sensor
platforms for which the power expense grows with the number of pixels acquired.
Our goal is for the manifold learned from masked images to resemble its full
image counterpart as closely as possible. More precisely, we show that one can
indeed accurately learn an image manifold without having to consider a large
majority of the image pixels. In doing so, we consider two masking methods that
preserve the local and global geometric structure of the manifold,
respectively. In each case, the process of finding the optimal masking pattern
can be cast as a binary integer program, which is computationally expensive but
can be approximated by a fast greedy algorithm. Numerical experiments show that
the relevant manifold structure is preserved through the data-dependent masking
process, even for modest mask sizes
Bayesian Cluster Enumeration Criterion for Unsupervised Learning
We derive a new Bayesian Information Criterion (BIC) by formulating the
problem of estimating the number of clusters in an observed data set as
maximization of the posterior probability of the candidate models. Given that
some mild assumptions are satisfied, we provide a general BIC expression for a
broad class of data distributions. This serves as a starting point when
deriving the BIC for specific distributions. Along this line, we provide a
closed-form BIC expression for multivariate Gaussian distributed variables. We
show that incorporating the data structure of the clustering problem into the
derivation of the BIC results in an expression whose penalty term is different
from that of the original BIC. We propose a two-step cluster enumeration
algorithm. First, a model-based unsupervised learning algorithm partitions the
data according to a given set of candidate models. Subsequently, the number of
clusters is determined as the one associated with the model for which the
proposed BIC is maximal. The performance of the proposed two-step algorithm is
tested using synthetic and real data sets.Comment: 14 pages, 7 figure
- …