Search CORE

5,745 research outputs found

Model-based approach for household clustering with mixed scale variables

Author: Canale Antonio
Carmona Christian
Nieto-Barajas Luis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/11/2017
Field of study

The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for the improvement of the quality of life. To better target the social programmes, the Ministry is aimed to find clusters of households with the same needs based on demographic characteristics as well as poverty conditions of the household. Available data consists of continuous, ordinal, and nominal variables, all of which come from a non-i.i.d complex design survey sample. We propose a Bayesian nonparametric mixture model that jointly models a set of latent variables, as in an underlying variable response approach, associated to the observed mixed scale data and accommodates for the different sampling probabilities. The performance of the model is assessed via simulated data. A full analysis of socio-economic conditions in households in the Mexican State of Mexico is presented

arXiv.org e-Print Archive

Oxford University Research Archive

Archivio istituzionale della ricerca - Università di Padova

Estimating genetic covariance functions assuming a parametric correlation structure for environmental effects

Author: Karin Meyer
Publication venue: 'EDP Sciences'
Publication date: 01/01/2003
Field of study

Crossref

Masking Strategies for Image Manifolds

Author: Dadkhahi Hamid
Duarte Marco F.
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of selecting an optimal mask for an image manifold, i.e., choosing a subset of the pixels of the image that preserves the manifold's geometric structure present in the original data. Such masking implements a form of compressive sensing through emerging imaging sensor platforms for which the power expense grows with the number of pixels acquired. Our goal is for the manifold learned from masked images to resemble its full image counterpart as closely as possible. More precisely, we show that one can indeed accurately learn an image manifold without having to consider a large majority of the image pixels. In doing so, we consider two masking methods that preserve the local and global geometric structure of the manifold, respectively. In each case, the process of finding the optimal masking pattern can be cast as a binary integer program, which is computationally expensive but can be approximated by a fast greedy algorithm. Numerical experiments show that the relevant manifold structure is preserved through the data-dependent masking process, even for modest mask sizes

arXiv.org e-Print Archive

Crossref

Bayesian Cluster Enumeration Criterion for Unsupervised Learning

Author: Muma Michael
Teklehaymanot Freweyni K.
Zoubir Abdelhak M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2018
Field of study

We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

TUbiblio

Cluster validity in clustering methods

Author: Zhao Qinpei
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications