Search CORE

39,512 research outputs found

Quantile-based clustering

Author: Anderlucci Laura
Hennig Christian
Viroli Cinzia
Publication venue
Publication date: 01/01/2019
Field of study

A new cluster analysis method,

K

-quantiles clustering, is introduced.

K

-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd's algorithm for

K

-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although

K

-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of

K

-quantiles clustering is proved, and it is shown that

K

-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation,

K

-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by

K

-quantiles

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

On clustering procedures and nonparametric mixture estimation

Author: Auray Stéphane
Klutchnikoff Nicolas
Rouvière Laurent
Publication venue
Publication date: 01/01/2015
Field of study

This paper deals with nonparametric estimation of conditional den-sities in mixture models in the case when additional covariates are available. The proposed approach consists of performing a prelim-inary clustering algorithm on the additional covariates to guess the mixture component of each observation. Conditional densities of the mixture model are then estimated using kernel density estimates ap-plied separately to each cluster. We investigate the expected L 1 -error of the resulting estimates and derive optimal rates of convergence over classical nonparametric density classes provided the clustering method is accurate. Performances of clustering algorithms are measured by the maximal misclassification error. We obtain upper bounds of this quantity for a single linkage hierarchical clustering algorithm. Lastly, applications of the proposed method to mixture models involving elec-tricity distribution data and simulated data are presented

arXiv.org e-Print Archive

Hal-Diderot

HAL-Rennes 1

Nonparametric Hierarchical Clustering of Functional Data

Author: C. Abraham
D.M. Blei
F. Chamroukhi
G. Delaigle
G. Hébrail
J. Rissanen
M. Abramowitz
P. Hansen
R.M. Neal
T. Cover
T. Gasser
X. Nguyen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

arXiv.org e-Print Archive

Signal detection in high energy physics via a semisupervised nonparametric approach

Author: Casa Alessandro
Menardi Giovanna
Publication venue
Publication date: 01/01/2017
Field of study

Archivio istituzionale della ricerca - Università di Padova

Model-based approach for household clustering with mixed scale variables

Author: Canale Antonio
Carmona Christian
Nieto-Barajas Luis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/11/2017
Field of study

The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for the improvement of the quality of life. To better target the social programmes, the Ministry is aimed to find clusters of households with the same needs based on demographic characteristics as well as poverty conditions of the household. Available data consists of continuous, ordinal, and nominal variables, all of which come from a non-i.i.d complex design survey sample. We propose a Bayesian nonparametric mixture model that jointly models a set of latent variables, as in an underlying variable response approach, associated to the observed mixed scale data and accommodates for the different sampling probabilities. The performance of the model is assessed via simulated data. A full analysis of socio-economic conditions in households in the Mexican State of Mexico is presented

arXiv.org e-Print Archive

Oxford University Research Archive

Archivio istituzionale della ricerca - Università di Padova