Dimensionality reduction by clustering of variables while setting aside atypical variables

Abstract

Clustering of variables is one possible approach for reducing the dimensionality of a dataset. However, all the variables are usually assigned to one of the clusters, even the scattered variables associated with atypical or noise information. The presence of this type of information could obscure the interpretation of the latent variables associated with the clusters, or even give rise to artificial clusters. We propose two strategies to address this problem. The first is a "K +1" strategy, which consists of introducing an additional group of variables,  called the "noise cluster" for simplicity. The second is based on the definition of sparse latent variables. Both strategies result in refined clusters for the identification of more relevant latent variables

    Similar works