Search CORE

74 research outputs found

ClustGeo: an R package for hierarchical clustering with spatial constraints

Author: Chavent Marie
Kuentz-Simonet Vanessa
Labenne Amaury
Saracco Jérôme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2017
Field of study

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices

D_0

and

D_1

are inputted, along with a mixing parameter

\alpha \in [0,1]

. The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the "feature space" and the second matrix gives the dissimilarities in the "constraint space". The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with

D_0

and the homogeneity criterion calculated with

D_1

. The idea is then to determine a value of

\alpha

which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Oskar Bordeaux

Multivariate Analysis of Mixed Data: The R Package PCAmixdata

Author: Chavent Marie
Kuentz-Simonet Vanessa
Labenne Amaury
Saracco Jérôme
Publication venue
Publication date: 08/12/2017
Field of study

Mixed data arise when observations are described by a mixture of numerical and categorical variables. The R package PCAmixdata extends standard multivariate analysis methods to incorporate this type of data. The key techniques/methods included in the package are principal component analysis for mixed data (PCAmix), varimax-like orthogonal rotation for PCAmix, and multiple factor analysis for mixed multi-table data. This paper gives a synthetic presentation of the three algorithms with details to help the user understand graphical and numerical outputs of the corresponding R functions. The three main methods are illustrated on a real dataset composed of four data tables characterizing living conditions in different municipalities in the Gironde region of southwest France

arXiv.org e-Print Archive

ESE - Salento University Publishing

INRIA a CCSD electronic archive server

HAL Descartes

Università del Salento: ESE - Salento University Publishing

ClustOfVar-based approach for unsupervised learning: Reading of synthetic variables with sociological data

Author: Candau Jacqueline
Deuffic Philippe
Kuentz Vanessa
Lyser Sandrine
Publication venue: Coordinamento SIBA - Università del Salento
Publication date: 21/10/2015
Field of study

This paper proposes an original data mining method for unsupervised learning, replacing traditional factor analysis with a system of variable clustering. Clustering of variables aims to group together variables that are strongly related to each other, i.e. containing the same information. We recently proposed the ClustOfVar method, specifically devoted to variable clustering, regardless of whether the variables are numeric or categorical in nature. It simultaneously provides homogeneous clusters of variables and their corresponding synthetic variables that can be read as a kind of gradient. In this algorithm, the homogeneity criterion of a cluster is defined by the squared Pearson correlation for the numeric variables and by the correlation ratio for the categorical variables. This method was tested on categorical data relating to French farmers and their perception of the environment. The use of synthetic variables provided us with an original approach of identifying the way farmers reconfigured the questions put to them

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing

L'adoption en France des normes IFRS relatives aux incorporels : bouleversement des pratiques ou inertie ?

Author: Corinne Bessieux-Ollier
Elisabeth Walliser
Marie Chavent
Vanessa Kuentz
Publication venue
Publication date
Field of study

Cet article examine l'adoption obligatoire en France des normes IFRS relatives aux incorporels. Une typologie des pratiques comptables liées aux incorporels à la période de transition aux normes IFRS est recherchée. Les résultats font ressortir trois classes d'entreprises affectées différemment par le passage aux normes internationales. La première classe est caractérisée par un changement important avec une forte augmentation du goodwill liée au retraitement d'immobilisations incorporelles comme les parts de marché. Les deuxième et troisième classes se caractérisent par une stabilité. Le phénomène d'inertie (Nobes, 2006) selon lequel les traitements comptables pré-IFRS pourraient perdurer sous normes IFRS est vérifié.Capital immatériel ; Analyse typologique ; DIV ; Goodwill ; Incorporels ; IFRS ; Transition

Research Papers in Economics

Clustering of categorical variables around latent variables

Author: Jérome SARACCO (GREThA UMR CNRS 5113)
Marie CHAVENT (IMB UMR CNRS 5251)
Vanessa KUENTZ (IMB UMR CNRS 5251)
Publication venue
Publication date
Field of study

In the framework of clustering, the usual aim is to cluster observations and not variables. However the issue of variable clustering clearly appears for dimension reduction, selection of variables or in some case studies (sensory analysis, biochemistry, marketing, etc.). Clustering of variables is then studied as a way to arrange variables into homogeneous clusters, thereby organizing data into meaningful structures. Once the variables are clustered into groups such that variables are similar to the other variables belonging to their cluster, the selection of a subset of variables is possible. Several specific methods have been developed for the clustering of numerical variables. However concerning categorical variables, much less methods have been proposed. In this paper we extend the criterion used by Vigneau and Qannari (2003) in their Clustering around Latent Variables approach for numerical variables to the case of categorical data. The homogeneity criterion of a cluster of categorical variables is defined as the sum of the correlation ratio between the categorical variables and a latent variable, which is in this case a numerical variable. We show that the latent variable maximizing the homogeneity of a cluster can be obtained with Multiple Correspondence Analysis. Different algorithms for the clustering of categorical variables are proposed: iterative relocation algorithm, ascendant and divisive hierarchical clustering. The proposed methodology is illustrated by a real data application to satisfaction of pleasure craft operators.clustering of categorical variables, correlation ratio, iterative relocation algorithm, hierarchical clustering

Research Papers in Economics

Une solution analytique pour la rotation planaire en Analyse Factorielle des Correspondances Multiples

Author: Chavent Marie
Kuentz Vanessa
Saracco Jérôme
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceL'Analyse en Composantes Principales (ACP) et l'Analyse Factorielle des Correspondances Multiples (AFCM) sont respectivement deux méthodes de description statistique multidimensionnelle de données quantitatives et qualitatives. Une rotation peut ensuite être appliquée à la matrice des scores des composantes principales. La définition d'un critère de rotation permet alors d'obtenir une structure simple, facilitant ainsi l'interprétation des résultats. Une solution analytique en deux dimensions a été proposée pour le critère varimax en ACP. Nous proposons ici une solution analytique en deux dimensions pour la rotation en AFCM utilisant un critère inspiré de varimax et basé sur la notion de rapport de corrélation

INRIA a CCSD electronic archive server

Oskar Bordeaux

Rotation in Multiple Correspondence Analysis: a planar rotation iterative procedure

Author: Jérome SARACCO (GREThA UMR CNRS 5113)
Marie CHAVENT (IMB UMR CNRS 5251)
Vanessa KUENTZ (IMB UMR CNRS 5251)
Publication venue
Publication date
Field of study

Multiple Correspondence Analysis (MCA) is a well-known multivariate method for statistical description of categorical data (see for instance Greenacre and Blasius, 2006). Similarly to what is done in Principal Component Analysis (PCA) and Factor Analysis, the MCA solution can be rotated to increase the components simplicity. The idea behind a rotation is to find subsets of variables which coincide more clearly with the rotated components. This implies that maximizing components simplicity can help in factor interpretation and in variables clustering. In PCA, the probably most famous rotation criterion is the varimax one introduced by Kaiser (1958). Besides, Kiers (1991) proposed a rotation criterion in his method named PCAMIX developed for the analysis of both numerical and categorical data, and including PCA and MCA as special cases. In case of only categorical data, this criterion is a varimax-based one relying on the correlation ratio between the categorical variables and the MCA numerical components. The optimization of this criterion is then reached by the algorithm of De Leeuw and Pruzansky (1978). In this paper, we give the analytic expression of the optimal angle of planar rotation for this criterion. If more than two principal components are to be retained, similarly to what is done by Kaiser (1958) for PCA, this planar solution is computed in a practical algorithm applying successive pairwise planar rotations for optimizing the rotation criterion. A simulation study is used to illustrate the analytic expression of the angle for planar rotation. The proposed procedure is also applied on a real data set to show the possible benefits of using rotation in MCA.categorical data, multiple correspondence analysis, correlation ratio, rotation, varimax criterion

Research Papers in Economics

Multivariate Analysis of Mixed Data. The R Package PCAmixdata

Author: Chavent Marie
Kuentz Vanessa
Labenne Amaury
Saracco Jérôme
Publication venue: Coordinamento SIBA - Università del Salento
Publication date: 29/12/2022
Field of study

Mixed data arise when observations are described by a mixture of numerical and categorical variables. The R package PCAmixdata extends to this type of data standard multivariate analysis methods which allow description, exploration and visualization of the data. The key techniques/methods included in the package are principal component analysis for mixed data (PCAmix), varimax-like orthogonal rotation for PCAmix, and multiple factor analysis for mixed multi-table data. This paper proposes a unified mathematical presentation of the different methods with common notations, as well as providing a summarised presentation of the three algorithms, with details to help the user understand graphical and numerical outputs of the corresponding R functions. This then allows the user to easily provide relevant interpretations of the results obtained. The three main methods are illustrated on a real dataset composed of four data tables characterizing living conditions in different municipalities in the Gironde region of southwest France

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing