24 research outputs found
ClustGeo: an R package for hierarchical clustering with spatial constraints
In this paper, we propose a Ward-like hierarchical clustering algorithm
including spatial/geographical constraints. Two dissimilarity matrices
and are inputted, along with a mixing parameter . The
dissimilarities can be non-Euclidean and the weights of the observations can be
non-uniform. The first matrix gives the dissimilarities in the "feature space"
and the second matrix gives the dissimilarities in the "constraint space". The
criterion minimized at each stage is a convex combination of the homogeneity
criterion calculated with and the homogeneity criterion calculated with
. The idea is then to determine a value of which increases the
spatial contiguity without deteriorating too much the quality of the solution
based on the variables of interest i.e. those of the feature space. This
procedure is illustrated on a real dataset using the R package ClustGeo
Multivariate Analysis of Mixed Data: The R Package PCAmixdata
Mixed data arise when observations are described by a mixture of numerical
and categorical variables. The R package PCAmixdata extends standard
multivariate analysis methods to incorporate this type of data. The key
techniques/methods included in the package are principal component analysis for
mixed data (PCAmix), varimax-like orthogonal rotation for PCAmix, and multiple
factor analysis for mixed multi-table data. This paper gives a synthetic
presentation of the three algorithms with details to help the user understand
graphical and numerical outputs of the corresponding R functions. The three
main methods are illustrated on a real dataset composed of four data tables
characterizing living conditions in different municipalities in the Gironde
region of southwest France
Sélection de variables pour la construction d'indicateurs de qualité de vie pour des données structurées en groupes
International audienceThe analysis and measurement of quality of life may be made via two complementary approaches. The first one, based on survey of individuals, concerns the analysis of levels of life satisfaction. We focus here on the second one, based on national data, which analyses living conditions of people. The aim is to create composite indices of living conditions.According to authors, the components of quality of life are related to different themes (groups of variables): ``Family conditions", ``Employment", ``Housing",... For this purpose, dimension reduction methods are particularly suitable.Multiple Factor Analysis (MFA) is a method designed to handle data structured into groups of quantitative variables. In our study, each theme is composed of a group of quantitative and/or categorical variables. Since our data are naturally structured in groups of variables, we develop an extension of MFA for mixed data type, called MFAmix. Thus the principal components from MFAmix are our composite indices for measuring quality of life. However, the creation of these indices raises two questions. How many principal components keep to create indices? How select a limited number of variables to get similar indices for easier interpretation? We propose answers to these questions in this communication
Rotation orthogonale en ACP de données mixtes. Le package PCAmixdata et une application en sociologie culturelle.
Rotation orthogonale en ACP de données mixtes. Le package PCAmixdata et une application en sociologie culturelle
Diversity As A Key To Analyze French Organic Farms: Methodological Elements
Many typologies of organic farms exist but they fail to take into account diversity, i.e. the combination of productions, which is a core principle in agroecology.Our aims were multifold: i) increase the knowledge of the organic farms (OF) ii) better characterize organic systems in terms of diversity, iii) analyze the territorial distribution of diversity types, and iv) compare diversity between conventional farms (CF) and OF.The French Observatory of Organic Agriculture (ONAB) database from Agence Bio was used. It collects data from all French organic farms and provides detail on surfaces and livestock (about 200 species).We explored complementary methods to build a classification able to reflect the type and level of diversity within the farms’ systems, and to take into account their localization. Nevertheless it was challenging and further work is needed to improve methods to better characterize organic systems with this focus on diversity
Orthogonal rotation in PCAMIX
Kiers (1991) considered the orthogonal rotation in PCAMIX, a principal
component method for a mixture of qualitative and quantitative variables.
PCAMIX includes the ordinary principal component analysis (PCA) and multiple
correspondence analysis (MCA) as special cases. In this paper, we give a new
presentation of PCAMIX where the principal components and the squared loadings
are obtained from a Singular Value Decomposition. The loadings of the
quantitative variables and the principal coordinates of the categories of the
qualitative variables are also obtained directly. In this context, we propose a
computationaly efficient procedure for varimax rotation in PCAMIX and a direct
solution for the optimal angle of rotation. A simulation study shows the good
computational behavior of the proposed algorithm. An application on a real data
set illustrates the interest of using rotation in MCA. All source codes are
available in the R package "PCAmixdata"
La littoralisation appréhendée à l'échelle d'un socio-écosystème. Conditions de vie et dynamiques démographiques du complexe littoral/estuaire de la Gironde
In regional science and planning, the measure of quality of life can integrate the various components of the living conditions of a population. But this socioeconomic and environmental diagnosis would be incomplete without the knowledge of the demographic processes that influenced the dynamics of territories. Coastal development is not limited to migration to the sea front, it combines demographic and economic growth of coast and inland areas. We propose a novel statistical approach to measure quality of life and understand the multidimensionality of demographic determinants of the coastal zone of the estuary of Gironde.Dans le domaine des sciences régionales et de l'aménagement, la mesure de la qualité de vie permet d'intégrer les différentes composantes des conditions de vie d'une population. Mais ce diagnostic socio-économique et environnemental serait incomplet sans la connaissance des processus démographiques qui ont marqué les dynamiques territoriales. La littoralisation ne se résume pas aux migrations vers le front de mer, elle mêle croissances démographique et économique des côtes et des arrière-pays. Nous proposons une approche statistique originale pour mesurer la qualité de vie et appréhender la multidimensionnalité des déterminants démographiques du complexe littoral/estuaire de la Gironde
La littoralisation appréhendée à l'échelle d'un socio-écosystème. Conditions de vie et dynamiques démographiques du complexe littoral/estuaire de la Gironde
In regional science and planning, the measure of quality of life can integrate the various components of the living conditions of a population. But this socioeconomic and environmental diagnosis would be incomplete without the knowledge of the demographic processes that influenced the dynamics of territories. Coastal development is not limited to migration to the sea front, it combines demographic and economic growth of coast and inland areas. We propose a novel statistical approach to measure quality of life and understand the multidimensionality of demographic determinants of the coastal zone of the estuary of Gironde.Dans le domaine des sciences régionales et de l'aménagement, la mesure de la qualité de vie permet d'intégrer les différentes composantes des conditions de vie d'une population. Mais ce diagnostic socio-économique et environnemental serait incomplet sans la connaissance des processus démographiques qui ont marqué les dynamiques territoriales. La littoralisation ne se résume pas aux migrations vers le front de mer, elle mêle croissances démographique et économique des côtes et des arrière-pays. Nous proposons une approche statistique originale pour mesurer la qualité de vie et appréhender la multidimensionnalité des déterminants démographiques du complexe littoral/estuaire de la Gironde
ClustGeo : Classification Ascendante Hiérarchique (CAH) avec contraintes de proximitté géographique
National audienceHierarchical Ascendant Clustering (HAC) is a well-known method of individual clustering. This method aims to bring together individuals who are similar regarding to variables which describe them. But when individuals are geographical units, the user may wish geographically close individuals to be put in same clusters and that, without too much deteriorating the quality of the partition. The proposed ClustGeo method allows geographical constraints of proximity to be taken into account within the HAC. For that purpose, a new Ward homogeneity criterion based on two different matrices of distances is proposed
Multivariate Analysis of Mixed Data: The R Package PCAmixdata
Mixed data arise when observations are described by a mixture of numerical and categorical variables. The R package PCAmixdata extends standard multivariate analysis methods to incorporate this type of data. The key techniques/methods included in the package are principal component analysis for mixed data (PCAmix), varimax-like orthogonal rotation for PCAmix, and multiple factor analysis for mixed multi-table data. This paper gives a synthetic presentation of the three algorithms with details to help the user understand graphical and numerical outputs of the corresponding R functions. The three main methods are illustrated on a real dataset composed of four data tables characterizing living conditions in different municipalities in the Gironde region of southwest France