1,268,312 research outputs found
From correspondence analysis to multiple and joint correspondence analysis
The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis, with adjusted principal inertias, as the method of choice for the geometric definition, since it contains simple correspondence analysis as an exact special case, which is not the situation of the standard generalizations. We also clarify the issue of supplementary point representation and the properties of joint correspondence analysis, a method that visualizes all two-way relationships between the variables. The methodology is illustrated using data on attitudes to science from the International Social Survey Program on Environment in 1993.Correspondence analysis, eigendecomposition, joint correspondence analysis, multivariate categorical data, questionnaire data, singular value decomposition
Multiple Correspondence Analysis & the Multilogit Bilinear Model
Multiple Correspondence Analysis (MCA) is a dimension reduction method which
plays a large role in the analysis of tables with categorical nominal variables
such as survey data. Though it is usually motivated and derived using geometric
considerations, in fact we prove that it amounts to a single proximal Newtown
step of a natural bilinear exponential family model for categorical data the
multinomial logit bilinear model. We compare and contrast the behavior of MCA
with that of the model on simulations and discuss new insights on the
properties of both exploratory multivariate methods and their cognate models.
One main conclusion is that we could recommend to approximate the multilogit
model parameters using MCA. Indeed, estimating the parameters of the model is
not a trivial task whereas MCA has the great advantage of being easily solved
by singular value decomposition and scalable to large data
Computation of multiple correspondence analysis, with code in R
The generalization of simple correspondence analysis, for two categorical variables, to multiple correspondence analysis where they may be three or more variables, is not straighforward, both from a mathematical and computational point of view. In this paper we detail the exact computational steps involved in performing a multiple correspondence analysis, including the special aspects of adjusting the principal inertias to correct the percentages of inertia, supplementary points and subset analysis. Furthermore, we give the algorithm for joint correspondence analysis where the cross-tabulations of all unique pairs of variables are analysed jointly. The code in the R language for every step of the computations is given, as well as the results of each computation.Adjustment of principal inertias, Burt matrix, correspondence analysis, multiple correspondence analysis, R language, singular value decomposition, subset analysis
Canonical correspondence analysis in social science research
The use of simple and multiple correspondence analysis is well-established in social science research for understanding relationships between two or more categorical variables. By contrast, canonical correspondence analysis, which is a correspondence analysis with linear restrictions on the solution, has become one of the most popular multivariate techniques in ecological research. Multivariate ecological data typically consist of frequencies of observed species across a set of sampling locations, as well as a set of observed environmental variables at the same locations. In this context the principal dimensions of the biological variables are sought in a space that is constrained to be related to the environmental variables. This restricted form of correspondence analysis has many uses in social science research as well, as is demonstrated in this paper. We first illustrate the result that canonical correspondence analysis of an indicator matrix, restricted to be related an external categorical variable, reduces to a simple correspondence analysis of a set of concatenated (or “stacked”) tables. Then we show how canonical correspondence analysis can be used to focus on, or partial out, a particular set of response categories in sample survey data. For example, the method can be used to partial out the influence of missing responses, which usually dominate the results of a multiple correspondence analysis.Constraints, correspondence analysis, missing data, multiple correspondence
Biplots of fuzzy coded data
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.defuzzification, fuzzy coding, indicator matrix, measure of fit, multivariate data, multiple correspondence analysis, principal component analysis.
Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package
We describe an implementation of simple, multiple and joint correspondence analysis in R. The resulting package comprises two parts, one for simple correspondence analysis and one for multiple and joint correspondence analysis. Within each part, functions for computation, summaries and visualization in two and three dimensions are provided, including options to display supplementary points and perform subset analyses. Special emphasis has been put on the visualization functions that offer features such as different scaling options for biplots and three-dimensional maps using the rgl package. Graphical options include shading and sizing plot symbols for the points according to their contributions to the map and masses respectively.
SOM-based algorithms for qualitative variables
It is well known that the SOM algorithm achieves a clustering of data which
can be interpreted as an extension of Principal Component Analysis, because of
its topology-preserving property. But the SOM algorithm can only process
real-valued data. In previous papers, we have proposed several methods based on
the SOM algorithm to analyze categorical data, which is the case in survey
data. In this paper, we present these methods in a unified manner. The first
one (Kohonen Multiple Correspondence Analysis, KMCA) deals only with the
modalities, while the two others (Kohonen Multiple Correspondence Analysis with
individuals, KMCA\_ind, Kohonen algorithm on DISJonctive table, KDISJ) can take
into account the individuals, and the modalities simultaneously.Comment: Special Issue apr\`{e}s WSOM 03 \`{a} Kitakiush
- …