Search CORE

301 research outputs found

From correspondence analysis to multiple and joint correspondence analysis

Author: Michael Greenacre
Publication venue
Publication date
Field of study

The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis, with adjusted principal inertias, as the method of choice for the geometric definition, since it contains simple correspondence analysis as an exact special case, which is not the situation of the standard generalizations. We also clarify the issue of supplementary point representation and the properties of joint correspondence analysis, a method that visualizes all two-way relationships between the variables. The methodology is illustrated using data on attitudes to science from the International Social Survey Program on Environment in 1993.Correspondence analysis, eigendecomposition, joint correspondence analysis, multivariate categorical data, questionnaire data, singular value decomposition

Research Papers in Economics

The contributions of rare objects in correspondence analysis

Author: Michael Greenacre
Publication venue
Publication date
Field of study

Correspondence analysis, when used to visualize relationships in a table of counts (for example, abundance data in ecology), has been frequently criticized as being too sensitive to objects (for example, species) that occur with very low frequency or in very few samples. In this statistical report we show that this criticism is generally unfounded. We demonstrate this in several data sets by calculating the actual contributions of rare objects to the results of correspondence analysis and canonical correspondence analysis, both to the determination of the principal axes and to the chi-square distance. It is a fact that rare objects are often positioned as outliers in correspondence analysis maps, which gives the impression that they are highly influential, but their low weight offsets their distant positions and reduces their effect on the results. An alternative scaling of the correspondence analysis solution, the contribution biplot, is proposed as a way of mapping the results in order to avoid the problem of outlying and low contributing rare objects.Biplot, canonical correspondence analysis, contribution, correspondence analysis, influence, outlier, scaling

Research Papers in Economics

Canonical correspondence analysis in social science research

Author: Michael Greenacre
Publication venue
Publication date
Field of study

The use of simple and multiple correspondence analysis is well-established in social science research for understanding relationships between two or more categorical variables. By contrast, canonical correspondence analysis, which is a correspondence analysis with linear restrictions on the solution, has become one of the most popular multivariate techniques in ecological research. Multivariate ecological data typically consist of frequencies of observed species across a set of sampling locations, as well as a set of observed environmental variables at the same locations. In this context the principal dimensions of the biological variables are sought in a space that is constrained to be related to the environmental variables. This restricted form of correspondence analysis has many uses in social science research as well, as is demonstrated in this paper. We first illustrate the result that canonical correspondence analysis of an indicator matrix, restricted to be related an external categorical variable, reduces to a simple correspondence analysis of a set of concatenated (or “stacked”) tables. Then we show how canonical correspondence analysis can be used to focus on, or partial out, a particular set of response categories in sample survey data. For example, the method can be used to partial out the influence of missing responses, which usually dominate the results of a multiple correspondence analysis.Constraints, correspondence analysis, missing data, multiple correspondence

Research Papers in Economics

Analysis of matched matrices

Author: Michael Greenacre
Publication venue
Publication date
Field of study

We consider the joint visualization of two matrices which have common rows and columns, for example multivariate data observed at two time points or split accord-ing to a dichotomous variable. Methods of interest include principal components analysis for interval-scaled data, or correspondence analysis for frequency data or ratio-scaled variables on commensurate scales. A simple result in matrix algebra shows that by setting up the matrices in a particular block format, matrix sum and difference components can be visualized. The case when we have more than two matrices is also discussed and the methodology is applied to data from the International Social Survey Program.Correspondence analysis, International Social Survey Program (ISSP), matched matrices, principal component analysis, singular-value decomposition

Research Papers in Economics

Power transformations in correspondence analysis

Author: Michael Greenacre
Publication venue
Publication date
Field of study

Power transformations of positive data tables, prior to applying the correspondence analysis algorithm, are shown to open up a family of methods with direct connections to the analysis of log-ratios. Two variations of this idea are illustrated. The first approach is simply to power the original data and perform a correspondence analysis – this method is shown to converge to unweighted log-ratio analysis as the power parameter tends to zero. The second approach is to apply the power transformation to the contingency ratios, that is the values in the table relative to expected values based on the marginals – this method converges to weighted log-ratio analysis, or the spectral map. Two applications are described: first, a matrix of population genetic data which is inherently two-dimensional, and second, a larger cross-tabulation with higher dimensionality, from a linguistic analysis of several books.Box-Cox transformation, chi-square distance, contingency ratio, correspondence analysis, log-ratio analysis, power transformation, ratio data, singular value decomposition, spectral map

Research Papers in Economics

Correspondence analysis of raw data

Author: Michael Greenacre
Publication venue
Publication date
Field of study

Correspondence analysis has found extensive use in ecology, archeology, linguistics and the social sciences as a method for visualizing the patterns of association in a table of frequencies or nonnegative ratio-scale data. Inherent to the method is the expression of the data in each row or each column relative to their respective totals, and it is these sets of relative values (called profiles) that are visualized. This ‘relativization’ of the data makes perfect sense when the margins of the table represent samples from sub-populations of inherently different sizes. But in some ecological applications sampling is performed on equal areas or equal volumes so that the absolute levels of the observed occurrences may be of relevance, in which case relativization may not be required. In this paper we define the correspondence analysis of the raw ‘unrelativized’ data and discuss its properties, comparing this new method to regular correspondence analysis and to a related variant of non-symmetric correspondence analysis.Abundance data, biplot, Bray-Curtis dissimilarity, profile, size and shape, visualisation

Research Papers in Economics

Measuring subcompositional incoherence

Author: Michael Greenacre
Publication venue
Publication date
Field of study

Subcompositional coherence is a fundamental property of Aitchison’s approach to compositional data analysis, and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence, that is incoherence, can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods, which might be better suited to cope with problems such as data zeros and outliers, while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix which can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.correspondence analysis, compositional data, chi-square distance, log-ratio distance, multidimensional scaling, stress, subcompositional coherence

Research Papers in Economics

Contribution biplots

Author: Michael Greenacre
Publication venue
Publication date
Field of study

In order to interpret the biplot it is necessary to know which points – usually variables – are the ones that are important contributors to the solution, and this information is available separately as part of the biplot’s numerical results. We propose a new scaling of the display, called the contribution biplot, which incorporates this diagnostic directly into the graphical display, showing visually the important contributors and thus facilitating the biplot interpretation and often simplifying the graphical representation considerably. The contribution biplot can be applied to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. In the contribution biplot one set of points, usually the rows of the data matrix, optimally represent the spatial positions of the cases or sample units, according to some distance measure that usually incorporates some form of standardization unless all data are comparable in scale. The other set of points, usually the columns, is represented by vectors that are related to their contributions to the low-dimensional solution. A fringe benefit is that usually only one common scale for row and column points is needed on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot legible. Furthermore, this version of the biplot also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important, when they are in fact contributing minimally to the solution.biplot, contributions, correspondence analysis, discriminant analysis, log-ratio analysis, MANOVA, principal component analysis, scaling, singular value decomposition, weighting.

Research Papers in Economics

Weighted metric multidimensional scaling

Author: Michael Greenacre
Publication venue
Publication date
Field of study

This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.Biplot, correspondence analysis, distance, multidimensional scaling, singular-value decomposition

Research Papers in Economics

Dynamic perceptual mapping

Author: Michael Greenacre
Publication venue
Publication date
Field of study

Perceptual maps have been used for decades by market researchers to illuminate them about the similarity between brands in terms of a set of attributes, to position consumers relative to brands in terms of their preferences, or to study how demographic and psychometric variables relate to consumer choice. Invariably these maps are two-dimensional and static. As we enter the era of electronic publishing, the possibilities for dynamic graphics are opening up. We demonstrate the usefulness of introducing motion into perceptual maps through four examples. The first example shows how a perceptual map can be viewed in three dimensions, and the second one moves between two analyses of the data that were collected according to different protocols. In a third example we move from the best view of the data at the individual level to one which focuses on between-group differences in aggregated data. A final example considers the case when several demographic variables or market segments are available for each respondent, showing an animation with increasingly detailed demographic comparisons. These examples of dynamic maps use several data sets from marketing and social science research.Animation, brand-attribute maps, correspondence analysis, multidimensional scaling, perceptual map, visualization

Research Papers in Economics