255 research outputs found

    The contributions of rare objects in correspondence analysis

    Get PDF
    Correspondence analysis, when used to visualize relationships in a table of counts (for example, abundance data in ecology), has been frequently criticized as being too sensitive to objects (for example, species) that occur with very low frequency or in very few samples. In this statistical report we show that this criticism is generally unfounded. We demonstrate this in several data sets by calculating the actual contributions of rare objects to the results of correspondence analysis and canonical correspondence analysis, both to the determination of the principal axes and to the chi-square distance. It is a fact that rare objects are often positioned as outliers in correspondence analysis maps, which gives the impression that they are highly influential, but their low weight offsets their distant positions and reduces their effect on the results. An alternative scaling of the correspondence analysis solution, the contribution biplot, is proposed as a way of mapping the results in order to avoid the problem of outlying and low contributing rare objects.Biplot, canonical correspondence analysis, contribution, correspondence analysis, influence, outlier, scaling

    Analysis of matched matrices

    Get PDF
    We consider the joint visualization of two matrices which have common rows and columns, for example multivariate data observed at two time points or split accord-ing to a dichotomous variable. Methods of interest include principal components analysis for interval-scaled data, or correspondence analysis for frequency data or ratio-scaled variables on commensurate scales. A simple result in matrix algebra shows that by setting up the matrices in a particular block format, matrix sum and difference components can be visualized. The case when we have more than two matrices is also discussed and the methodology is applied to data from the International Social Survey Program.Correspondence analysis, International Social Survey Program (ISSP), matched matrices, principal component analysis, singular-value decomposition

    Correspondence analysis of raw data

    Get PDF
    Correspondence analysis has found extensive use in ecology, archeology, linguistics and the social sciences as a method for visualizing the patterns of association in a table of frequencies or nonnegative ratio-scale data. Inherent to the method is the expression of the data in each row or each column relative to their respective totals, and it is these sets of relative values (called profiles) that are visualized. This ‘relativization’ of the data makes perfect sense when the margins of the table represent samples from sub-populations of inherently different sizes. But in some ecological applications sampling is performed on equal areas or equal volumes so that the absolute levels of the observed occurrences may be of relevance, in which case relativization may not be required. In this paper we define the correspondence analysis of the raw ‘unrelativized’ data and discuss its properties, comparing this new method to regular correspondence analysis and to a related variant of non-symmetric correspondence analysis.Abundance data, biplot, Bray-Curtis dissimilarity, profile, size and shape, visualisation

    Measuring subcompositional incoherence

    Get PDF
    Subcompositional coherence is a fundamental property of Aitchison’s approach to compositional data analysis, and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence, that is incoherence, can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods, which might be better suited to cope with problems such as data zeros and outliers, while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix which can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.correspondence analysis, compositional data, chi-square distance, log-ratio distance, multidimensional scaling, stress, subcompositional coherence

    Contribution biplots

    Get PDF
    In order to interpret the biplot it is necessary to know which points – usually variables – are the ones that are important contributors to the solution, and this information is available separately as part of the biplot’s numerical results. We propose a new scaling of the display, called the contribution biplot, which incorporates this diagnostic directly into the graphical display, showing visually the important contributors and thus facilitating the biplot interpretation and often simplifying the graphical representation considerably. The contribution biplot can be applied to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. In the contribution biplot one set of points, usually the rows of the data matrix, optimally represent the spatial positions of the cases or sample units, according to some distance measure that usually incorporates some form of standardization unless all data are comparable in scale. The other set of points, usually the columns, is represented by vectors that are related to their contributions to the low-dimensional solution. A fringe benefit is that usually only one common scale for row and column points is needed on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot legible. Furthermore, this version of the biplot also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important, when they are in fact contributing minimally to the solution.biplot, contributions, correspondence analysis, discriminant analysis, log-ratio analysis, MANOVA, principal component analysis, scaling, singular value decomposition, weighting.

    Weighted metric multidimensional scaling

    Get PDF
    This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.Biplot, correspondence analysis, distance, multidimensional scaling, singular-value decomposition

    Dynamic perceptual mapping

    Get PDF
    Perceptual maps have been used for decades by market researchers to illuminate them about the similarity between brands in terms of a set of attributes, to position consumers relative to brands in terms of their preferences, or to study how demographic and psychometric variables relate to consumer choice. Invariably these maps are two-dimensional and static. As we enter the era of electronic publishing, the possibilities for dynamic graphics are opening up. We demonstrate the usefulness of introducing motion into perceptual maps through four examples. The first example shows how a perceptual map can be viewed in three dimensions, and the second one moves between two analyses of the data that were collected according to different protocols. In a third example we move from the best view of the data at the individual level to one which focuses on between-group differences in aggregated data. A final example considers the case when several demographic variables or market segments are available for each respondent, showing an animation with increasingly detailed demographic comparisons. These examples of dynamic maps use several data sets from marketing and social science research.Animation, brand-attribute maps, correspondence analysis, multidimensional scaling, perceptual map, visualization

    Biplots of fuzzy coded data

    Get PDF
    A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.defuzzification, fuzzy coding, indicator matrix, measure of fit, multivariate data, multiple correspondence analysis, principal component analysis.

    Multiple correspondence analysis of a subset of response categories

    Get PDF
    In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.Categorical data, correspondence analysis, questionnaire survey
    corecore