154 research outputs found
Distortion in Correspondence Analysis and in Taxicab Correspondence Analysis: A Comparison
Distortion is a fundamental well-studied topic in dimension reduction papers,
and intimately related with the underlying intrinsic dimension of a mapping of
a high dimensional data set onto a lower dimension. In this paper, we study
embedding distortions produced by Correspondence Analysis and its robust l1
variant Taxicab Correspondence analysis, which are visualization methods for
contingency tables. For high dimensional data, distortions in Correspondence
Analysis are contractions; while distortions in Taxicab Correspondence Analysis
could be contractions or stretchings. This shows that Euclidean geometry is
quite rigid, because of the orthogonality property; while Taxicab geometry is
quite flexible, because the orthogonality property is replaced by the conjugacy
property.Comment: 18 pages, 4 figures, 4 table
Scale Invariant Correspondence Analysis
Correspondence analysis is a dimension reduction method for visualization of
nonnegative data sets, in particular contingency tables ; but it depends on the
marginals of the data set. Two transformations of the data have been proposed
to render correspondence analysis row and column scales invariant : These two
kinds of transformations change the initial form of the data set into a
bistochastic form. The power transorfmation applied by Greenacre (2010) has one
positive parameter. While the transormation applied by Mosteller (1968) and
Goodman (1996) has (I+J) positive parameters, where the raw data is row and
column scaled by the Sinkhorn (RAS or ipf) algorithm to render it bistochastic.
Goodman (1996) named correspondence analsis of a bistochastic matrix
marginal-free correspondence analysis. We discuss these two transformations,
and further generalize Mosteller-Goodman approach.Comment: 22 pages, 3 figures, 3 table
On the choice of weights in aggregate compositional data analysis
In this paper, we distinguish between two kinds of compositional data sets:
elementary and aggregate. This fact will help us to decide the choice of the
weights to use in log interaction analysis of aggregate compositional vectors.
We show that in the aggregate case, the underlying given data form a paired
data sets composed of responses and qualitative covariates; this fact helps us
to propose two approaches for analysis-visualization of data named log
interaction of aggregates and aggregate of log interactions. Furthermore, we
also show the first-order approximation of log interaction of a cell for
different choices of the row and column weights.Comment: 3 figures, 1 table, 17 page
Direct transformations yielding the knight's move pattern in 3x3x3 arrays
Three-way arrays (or tensors) can be regarded as extensions of the traditional two-way data matrices that have a third dimension. Studying algebraic properties of arrays is relevant, for example, for the Tucker three-way PCA method, which generalizes principal component analysis to three-way data. One important algebraic property of arrays is concerned with the possibility of transformations to simplicity. An array is said to be transformed to a simple form when it can be manipulated by a sequence of invertible operations such that a vast majority of its entries become zero. This paper shows how 3 × 3 × 3 arrays, whether symmetric or nonsymmetric, can be transformed to a simple form with 18 out of its 27 entries equal to zero. We call this simple form the “knight's move pattern” due to a loose resemblance to the moves of a knight in a game of chess. The pattern was examined by Kiers, Ten Berge, and Rocci. It will be shown how the knight's move pattern can be found by means of a numeric–algebraic procedure based on the Gröbner basis. This approach seems to work almost surely for randomly generated arrays, whether symmetric or nonsymmetric
- …