22,822 research outputs found
Probabilistic Auto-Associative Models and Semi-Linear PCA
Auto-Associative models cover a large class of methods used in data analysis.
In this paper, we describe the generals properties of these models when the
projection component is linear and we propose and test an easy to implement
Probabilistic Semi-Linear Auto- Associative model in a Gaussian setting. We
show it is a generalization of the PCA model to the semi-linear case. Numerical
experiments on simulated datasets and a real astronomical application highlight
the interest of this approac
Informative Data Projections: A Framework and Two Examples
Methods for Projection Pursuit aim to facilitate the visual exploration of
high-dimensional data by identifying interesting low-dimensional projections. A
major challenge is the design of a suitable quality metric of projections,
commonly referred to as the projection index, to be maximized by the Projection
Pursuit algorithm. In this paper, we introduce a new information-theoretic
strategy for tackling this problem, based on quantifying the amount of
information the projection conveys to a user given their prior beliefs about
the data. The resulting projection index is a subjective quantity, explicitly
dependent on the intended user. As a useful illustration, we developed this
idea for two particular kinds of prior beliefs. The first kind leads to PCA
(Principal Component Analysis), shining new light on when PCA is (not)
appropriate. The second kind leads to a novel projection index, the
maximization of which can be regarded as a robust variant of PCA. We show how
this projection index, though non-convex, can be effectively maximized using a
modified power method as well as using a semidefinite programming relaxation.
The usefulness of this new projection index is demonstrated in comparative
empirical experiments against PCA and a popular Projection Pursuit method
MaxSkew and MultiSkew: Two R Packages for Detecting, Measuring and Removing Multivariate Skewness
Skewness plays a relevant role in several multivariate statistical
techniques. Sometimes it is used to recover data features, as in cluster
analysis. In other circumstances, skewness impairs the performances of
statistical methods, as in the Hotelling's one-sample test. In both cases,
there is the need to check the symmetry of the underlying distribution, either
by visual inspection or by formal testing. The R packages MaxSkew and MultiSkew
address these issues by measuring, testing and removing skewness from
multivariate data. Skewness is assessed by the third multivariate cumulant and
its functions. The hypothesis of symmetry is tested either nonparametrically,
with the bootstrap, or parametrically, under the normality assumption. Skewness
is removed or at least alleviated by projecting the data onto appropriate
linear subspaces. Usages of MaxSkew and MultiSkew are illustrated with the Iris
dataset
- …