53,780 research outputs found
Dimensionality Reduction Mappings
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
Asymptotic Generalization Bound of Fisher's Linear Discriminant Analysis
Fisher's linear discriminant analysis (FLDA) is an important dimension
reduction method in statistical pattern recognition. It has been shown that
FLDA is asymptotically Bayes optimal under the homoscedastic Gaussian
assumption. However, this classical result has the following two major
limitations: 1) it holds only for a fixed dimensionality , and thus does not
apply when and the training sample size are proportionally large; 2) it
does not provide a quantitative description on how the generalization ability
of FLDA is affected by and . In this paper, we present an asymptotic
generalization analysis of FLDA based on random matrix theory, in a setting
where both and increase and . The
obtained lower bound of the generalization discrimination power overcomes both
limitations of the classical result, i.e., it is applicable when and
are proportionally large and provides a quantitative description of the
generalization ability of FLDA in terms of the ratio and the
population discrimination power. Besides, the discrimination power bound also
leads to an upper bound on the generalization error of binary-classification
with FLDA
Uncertainty-Aware Principal Component Analysis
We present a technique to perform dimensionality reduction on data that is
subject to uncertainty. Our method is a generalization of traditional principal
component analysis (PCA) to multivariate probability distributions. In
comparison to non-linear methods, linear dimensionality reduction techniques
have the advantage that the characteristics of such probability distributions
remain intact after projection. We derive a representation of the PCA sample
covariance matrix that respects potential uncertainty in each of the inputs,
building the mathematical foundation of our new method: uncertainty-aware PCA.
In addition to the accuracy and performance gained by our approach over
sampling-based strategies, our formulation allows us to perform sensitivity
analysis with regard to the uncertainty in the data. For this, we propose
factor traces as a novel visualization that enables to better understand the
influence of uncertainty on the chosen principal components. We provide
multiple examples of our technique using real-world datasets. As a special
case, we show how to propagate multivariate normal distributions through PCA in
closed form. Furthermore, we discuss extensions and limitations of our
approach
Some steps towards a general principle for dimensionality reduction mappings
In the past years, many dimensionality reduction methods have been
established which allow to visualize high dimensional data sets. Recently,
also formal evaluation schemes have been proposed for data visualization,
which allow a quantitative evaluation along general principles. Most techniques
provide a mapping of a priorly given finite set of points only, requiring
additional steps for out-of-sample extensions. We propose a general
view on dimensionality reduction based on the concept of cost functions,
and, based on this general principle, extend dimensionality reduction to
explicit mappings of the data manifold. This offers the possibility of simple
out-of-sample extensions. Further, it opens a way towards a theory
of data visualization taking the perspective of its generalization ability
to new data points. We demonstrate the approach based in a simple
example
Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings
The recovery of the intrinsic geometric structures of data collections is an
important problem in data analysis. Supervised extensions of several manifold
learning approaches have been proposed in the recent years. Meanwhile, existing
methods primarily focus on the embedding of the training data, and the
generalization of the embedding to initially unseen test data is rather
ignored. In this work, we build on recent theoretical results on the
generalization performance of supervised manifold learning algorithms.
Motivated by these performance bounds, we propose a supervised manifold
learning method that computes a nonlinear embedding while constructing a smooth
and regular interpolation function that extends the embedding to the whole data
space in order to achieve satisfactory generalization. The embedding and the
interpolator are jointly learnt such that the Lipschitz regularity of the
interpolator is imposed while ensuring the separation between different
classes. Experimental results on several image data sets show that the proposed
method outperforms traditional classifiers and the supervised dimensionality
reduction algorithms in comparison in terms of classification accuracy in most
settings
- …