51,441 research outputs found
Informative Data Projections: A Framework and Two Examples
Methods for Projection Pursuit aim to facilitate the visual exploration of
high-dimensional data by identifying interesting low-dimensional projections. A
major challenge is the design of a suitable quality metric of projections,
commonly referred to as the projection index, to be maximized by the Projection
Pursuit algorithm. In this paper, we introduce a new information-theoretic
strategy for tackling this problem, based on quantifying the amount of
information the projection conveys to a user given their prior beliefs about
the data. The resulting projection index is a subjective quantity, explicitly
dependent on the intended user. As a useful illustration, we developed this
idea for two particular kinds of prior beliefs. The first kind leads to PCA
(Principal Component Analysis), shining new light on when PCA is (not)
appropriate. The second kind leads to a novel projection index, the
maximization of which can be regarded as a robust variant of PCA. We show how
this projection index, though non-convex, can be effectively maximized using a
modified power method as well as using a semidefinite programming relaxation.
The usefulness of this new projection index is demonstrated in comparative
empirical experiments against PCA and a popular Projection Pursuit method
Projection Pursuit for Exploratory Supervised Classification
In high-dimensional data, one often seeks a few interesting low-dimensional projections that reveal important features of the data. Projection pursuit is a procedure for searching high-dimensional data for interesting low-dimensional projections via the optimization of a criterion function called the projection pursuit index. Very few projection pursuit indices incorporate class or group information in the calculation. Hence, they cannot be adequately applied in supervised classification problems to provide low-dimensional projections revealing class differences in the data . We introduce new indices derived from linear discriminant analysis that can be used for exploratory supervised classification.Data mining, Exploratory multivariate data analysis, Gene expression data, Discriminant analysis
Minimum Density Hyperplanes
Associating distinct groups of objects (clusters) with contiguous regions of
high probability density (high-density clusters), is central to many
statistical and machine learning approaches to the classification of unlabelled
data. We propose a novel hyperplane classifier for clustering and
semi-supervised classification which is motivated by this objective. The
proposed minimum density hyperplane minimises the integral of the empirical
probability density function along it, thereby avoiding intersection with high
density clusters. We show that the minimum density and the maximum margin
hyperplanes are asymptotically equivalent, thus linking this approach to
maximum margin clustering and semi-supervised support vector classifiers. We
propose a projection pursuit formulation of the associated optimisation problem
which allows us to find minimum density hyperplanes efficiently in practice,
and evaluate its performance on a range of benchmark datasets. The proposed
approach is found to be very competitive with state of the art methods for
clustering and semi-supervised classification
Sparse Representation of Astronomical Images
Sparse representation of astronomical images is discussed. It is shown that a
significant gain in sparsity is achieved when particular mixed dictionaries are
used for approximating these types of images with greedy selection strategies.
Experiments are conducted to confirm: i)Effectiveness at producing sparse
representations. ii)Competitiveness, with respect to the time required to
process large images.The latter is a consequence of the suitability of the
proposed dictionaries for approximating images in partitions of small
blocks.This feature makes it possible to apply the effective greedy selection
technique Orthogonal Matching Pursuit, up to some block size. For blocks
exceeding that size a refinement of the original Matching Pursuit approach is
considered. The resulting method is termed Self Projected Matching Pursuit,
because is shown to be effective for implementing, via Matching Pursuit itself,
the optional back-projection intermediate steps in that approach.Comment: Software to implement the approach is available on
http://www.nonlinear-approx.info/examples/node1.htm
- …