12,769 research outputs found
An Explicit Nonlinear Mapping for Manifold Learning
Manifold learning is a hot research topic in the field of computer science
and has many applications in the real world. A main drawback of manifold
learning methods is, however, that there is no explicit mappings from the input
data manifold to the output embedding. This prohibits the application of
manifold learning methods in many practical problems such as classification and
target detection. Previously, in order to provide explicit mappings for
manifold learning methods, many methods have been proposed to get an
approximate explicit representation mapping with the assumption that there
exists a linear projection between the high-dimensional data samples and their
low-dimensional embedding. However, this linearity assumption may be too
restrictive. In this paper, an explicit nonlinear mapping is proposed for
manifold learning, based on the assumption that there exists a polynomial
mapping between the high-dimensional data samples and their low-dimensional
representations. As far as we know, this is the first time that an explicit
nonlinear mapping for manifold learning is given. In particular, we apply this
to the method of Locally Linear Embedding (LLE) and derive an explicit
nonlinear manifold learning algorithm, named Neighborhood Preserving Polynomial
Embedding (NPPE). Experimental results on both synthetic and real-world data
show that the proposed mapping is much more effective in preserving the local
neighborhood information and the nonlinear geometry of the high-dimensional
data samples than previous work
Landmark Diffusion Maps (L-dMaps): Accelerated manifold learning out-of-sample extension
Diffusion maps are a nonlinear manifold learning technique based on harmonic
analysis of a diffusion process over the data. Out-of-sample extensions with
computational complexity , where is the number of points
comprising the manifold, frustrate applications to online learning applications
requiring rapid embedding of high-dimensional data streams. We propose landmark
diffusion maps (L-dMaps) to reduce the complexity to , where is the number of landmark points selected using pruned spanning trees or
k-medoids. Offering speedups in out-of-sample extension, L-dMaps
enables the application of diffusion maps to high-volume and/or high-velocity
streaming data. We illustrate our approach on three datasets: the Swiss roll,
molecular simulations of a CH polymer chain, and biomolecular
simulations of alanine dipeptide. We demonstrate up to 50-fold speedups in
out-of-sample extension for the molecular systems with less than 4% errors in
manifold reconstruction fidelity relative to calculations over the full
dataset.Comment: Submitte
A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning
We present a general framework of semi-supervised dimensionality reduction
for manifold learning which naturally generalizes existing supervised and
unsupervised learning frameworks which apply the spectral decomposition.
Algorithms derived under our framework are able to employ both labeled and
unlabeled examples and are able to handle complex problems where data form
separate clusters of manifolds. Our framework offers simple views, explains
relationships among existing frameworks and provides further extensions which
can improve existing algorithms. Furthermore, a new semi-supervised
kernelization framework called ``KPCA trick'' is proposed to handle non-linear
problems.Comment: 22 pages, 9 figure
Incomplete Pivoted QR-based Dimensionality Reduction
High-dimensional big data appears in many research fields such as image
recognition, biology and collaborative filtering. Often, the exploration of
such data by classic algorithms is encountered with difficulties due to `curse
of dimensionality' phenomenon. Therefore, dimensionality reduction methods are
applied to the data prior to its analysis. Many of these methods are based on
principal components analysis, which is statistically driven, namely they map
the data into a low-dimension subspace that preserves significant statistical
properties of the high-dimensional data. As a consequence, such methods do not
directly address the geometry of the data, reflected by the mutual distances
between multidimensional data point. Thus, operations such as classification,
anomaly detection or other machine learning tasks may be affected.
This work provides a dictionary-based framework for geometrically driven data
analysis that includes dimensionality reduction, out-of-sample extension and
anomaly detection. It embeds high-dimensional data in a low-dimensional
subspace. This embedding preserves the original high-dimensional geometry of
the data up to a user-defined distortion rate. In addition, it identifies a
subset of landmark data points that constitute a dictionary for the analyzed
dataset. The dictionary enables to have a natural extension of the
low-dimensional embedding to out-of-sample data points, which gives rise to a
distortion-based criterion for anomaly detection. The suggested method is
demonstrated on synthetic and real-world datasets and achieves good results for
classification, anomaly detection and out-of-sample tasks
Locality preserving projection on SPD matrix Lie group: algorithm and analysis
Symmetric positive definite (SPD) matrices used as feature descriptors in
image recognition are usually high dimensional. Traditional manifold learning
is only applicable for reducing the dimension of high-dimensional vector-form
data. For high-dimensional SPD matrices, directly using manifold learning
algorithms to reduce the dimension of matrix-form data is impossible. The SPD
matrix must first be transformed into a long vector, and then the dimension of
this vector must be reduced. However, this approach breaks the spatial
structure of the SPD matrix space. To overcome this limitation, we propose a
new dimension reduction algorithm on SPD matrix space to transform
high-dimensional SPD matrices into low-dimensional SPD matrices. Our work is
based on the fact that the set of all SPD matrices with the same size has a Lie
group structure, and we aim to transform the manifold learning to the SPD
matrix Lie group. We use the basic idea of the manifold learning algorithm
called locality preserving projection (LPP) to construct the corresponding
Laplacian matrix on the SPD matrix Lie group. Thus, we call our approach
Lie-LPP to emphasize its Lie group character. We present a detailed algorithm
analysis and show through experiments that Lie-LPP achieves effective results
on human action recognition and human face recognition.Comment: 15 pages, 3 table
Shamap: Shape-based Manifold Learning
For manifold learning, it is assumed that high-dimensional sample/data points
are embedded on a low-dimensional manifold. Usually, distances among samples
are computed to capture an underlying data structure. Here we propose a metric
according to angular changes along a geodesic line, thereby reflecting the
underlying shape-oriented information or a topological similarity between high-
and low-dimensional representations of a data cloud. Our results demonstrate
the feasibility and merits of the proposed dimensionality reduction scheme
Principal Polynomial Analysis
This paper presents a new framework for manifold learning based on a sequence
of principal polynomials that capture the possibly nonlinear nature of the
data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by
modeling the directions of maximal variance by means of curves, instead of
straight lines. Contrarily to previous approaches, PPA reduces to performing
simple univariate regressions, which makes it computationally feasible and
robust. Moreover, PPA shows a number of interesting analytical properties.
First, PPA is a volume-preserving map, which in turn guarantees the existence
of the inverse. Second, such an inverse can be obtained in closed form.
Invertibility is an important advantage over other learning methods, because it
permits to understand the identified features in the input domain where the
data has physical meaning. Moreover, it allows to evaluate the performance of
dimensionality reduction in sensible (input-domain) units. Volume preservation
also allows an easy computation of information theoretic quantities, such as
the reduction in multi-information after the transform. Third, the analytical
nature of PPA leads to a clear geometrical interpretation of the manifold: it
allows the computation of Frenet-Serret frames (local features) and of
generalized curvatures at any point of the space. And fourth, the analytical
Jacobian allows the computation of the metric induced by the data, thus
generalizing the Mahalanobis distance. These properties are demonstrated
theoretically and illustrated experimentally. The performance of PPA is
evaluated in dimensionality and redundancy reduction, in both synthetic and
real datasets from the UCI repository
Curvature-aware Manifold Learning
Traditional manifold learning algorithms assumed that the embedded manifold
is globally or locally isometric to Euclidean space. Under this assumption,
they divided manifold into a set of overlapping local patches which are locally
isometric to linear subsets of Euclidean space. By analyzing the global or
local isometry assumptions it can be shown that the learnt manifold is a flat
manifold with zero Riemannian curvature tensor. In general, manifolds may not
satisfy these hypotheses. One major limitation of traditional manifold learning
is that it does not consider the curvature information of manifold. In order to
remove these limitations, we present our curvature-aware manifold learning
algorithm called CAML. The purpose of our algorithm is to break the local
isometry assumption and to reduce the dimension of the general manifold which
is not isometric to Euclidean space. Thus, our method adds the curvature
information to the process of manifold learning. The experiments have shown
that our method CAML is more stable than other manifold learning algorithms by
comparing the neighborhood preserving ratios.Comment: 24 pages, 4 figure
Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds
In this paper, we investigate Dimensionality reduction (DR) maps in an
information retrieval setting from a quantitative topology point of view. In
particular, we show that no DR maps can achieve perfect precision and perfect
recall simultaneously. Thus a continuous DR map must have imperfect precision.
We further prove an upper bound on the precision of Lipschitz continuous DR
maps. While precision is a natural measure in an information retrieval setting,
it does not measure `how' wrong the retrieved data is. We therefore propose a
new measure based on Wasserstein distance that comes with similar theoretical
guarantee. A key technical step in our proofs is a particular optimization
problem of the -Wasserstein distance over a constrained set of
distributions. We provide a complete solution to this optimization problem,
which can be of independent interest on the technical side.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018),
Montreal, Canad
Linearly-Recurrent Autoencoder Networks for Learning Dynamics
This paper describes a method for learning low-dimensional approximations of
nonlinear dynamical systems, based on neural-network approximations of the
underlying Koopman operator. Extended Dynamic Mode Decomposition (EDMD)
provides a useful data-driven approximation of the Koopman operator for
analyzing dynamical systems. This paper addresses a fundamental problem
associated with EDMD: a trade-off between representational capacity of the
dictionary and over-fitting due to insufficient data. A new neural network
architecture combining an autoencoder with linear recurrent dynamics in the
encoded state is used to learn a low-dimensional and highly informative
Koopman-invariant subspace of observables. A method is also presented for
balanced model reduction of over-specified EDMD systems in feature space.
Nonlinear reconstruction using partially linear multi-kernel regression aims to
improve reconstruction accuracy from the low-dimensional state when the data
has complex but intrinsically low-dimensional structure. The techniques
demonstrate the ability to identify Koopman eigenfunctions of the unforced
Duffing equation, create accurate low-dimensional models of an unstable
cylinder wake flow, and make short-time predictions of the chaotic
Kuramoto-Sivashinsky equation.Comment: 37 pages, 16 figure
- …