21,984 research outputs found
ManifoldNet: A Deep Network Framework for Manifold-valued Data
Deep neural networks have become the main work horse for many tasks involving
learning from data in a variety of applications in Science and Engineering.
Traditionally, the input to these networks lie in a vector space and the
operations employed within the network are well defined on vector-spaces. In
the recent past, due to technological advances in sensing, it has become
possible to acquire manifold-valued data sets either directly or indirectly.
Examples include but are not limited to data from omnidirectional cameras on
automobiles, drones etc., synthetic aperture radar imaging, diffusion magnetic
resonance imaging, elastography and conductance imaging in the Medical Imaging
domain and others. Thus, there is need to generalize the deep neural networks
to cope with input data that reside on curved manifolds where vector space
operations are not naturally admissible. In this paper, we present a novel
theoretical framework to generalize the widely popular convolutional neural
networks (CNNs) to high dimensional manifold-valued data inputs. We call these
networks, ManifoldNets.
In ManifoldNets, convolution operation on data residing on Riemannian
manifolds is achieved via a provably convergent recursive computation of the
weighted Fr\'{e}chet Mean (wFM) of the given data, where the weights makeup the
convolution mask, to be learned. Further, we prove that the proposed wFM layer
achieves a contraction mapping and hence ManifoldNet does not need the
non-linear ReLU unit used in standard CNNs. We present experiments, using the
ManifoldNet framework, to achieve dimensionality reduction by computing the
principal linear subspaces that naturally reside on a Grassmannian. The
experimental results demonstrate the efficacy of ManifoldNets in the context of
classification and reconstruction accuracy
Towards Distortion-Predictable Embedding of Neural Networks
Current research in Computer Vision has shown that Convolutional Neural
Networks (CNN) give state-of-the-art performance in many classification tasks
and Computer Vision problems. The embedding of CNN, which is the internal
representation produced by the last layer, can indirectly learn topological and
relational properties. Moreover, by using a suitable loss function, CNN models
can learn invariance to a wide range of non-linear distortions such as
rotation, viewpoint angle or lighting condition. In this work, new insights are
discovered about CNN embeddings and a new loss function is proposed, derived
from the contrastive loss, that creates models with more predicable mappings
and also quantifies distortions. In typical distortion-dependent methods, there
is no simple relation between the features corresponding to one image and the
features of this image distorted. Therefore, these methods require to
feed-forward inputs under every distortions in order to find the corresponding
features representations. Our contribution makes a step towards embeddings
where features of distorted inputs are related and can be derived from each
others by the intensity of the distortion.Comment: 54 pages, 28 figures. Master project at EPFL (Switzerland) in 2015.
For source code on GitHub, see https://github.com/axel-angel/master-projec
Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation
Due to large variations in shape, appearance, and viewing conditions, object
recognition is a key precursory challenge in the fields of object manipulation
and robotic/AI visual reasoning in general. Recognizing object categories,
particular instances of objects and viewpoints/poses of objects are three
critical subproblems robots must solve in order to accurately grasp/manipulate
objects and reason about their environments. Multi-view images of the same
object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g.
visual/depth descriptor spaces). These object manifolds share the same topology
despite being geometrically different. Each object manifold can be represented
as a deformed version of a unified manifold. The object manifolds can thus be
parameterized by its homeomorphic mapping/reconstruction from the unified
manifold. In this work, we develop a novel framework to jointly solve the three
challenging recognition sub-problems, by explicitly modeling the deformations
of object manifolds and factorizing it in a view-invariant space for
recognition. We perform extensive experiments on several challenging datasets
and achieve state-of-the-art results
Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods
Representing images and videos with Symmetric Positive Definite (SPD)
matrices, and considering the Riemannian geometry of the resulting space, has
been shown to yield high discriminative power in many visual recognition tasks.
Unfortunately, computation on the Riemannian manifold of SPD matrices
-especially of high-dimensional ones- comes at a high cost that limits the
applicability of existing techniques. In this paper, we introduce algorithms
able to handle high-dimensional SPD matrices by constructing a
lower-dimensional SPD manifold. To this end, we propose to model the mapping
from the high-dimensional SPD manifold to the low-dimensional one with an
orthonormal projection. This lets us formulate dimensionality reduction as the
problem of finding a projection that yields a low-dimensional manifold either
with maximum discriminative power in the supervised scenario, or with maximum
variance of the data in the unsupervised one. We show that learning can be
expressed as an optimization problem on a Grassmann manifold and discuss fast
solutions for special cases. Our evaluation on several classification tasks
evidences that our approach leads to a significant accuracy gain over
state-of-the-art methods.Comment: arXiv admin note: text overlap with arXiv:1407.112
Elastic Functional Coding of Riemannian Trajectories
Visual observations of dynamic phenomena, such as human actions, are often
represented as sequences of smoothly-varying features . In cases where the
feature spaces can be structured as Riemannian manifolds, the corresponding
representations become trajectories on manifolds. Analysis of these
trajectories is challenging due to non-linearity of underlying spaces and
high-dimensionality of trajectories. In vision problems, given the nature of
physical systems involved, these phenomena are better characterized on a
low-dimensional manifold compared to the space of Riemannian trajectories. For
instance, if one does not impose physical constraints of the human body, in
data involving human action analysis, the resulting representation space will
have highly redundant features. Learning an effective, low-dimensional
embedding for action representations will have a huge impact in the areas of
search and retrieval, visualization, learning, and recognition. The difficulty
lies in inherent non-linearity of the domain and temporal variability of
actions that can distort any traditional metric between trajectories. To
overcome these issues, we use the framework based on transported square-root
velocity fields (TSRVF); this framework has several desirable properties,
including a rate-invariant metric and vector space representations. We propose
to learn an embedding such that each action trajectory is mapped to a single
point in a low-dimensional Euclidean space, and the trajectories that differ
only in temporal rates map to the same point. We utilize the TSRVF
representation, and accompanying statistical summaries of Riemannian
trajectories, to extend existing coding methods such as PCA, KSVD and Label
Consistent KSVD to Riemannian trajectories or more generally to Riemannian
functions.Comment: Under major revision at IEEE T-PAMI, 201
Compact Nonlinear Maps and Circulant Extensions
Kernel approximation via nonlinear random feature maps is widely used in
speeding up kernel machines. There are two main challenges for the conventional
kernel approximation methods. First, before performing kernel approximation, a
good kernel has to be chosen. Picking a good kernel is a very challenging
problem in itself. Second, high-dimensional maps are often required in order to
achieve good performance. This leads to high computational cost in both
generating the nonlinear maps, and in the subsequent learning and prediction
process. In this work, we propose to optimize the nonlinear maps directly with
respect to the classification objective in a data-dependent fashion. The
proposed approach achieves kernel approximation and kernel learning in a joint
framework. This leads to much more compact maps without hurting the
performance. As a by-product, the same framework can also be used to achieve
more compact kernel maps to approximate a known kernel. We also introduce
Circulant Nonlinear Maps, which uses a circulant-structured projection matrix
to speed up the nonlinear maps for high-dimensional data
Locality preserving projection on SPD matrix Lie group: algorithm and analysis
Symmetric positive definite (SPD) matrices used as feature descriptors in
image recognition are usually high dimensional. Traditional manifold learning
is only applicable for reducing the dimension of high-dimensional vector-form
data. For high-dimensional SPD matrices, directly using manifold learning
algorithms to reduce the dimension of matrix-form data is impossible. The SPD
matrix must first be transformed into a long vector, and then the dimension of
this vector must be reduced. However, this approach breaks the spatial
structure of the SPD matrix space. To overcome this limitation, we propose a
new dimension reduction algorithm on SPD matrix space to transform
high-dimensional SPD matrices into low-dimensional SPD matrices. Our work is
based on the fact that the set of all SPD matrices with the same size has a Lie
group structure, and we aim to transform the manifold learning to the SPD
matrix Lie group. We use the basic idea of the manifold learning algorithm
called locality preserving projection (LPP) to construct the corresponding
Laplacian matrix on the SPD matrix Lie group. Thus, we call our approach
Lie-LPP to emphasize its Lie group character. We present a detailed algorithm
analysis and show through experiments that Lie-LPP achieves effective results
on human action recognition and human face recognition.Comment: 15 pages, 3 table
Image Representation Learning Using Graph Regularized Auto-Encoders
We consider the problem of image representation for the tasks of unsupervised
learning and semi-supervised learning. In those learning tasks, the raw image
vectors may not provide enough representation for their intrinsic structures
due to their highly dense feature space. To overcome this problem, the raw
image vectors should be mapped to a proper representation space which can
capture the latent structure of the original data and represent the data
explicitly for further learning tasks such as clustering.
Inspired by the recent research works on deep neural network and
representation learning, in this paper, we introduce the multiple-layer
auto-encoder into image representation, we also apply the locally invariant
ideal to our image representation with auto-encoders and propose a novel
method, called Graph regularized Auto-Encoder (GAE). GAE can provide a compact
representation which uncovers the hidden semantics and simultaneously respects
the intrinsic geometric structure.
Extensive experiments on image clustering show encouraging results of the
proposed algorithm in comparison to the state-of-the-art algorithms on
real-word cases.Comment: 9page
A survey of dimensionality reduction techniques
Experimental life sciences like biology or chemistry have seen in the recent
decades an explosion of the data available from experiments. Laboratory
instruments become more and more complex and report hundreds or thousands
measurements for a single experiment and therefore the statistical methods face
challenging tasks when dealing with such high dimensional data. However, much
of the data is highly redundant and can be efficiently brought down to a much
smaller number of variables without a significant loss of information. The
mathematical procedures making possible this reduction are called
dimensionality reduction techniques; they have widely been developed by fields
like Statistics or Machine Learning, and are currently a hot research topic. In
this review we categorize the plethora of dimension reduction techniques
available and give the mathematical insight behind them
Unsupervised speech representation learning using WaveNet autoencoders
We consider the task of unsupervised extraction of meaningful latent
representations of speech by applying autoencoding neural networks to speech
waveforms. The goal is to learn a representation able to capture high level
semantic content from the signal, e.g.\ phoneme identities, while being
invariant to confounding low level details in the signal such as the underlying
pitch contour or background noise. Since the learned representation is tuned to
contain only phonetic content, we resort to using a high capacity WaveNet
decoder to infer information discarded by the encoder from previous samples.
Moreover, the behavior of autoencoder models depends on the kind of constraint
that is applied to the latent representation. We compare three variants: a
simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder
(VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of
learned representations in terms of speaker independence, the ability to
predict phonetic content, and the ability to accurately reconstruct individual
spectrogram frames. Moreover, for discrete encodings extracted using the
VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a
regularization scheme that forces the representations to focus on the phonetic
content of the utterance and report performance comparable with the top entries
in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.Comment: Accepted to IEEE TASLP, final version available at
http://dx.doi.org/10.1109/TASLP.2019.293886
- …