179 research outputs found
Riemannian kernel based Nystr\"om method for approximate infinite-dimensional covariance descriptors with application to image set classification
In the domain of pattern recognition, using the CovDs (Covariance
Descriptors) to represent data and taking the metrics of the resulting
Riemannian manifold into account have been widely adopted for the task of image
set classification. Recently, it has been proven that infinite-dimensional
CovDs are more discriminative than their low-dimensional counterparts. However,
the form of infinite-dimensional CovDs is implicit and the computational load
is high. We propose a novel framework for representing image sets by
approximating infinite-dimensional CovDs in the paradigm of the Nystr\"om
method based on a Riemannian kernel. We start by modeling the images via CovDs,
which lie on the Riemannian manifold spanned by SPD (Symmetric Positive
Definite) matrices. We then extend the Nystr\"om method to the SPD manifold and
obtain the approximations of CovDs in RKHS (Reproducing Kernel Hilbert Space).
Finally, we approximate infinite-dimensional CovDs via these approximations.
Empirically, we apply our framework to the task of image set classification.
The experimental results obtained on three benchmark datasets show that our
proposed approximate infinite-dimensional CovDs outperform the original CovDs.Comment: 6 pages, 3 figures, International Conference on Pattern Recognition
201
The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning
A diverse number of tasks in computer vision and machine learning
enjoy from representations of data that are compact yet
discriminative, informative and robust to critical measurements.
Two notable representations are offered by Region Covariance
Descriptors (RCovD) and linear subspaces which are naturally
analyzed through the manifold of Symmetric Positive Definite
(SPD) matrices and the Grassmann manifold, respectively, two
widely used types of Riemannian manifolds in computer vision.
As our first objective, we examine image and video-based
recognition applications where the local descriptors have the
aforementioned Riemannian structures, namely the SPD or linear
subspace structure. Initially, we provide a solution to compute
Riemannian version of the conventional Vector of Locally
aggregated Descriptors (VLAD), using geodesic distance of the
underlying manifold as the nearness measure. Next, by having a
closer look at the resulting codes, we formulate a new concept
which we name Local Difference Vectors (LDV). LDVs enable us to
elegantly expand our Riemannian coding techniques to any
arbitrary metric as well as provide intrinsic solutions to
Riemannian sparse coding and its variants when local structured
descriptors are considered.
We then turn our attention to two special types of covariance
descriptors namely infinite-dimensional RCovDs and rank-deficient
covariance matrices for which the underlying Riemannian
structure, i.e. the manifold of SPD matrices is out of reach to
great extent. %Generally speaking, infinite-dimensional RCovDs
offer better discriminatory power over their low-dimensional
counterparts.
To overcome this difficulty, we propose to approximate the
infinite-dimensional RCovDs by making use of two feature
mappings, namely random Fourier features and the Nystrom method.
As for the rank-deficient covariance matrices, unlike most
existing approaches that employ inference tools by predefined
regularizers, we derive positive definite kernels that can be
decomposed into the kernels on the cone of SPD matrices and
kernels on the Grassmann manifolds and show their effectiveness
for image set classification task.
Furthermore, inspired by attractive properties of Riemannian
optimization techniques, we extend the recently introduced Keep
It Simple and Straightforward MEtric learning (KISSME) method to
the scenarios where input data is non-linearly distributed. To
this end, we make use of the infinite dimensional covariance
matrices and propose techniques towards projecting on the
positive cone in a Reproducing Kernel Hilbert Space (RKHS).
We also address the sensitivity issue of the KISSME to the input
dimensionality. The KISSME algorithm is greatly dependent on
Principal Component Analysis (PCA) as a preprocessing step which
can lead to difficulties, especially when the dimensionality is
not meticulously set.
To address this issue, based on the KISSME algorithm, we develop
a Riemannian framework to jointly learn a mapping performing
dimensionality reduction and a metric in the induced space.
Lastly, in line with the recent trend in metric learning, we
devise end-to-end learning of a generic deep network for metric
learning using our derivation
Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
Transfer operators such as the Perron--Frobenius or Koopman operator play an
important role in the global analysis of complex dynamical systems. The
eigenfunctions of these operators can be used to detect metastable sets, to
project the dynamics onto the dominant slow processes, or to separate
superimposed signals. We extend transfer operator theory to reproducing kernel
Hilbert spaces and show that these operators are related to Hilbert space
representations of conditional distributions, known as conditional mean
embeddings in the machine learning community. Moreover, numerical methods to
compute empirical estimates of these embeddings are akin to data-driven methods
for the approximation of transfer operators such as extended dynamic mode
decomposition and its variants. One main benefit of the presented kernel-based
approaches is that these methods can be applied to any domain where a
similarity measure given by a kernel is available. We illustrate the results
with the aid of guiding examples and highlight potential applications in
molecular dynamics as well as video and text data analysis
Non-Linear Temporal Subspace Representations for Activity Recognition
Representations that can compactly and effectively capture the temporal
evolution of semantic content are important to computer vision and machine
learning algorithms that operate on multi-variate time-series data. We
investigate such representations motivated by the task of human action
recognition. Here each data instance is encoded by a multivariate feature (such
as via a deep CNN) where action dynamics are characterized by their variations
in time. As these features are often non-linear, we propose a novel pooling
method, kernelized rank pooling, that represents a given sequence compactly as
the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert
space, projections of data onto which captures their temporal order. We develop
this idea further and show that such a pooling scheme can be cast as an
order-constrained kernelized PCA objective. We then propose to use the
parameters of a kernelized low-rank feature subspace as the representation of
the sequences. We cast our formulation as an optimization problem on
generalized Grassmann manifolds and then solve it efficiently using Riemannian
optimization techniques. We present experiments on several action recognition
datasets using diverse feature modalities and demonstrate state-of-the-art
results.Comment: Accepted at the IEEE International Conference on Computer Vision and
Pattern Recognition, CVPR, 2018. arXiv admin note: substantial text overlap
with arXiv:1705.0858
Learning by correlation for computer vision applications: from Kernel methods to deep learning
Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications
- …