3,714 research outputs found
Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences
This paper introduces sparse coding and dictionary learning for Symmetric
Positive Definite (SPD) matrices, which are often used in machine learning,
computer vision and related areas. Unlike traditional sparse coding schemes
that work in vector spaces, in this paper we discuss how SPD matrices can be
described by sparse combination of dictionary atoms, where the atoms are also
SPD matrices. We propose to seek sparse coding by embedding the space of SPD
matrices into Hilbert spaces through two types of Bregman matrix divergences.
This not only leads to an efficient way of performing sparse coding, but also
an online and iterative scheme for dictionary learning. We apply the proposed
methods to several computer vision tasks where images are represented by region
covariance matrices. Our proposed algorithms outperform state-of-the-art
methods on a wide range of classification tasks, including face recognition,
action recognition, material classification and texture categorization
Second-order Temporal Pooling for Action Recognition
Deep learning models for video-based action recognition usually generate
features for short clips (consisting of a few frames); such clip-level features
are aggregated to video-level representations by computing statistics on these
features. Typically zero-th (max) or the first-order (average) statistics are
used. In this paper, we explore the benefits of using second-order statistics.
Specifically, we propose a novel end-to-end learnable feature aggregation
scheme, dubbed temporal correlation pooling that generates an action descriptor
for a video sequence by capturing the similarities between the temporal
evolution of clip-level CNN features computed across the video. Such a
descriptor, while being computationally cheap, also naturally encodes the
co-activations of multiple CNN features, thereby providing a richer
characterization of actions than their first-order counterparts. We also
propose higher-order extensions of this scheme by computing correlations after
embedding the CNN features in a reproducing kernel Hilbert space. We provide
experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained
datasets such as MPII Cooking activities and JHMDB, as well as the recent
Kinetics-600. Our results demonstrate the advantages of higher-order pooling
schemes that when combined with hand-crafted features (as is standard practice)
achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV
Boosted Multiple Kernel Learning for First-Person Activity Recognition
Activity recognition from first-person (ego-centric) videos has recently
gained attention due to the increasing ubiquity of the wearable cameras. There
has been a surge of efforts adapting existing feature descriptors and designing
new descriptors for the first-person videos. An effective activity recognition
system requires selection and use of complementary features and appropriate
kernels for each feature. In this study, we propose a data-driven framework for
first-person activity recognition which effectively selects and combines
features and their respective kernels during the training. Our experimental
results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in
first-person activity recognition problem exhibits improved results in
comparison to the state-of-the-art. In addition, these techniques enable the
expansion of the framework with new features in an efficient and convenient
way.Comment: First published in the Proceedings of the 25th European Signal
Processing Conference (EUSIPCO-2017) in 2017, published by EURASI
Diffeomorphic Metric Mapping of High Angular Resolution Diffusion Imaging based on Riemannian Structure of Orientation Distribution Functions
In this paper, we propose a novel large deformation diffeomorphic
registration algorithm to align high angular resolution diffusion images
(HARDI) characterized by orientation distribution functions (ODFs). Our
proposed algorithm seeks an optimal diffeomorphism of large deformation between
two ODF fields in a spatial volume domain and at the same time, locally
reorients an ODF in a manner such that it remains consistent with the
surrounding anatomical structure. To this end, we first review the Riemannian
manifold of ODFs. We then define the reorientation of an ODF when an affine
transformation is applied and subsequently, define the diffeomorphic group
action to be applied on the ODF based on this reorientation. We incorporate the
Riemannian metric of ODFs for quantifying the similarity of two HARDI images
into a variational problem defined under the large deformation diffeomorphic
metric mapping (LDDMM) framework. We finally derive the gradient of the cost
function in both Riemannian spaces of diffeomorphisms and the ODFs, and present
its numerical implementation. Both synthetic and real brain HARDI data are used
to illustrate the performance of our registration algorithm
- …