4,277 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds
Sparsity-based representations have recently led to notable results in
various visual recognition tasks. In a separate line of research, Riemannian
manifolds have been shown useful for dealing with features and models that do
not lie in Euclidean spaces. With the aim of building a bridge between the two
realms, we address the problem of sparse coding and dictionary learning over
the space of linear subspaces, which form Riemannian structures known as
Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into
the space of symmetric matrices by an isometric mapping. This in turn enables
us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we
propose closed-form solutions for learning a Grassmann dictionary, atom by
atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann
sparse coding and dictionary learning algorithms through embedding into Hilbert
spaces.
Experiments on several classification tasks (gender recognition, gesture
classification, scene analysis, face recognition, action recognition and
dynamic texture classification) show that the proposed approaches achieve
considerable improvements in discrimination accuracy, in comparison to
state-of-the-art methods such as kernelized Affine Hull Method and
graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio
When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data
Human action recognition from skeletal data is a hot research topic and
important in many open domain applications of computer vision, thanks to
recently introduced 3D sensors. In the literature, naive methods simply
transfer off-the-shelf techniques from video to the skeletal representation.
However, the current state-of-the-art is contended between to different
paradigms: kernel-based methods and feature learning with (recurrent) neural
networks. Both approaches show strong performances, yet they exhibit heavy, but
complementary, drawbacks. Motivated by this fact, our work aims at combining
together the best of the two paradigms, by proposing an approach where a
shallow network is fed with a covariance representation. Our intuition is that,
as long as the dynamics is effectively modeled, there is no need for the
classification network to be deep nor recurrent in order to score favorably. We
validate this hypothesis in a broad experimental analysis over 6 publicly
available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop
Deep representation learning for human motion prediction and classification
Generative models of 3D human motion are often restricted to a small number
of activities and can therefore not generalize well to novel movements or
applications. In this work we propose a deep learning framework for human
motion capture data that learns a generic representation from a large corpus of
motion capture data and generalizes well to new, unseen, motions. Using an
encoding-decoding network that learns to predict future 3D poses from the most
recent past, we extract a feature representation of human motion. Most work on
deep learning for sequence prediction focuses on video and speech. Since
skeletal data has a different structure, we present and evaluate different
network architectures that make different assumptions about time dependencies
and limb correlations. To quantify the learned features, we use the output of
different layers for action classification and visualize the receptive fields
of the network units. Our method outperforms the recent state of the art in
skeletal motion prediction even though these use action specific training data.
Our results show that deep feedforward networks, trained from a generic mocap
database, can successfully be used for feature extraction from human motion
data and that this representation can be used as a foundation for
classification and prediction.Comment: This paper is published at the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 201
Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification
In order to encode the class correlation and class specific information in
image representation, we propose a new local feature learning approach named
Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to
hierarchically learn feature transformation filter banks to transform raw pixel
image patches to features. The learned filter banks are expected to: (1) encode
common visual patterns of a flexible number of categories; (2) encode
discriminative information; and (3) hierarchically extract patterns at
different visual levels. Particularly, in each single layer of DDSFL, shareable
filters are jointly learned for classes which share the similar patterns.
Discriminative power of the filters is achieved by enforcing the features from
the same category to be close, while features from different categories to be
far away from each other. Furthermore, we also propose two exemplar selection
methods to iteratively select training data for more efficient and effective
learning. Based on the experimental results, DDSFL can achieve very promising
performance, and it also shows great complementary effect to the
state-of-the-art Caffe features.Comment: Pattern Recognition, Elsevier, 201
- …