6,065 research outputs found
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds
Sparsity-based representations have recently led to notable results in
various visual recognition tasks. In a separate line of research, Riemannian
manifolds have been shown useful for dealing with features and models that do
not lie in Euclidean spaces. With the aim of building a bridge between the two
realms, we address the problem of sparse coding and dictionary learning over
the space of linear subspaces, which form Riemannian structures known as
Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into
the space of symmetric matrices by an isometric mapping. This in turn enables
us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we
propose closed-form solutions for learning a Grassmann dictionary, atom by
atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann
sparse coding and dictionary learning algorithms through embedding into Hilbert
spaces.
Experiments on several classification tasks (gender recognition, gesture
classification, scene analysis, face recognition, action recognition and
dynamic texture classification) show that the proposed approaches achieve
considerable improvements in discrimination accuracy, in comparison to
state-of-the-art methods such as kernelized Affine Hull Method and
graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio
A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset
This paper aims to determine which is the best human action recognition
method based on features extracted from RGB-D devices, such as the Microsoft
Kinect. A review of all the papers that make reference to MSR Action3D, the
most used dataset that includes depth information acquired from a RGB-D device,
has been performed. We found that the validation method used by each work
differs from the others. So, a direct comparison among works cannot be made.
However, almost all the works present their results comparing them without
taking into account this issue. Therefore, we present different rankings
according to the methodology used for the validation in orden to clarify the
existing confusion.Comment: 16 pages and 7 table
Expanding the Family of Grassmannian Kernels: An Embedding Perspective
Modeling videos and image-sets as linear subspaces has proven beneficial for
many visual recognition tasks. However, it also incurs challenges arising from
the fact that linear subspaces do not obey Euclidean geometry, but lie on a
special type of Riemannian manifolds known as Grassmannian. To leverage the
techniques developed for Euclidean spaces (e.g, support vector machines) with
subspaces, several recent studies have proposed to embed the Grassmannian into
a Hilbert space by making use of a positive definite kernel. Unfortunately,
only two Grassmannian kernels are known, none of which -as we will show- is
universal, which limits their ability to approximate a target function
arbitrarily well. Here, we introduce several positive definite Grassmannian
kernels, including universal ones, and demonstrate their superiority over
previously-known kernels in various tasks, such as classification, clustering,
sparse coding and hashing
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
- …