23 research outputs found
Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks
This paper addresses the problem of continuous gesture recognition from
sequences of depth maps using convolutional neutral networks (ConvNets). The
proposed method first segments individual gestures from a depth sequence based
on quantity of movement (QOM). For each segmented gesture, an Improved Depth
Motion Map (IDMM), which converts the depth sequence into one image, is
constructed and fed to a ConvNet for recognition. The IDMM effectively encodes
both spatial and temporal information and allows the fine-tuning with existing
ConvNet models for classification without introducing millions of parameters to
learn. The proposed method is evaluated on the Large-scale Continuous Gesture
Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved
the performance of 0.2655 (Mean Jaccard Index) and ranked place in
this challenge
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
Expanding the Family of Grassmannian Kernels: An Embedding Perspective
Modeling videos and image-sets as linear subspaces has proven beneficial for
many visual recognition tasks. However, it also incurs challenges arising from
the fact that linear subspaces do not obey Euclidean geometry, but lie on a
special type of Riemannian manifolds known as Grassmannian. To leverage the
techniques developed for Euclidean spaces (e.g, support vector machines) with
subspaces, several recent studies have proposed to embed the Grassmannian into
a Hilbert space by making use of a positive definite kernel. Unfortunately,
only two Grassmannian kernels are known, none of which -as we will show- is
universal, which limits their ability to approximate a target function
arbitrarily well. Here, we introduce several positive definite Grassmannian
kernels, including universal ones, and demonstrate their superiority over
previously-known kernels in various tasks, such as classification, clustering,
sparse coding and hashing
Parametric Regression on the Grassmannian
We address the problem of fitting parametric curves on the Grassmann manifold
for the purpose of intrinsic parametric regression. As customary in the
literature, we start from the energy minimization formulation of linear
least-squares in Euclidean spaces and generalize this concept to general
nonflat Riemannian manifolds, following an optimal-control point of view. We
then specialize this idea to the Grassmann manifold and demonstrate that it
yields a simple, extensible and easy-to-implement solution to the parametric
regression problem. In fact, it allows us to extend the basic geodesic model to
(1) a time-warped variant and (2) cubic splines. We demonstrate the utility of
the proposed solution on different vision problems, such as shape regression as
a function of age, traffic-speed estimation and crowd-counting from
surveillance video clips. Most notably, these problems can be conveniently
solved within the same framework without any specifically-tailored steps along
the processing pipeline.Comment: 14 pages, 11 figure
Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds
Sparsity-based representations have recently led to notable results in
various visual recognition tasks. In a separate line of research, Riemannian
manifolds have been shown useful for dealing with features and models that do
not lie in Euclidean spaces. With the aim of building a bridge between the two
realms, we address the problem of sparse coding and dictionary learning over
the space of linear subspaces, which form Riemannian structures known as
Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into
the space of symmetric matrices by an isometric mapping. This in turn enables
us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we
propose closed-form solutions for learning a Grassmann dictionary, atom by
atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann
sparse coding and dictionary learning algorithms through embedding into Hilbert
spaces.
Experiments on several classification tasks (gender recognition, gesture
classification, scene analysis, face recognition, action recognition and
dynamic texture classification) show that the proposed approaches achieve
considerable improvements in discrimination accuracy, in comparison to
state-of-the-art methods such as kernelized Affine Hull Method and
graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio
Multi-Order Statistical Descriptors for Real-Time Face Recognition and Object Classification
We propose novel multi-order statistical descriptors which can be used for high speed object classification or face recognition from videos or image sets. We represent each gallery set with a global second-order statistic which captures correlated global variations in all feature directions as well as the common set structure. A lightweight descriptor is then constructed by efficiently compacting the second-order statistic using Cholesky decomposition. We then enrich the descriptor with the first-order statistic of the gallery set to further enhance the representation power. By projecting the descriptor into a low-dimensional discriminant subspace, we obtain further dimensionality reduction, while the discrimination power of the proposed representation is still preserved. Therefore, our method represents a complex image set by a single descriptor having significantly reduced dimensionality. We apply the proposed algorithm on image set and video-based face and periocular biometric identification, object category recognition, and hand gesture recognition. Experiments on six benchmark data sets validate that the proposed method achieves significantly better classification accuracy with lower computational complexity than the existing techniques. The proposed compact representations can be used for real-time object classification and face recognition in videos. 2013 IEEE.This work was supported by NPRP through the Qatar National Research Fund (a member of Qatar Foundation) under Grant 7-1711-1-312.Scopu