677 research outputs found
Generalized Rank Pooling for Activity Recognition
Most popular deep models for action recognition split video sequences into
short sub-sequences consisting of a few frames; frame-based features are then
pooled for recognizing the activity. Usually, this pooling step discards the
temporal order of the frames, which could otherwise be used for better
recognition. Towards this end, we propose a novel pooling method, generalized
rank pooling (GRP), that takes as input, features from the intermediate layers
of a CNN that is trained on tiny sub-sequences, and produces as output the
parameters of a subspace which (i) provides a low-rank approximation to the
features and (ii) preserves their temporal order. We propose to use these
parameters as a compact representation for the video sequence, which is then
used in a classification setup. We formulate an objective for computing this
subspace as a Riemannian optimization problem on the Grassmann manifold, and
propose an efficient conjugate gradient scheme for solving it. Experiments on
several activity recognition datasets show that our scheme leads to
state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR), 201
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
Pooling Faces: Template based Face Recognition with Pooled Face Images
We propose a novel approach to template based face recognition. Our dual goal
is to both increase recognition accuracy and reduce the computational and
storage costs of template matching. To do this, we leverage on an approach
which was proven effective in many other domains, but, to our knowledge, never
fully explored for face images: average pooling of face photos. We show how
(and why!) the space of a template's images can be partitioned and then pooled
based on image quality and head pose and the effect this has on accuracy and
template size. We perform extensive tests on the IJB-A and Janus CS2 template
based face identification and verification benchmarks. These show that not only
does our approach outperform published state of the art despite requiring far
fewer cross template comparisons, but also, surprisingly, that image pooling
performs on par with deep feature pooling.Comment: Appeared in the IEEE Computer Society Workshop on Biometrics, IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), June, 201
3D Point Capsule Networks
In this paper, we propose 3D point-capsule networks, an auto-encoder designed
to process sparse 3D point clouds while preserving spatial arrangements of the
input data. 3D capsule networks arise as a direct consequence of our novel
unified 3D auto-encoder formulation. Their dynamic routing scheme and the
peculiar 2D latent space deployed by our approach bring in improvements for
several common point cloud-related tasks, such as object classification, object
reconstruction and part segmentation as substantiated by our extensive
evaluations. Moreover, it enables new applications such as part interpolation
and replacement.Comment: As published in CVPR 2019 (camera ready version), with supplementary
materia
3D Point Capsule Networks
In this paper, we propose 3D point-capsule networks, an auto-encoder designed
to process sparse 3D point clouds while preserving spatial arrangements of the
input data. 3D capsule networks arise as a direct consequence of our novel
unified 3D auto-encoder formulation. Their dynamic routing scheme and the
peculiar 2D latent space deployed by our approach bring in improvements for
several common point cloud-related tasks, such as object classification, object
reconstruction and part segmentation as substantiated by our extensive
evaluations. Moreover, it enables new applications such as part interpolation
and replacement
- …