1,940 research outputs found
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
DART: Distribution Aware Retinal Transform for Event-based Cameras
We introduce a generic visual descriptor, termed as distribution aware
retinal transform (DART), that encodes the structural context using log-polar
grids for event cameras. The DART descriptor is applied to four different
problems, namely object classification, tracking, detection and feature
matching: (1) The DART features are directly employed as local descriptors in a
bag-of-features classification framework and testing is carried out on four
standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS,
NCaltech-101). (2) Extending the classification system, tracking is
demonstrated using two key novelties: (i) For overcoming the low-sample problem
for the one-shot learning of a binary classifier, statistical bootstrapping is
leveraged with online learning; (ii) To achieve tracker robustness, the scale
and rotation equivariance property of the DART descriptors is exploited for the
one-shot learning. (3) To solve the long-term object tracking problem, an
object detector is designed using the principle of cluster majority voting. The
detection scheme is then combined with the tracker to result in a high
intersection-over-union score with augmented ground truth annotations on the
publicly available event camera dataset. (4) Finally, the event context encoded
by DART greatly simplifies the feature correspondence problem, especially for
spatio-temporal slices far apart in time, which has not been explicitly tackled
in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201
Circulant temporal encoding for video retrieval and temporal alignment
We address the problem of specific video event retrieval. Given a query video
of a specific event, e.g., a concert of Madonna, the goal is to retrieve other
videos of the same event that temporally overlap with the query. Our approach
encodes the frame descriptors of a video to jointly represent their appearance
and temporal order. It exploits the properties of circulant matrices to
efficiently compare the videos in the frequency domain. This offers a
significant gain in complexity and accurately localizes the matching parts of
videos. The descriptors can be compressed in the frequency domain with a
product quantizer adapted to complex numbers. In this case, video retrieval is
performed without decompressing the descriptors. We also consider the temporal
alignment of a set of videos. We exploit the matching confidence and an
estimate of the temporal offset computed for all pairs of videos by our
retrieval approach. Our robust algorithm aligns the videos on a global timeline
by maximizing the set of temporally consistent matches. The global temporal
alignment enables synchronous playback of the videos of a given scene
Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences
This paper introduces sparse coding and dictionary learning for Symmetric
Positive Definite (SPD) matrices, which are often used in machine learning,
computer vision and related areas. Unlike traditional sparse coding schemes
that work in vector spaces, in this paper we discuss how SPD matrices can be
described by sparse combination of dictionary atoms, where the atoms are also
SPD matrices. We propose to seek sparse coding by embedding the space of SPD
matrices into Hilbert spaces through two types of Bregman matrix divergences.
This not only leads to an efficient way of performing sparse coding, but also
an online and iterative scheme for dictionary learning. We apply the proposed
methods to several computer vision tasks where images are represented by region
covariance matrices. Our proposed algorithms outperform state-of-the-art
methods on a wide range of classification tasks, including face recognition,
action recognition, material classification and texture categorization
- …