1,275 research outputs found
A robust and efficient video representation for action recognition
This paper introduces a state-of-the-art video representation and applies it
to efficient action recognition and detection. We first propose to improve the
popular dense trajectory features by explicit camera motion estimation. More
specifically, we extract feature point matches between frames using SURF
descriptors and dense optical flow. The matches are used to estimate a
homography with RANSAC. To improve the robustness of homography estimation, a
human detector is employed to remove outlier matches from the human body as
human motion is not constrained by the camera. Trajectories consistent with the
homography are considered as due to camera motion, and thus removed. We also
use the homography to cancel out camera motion from the optical flow. This
results in significant improvement on motion-based HOF and MBH descriptors. We
further explore the recent Fisher vector as an alternative feature encoding
approach to the standard bag-of-words histogram, and consider different ways to
include spatial layout information in these encodings. We present a large and
varied set of evaluations, considering (i) classification of short basic
actions on six datasets, (ii) localization of such actions in feature-length
movies, and (iii) large-scale recognition of complex events. We find that our
improved trajectory features significantly outperform previous dense
trajectories, and that Fisher vectors are superior to bag-of-words encodings
for video recognition tasks. In all three tasks, we show substantial
improvements over the state-of-the-art results
Person Re-identification by Local Maximal Occurrence Representation and Metric Learning
Person re-identification is an important technique towards automatic search
of a person's presence in a surveillance video. Two fundamental problems are
critical for person re-identification, feature representation and metric
learning. An effective feature representation should be robust to illumination
and viewpoint changes, and a discriminant metric should be learned to match
various person images. In this paper, we propose an effective feature
representation called Local Maximal Occurrence (LOMO), and a subspace and
metric learning method called Cross-view Quadratic Discriminant Analysis
(XQDA). The LOMO feature analyzes the horizontal occurrence of local features,
and maximizes the occurrence to make a stable representation against viewpoint
changes. Besides, to handle illumination variations, we apply the Retinex
transform and a scale invariant texture operator. To learn a discriminant
metric, we propose to learn a discriminant low dimensional subspace by
cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is
learned on the derived subspace. We also present a practical computation method
for XQDA, as well as its regularization. Experiments on four challenging person
re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show
that the proposed method improves the state-of-the-art rank-1 identification
rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.Comment: This paper has been accepted by CVPR 2015. For source codes and
extracted features please visit
http://www.cbsr.ia.ac.cn/users/scliao/projects/lomo_xqda
Towards Effective Codebookless Model for Image Classification
The bag-of-features (BoF) model for image classification has been thoroughly
studied over the last decade. Different from the widely used BoF methods which
modeled images with a pre-trained codebook, the alternative codebook free image
modeling method, which we call Codebookless Model (CLM), attracted little
attention. In this paper, we present an effective CLM that represents an image
with a single Gaussian for classification. By embedding Gaussian manifold into
a vector space, we show that the simple incorporation of our CLM into a linear
classifier achieves very competitive accuracy compared with state-of-the-art
BoF methods (e.g., Fisher Vector). Since our CLM lies in a high dimensional
Riemannian manifold, we further propose a joint learning method of low-rank
transformation with support vector machine (SVM) classifier on the Gaussian
manifold, in order to reduce computational and storage cost. To study and
alleviate the side effect of background clutter on our CLM, we also present a
simple yet effective partial background removal method based on saliency
detection. Experiments are extensively conducted on eight widely used databases
to demonstrate the effectiveness and efficiency of our CLM method
- …