11,245 research outputs found
Continuous Action Recognition Based on Sequence Alignment
Continuous action recognition is more challenging than isolated recognition
because classification and segmentation must be simultaneously carried out. We
build on the well known dynamic time warping (DTW) framework and devise a novel
visual alignment technique, namely dynamic frame warping (DFW), which performs
isolated recognition based on per-frame representation of videos, and on
aligning a test sequence with a model sequence. Moreover, we propose two
extensions which enable to perform recognition concomitant with segmentation,
namely one-pass DFW and two-pass DFW. These two methods have their roots in the
domain of continuous recognition of speech and, to the best of our knowledge,
their extension to continuous visual action recognition has been overlooked. We
test and illustrate the proposed techniques with a recently released dataset
(RAVEL) and with two public-domain datasets widely used in action recognition
(Hollywood-1 and Hollywood-2). We also compare the performances of the proposed
isolated and continuous recognition algorithms with several recently published
methods
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
- …