4,835 research outputs found
LSTA: Long Short-Term Attention for Egocentric Action Recognition
Egocentric activity recognition is one of the most challenging tasks in video
analysis. It requires a fine-grained discrimination of small objects and their
manipulation. While some methods base on strong supervision and attention
mechanisms, they are either annotation consuming or do not take spatio-temporal
patterns into account. In this paper we propose LSTA as a mechanism to focus on
features from spatial relevant parts while attention is being tracked smoothly
across the video sequence. We demonstrate the effectiveness of LSTA on
egocentric activity recognition with an end-to-end trainable two-stream
architecture, achieving state of the art performance on four standard
benchmarks.Comment: Accepted to CVPR 201
Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection
Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-ofthe-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index Terms—Action categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Ç
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Dynamic mode decomposition in vector-valued reproducing kernel Hilbert spaces for extracting dynamical structure among observables
Understanding nonlinear dynamical systems (NLDSs) is challenging in a variety
of engineering and scientific fields. Dynamic mode decomposition (DMD), which
is a numerical algorithm for the spectral analysis of Koopman operators, has
been attracting attention as a way of obtaining global modal descriptions of
NLDSs without requiring explicit prior knowledge. However, since existing DMD
algorithms are in principle formulated based on the concatenation of scalar
observables, it is not directly applicable to data with dependent structures
among observables, which take, for example, the form of a sequence of graphs.
In this paper, we formulate Koopman spectral analysis for NLDSs with structures
among observables and propose an estimation algorithm for this problem. This
method can extract and visualize the underlying low-dimensional global dynamics
of NLDSs with structures among observables from data, which can be useful in
understanding the underlying dynamics of such NLDSs. To this end, we first
formulate the problem of estimating spectra of the Koopman operator defined in
vector-valued reproducing kernel Hilbert spaces, and then develop an estimation
procedure for this problem by reformulating tensor-based DMD. As a special case
of our method, we propose the method named as Graph DMD, which is a numerical
algorithm for Koopman spectral analysis of graph dynamical systems, using a
sequence of adjacency matrices. We investigate the empirical performance of our
method by using synthetic and real-world data.Comment: 34 pages with 4 figures, Published in Neural Networks, 201
- …