9 research outputs found
Skeleton-based Action Recognition of People Handling Objects
In visual surveillance systems, it is necessary to recognize the behavior of
people handling objects such as a phone, a cup, or a plastic bag. In this
paper, to address this problem, we propose a new framework for recognizing
object-related human actions by graph convolutional networks using human and
object poses. In this framework, we construct skeletal graphs of reliable human
poses by selectively sampling the informative frames in a video, which include
human joints with high confidence scores obtained in pose estimation. The
skeletal graphs generated from the sampled frames represent human poses related
to the object position in both the spatial and temporal domains, and these
graphs are used as inputs to the graph convolutional networks. Through
experiments over an open benchmark and our own data sets, we verify the
validity of our framework in that our method outperforms the state-of-the-art
method for skeleton-based action recognition.Comment: Accepted in WACV 201
Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition
Variations of human body skeletons may be considered as dynamic graphs, which
are generic data representation for numerous real-world applications. In this
paper, we propose a spatio-temporal graph convolution (STGC) approach for
assembling the successes of local convolutional filtering and sequence learning
ability of autoregressive moving average. To encode dynamic graphs, the
constructed multi-scale local graph convolution filters, consisting of matrices
of local receptive fields and signal mappings, are recursively performed on
structured graph data of temporal and spatial domain. The proposed model is
generic and principled as it can be generalized into other dynamic models. We
theoretically prove the stability of STGC and provide an upper-bound of the
signal transformation to be learnt. Further, the proposed recursive model can
be stacked into a multi-layer architecture. To evaluate our model, we conduct
extensive experiments on four benchmark skeleton-based action datasets,
including the large-scale challenging NTU RGB+D. The experimental results
demonstrate the effectiveness of our proposed model and the improvement over
the state-of-the-art.Comment: Accepted by AAAI 201
EleAtt-RNN: Adding Attentiveness to Neurons in Recurrent Neural Networks
Recurrent neural networks (RNNs) are capable of modeling temporal
dependencies of complex sequential data. In general, current available
structures of RNNs tend to concentrate on controlling the contributions of
current and previous information. However, the exploration of different
importance levels of different elements within an input vector is always
ignored. We propose a simple yet effective Element-wise-Attention Gate
(EleAttG), which can be easily added to an RNN block (e.g. all RNN neurons in
an RNN layer), to empower the RNN neurons to have attentiveness capability. For
an RNN block, an EleAttG is used for adaptively modulating the input by
assigning different levels of importance, i.e., attention, to each
element/dimension of the input. We refer to an RNN block equipped with an
EleAttG as an EleAtt-RNN block. Instead of modulating the input as a whole, the
EleAttG modulates the input at fine granularity, i.e., element-wise, and the
modulation is content adaptive. The proposed EleAttG, as an additional
fundamental unit, is general and can be applied to any RNN structures, e.g.,
standard RNN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU). We
demonstrate the effectiveness of the proposed EleAtt-RNN by applying it to
different tasks including the action recognition, from both skeleton-based data
and RGB videos, gesture recognition, and sequential MNIST classification.
Experiments show that adding attentiveness through EleAttGs to RNN blocks
significantly improves the power of RNNs.Comment: IEEE Transactions on Image Processing (Accept). arXiv admin note:
substantial text overlap with arXiv:1807.0444