335,386 research outputs found
Where to Focus on for Human Action Recognition?
International audienceIn this paper, we present a new attention model for the recognition of human action from RGB-D videos. We propose an attention mechanism based on 3D articulated pose. The objective is to focus on the most relevant body parts involved in the action. For action classification, we propose a classification network compounded of spatio-temporal sub-networks modeling the appearance of human body parts and RNN attention subnetwork implementing our attention mechanism. Furthermore, we train our proposed network end-to-end using a regularized cross-entropy loss, leading to a joint training of the RNN delivering attention globally to the whole set of spatio-temporal features, extracted from 3D ConvNets. Our method outperforms the State-of-the-art methods on the largest human activity recognition dataset available to-date (NTU RGB+D Dataset) which is also multi-views and on a human action recognition dataset with object interaction (Northwestern-UCLA Multiview Action 3D Dataset)
Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition
Graph convolutional networks have been widely used in skeleton-based action
recognition. However, existing approaches are limited in fine-grained action
recognition due to the similarity of inter-class data. Moreover, the noisy data
from pose extraction increases the challenge of fine-grained recognition. In
this work, we propose a flexible attention block called Channel-Variable
Spatial-Temporal Attention (CVSTA) to enhance the discriminative power of
spatial-temporal joints and obtain a more compact intra-class feature
distribution. Based on CVSTA, we construct a Multi-Dimensional Refinement Graph
Convolutional Network (MDR-GCN), which can improve the discrimination among
channel-, joint- and frame-level features for fine-grained actions.
Furthermore, we propose a Robust Decouple Loss (RDL), which significantly
boosts the effect of the CVSTA and reduces the impact of noise. The proposed
method combining MDR-GCN with RDL outperforms the known state-of-the-art
skeleton-based approaches on fine-grained datasets, FineGym99 and FSD-10, and
also on the coarse dataset NTU-RGB+D X-view version
Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching
Human action recognition from skeleton data, fueled by the Graph
Convolutional Network (GCN), has attracted lots of attention, due to its
powerful capability of modeling non-Euclidean structure data. However, many
existing GCN methods provide a pre-defined graph and fix it through the entire
network, which can loss implicit joint correlations. Besides, the mainstream
spectral GCN is approximated by one-order hop, thus higher-order connections
are not well involved. Therefore, huge efforts are required to explore a better
GCN architecture. To address these problems, we turn to Neural Architecture
Search (NAS) and propose the first automatically designed GCN for
skeleton-based action recognition. Specifically, we enrich the search space by
providing multiple dynamic graph modules after fully exploring the
spatial-temporal correlations between nodes. Besides, we introduce multiple-hop
modules and expect to break the limitation of representational capacity caused
by one-order approximation. Moreover, a sampling- and memory-efficient
evolution strategy is proposed to search an optimal architecture for this task.
The resulted architecture proves the effectiveness of the higher-order
approximation and the dynamic graph modeling mechanism with temporal
interactions, which is barely discussed before. To evaluate the performance of
the searched model, we conduct extensive experiments on two very large scaled
datasets and the results show that our model gets the state-of-the-art results.Comment: Accepted by AAAI202
Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks
Human action recognition in 3D skeleton sequences has attracted a lot of
research attention. Recently, Long Short-Term Memory (LSTM) networks have shown
promising performance in this task due to their strengths in modeling the
dependencies and dynamics in sequential data. As not all skeletal joints are
informative for action recognition, and the irrelevant joints often bring noise
which can degrade the performance, we need to pay more attention to the
informative ones. However, the original LSTM network does not have explicit
attention ability. In this paper, we propose a new class of LSTM network,
Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action
recognition. This network is capable of selectively focusing on the informative
joints in each frame of each skeleton sequence by using a global context memory
cell. To further improve the attention capability of our network, we also
introduce a recurrent attention mechanism, with which the attention performance
of the network can be enhanced progressively. Moreover, we propose a stepwise
training scheme in order to train our network effectively. Our approach
achieves state-of-the-art performance on five challenging benchmark datasets
for skeleton based action recognition
- …