14 research outputs found
Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn
This paper presents an image classification based approach for skeleton-based
video action recognition problem. Firstly, A dataset independent
translation-scale invariant image mapping method is proposed, which transformes
the skeleton videos to colour images, named skeleton-images. Secondly, A
multi-scale deep convolutional neural network (CNN) architecture is proposed
which could be built and fine-tuned on the powerful pre-trained CNNs, e.g.,
AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very
different from natural images, the fine-tune strategy still works well. At
last, we prove that our method could also work well on 2D skeleton video data.
We achieve the state-of-the-art results on the popular benchmard datasets e.g.
NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge
NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods
by a large margion, which proves the efficacy of the proposed method
Richly Activated Graph Convolutional Network for Robust Skeleton-based Action Recognition
Current methods for skeleton-based human action recognition usually work with
complete skeletons. However, in real scenarios, it is inevitable to capture
incomplete or noisy skeletons, which could significantly deteriorate the
performance of current methods when some informative joints are occluded or
disturbed. To improve the robustness of action recognition models, a
multi-stream graph convolutional network (GCN) is proposed to explore
sufficient discriminative features spreading over all skeleton joints, so that
the distributed redundant representation reduces the sensitivity of the action
models to non-standard skeletons. Concretely, the backbone GCN is extended by a
series of ordered streams which is responsible for learning discriminative
features from the joints less activated by preceding streams. Here, the
activation degrees of skeleton joints of each GCN stream are measured by the
class activation maps (CAM), and only the information from the unactivated
joints will be passed to the next stream, by which rich features over all
active joints are obtained. Thus, the proposed method is termed richly
activated GCN (RA-GCN). Compared to the state-of-the-art (SOTA) methods, the
RA-GCN achieves comparable performance on the standard NTU RGB+D 60 and 120
datasets. More crucially, on the synthetic occlusion and jittering datasets,
the performance deterioration due to the occluded and disturbed joints can be
significantly alleviated by utilizing the proposed RA-GCN.Comment: Accepted by IEEE T-CSVT, 11 pages, 6 figures, 10 table