15 research outputs found
SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
International audienceDue to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have fo-cused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of each frame is represented as rows of a matrix. To further improve such representations, we introduce a novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion. The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Different temporal scales are employed to compute motion values to aggregate more temporal dynamics to the representation making it able to capture long-range joint interactions involved in actions as well as filtering noisy motion values. Experimental results demonstrate the effectiveness of the proposed representation on 3D action recognition outperforming the state-of-the-art on NTU RGB+D 120 dataset
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Richly Activated Graph Convolutional Network for Robust Skeleton-based Action Recognition
Current methods for skeleton-based human action recognition usually work with
complete skeletons. However, in real scenarios, it is inevitable to capture
incomplete or noisy skeletons, which could significantly deteriorate the
performance of current methods when some informative joints are occluded or
disturbed. To improve the robustness of action recognition models, a
multi-stream graph convolutional network (GCN) is proposed to explore
sufficient discriminative features spreading over all skeleton joints, so that
the distributed redundant representation reduces the sensitivity of the action
models to non-standard skeletons. Concretely, the backbone GCN is extended by a
series of ordered streams which is responsible for learning discriminative
features from the joints less activated by preceding streams. Here, the
activation degrees of skeleton joints of each GCN stream are measured by the
class activation maps (CAM), and only the information from the unactivated
joints will be passed to the next stream, by which rich features over all
active joints are obtained. Thus, the proposed method is termed richly
activated GCN (RA-GCN). Compared to the state-of-the-art (SOTA) methods, the
RA-GCN achieves comparable performance on the standard NTU RGB+D 60 and 120
datasets. More crucially, on the synthetic occlusion and jittering datasets,
the performance deterioration due to the occluded and disturbed joints can be
significantly alleviated by utilizing the proposed RA-GCN.Comment: Accepted by IEEE T-CSVT, 11 pages, 6 figures, 10 table
SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
International audienceDue to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have fo-cused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of each frame is represented as rows of a matrix. To further improve such representations, we introduce a novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion. The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Different temporal scales are employed to compute motion values to aggregate more temporal dynamics to the representation making it able to capture long-range joint interactions involved in actions as well as filtering noisy motion values. Experimental results demonstrate the effectiveness of the proposed representation on 3D action recognition outperforming the state-of-the-art on NTU RGB+D 120 dataset