3 research outputs found
Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition
In this paper, we focus on unsupervised representation learning for
skeleton-based action recognition. Existing approaches usually learn action
representations by sequential prediction but they suffer from the inability to
fully learn semantic information. To address this limitation, we propose a
novel framework named Prototypical Contrast and Reverse Prediction (PCRP),
which not only creates reverse sequential prediction to learn low-level
information (e.g., body posture at every frame) and high-level pattern (e.g.,
motion order), but also devises action prototypes to implicitly encode semantic
similarity shared among sequences. In general, we regard action prototypes as
latent variables and formulate PCRP as an expectation-maximization task.
Specifically, PCRP iteratively runs (1) E-step as determining the distribution
of prototypes by clustering action encoding from the encoder, and (2) M-step as
optimizing the encoder by minimizing the proposed ProtoMAE loss, which helps
simultaneously pull the action encoding closer to its assigned prototype and
perform reverse prediction task. Extensive experiments on N-UCLA, NTU 60, and
NTU 120 dataset present that PCRP outperforms state-of-the-art unsupervised
methods and even achieves superior performance over some of supervised methods.
Codes are available at https://github.com/Mikexu007/PCRP.Comment: Codes are available at https://github.com/Mikexu007/PCR
A Survey on 3D Skeleton-Based Action Recognition Using Learning Method
3D skeleton-based action recognition, owing to the latent advantages of
skeleton, has been an active topic in computer vision. As a consequence, there
are lots of impressive works including conventional handcraft feature based and
learned feature based have been done over the years. However, previous surveys
about action recognition mostly focus on the video or RGB data dominated
methods, and the scanty existing reviews related to skeleton data mainly
indicate the representation of skeleton data or performance of some classic
techniques on a certain dataset. Besides, though deep learning methods has been
applied to this field for years, there is no related reserach concern about an
introduction or review from the perspective of deep learning architectures. To
break those limitations, this survey firstly highlight the necessity of action
recognition and the significance of 3D-skeleton data. Then a comprehensive
introduction about Recurrent Neural Network(RNN)-based, Convolutional Neural
Network(CNN)-based and Graph Convolutional Network(GCN)-based main stream
action recognition techniques are illustrated in a data-driven manner. Finally,
we give a brief talk about the biggest 3D skeleton dataset NTU-RGB+D and its
new edition called NTU-RGB+D 120, accompanied with several existing top rank
algorithms within those two datasets. To our best knowledge, this is the first
research which give an overall discussion over deep learning-based action
recognitin using 3D skeleton data.Comment: 8 pages, 6 figure
Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition
Action recognition via 3D skeleton data is an emerging important topic in
these years. Most existing methods either extract hand-crafted descriptors or
learn action representations by supervised learning paradigms that require
massive labeled data. In this paper, we for the first time propose a
contrastive action learning paradigm named AS-CAL that can leverage different
augmentations of unlabeled skeleton data to learn action representations in an
unsupervised manner. Specifically, we first propose to contrast similarity
between augmented instances (query and key) of the input skeleton sequence,
which are transformed by multiple novel augmentation strategies, to learn
inherent action patterns ("pattern-invariance") of different skeleton
transformations. Second, to encourage learning the pattern-invariance with more
consistent action representations, we propose a momentum LSTM, which is
implemented as the momentum-based moving average of LSTM based query encoder,
to encode long-term action dynamics of the key sequence. Third, we introduce a
queue to store the encoded keys, which allows our model to flexibly reuse
proceeding keys and build a more consistent dictionary to improve contrastive
learning. Last, by temporally averaging the hidden states of action learned by
the query encoder, a novel representation named Contrastive Action Encoding
(CAE) is proposed to represent human's action effectively. Extensive
experiments show that our approach typically improves existing hand-crafted
methods by 10-50% top-1 accuracy, and it can achieve comparable or even
superior performance to numerous supervised learning methods.Comment: Accepted by Information Sciences. Our codes are available at
https://github.com/Mikexu007/AS-CA