959 research outputs found
Richly Activated Graph Convolutional Network for Robust Skeleton-based Action Recognition
Current methods for skeleton-based human action recognition usually work with
complete skeletons. However, in real scenarios, it is inevitable to capture
incomplete or noisy skeletons, which could significantly deteriorate the
performance of current methods when some informative joints are occluded or
disturbed. To improve the robustness of action recognition models, a
multi-stream graph convolutional network (GCN) is proposed to explore
sufficient discriminative features spreading over all skeleton joints, so that
the distributed redundant representation reduces the sensitivity of the action
models to non-standard skeletons. Concretely, the backbone GCN is extended by a
series of ordered streams which is responsible for learning discriminative
features from the joints less activated by preceding streams. Here, the
activation degrees of skeleton joints of each GCN stream are measured by the
class activation maps (CAM), and only the information from the unactivated
joints will be passed to the next stream, by which rich features over all
active joints are obtained. Thus, the proposed method is termed richly
activated GCN (RA-GCN). Compared to the state-of-the-art (SOTA) methods, the
RA-GCN achieves comparable performance on the standard NTU RGB+D 60 and 120
datasets. More crucially, on the synthetic occlusion and jittering datasets,
the performance deterioration due to the occluded and disturbed joints can be
significantly alleviated by utilizing the proposed RA-GCN.Comment: Accepted by IEEE T-CSVT, 11 pages, 6 figures, 10 table
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn
This paper presents an image classification based approach for skeleton-based
video action recognition problem. Firstly, A dataset independent
translation-scale invariant image mapping method is proposed, which transformes
the skeleton videos to colour images, named skeleton-images. Secondly, A
multi-scale deep convolutional neural network (CNN) architecture is proposed
which could be built and fine-tuned on the powerful pre-trained CNNs, e.g.,
AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very
different from natural images, the fine-tune strategy still works well. At
last, we prove that our method could also work well on 2D skeleton video data.
We achieve the state-of-the-art results on the popular benchmard datasets e.g.
NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge
NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods
by a large margion, which proves the efficacy of the proposed method
- …