1,505 research outputs found
Attention module-based spatial-temporal graph convolutional networks for skeleton-based action recognition
Skeleton-based action recognition is a significant direction of human action recognition, because the skeleton contains important information for recognizing action. The spatial-temporal graph convolutional networks (ST-GCN) automatically learn both the temporal and spatial features from the skeleton data and achieve remarkable performance for skeleton-based action recognition. However, ST-GCN just learns local information on a certain neighborhood but does not capture the correlation information between all joints (i.e., global information). Therefore, we need to introduce global information into the ST-GCN. We propose a model of dynamic skeletons called attention module-based-ST-GCN, which solves these problems by adding attention module. The attention module can capture some global information, which brings stronger expressive power and generalization capability. Experimental results on two large-scale datasets, Kinetics and NTU-RGB+D, demonstrate that our model achieves significant improvements over previous representative methods. © 2019 SPIE and IS&T
Attentive multi-scale aggregation based action recognition and its application in power substation operation training
With the rapid development of the power system and increasing demand for intelligence, substation operation training has received more attention. Action recognition is a monitoring and analysis system based on computer vision and artificial intelligence technology that can automatically identify and track personnel actions in video frames. The system accurately identifies abnormal behaviors such as illegal operations and provides real-time feedback to trainers or surveillance systems. The commonly adopted strategy for action recognition is to first extract human skeletons from videos and then recognize the skeleton sequences. Although graph convolutional networks (GCN)-based skeleton-based recognition methods have achieved impressive performance, they operate in spatial dimensions and cannot accurately describe the dependence between different time intervals in the temporal dimension. Additionally, existing methods typically handle the temporal and spatial dimensions separately, lacking effective communication between them. To address these issues, we propose a skeleton-based method that aggregates convolutional information of different scales in the time dimension to form a new scale dimension. We also introduce a space-time-scale attention module that enables effective communication and weight generation between the three dimensions for prediction. Our proposed method is validated on public datasets NTU60 and NTU120, with experimental results verifying its effectiveness. For substation operation training, we built a real-time recognition system based on our proposed method. We collected over 400 videos for evaluation, including 5 categories of actions, and achieved an accuracy of over 98%
Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons
Current methods for skeleton-based human action recognition usually work with
completely observed skeletons. However, in real scenarios, it is prone to
capture incomplete and noisy skeletons, which will deteriorate the performance
of traditional models. To enhance the robustness of action recognition models
to incomplete skeletons, we propose a multi-stream graph convolutional network
(GCN) for exploring sufficient discriminative features distributed over all
skeleton joints. Here, each stream of the network is only responsible for
learning features from currently unactivated joints, which are distinguished by
the class activation maps (CAM) obtained by preceding streams, so that the
activated joints of the proposed method are obviously more than traditional
methods. Thus, the proposed method is termed richly activated GCN (RA-GCN),
where the richly discovered features will improve the robustness of the model.
Compared to the state-of-the-art methods, the RA-GCN achieves comparable
performance on the NTU RGB+D dataset. Moreover, on a synthetic occlusion
dataset, the performance deterioration can be alleviated by the RA-GCN
significantly.Comment: Accepted by ICIP 2019, 5 pages, 3 figures, 3 table
- …