1 research outputs found
Learning Coupled Spatial-temporal Attention for Skeleton-based Action Recognition
In this paper, we propose a coupled spatial-temporal attention (CSTA) model
for skeleton-based action recognition, which aims to figure out the most
discriminative joints and frames in spatial and temporal domains
simultaneously. Conventional approaches usually consider all the joints or
frames in a skeletal sequence equally important, which are unrobust to
ambiguous and redundant information. To address this, we first learn two sets
of weights for different joints and frames through two subnetworks
respectively, which enable the model to have the ability of "paying attention
to" the relatively informative section. Then, we calculate the cross product
based on the weights of joints and frames for the coupled spatial-temporal
attention. Moreover, our CSTA mechanisms can be easily plugged into existing
hierarchical CNN models (CSTA-CNN) to realize their function. Extensive
experimental results on the recently collected UESTC dataset and the currently
largest NTU dataset have shown the effectiveness of our proposed method for
skeleton-based action recognition