Search CORE

3 research outputs found

SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching

Author: Guo Zihui
Hou Yonghong
Li Wanqing
Wang Pichao
Zhang Haoyuan
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents a study of automatic design of neural network architectures for skeleton-based action recognition. Specifically, we encode a skeleton-based action instance into a tensor and carefully define a set of operations to build two types of network cells: normal cells and reduction cells. The recently developed DARTS (Differentiable Architecture Search) is adopted to search for an effective network architecture that is built upon the two types of cells. All operations are 2D based in order to reduce the overall computation and search space. Experiments on the challenging NTU RGB+D and Kinectics datasets have verified that most of the networks developed to date for skeleton-based action recognition are likely not compact and efficient. The proposed method provides an approach to search for such a compact network that is able to achieve comparative or even better performance than the state-of-the-art methods

arXiv.org e-Print Archive

Research Online

Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition

Author: Liu Gaowen
Ngu Anne H. H.
Ni Jianyuan
Tang Hao
Yan Yan
Publication venue
Publication date: 07/07/2023
Field of study

Wearable sensor-based Human Action Recognition (HAR) has made significant strides in recent times. However, the accuracy performance of wearable sensor-based HAR is currently still lagging behind that of visual modalities-based systems, such as RGB video and depth data. Although diverse input modalities can provide complementary cues and improve the accuracy performance of HAR, wearable devices can only capture limited kinds of non-visual time series input, such as accelerometers and gyroscopes. This limitation hinders the deployment of multimodal simultaneously using visual and non-visual modality data in parallel on current wearable devices. To address this issue, we propose a novel Physical-aware Cross-modal Adversarial (PCA) framework that utilizes only time-series accelerometer data from four inertial sensors for the wearable sensor-based HAR problem. Specifically, we propose an effective IMU2SKELETON network to produce corresponding synthetic skeleton joints from accelerometer data. Subsequently, we imposed additional constraints on the synthetic skeleton data from a physical perspective, as accelerometer data can be regarded as the second derivative of the skeleton sequence coordinates. After that, the original accelerometer as well as the constrained skeleton sequence were fused together to make the final classification. In this way, when individuals wear wearable devices, the devices can not only capture accelerometer data, but can also generate synthetic skeleton sequences for real-time wearable sensor-based HAR applications that need to be conducted anytime and anywhere. To demonstrate the effectiveness of our proposed PCA framework, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PCA approach has competitive performance compared to the previous methods on the mono sensor-based HAR classification problem.Comment: First IMU2SKELETON GANs approach for wearable HAR problem. arXiv admin note: text overlap with arXiv:2208.0809

arXiv.org e-Print Archive

Self-attention guided deep features for action recognition

Author: Guo Zihui
Hou Yonghong
Li Chuankun
Li Wanqing
Wang Pichao
Xiao Renyi
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2019
Field of study

Skeleton based human action recognition is an important task in computer vision. However, it is very challenging due to the complex spatio-temporal variations of skeleton joints. In this work, we propose an end-to-end trainable network consisting of a Deep Convolutional Model (DCM) and a Self-Attention Model (SAM) for human action recognition from skeleton data. Specifically, skeleton sequences are encoded into color images and fed into DCM to extract deep features. In the SAM, handcrafted features representing the motion degree of joints are extracted and the attention weights are learned by a simple yet effective linear mapping. The effectiveness of proposed method has been verified on NTU RGB+D, SYSU-3D and UTD-MHAD datasets and achieved state-of-the-art results

Crossref

Research Online