3 research outputs found
SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching
This paper presents a study of automatic design of neural network
architectures for skeleton-based action recognition. Specifically, we encode a
skeleton-based action instance into a tensor and carefully define a set of
operations to build two types of network cells: normal cells and reduction
cells. The recently developed DARTS (Differentiable Architecture Search) is
adopted to search for an effective network architecture that is built upon the
two types of cells. All operations are 2D based in order to reduce the overall
computation and search space. Experiments on the challenging NTU RGB+D and
Kinectics datasets have verified that most of the networks developed to date
for skeleton-based action recognition are likely not compact and efficient. The
proposed method provides an approach to search for such a compact network that
is able to achieve comparative or even better performance than the
state-of-the-art methods
Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition
Wearable sensor-based Human Action Recognition (HAR) has made significant
strides in recent times. However, the accuracy performance of wearable
sensor-based HAR is currently still lagging behind that of visual
modalities-based systems, such as RGB video and depth data. Although diverse
input modalities can provide complementary cues and improve the accuracy
performance of HAR, wearable devices can only capture limited kinds of
non-visual time series input, such as accelerometers and gyroscopes. This
limitation hinders the deployment of multimodal simultaneously using visual and
non-visual modality data in parallel on current wearable devices. To address
this issue, we propose a novel Physical-aware Cross-modal Adversarial (PCA)
framework that utilizes only time-series accelerometer data from four inertial
sensors for the wearable sensor-based HAR problem. Specifically, we propose an
effective IMU2SKELETON network to produce corresponding synthetic skeleton
joints from accelerometer data. Subsequently, we imposed additional constraints
on the synthetic skeleton data from a physical perspective, as accelerometer
data can be regarded as the second derivative of the skeleton sequence
coordinates. After that, the original accelerometer as well as the constrained
skeleton sequence were fused together to make the final classification. In this
way, when individuals wear wearable devices, the devices can not only capture
accelerometer data, but can also generate synthetic skeleton sequences for
real-time wearable sensor-based HAR applications that need to be conducted
anytime and anywhere. To demonstrate the effectiveness of our proposed PCA
framework, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and
MMAct datasets. The results confirm that the proposed PCA approach has
competitive performance compared to the previous methods on the mono
sensor-based HAR classification problem.Comment: First IMU2SKELETON GANs approach for wearable HAR problem. arXiv
admin note: text overlap with arXiv:2208.0809
Self-attention guided deep features for action recognition
Skeleton based human action recognition is an important task in computer vision. However, it is very challenging due to the complex spatio-temporal variations of skeleton joints. In this work, we propose an end-to-end trainable network consisting of a Deep Convolutional Model (DCM) and a Self-Attention Model (SAM) for human action recognition from skeleton data. Specifically, skeleton sequences are encoded into color images and fed into DCM to extract deep features. In the SAM, handcrafted features representing the motion degree of joints are extracted and the attention weights are learned by a simple yet effective linear mapping. The effectiveness of proposed method has been verified on NTU RGB+D, SYSU-3D and UTD-MHAD datasets and achieved state-of-the-art results