3 research outputs found
Simple yet efficient real-time pose-based action recognition
Recognizing human actions is a core challenge for autonomous systems as they
directly share the same space with humans. Systems must be able to recognize
and assess human actions in real-time. In order to train corresponding
data-driven algorithms, a significant amount of annotated training data is
required. We demonstrated a pipeline to detect humans, estimate their pose,
track them over time and recognize their actions in real-time with standard
monocular camera sensors. For action recognition, we encode the human pose into
a new data format called Encoded Human Pose Image (EHPI) that can then be
classified using standard methods from the computer vision community. With this
simple procedure we achieve competitive state-of-the-art performance in
pose-based action detection and can ensure real-time performance. In addition,
we show a use case in the context of autonomous driving to demonstrate how such
a system can be trained to recognize human actions using simulation data.Comment: Submitted to IEEE Intelligent Transportation Systems Conference
(ITSC) 2019. Code will be available soon at
https://github.com/noboevbo/ehpi_action_recognitio
Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
We present an autoencoder-based semi-supervised approach to classify
perceived human emotions from walking styles obtained from videos or
motion-captured data and represented as sequences of 3D poses. Given the motion
on each joint in the pose at each time step extracted from 3D pose sequences,
we hierarchically pool these joint motions in a bottom-up manner in the
encoder, following the kinematic chains in the human body. We also constrain
the latent embeddings of the encoder to contain the space of
psychologically-motivated affective features underlying the gaits. We train the
decoder to reconstruct the motions per joint per time step in a top-down manner
from the latent embeddings. For the annotated data, we also train a classifier
to map the latent embeddings to emotion labels. Our semi-supervised approach
achieves a mean average precision of 0.84 on the Emotion-Gait benchmark
dataset, which contains both labeled and unlabeled gaits collected from
multiple sources. We outperform current state-of-art algorithms for both
emotion recognition and action recognition from 3D gaits by 7%--23% on the
absolute. More importantly, we improve the average precision by 10%--50% on the
absolute on classes that each makes up less than 25% of the labeled part of the
Emotion-Gait benchmark dataset.Comment: In proceedings of the 16th European Conference on Computer Vision,
2020. Total pages 18. Total figures 5. Total tables