4,170 research outputs found
Simple yet efficient real-time pose-based action recognition
Recognizing human actions is a core challenge for autonomous systems as they
directly share the same space with humans. Systems must be able to recognize
and assess human actions in real-time. In order to train corresponding
data-driven algorithms, a significant amount of annotated training data is
required. We demonstrated a pipeline to detect humans, estimate their pose,
track them over time and recognize their actions in real-time with standard
monocular camera sensors. For action recognition, we encode the human pose into
a new data format called Encoded Human Pose Image (EHPI) that can then be
classified using standard methods from the computer vision community. With this
simple procedure we achieve competitive state-of-the-art performance in
pose-based action detection and can ensure real-time performance. In addition,
we show a use case in the context of autonomous driving to demonstrate how such
a system can be trained to recognize human actions using simulation data.Comment: Submitted to IEEE Intelligent Transportation Systems Conference
(ITSC) 2019. Code will be available soon at
https://github.com/noboevbo/ehpi_action_recognitio
LEARNet Dynamic Imaging Network for Micro Expression Recognition
Unlike prevalent facial expressions, micro expressions have subtle,
involuntary muscle movements which are short-lived in nature. These minute
muscle movements reflect true emotions of a person. Due to the short duration
and low intensity, these micro-expressions are very difficult to perceive and
interpret correctly. In this paper, we propose the dynamic representation of
micro-expressions to preserve facial movement information of a video in a
single frame. We also propose a Lateral Accretive Hybrid Network (LEARNet) to
capture micro-level features of an expression in the facial region. The LEARNet
refines the salient expression features in accretive manner by incorporating
accretion layers (AL) in the network. The response of the AL holds the hybrid
feature maps generated by prior laterally connected convolution layers.
Moreover, LEARNet architecture incorporates the cross decoupled relationship
between convolution layers which helps in preserving the tiny but influential
facial muscle change information. The visual responses of the proposed LEARNet
depict the effectiveness of the system by preserving both high- and micro-level
edge features of facial expression. The effectiveness of the proposed LEARNet
is evaluated on four benchmark datasets: CASME-I, CASME-II, CAS(ME)^2 and SMIC.
The experimental results after investigation show a significant improvement of
4.03%, 1.90%, 1.79% and 2.82% as compared with ResNet on CASME-I, CASME-II,
CAS(ME)^2 and SMIC datasets respectively.Comment: Dynamic imaging, accretion, lateral, micro expression recognitio
Gait Recognition from Motion Capture Data
Gait recognition from motion capture data, as a pattern classification
discipline, can be improved by the use of machine learning. This paper
contributes to the state-of-the-art with a statistical approach for extracting
robust gait features directly from raw data by a modification of Linear
Discriminant Analysis with Maximum Margin Criterion. Experiments on the CMU
MoCap database show that the suggested method outperforms thirteen relevant
methods based on geometric features and a method to learn the features by a
combination of Principal Component Analysis and Linear Discriminant Analysis.
The methods are evaluated in terms of the distribution of biometric templates
in respective feature spaces expressed in a number of class separability
coefficients and classification metrics. Results also indicate a high
portability of learned features, that means, we can learn what aspects of walk
people generally differ in and extract those as general gait features.
Recognizing people without needing group-specific features is convenient as
particular people might not always provide annotated learning data. As a
contribution to reproducible research, our evaluation framework and database
have been made publicly available. This research makes motion capture
technology directly applicable for human recognition.Comment: Preprint. Full paper accepted at the ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), special issue on
Representation, Analysis and Recognition of 3D Humans. 18 pages. arXiv admin
note: substantial text overlap with arXiv:1701.00995, arXiv:1609.04392,
arXiv:1609.0693
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
- …