7,207 research outputs found
Graph-based Facial Affect Analysis: A Review of Methods, Applications and Challenges
Facial affect analysis (FAA) using visual signals is important in
human-computer interaction. Early methods focus on extracting appearance and
geometry features associated with human affects, while ignoring the latent
semantic information among individual facial changes, leading to limited
performance and generalization. Recent work attempts to establish a graph-based
representation to model these semantic relationships and develop frameworks to
leverage them for various FAA tasks. In this paper, we provide a comprehensive
review of graph-based FAA, including the evolution of algorithms and their
applications. First, the FAA background knowledge is introduced, especially on
the role of the graph. We then discuss approaches that are widely used for
graph-based affective representation in literature and show a trend towards
graph construction. For the relational reasoning in graph-based FAA, existing
studies are categorized according to their usage of traditional methods or deep
models, with a special emphasis on the latest graph neural networks.
Performance comparisons of the state-of-the-art graph-based FAA methods are
also summarized. Finally, we discuss the challenges and potential directions.
As far as we know, this is the first survey of graph-based FAA methods. Our
findings can serve as a reference for future research in this field.Comment: 20 pages, 12 figures, 5 table
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Skeleton-Based Gesture Recognition With Learnable Paths and Signature Features
For the skeleton-based gesture recognition, graph
convolutional networks (GCNs) have achieved remarkable performance since the human skeleton is a natural graph. However,
the biological structure might not be the crucial one for motion
analysis. Also, spatial differential information like joint distance
and angle between bones may be overlooked during the graph
convolution. In this paper, we focus on obtaining meaningful joint
groups and extracting their discriminative features by the path
signature (PS) theory. Firstly, to characterize the constraints and
dependencies of various joints, we propose three types of paths,
i.e., spatial, temporal, and learnable path. Especially, a learnable
path generation mechanism can group joints together that are not
directly connected or far away, according to their kinematic characteristic. Secondly, to obtain informative and compact features,
a deep integration of PS with few parameters are introduced.
All the computational process is packed into two modules, i.e.,
spatial-temporal path signature module (ST-PSM) and learnable
path signature module (L-PSM) for the convenience of utilization.
They are plug-and-play modules available for any neural network
like CNNs and GCNs to enhance the feature extraction ability.
Extensive experiments have conducted on three mainstream
datasets (ChaLearn 2013, ChaLearn 2016, and AUTSL). We
achieved the state-of-the-art results with simpler framework and
much smaller model size. By inserting our two modules into the
several GCN-based networks, we can observe clear improvements
demonstrating the great effectiveness of our proposed method
- …