8 research outputs found
Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection
Most existing AU detection works considering AU relationships are relying on
probabilistic graphical models with manually extracted features. This paper
proposes an end-to-end deep learning framework for facial AU detection with
graph convolutional network (GCN) for AU relation modeling, which has not been
explored before. In particular, AU related regions are extracted firstly,
latent representations full of AU information are learned through an
auto-encoder. Moreover, each latent representation vector is feed into GCN as a
node, the connection mode of GCN is determined based on the relationships of
AUs. Finally, the assembled features updated through GCN are concatenated for
AU detection. Extensive experiments on BP4D and DISFA benchmarks demonstrate
that our framework significantly outperforms the state-of-the-art methods for
facial AU detection. The proposed framework is also validated through a series
of ablation studies.Comment: Accepted by MMM202
Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection
Spatio-temporal relations among facial action units (AUs) convey significant
information for AU detection yet have not been thoroughly exploited. The main
reasons are the limited capability of current AU detection works in
simultaneously learning spatial and temporal relations, and the lack of precise
localization information for AU feature learning. To tackle these limitations,
we propose a novel spatio-temporal relation and attention learning framework
for AU detection. Specifically, we introduce a spatio-temporal graph
convolutional network to capture both spatial and temporal relations from
dynamic AUs, in which the AU relations are formulated as a spatio-temporal
graph with adaptively learned instead of predefined edge weights. Moreover, the
learning of spatio-temporal relations among AUs requires individual AU
features. Considering the dynamism and shape irregularity of AUs, we propose an
attention regularization method to adaptively learn regional attentions that
capture highly relevant regions and suppress irrelevant regions so as to
extract a complete feature for each AU. Extensive experiments show that our
approach achieves substantial improvements over the state-of-the-art AU
detection methods on BP4D and especially DISFA benchmarks
Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks
Most state-of-the-art approaches for Facial Action Unit (AU) detection rely
upon evaluating facial expressions from static frames, encoding a snapshot of
heightened facial activity. In real-world interactions, however, facial
expressions are usually more subtle and evolve in a temporal manner requiring
AU detection models to learn spatial as well as temporal information. In this
paper, we focus on both spatial and spatio-temporal features encoding the
temporal evolution of facial AU activation. For this purpose, we propose the
Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU
detection using both frame and sequence-level features. While at the
frame-level the capsule layers of AULA-Caps learn spatial feature primitives to
determine AU activations, at the sequence-level, it learns temporal
dependencies between contiguous frames by focusing on relevant spatio-temporal
segments in the sequence. The learnt feature capsules are routed together such
that the model learns to selectively focus more on spatial or spatio-temporal
information depending upon the AU lifecycle. The proposed model is evaluated on
the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art
results on both the datasets.Comment: Updated Figure 6 and the Acknowledgements. Corrected typos. 11 pages,
6 figures, 3 table
Graph-based Facial Affect Analysis: A Review of Methods, Applications and Challenges
Facial affect analysis (FAA) using visual signals is important in
human-computer interaction. Early methods focus on extracting appearance and
geometry features associated with human affects, while ignoring the latent
semantic information among individual facial changes, leading to limited
performance and generalization. Recent work attempts to establish a graph-based
representation to model these semantic relationships and develop frameworks to
leverage them for various FAA tasks. In this paper, we provide a comprehensive
review of graph-based FAA, including the evolution of algorithms and their
applications. First, the FAA background knowledge is introduced, especially on
the role of the graph. We then discuss approaches that are widely used for
graph-based affective representation in literature and show a trend towards
graph construction. For the relational reasoning in graph-based FAA, existing
studies are categorized according to their usage of traditional methods or deep
models, with a special emphasis on the latest graph neural networks.
Performance comparisons of the state-of-the-art graph-based FAA methods are
also summarized. Finally, we discuss the challenges and potential directions.
As far as we know, this is the first survey of graph-based FAA methods. Our
findings can serve as a reference for future research in this field.Comment: 20 pages, 12 figures, 5 table
Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition
Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance