7 research outputs found
DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization
Facial action unit recognition has many applications from market research to
psychotherapy and from image captioning to entertainment. Despite its recent
progress, deployment of these models has been impeded due to their limited
generalization to unseen people and demographics. This work conducts an
in-depth analysis of performance across several dimensions: individuals(40
subjects), genders (male and female), skin types (darker and lighter), and
databases (BP4D and DISFA). To help suppress the variance in data, we use the
notion of self-supervised denoising autoencoders to design a method for deep
face normalization(DeepFN) that transfers facial expressions of different
people onto a common facial template which is then used to train and evaluate
facial action recognition models. We show that person-independent models yield
significantly lower performance (55% average F1 and accuracy across 40
subjects) than person-dependent models (60.3%), leading to a generalization gap
of 5.3%. However, normalizing the data with the newly introduced DeepFN
significantly increased the performance of person-independent models (59.6%),
effectively reducing the gap. Similarly, we observed generalization gaps when
considering gender (2.4%), skin type (5.3%), and dataset (9.4%), which were
significantly reduced with the use of DeepFN. These findings represent an
important step towards the creation of more generalizable facial action unit
recognition systems
Self-supervised Facial Action Unit Detection with Region and Relation Learning
Facial action unit (AU) detection is a challenging task due to the scarcity
of manual annotations. Recent works on AU detection with self-supervised
learning have emerged to address this problem, aiming to learn meaningful AU
representations from numerous unlabeled data. However, most existing AU
detection works with self-supervised learning utilize global facial features
only, while AU-related properties such as locality and relevance are not fully
explored. In this paper, we propose a novel self-supervised framework for AU
detection with the region and relation learning. In particular, AU related
attention map is utilized to guide the model to focus more on AU-specific
regions to enhance the integrity of AU local features. Meanwhile, an improved
Optimal Transport (OT) algorithm is introduced to exploit the correlation
characteristics among AUs. In addition, Swin Transformer is exploited to model
the long-distance dependencies within each AU region during feature learning.
The evaluation results on BP4D and DISFA demonstrate that our proposed method
is comparable or even superior to the state-of-the-art self-supervised learning
methods and supervised AU detection methods.Comment: Accepted by ICASSP 202
Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection
Spatio-temporal relations among facial action units (AUs) convey significant
information for AU detection yet have not been thoroughly exploited. The main
reasons are the limited capability of current AU detection works in
simultaneously learning spatial and temporal relations, and the lack of precise
localization information for AU feature learning. To tackle these limitations,
we propose a novel spatio-temporal relation and attention learning framework
for AU detection. Specifically, we introduce a spatio-temporal graph
convolutional network to capture both spatial and temporal relations from
dynamic AUs, in which the AU relations are formulated as a spatio-temporal
graph with adaptively learned instead of predefined edge weights. Moreover, the
learning of spatio-temporal relations among AUs requires individual AU
features. Considering the dynamism and shape irregularity of AUs, we propose an
attention regularization method to adaptively learn regional attentions that
capture highly relevant regions and suppress irrelevant regions so as to
extract a complete feature for each AU. Extensive experiments show that our
approach achieves substantial improvements over the state-of-the-art AU
detection methods on BP4D and especially DISFA benchmarks