107 research outputs found
Facial Action Unit Detection Using Attention and Relation Learning
Attention mechanism has recently attracted increasing attentions in the field
of facial action unit (AU) detection. By finding the region of interest of each
AU with the attention mechanism, AU-related local features can be captured.
Most of the existing attention based AU detection works use prior knowledge to
predefine fixed attentions or refine the predefined attentions within a small
range, which limits their capacity to model various AUs. In this paper, we
propose an end-to-end deep learning based attention and relation learning
framework for AU detection with only AU labels, which has not been explored
before. In particular, multi-scale features shared by each AU are learned
firstly, and then both channel-wise and spatial attentions are adaptively
learned to select and extract AU-related local features. Moreover, pixel-level
relations for AUs are further captured to refine spatial attentions so as to
extract more relevant local features. Without changing the network
architecture, our framework can be easily extended for AU intensity estimation.
Extensive experiments show that our framework (i) soundly outperforms the
state-of-the-art methods for both AU detection and AU intensity estimation on
the challenging BP4D, DISFA, FERA 2015 and BP4D+ benchmarks, (ii) can
adaptively capture the correlated regions of each AU, and (iii) also works well
under severe occlusions and large poses.Comment: This paper is accepted by IEEE Transactions on Affective Computin
Fine-Grained Expression Manipulation via Structured Latent Space
Fine-grained facial expression manipulation is a challenging problem, as
fine-grained expression details are difficult to be captured. Most existing
expression manipulation methods resort to discrete expression labels, which
mainly edit global expressions and ignore the manipulation of fine details. To
tackle this limitation, we propose an end-to-end expression-guided generative
adversarial network (EGGAN), which utilizes structured latent codes and
continuous expression labels as input to generate images with expected
expressions. Specifically, we adopt an adversarial autoencoder to map a source
image into a structured latent space. Then, given the source latent code and
the target expression label, we employ a conditional GAN to generate a new
image with the target expression. Moreover, we introduce a perceptual loss and
a multi-scale structural similarity loss to preserve identity and global shape
during generation. Extensive experiments show that our method can manipulate
fine-grained expressions, and generate continuous intermediate expressions
between source and target expressions
- …