9,202 research outputs found
Facial Action Unit Detection Using Attention and Relation Learning
Attention mechanism has recently attracted increasing attentions in the field
of facial action unit (AU) detection. By finding the region of interest of each
AU with the attention mechanism, AU-related local features can be captured.
Most of the existing attention based AU detection works use prior knowledge to
predefine fixed attentions or refine the predefined attentions within a small
range, which limits their capacity to model various AUs. In this paper, we
propose an end-to-end deep learning based attention and relation learning
framework for AU detection with only AU labels, which has not been explored
before. In particular, multi-scale features shared by each AU are learned
firstly, and then both channel-wise and spatial attentions are adaptively
learned to select and extract AU-related local features. Moreover, pixel-level
relations for AUs are further captured to refine spatial attentions so as to
extract more relevant local features. Without changing the network
architecture, our framework can be easily extended for AU intensity estimation.
Extensive experiments show that our framework (i) soundly outperforms the
state-of-the-art methods for both AU detection and AU intensity estimation on
the challenging BP4D, DISFA, FERA 2015 and BP4D+ benchmarks, (ii) can
adaptively capture the correlated regions of each AU, and (iii) also works well
under severe occlusions and large poses.Comment: This paper is accepted by IEEE Transactions on Affective Computin
Time-delay neural network for continuous emotional dimension prediction from facial expression sequences
"(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works."Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a Time-Delay Neural Network (TDNN) is proposed to model the temporal relationships between
consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic
facial expressions. The proposed approach has won the affect recognition sub-challenge of the third international Audio/Visual Emotion Recognition Challenge (AVEC2013)1
MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection
The Facial Action Coding System (FACS) encodes the action units (AUs) in
facial images, which has attracted extensive research attention due to its wide
use in facial expression analysis. Many methods that perform well on automatic
facial action unit (AU) detection primarily focus on modeling various types of
AU relations between corresponding local muscle areas, or simply mining global
attention-aware facial features, however, neglect the dynamic interactions
among local-global features. We argue that encoding AU features just from one
perspective may not capture the rich contextual information between regional
and global face features, as well as the detailed variability across AUs,
because of the diversity in expression and individual characteristics. In this
paper, we propose a novel Multi-level Graph Relational Reasoning Network
(termed MGRR-Net) for facial AU detection. Each layer of MGRR-Net performs a
multi-level (i.e., region-level, pixel-wise and channel-wise level) feature
learning. While the region-level feature learning from local face patches
features via graph neural network can encode the correlation across different
AUs, the pixel-wise and channel-wise feature learning via graph attention
network can enhance the discrimination ability of AU features from global face
features. The fused features from the three levels lead to improved AU
discriminative ability. Extensive experiments on DISFA and BP4D AU datasets
show that the proposed approach achieves superior performance than the
state-of-the-art methods.Comment: 10 pages, 4 figures, 8 tables; submitted to IEEE TMM for possible
publication. Copyright may be transferred without notice, after which this
version may no longer be accessibl
- …