34,194 research outputs found
Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
Facial micro-expression (ME) recognition has posed a huge challenge to
researchers for its subtlety in motion and limited databases. Recently,
handcrafted techniques have achieved superior performance in micro-expression
recognition but at the cost of domain specificity and cumbersome parametric
tunings. In this paper, we propose an Enriched Long-term Recurrent
Convolutional Network (ELRCN) that first encodes each micro-expression frame
into a feature vector through CNN module(s), then predicts the micro-expression
by passing the feature vector through a Long Short-term Memory (LSTM) module.
The framework contains two different network variants: (1) Channel-wise
stacking of input data for spatial enrichment, (2) Feature-wise stacking of
features for temporal enrichment. We demonstrate that the proposed approach is
able to achieve reasonably good performance, without data augmentation. In
addition, we also present ablation studies conducted on the framework and
visualizations of what CNN "sees" when predicting the micro-expression classes.Comment: Published in Micro-Expression Grand Challenge 2018, Workshop of 13th
IEEE Facial & Gesture 201
Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields
Automated Facial Expression Recognition (FER) has been a challenging task for
decades. Many of the existing works use hand-crafted features such as LBP, HOG,
LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as
Support Vector Machines for expression recognition. These methods often require
rigorous hyperparameter tuning to achieve good results. Recently Deep Neural
Networks (DNN) have shown to outperform traditional methods in visual object
recognition. In this paper, we propose a two-part network consisting of a
DNN-based architecture followed by a Conditional Random Field (CRF) module for
facial expression recognition in videos. The first part captures the spatial
relation within facial images using convolutional layers followed by three
Inception-ResNet modules and two fully-connected layers. To capture the
temporal relation between the image frames, we use linear chain CRF in the
second part of our network. We evaluate our proposed network on three publicly
available databases, viz. CK+, MMI, and FERA. Experiments are performed in
subject-independent and cross-database manners. Our experimental results show
that cascading the deep network architecture with the CRF module considerably
increases the recognition of facial expressions in videos and in particular it
outperforms the state-of-the-art methods in the cross-database experiments and
yields comparable results in the subject-independent experiments.Comment: To appear in 12th IEEE Conference on Automatic Face and Gesture
Recognition Worksho
Region Based Adversarial Synthesis of Facial Action Units
Facial expression synthesis or editing has recently received increasing
attention in the field of affective computing and facial expression modeling.
However, most existing facial expression synthesis works are limited in paired
training data, low resolution, identity information damaging, and so on. To
address those limitations, this paper introduces a novel Action Unit (AU) level
facial expression synthesis method called Local Attentive Conditional
Generative Adversarial Network (LAC-GAN) based on face action units
annotations. Given desired AU labels, LAC-GAN utilizes local AU regional rules
to control the status of each AU and attentive mechanism to combine several of
them into the whole photo-realistic facial expressions or arbitrary facial
expressions. In addition, unpaired training data is utilized in our proposed
method to train the manipulation module with the corresponding AU labels, which
learns a mapping between a facial expression manifold. Extensive qualitative
and quantitative evaluations are conducted on the commonly used BP4D dataset to
verify the effectiveness of our proposed AU synthesis method.Comment: Accepted by MMM202
GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Although existing speech-driven talking face generation methods achieve
significant progress, they are far from real-world application due to the
avatar-specific training demand and unstable lip movements. To address the
above issues, we propose the GSmoothFace, a novel two-stage generalized talking
face generation model guided by a fine-grained 3d face model, which can
synthesize smooth lip dynamics while preserving the speaker's identity. Our
proposed GSmoothFace model mainly consists of the Audio to Expression
Prediction (A2EP) module and the Target Adaptive Face Translation (TAFT)
module. Specifically, we first develop the A2EP module to predict expression
parameters synchronized with the driven speech. It uses a transformer to
capture the long-term audio context and learns the parameters from the
fine-grained 3D facial vertices, resulting in accurate and smooth
lip-synchronization performance. Afterward, the well-designed TAFT module,
empowered by Morphology Augmented Face Blending (MAFB), takes the predicted
expression parameters and target video as inputs to modify the facial region of
the target video without distorting the background content. The TAFT
effectively exploits the identity appearance and background context in the
target video, which makes it possible to generalize to different speakers
without retraining. Both quantitative and qualitative experiments confirm the
superiority of our method in terms of realism, lip synchronization, and visual
quality. See the project page for code, data, and request pre-trained models:
https://zhanghm1995.github.io/GSmoothFace
- …