25,339 research outputs found
On 3D Face Reconstruction via Cascaded Regression in Shape Space
Cascaded regression has been recently applied to reconstructing 3D faces from
single 2D images directly in shape space, and achieved state-of-the-art
performance. This paper investigates thoroughly such cascaded regression based
3D face reconstruction approaches from four perspectives that are not well
studied yet: (i) The impact of the number of 2D landmarks; (ii) the impact of
the number of 3D vertices; (iii) the way of using standalone automated landmark
detection methods; and (iv) the convergence property. To answer these
questions, a simplified cascaded regression based 3D face reconstruction method
is devised, which can be integrated with standalone automated landmark
detection methods and reconstruct 3D face shapes that have the same pose and
expression as the input face images, rather than normalized pose and
expression. Moreover, an effective training method is proposed by disturbing
the automatically detected landmarks. Comprehensive evaluation experiments have
been done with comparison to other 3D face reconstruction methods. The results
not only deepen the understanding of cascaded regression based 3D face
reconstruction approaches, but also prove the effectiveness of proposed method.Comment: 11 pages, 11 figure
Pose Invariant 3D Face Reconstruction
3D face reconstruction is an important task in the field of computer vision.
Although 3D face reconstruction has being developing rapidly in recent years,
it is still a challenge for face reconstruction under large pose. That is
because much of the information about a face in a large pose will be
unknowable. In order to address this issue, this paper proposes a novel 3D face
reconstruction algorithm (PIFR) based on 3D Morphable Model (3DMM). After input
a single face image, it generates a frontal image by normalizing the image.
Then we set weighted sum of the 3D parameters of the two images. Our method
solves the problem of face reconstruction of a single image of a traditional
method in a large pose, works on arbitrary Pose and Expressions, greatly
improves the accuracy of reconstruction. Experiments on the challenging AFW,
LFPW and AFLW database show that our algorithm significantly improves the
accuracy of 3D face reconstruction even under extreme poses .Comment: 8 page
Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
Occlusion and pose variations, which can change facial appearance
significantly, are two major obstacles for automatic Facial Expression
Recognition (FER). Though automatic FER has made substantial progresses in the
past few decades, occlusion-robust and pose-invariant issues of FER have
received relatively less attention, especially in real-world scenarios. This
paper addresses the real-world pose and occlusion robust FER problem with
three-fold contributions. First, to stimulate the research of FER under
real-world occlusions and variant poses, we build several in-the-wild facial
expression datasets with manual annotations for the community. Second, we
propose a novel Region Attention Network (RAN), to adaptively capture the
importance of facial regions for occlusion and pose variant FER. The RAN
aggregates and embeds varied number of region features produced by a backbone
convolutional neural network into a compact fixed-length representation. Last,
inspired by the fact that facial expressions are mainly defined by facial
action units, we propose a region biased loss to encourage high attention
weights for the most important regions. We validate our RAN and region biased
loss on both our built test datasets and four popular datasets: FERPlus,
AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region
biased loss largely improve the performance of FER with occlusion and variant
pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet,
RAF-DB, and SFEW. Code and the collected test data will be publicly available.Comment: The test set and the code of this paper will be available at
https://github.com/kaiwang960112/Challenge-condition-FER-datase
Self-supervised CNN for Unconstrained 3D Facial Performance Capture from an RGB-D Camera
We present a novel method for real-time 3D facial performance capture with
consumer-level RGB-D sensors. Our capturing system is targeted at robust and
stable 3D face capturing in the wild, in which the RGB-D facial data contain
noise, imperfection and occlusion, and often exhibit high variability in
motion, pose, expression and lighting conditions, thus posing great challenges.
The technical contribution is a self-supervised deep learning framework, which
is trained directly from raw RGB-D data. The key novelties include: (1)
learning both the core tensor and the parameters for refining our parametric
face model; (2) using vertex displacement and UV map for learning surface
detail; (3) designing the loss function by incorporating temporal coherence and
same identity constraints based on pairs of RGB-D images and utilizing sparse
norms, in addition to the conventional terms for photo-consistency, feature
similarity, regularization as well as geometry consistency; and (4) augmenting
the training data set in new ways. The method is demonstrated in a live setup
that runs in real-time on a smartphone and an RGB-D sensor. Extensive
experiments show that our method is robust to severe occlusion, fast motion,
large rotation, exaggerated facial expressions and diverse lighting
3D Facial Expression Reconstruction using Cascaded Regression
This paper proposes a novel model fitting algorithm for 3D facial expression
reconstruction from a single image. Face expression reconstruction from a
single image is a challenging task in computer vision. Most state-of-the-art
methods fit the input image to a 3D Morphable Model (3DMM). These methods need
to solve a stochastic problem and cannot deal with expression and pose
variations. To solve this problem, we adopt a 3D face expression model and use
a combined feature which is robust to scale, rotation and different lighting
conditions. The proposed method applies a cascaded regression framework to
estimate parameters for the 3DMM. 2D landmarks are detected and used to
initialize the 3D shape and mapping matrices. In each iteration, residues
between the current 3DMM parameters and the ground truth are estimated and then
used to update the 3D shapes. The mapping matrices are also calculated based on
the updated shapes and 2D landmarks. HOG features of the local patches and
displacements between 3D landmark projections and 2D landmarks are exploited.
Compared with existing methods, the proposed method is robust to expression and
pose changes and can reconstruct higher fidelity 3D face shape
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
Deep Face Feature for Face Alignment
In this paper, we present a deep learning based image feature extraction
method designed specifically for face images. To train the feature extraction
model, we construct a large scale photo-realistic face image dataset with
ground-truth correspondence between multi-view face images, which are
synthesized from real photographs via an inverse rendering procedure. The deep
face feature (DFF) is trained using correspondence between face images rendered
from different views. Using the trained DFF model, we can extract a feature
vector for each pixel of a face image, which distinguishes different facial
regions and is shown to be more effective than general-purpose feature
descriptors for face-related tasks such as matching and alignment. Based on the
DFF, we develop a robust face alignment method, which iteratively updates
landmarks, pose and 3D shape. Extensive experiments demonstrate that our method
can achieve state-of-the-art results for face alignment under highly
unconstrained face images
Robust Face Recognition by Constrained Part-based Alignment
Developing a reliable and practical face recognition system is a
long-standing goal in computer vision research. Existing literature suggests
that pixel-wise face alignment is the key to achieve high-accuracy face
recognition. By assuming a human face as piece-wise planar surfaces, where each
surface corresponds to a facial part, we develop in this paper a Constrained
Part-based Alignment (CPA) algorithm for face recognition across pose and/or
expression. Our proposed algorithm is based on a trainable CPA model, which
learns appearance evidence of individual parts and a tree-structured shape
configuration among different parts. Given a probe face, CPA simultaneously
aligns all its parts by fitting them to the appearance evidence with
consideration of the constraint from the tree-structured shape configuration.
This objective is formulated as a norm minimization problem regularized by
graph likelihoods. CPA can be easily integrated with many existing classifiers
to perform part-based face recognition. Extensive experiments on benchmark face
datasets show that CPA outperforms or is on par with existing methods for
robust face recognition across pose, expression, and/or illumination changes
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
High Fidelity Face Manipulation with Extreme Poses and Expressions
Face manipulation has shown remarkable advances with the flourish of
Generative Adversarial Networks. However, due to the difficulties of
controlling structures and textures, it is challenging to model poses and
expressions simultaneously, especially for the extreme manipulation at
high-resolution. In this paper, we propose a novel framework that simplifies
face manipulation into two correlated stages: a boundary prediction stage and a
disentangled face synthesis stage. The first stage models poses and expressions
jointly via boundary images. Specifically, a conditional encoder-decoder
network is employed to predict the boundary image of the target face in a
semi-supervised way. Pose and expression estimators are introduced to improve
the prediction performance. In the second stage, the predicted boundary image
and the input face image are encoded into the structure and the texture latent
space by two encoder networks, respectively. A proxy network and a feature
threshold loss are further imposed to disentangle the latent space.
Furthermore, due to the lack of high-resolution face manipulation databases to
verify the effectiveness of our method, we collect a new high-quality
Multi-View Face (MVF-HQ) database. It contains 120,283 images at 6000x4000
resolution from 479 identities with diverse poses, expressions, and
illuminations. MVF-HQ is much larger in scale and much higher in resolution
than publicly available high-resolution face manipulation databases. We will
release MVF-HQ soon to push forward the advance of face manipulation.
Qualitative and quantitative experiments on four databases show that our method
dramatically improves the synthesis quality.Comment: Accepted by IEEE Transactions on Information Forensics and Security
(TIFS
- …