6,941 research outputs found
Deep Multi-Center Learning for Face Alignment
Facial landmarks are highly correlated with each other since a certain
landmark can be estimated by its neighboring landmarks. Most of the existing
deep learning methods only use one fully-connected layer called shape
prediction layer to estimate the locations of facial landmarks. In this paper,
we propose a novel deep learning framework named Multi-Center Learning with
multiple shape prediction layers for face alignment. In particular, each shape
prediction layer emphasizes on the detection of a certain cluster of
semantically relevant landmarks respectively. Challenging landmarks are focused
firstly, and each cluster of landmarks is further optimized respectively.
Moreover, to reduce the model complexity, we propose a model assembling method
to integrate multiple shape prediction layers into one shape prediction layer.
Extensive experiments demonstrate that our method is effective for handling
complex occlusions and appearance variations with real-time performance. The
code for our method is available at
https://github.com/ZhiwenShao/MCNet-Extension.Comment: This paper has been accepted by Neurocomputin
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
Face De-occlusion using 3D Morphable Model and Generative Adversarial Network
In recent decades, 3D morphable model (3DMM) has been commonly used in
image-based photorealistic 3D face reconstruction. However, face images are
often corrupted by serious occlusion by non-face objects including eyeglasses,
masks, and hands. Such objects block the correct capture of landmarks and
shading information. Therefore, the reconstructed 3D face model is hardly
reusable. In this paper, a novel method is proposed to restore de-occluded face
images based on inverse use of 3DMM and generative adversarial network. We
utilize the 3DMM prior to the proposed adversarial network and combine a global
and local adversarial convolutional neural network to learn face de-occlusion
model. The 3DMM serves not only as geometric prior but also proposes the face
region for the local discriminator. Experiment results confirm the
effectiveness and robustness of the proposed algorithm in removing challenging
types of occlusions with various head poses and illumination. Furthermore, the
proposed method reconstructs the correct 3D face model with de-occluded
textures.Comment: Presented in ICCV 201
A Detailed Look At CNN-based Approaches In Facial Landmark Detection
Facial landmark detection has been studied over decades. Numerous neural
network (NN)-based approaches have been proposed for detecting landmarks,
especially the convolutional neural network (CNN)-based approaches. In general,
CNN-based approaches can be divided into regression and heatmap approaches.
However, no research systematically studies the characteristics of different
approaches. In this paper, we investigate both CNN-based approaches, generalize
their advantages and disadvantages, and introduce a variation of the heatmap
approach, a pixel-wise classification (PWC) model. To the best of our
knowledge, using the PWC model to detect facial landmarks have not been
comprehensively studied. We further design a hybrid loss function and a
discrimination network for strengthening the landmarks' interrelationship
implied in the PWC model to improve the detection accuracy without modifying
the original model architecture. Six common facial landmark datasets, AFW,
Helen, LFPW, 300-W, IBUG, and COFW are adopted to train or evaluate our model.
A comprehensive evaluation is conducted and the result shows that the proposed
model outperforms other models in all tested datasets
Feature Extraction via Recurrent Random Deep Ensembles and its Application in Gruop-level Happiness Estimation
This paper presents a novel ensemble framework to extract highly
discriminative feature representation of image and its application for
group-level happpiness intensity prediction in wild. In order to generate
enough diversity of decisions, n convolutional neural networks are trained by
bootstrapping the training set and extract n features for each image from them.
A recurrent neural network (RNN) is then used to remember which network
extracts better feature and generate the final feature representation for one
individual image. Several group emotion models (GEM) are used to aggregate face
fea- tures in a group and use parameter-optimized support vector regressor
(SVR) to get the final results. Through extensive experiments, the great
effectiveness of the proposed recurrent random deep ensembles (RRDE) is
demonstrated in both structural and decisional ways. The best result yields a
0.55 root-mean-square error (RMSE) on validation set of HAPPEI dataset,
significantly better than the baseline of 0.78
A Fast and Accurate Unconstrained Face Detector
We propose a method to address challenges in unconstrained face detection,
such as arbitrary pose variations and occlusions. First, a new image feature
called Normalized Pixel Difference (NPD) is proposed. NPD feature is computed
as the difference to sum ratio between two pixel values, inspired by the Weber
Fraction in experimental psychology. The new feature is scale invariant,
bounded, and is able to reconstruct the original image. Second, we propose a
deep quadratic tree to learn the optimal subset of NPD features and their
combinations, so that complex face manifolds can be partitioned by the learned
rules. This way, only a single soft-cascade classifier is needed to handle
unconstrained face detection. Furthermore, we show that the NPD features can be
efficiently obtained from a look up table, and the detection template can be
easily scaled, making the proposed face detector very fast. Experimental
results on three public face datasets (FDDB, GENKI, and CMU-MIT) show that the
proposed method achieves state-of-the-art performance in detecting
unconstrained faces with arbitrary pose variations and occlusions in cluttered
scenes.Comment: This paper has been accepted by TPAMI. The source code is available
on the project page
http://www.cbsr.ia.ac.cn/users/scliao/projects/npdface/index.htm
WIDER FACE: A Face Detection Benchmark
Face detection is one of the most studied topics in the computer vision
community. Much of the progresses have been made by the availability of face
detection benchmark datasets. We show that there is a gap between current face
detection performance and the real world requirements. To facilitate future
face detection research, we introduce the WIDER FACE dataset, which is 10 times
larger than existing datasets. The dataset contains rich annotations, including
occlusions, poses, event categories, and face bounding boxes. Faces in the
proposed dataset are extremely challenging due to large variations in scale,
pose and occlusion, as shown in Fig. 1. Furthermore, we show that WIDER FACE
dataset is an effective training source for face detection. We benchmark
several representative detection systems, providing an overview of
state-of-the-art performance and propose a solution to deal with large scale
variation. Finally, we discuss common failure cases that worth to be further
investigated. Dataset can be downloaded at:
mmlab.ie.cuhk.edu.hk/projects/WIDERFaceComment: 12 page
When 3D-Aided 2D Face Recognition Meets Deep Learning: An extended UR2D for Pose-Invariant Face Recognition
Most of the face recognition works focus on specific modules or demonstrate a
research idea. This paper presents a pose-invariant 3D-aided 2D face
recognition system (UR2D) that is robust to pose variations as large as 90? by
leveraging deep learning technology. The architecture and the interface of UR2D
are described, and each module is introduced in detail. Extensive experiments
are conducted on the UHDB31 and IJB-A, demonstrating that UR2D outperforms
existing 2D face recognition systems such as VGG-Face, FaceNet, and a
commercial off-the-shelf software (COTS) by at least 9% on the UHDB31 dataset
and 3% on the IJB-A dataset on average in face identification tasks. UR2D also
achieves state-of-the-art performance of 85% on the IJB-A dataset by comparing
the Rank-1 accuracy score from template matching. It fills a gap by providing a
3D-aided 2D face recognition system that has compatible results with 2D face
recognition systems using deep learning techniques.Comment: Submitted to Special Issue on Biometrics in the Wild, Image and
Vision Computin
A Multi-Face Challenging Dataset for Robust Face Recognition
Face recognition in images is an active area of interest among the computer
vision researchers. However, recognizing human face in an unconstrained
environment, is a relatively less-explored area of research. Multiple face
recognition in unconstrained environment is a challenging task, due to the
variation of view-point, scale, pose, illumination and expression of the face
images. Partial occlusion of faces makes the recognition task even more
challenging. The contribution of this paper is two-folds: introducing a
challenging multiface dataset (i.e., IIITS MFace Dataset) for face recognition
in unconstrained environment and evaluating the performance of state-of-the-art
hand-designed and deep learning based face descriptors on the dataset. The
proposed IIITS MFace dataset contains faces with challenges like pose
variation, occlusion, mask, spectacle, expressions, change of illumination,
etc. We experiment with several state-of-the-art face descriptors, including
recent deep learning based face descriptors like VGGFace, and compare with the
existing benchmark face datasets. Results of the experiments clearly show that
the difficulty level of the proposed dataset is much higher compared to the
benchmark datasets.Comment: 15th International Conference on Control, Automation, Robotics and
Vision (ICARCV 2018
Unsupervised Eyeglasses Removal in the Wild
Eyeglasses removal is challenging in removing different kinds of eyeglasses,
e.g., rimless glasses, full-rim glasses and sunglasses, and recovering
appropriate eyes. Due to the large visual variants, the conventional methods
lack scalability. Most existing works focus on the frontal face images in the
controlled environment, such as the laboratory, and need to design specific
systems for different eyeglass types. To address the limitation, we propose a
unified eyeglass removal model called Eyeglasses Removal Generative Adversarial
Network (ERGAN), which could handle different types of glasses in the wild. The
proposed method does not depend on the dense annotation of eyeglasses location
but benefits from the large-scale face images with weak annotations.
Specifically, we study the two relevant tasks simultaneously, i.e., removing
and wearing eyeglasses. Given two facial images with and without eyeglasses,
the proposed model learns to swap the eye area in two faces. The generation
mechanism focuses on the eye area and invades the difficulty of generating a
new face. In the experiment, we show the proposed method achieves a competitive
removal quality in terms of realism and diversity. Furthermore, we evaluate
ERGAN on several subsequent tasks, such as face verification and facial
expression recognition. The experiment shows that our method could serve as a
pre-processing method for these tasks
- …