3,302 research outputs found
Face Alignment Robust to Pose, Expressions and Occlusions
We propose an Ensemble of Robust Constrained Local Models for alignment of
faces in the presence of significant occlusions and of any unknown pose and
expression. To account for partial occlusions we introduce, Robust Constrained
Local Models, that comprises of a deformable shape and local landmark
appearance model and reasons over binary occlusion labels. Our occlusion
reasoning proceeds by a hypothesize-and-test search over occlusion labels.
Hypotheses are generated by Constrained Local Model based shape fitting over
randomly sampled subsets of landmark detector responses and are evaluated by
the quality of face alignment. To span the entire range of facial pose and
expression variations we adopt an ensemble of independent Robust Constrained
Local Models to search over a discretized representation of pose and
expression. We perform extensive evaluation on a large number of face images,
both occluded and unoccluded. We find that our face alignment system trained
entirely on facial images captured "in-the-lab" exhibits a high degree of
generalization to facial images captured "in-the-wild". Our results are
accurate and stable over a wide spectrum of occlusions, pose and expression
variations resulting in excellent performance on many real-world face datasets
Robust Facial Landmark Localization Based on Texture and Pose Correlated Initialization
Robust facial landmark localization remains a challenging task when faces are
partially occluded. Recently, the cascaded pose regression has attracted
increasing attentions, due to it's superior performance in facial landmark
localization and occlusion detection. However, such an approach is sensitive to
initialization, where an improper initialization can severly degrade the
performance. In this paper, we propose a Robust Initialization for Cascaded
Pose Regression (RICPR) by providing texture and pose correlated initial shapes
for the testing face. By examining the correlation of local binary patterns
histograms between the testing face and the training faces, the shapes of the
training faces that are most correlated with the testing face are selected as
the texture correlated initialization. To make the initialization more robust
to various poses, we estimate the rough pose of the testing face according to
five fiducial landmarks located by multitask cascaded convolutional networks.
Then the pose correlated initial shapes are constructed by the mean face's
shape and the rough testing face pose. Finally, the texture correlated and the
pose correlated initial shapes are joined together as the robust
initialization. We evaluate RICPR on the challenging dataset of COFW. The
experimental results demonstrate that the proposed scheme achieves better
performances than the state-of-the-art methods in facial landmark localization
and occlusion detection
Deep Multi-Center Learning for Face Alignment
Facial landmarks are highly correlated with each other since a certain
landmark can be estimated by its neighboring landmarks. Most of the existing
deep learning methods only use one fully-connected layer called shape
prediction layer to estimate the locations of facial landmarks. In this paper,
we propose a novel deep learning framework named Multi-Center Learning with
multiple shape prediction layers for face alignment. In particular, each shape
prediction layer emphasizes on the detection of a certain cluster of
semantically relevant landmarks respectively. Challenging landmarks are focused
firstly, and each cluster of landmarks is further optimized respectively.
Moreover, to reduce the model complexity, we propose a model assembling method
to integrate multiple shape prediction layers into one shape prediction layer.
Extensive experiments demonstrate that our method is effective for handling
complex occlusions and appearance variations with real-time performance. The
code for our method is available at
https://github.com/ZhiwenShao/MCNet-Extension.Comment: This paper has been accepted by Neurocomputin
A Detailed Look At CNN-based Approaches In Facial Landmark Detection
Facial landmark detection has been studied over decades. Numerous neural
network (NN)-based approaches have been proposed for detecting landmarks,
especially the convolutional neural network (CNN)-based approaches. In general,
CNN-based approaches can be divided into regression and heatmap approaches.
However, no research systematically studies the characteristics of different
approaches. In this paper, we investigate both CNN-based approaches, generalize
their advantages and disadvantages, and introduce a variation of the heatmap
approach, a pixel-wise classification (PWC) model. To the best of our
knowledge, using the PWC model to detect facial landmarks have not been
comprehensively studied. We further design a hybrid loss function and a
discrimination network for strengthening the landmarks' interrelationship
implied in the PWC model to improve the detection accuracy without modifying
the original model architecture. Six common facial landmark datasets, AFW,
Helen, LFPW, 300-W, IBUG, and COFW are adopted to train or evaluate our model.
A comprehensive evaluation is conducted and the result shows that the proposed
model outperforms other models in all tested datasets
Deep Regression for Face Alignment
In this paper, we present a deep regression approach for face alignment. The
deep architecture consists of a global layer and multi-stage local layers. We
apply the back-propagation algorithm with the dropout strategy to jointly
optimize the regression parameters. We show that the resulting deep regressor
gradually and evenly approaches the true facial landmarks stage by stage,
avoiding the tendency to yield over-strong early stage regressors while
over-weak later stage regressors. Experimental results show that our approach
achieves the state-of-the-ar
A Convolution Tree with Deconvolution Branches: Exploiting Geometric Relationships for Single Shot Keypoint Detection
Recently, Deep Convolution Networks (DCNNs) have been applied to the task of
face alignment and have shown potential for learning improved feature
representations. Although deeper layers can capture abstract concepts like
pose, it is difficult to capture the geometric relationships among the
keypoints in DCNNs. In this paper, we propose a novel convolution-deconvolution
network for facial keypoint detection. Our model predicts the 2D locations of
the keypoints and their individual visibility along with 3D head pose, while
exploiting the spatial relationships among different keypoints. Different from
existing approaches of modeling these relationships, we propose learnable
transform functions which captures the relationships between keypoints at
feature level. However, due to extensive variations in pose, not all of these
relationships act at once, and hence we propose, a pose-based routing function
which implicitly models the active relationships. Both transform functions and
the routing function are implemented through convolutions in a multi-task
framework. Our approach presents a single-shot keypoint detection method,
making it different from many existing cascade regression-based methods. We
also show that learning these relationships significantly improve the accuracy
of keypoint detections for in-the-wild face images from challenging datasets
such as AFW and AFLW
Automatic Facial Expression Recognition Using Features of Salient Facial Patches
Extraction of discriminative features from salient facial patches plays a
vital role in effective facial expression recognition. The accurate detection
of facial landmarks improves the localization of the salient patches on face
images. This paper proposes a novel framework for expression recognition by
using appearance features of selected facial patches. A few prominent facial
patches, depending on the position of facial landmarks, are extracted which are
active during emotion elicitation. These active patches are further processed
to obtain the salient patches which contain discriminative features for
classification of each pair of expressions, thereby selecting different facial
patches as salient for different pair of expression classes. One-against-one
classification method is adopted using these features. In addition, an
automated learning-free facial landmark detection technique has been proposed,
which achieves similar performances as that of other state-of-art landmark
detection methods, yet requires significantly less execution time. The proposed
method is found to perform well consistently in different resolutions, hence,
providing a solution for expression recognition in low resolution images.
Experiments on CK+ and JAFFE facial expression databases show the effectiveness
of the proposed system
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
Learning Deep Representation for Face Alignment with Auxiliary Attributes
In this study, we show that landmark detection or face alignment task is not
a single and independent problem. Instead, its robustness can be greatly
improved with auxiliary information. Specifically, we jointly optimize landmark
detection together with the recognition of heterogeneous but subtly correlated
facial attributes, such as gender, expression, and appearance attributes. This
is non-trivial since different attribute inference tasks have different
learning difficulties and convergence rates. To address this problem, we
formulate a novel tasks-constrained deep model, which not only learns the
inter-task correlation but also employs dynamic task coefficients to facilitate
the optimization convergence when learning multiple complex tasks. Extensive
evaluations show that the proposed task-constrained learning (i) outperforms
existing face alignment methods, especially in dealing with faces with severe
occlusion and pose variation, and (ii) reduces model complexity drastically
compared to the state-of-the-art methods based on cascaded deep model.Comment: to be published in the IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
DeCaFA: Deep Convolutional Cascade for Face Alignment In The Wild
Face Alignment is an active computer vision domain, that consists in
localizing a number of facial landmarks that vary across datasets.
State-of-the-art face alignment methods either consist in end-to-end
regression, or in refining the shape in a cascaded manner, starting from an
initial guess. In this paper, we introduce DeCaFA, an end-to-end deep
convolutional cascade architecture for face alignment. DeCaFA uses
fully-convolutional stages to keep full spatial resolution throughout the
cascade. Between each cascade stage, DeCaFA uses multiple chained transfer
layers with spatial softmax to produce landmark-wise attention maps for each of
several landmark alignment tasks. Weighted intermediate supervision, as well as
efficient feature fusion between the stages allow to learn to progressively
refine the attention maps in an end-to-end manner. We show experimentally that
DeCaFA significantly outperforms existing approaches on 300W, CelebA and WFLW
databases. In addition, we show that DeCaFA can learn fine alignment with
reasonable accuracy from very few images using coarsely annotated data
- …