386 research outputs found
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
Learning Deep Representation for Face Alignment with Auxiliary Attributes
In this study, we show that landmark detection or face alignment task is not
a single and independent problem. Instead, its robustness can be greatly
improved with auxiliary information. Specifically, we jointly optimize landmark
detection together with the recognition of heterogeneous but subtly correlated
facial attributes, such as gender, expression, and appearance attributes. This
is non-trivial since different attribute inference tasks have different
learning difficulties and convergence rates. To address this problem, we
formulate a novel tasks-constrained deep model, which not only learns the
inter-task correlation but also employs dynamic task coefficients to facilitate
the optimization convergence when learning multiple complex tasks. Extensive
evaluations show that the proposed task-constrained learning (i) outperforms
existing face alignment methods, especially in dealing with faces with severe
occlusion and pose variation, and (ii) reduces model complexity drastically
compared to the state-of-the-art methods based on cascaded deep model.Comment: to be published in the IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
Human and Sheep Facial Landmarks Localisation by Triplet Interpolated Features
In this paper we present a method for localisation of facial landmarks on
human and sheep. We introduce a new feature extraction scheme called
triplet-interpolated feature used at each iteration of the cascaded shape
regression framework. It is able to extract features from similar semantic
location given an estimated shape, even when head pose variations are large and
the facial landmarks are very sparsely distributed. Furthermore, we study the
impact of training data imbalance on model performance and propose a training
sample augmentation scheme that produces more initialisations for training
samples from the minority. More specifically, the augmentation number for a
training sample is made to be negatively correlated to the value of the fitted
probability density function at the sample's position. We evaluate the proposed
scheme on both human and sheep facial landmarks localisation. On the benchmark
300w human face dataset, we demonstrate the benefits of our proposed methods
and show very competitive performance when comparing to other methods. On a
newly created sheep face dataset, we get very good performance despite the fact
that we only have a limited number of training samples and a set of sparse
landmarks are annotated.Comment: submitted to WACV201
Facial Expression Recognition in the Wild using Rich Deep Features
Facial Expression Recognition is an active area of research in computer
vision with a wide range of applications. Several approaches have been
developed to solve this problem for different benchmark datasets. However,
Facial Expression Recognition in the wild remains an area where much work is
still needed to serve real-world applications. To this end, in this paper we
present a novel approach towards facial expression recognition. We fuse rich
deep features with domain knowledge through encoding discriminant facial
patches. We conduct experiments on two of the most popular benchmark datasets;
CK and TFE. Moreover, we present a novel dataset that, unlike its precedents,
consists of natural - not acted - expression images. Experimental results show
that our approach achieves state-of-the-art results over standard benchmarks
and our own datasetComment: in International Conference in Image Processing, 201
Face Alignment with Cascaded Semi-Parametric Deep Greedy Neural Forests
Face alignment is an active topic in computer vision, consisting in aligning
a shape model on the face. To this end, most modern approaches refine the shape
in a cascaded manner, starting from an initial guess. Those shape updates can
either be applied in the feature point space (\textit{i.e.} explicit updates)
or in a low-dimensional, parametric space. In this paper, we propose a
semi-parametric cascade that first aligns a parametric shape, then captures
more fine-grained deformations of an explicit shape. For the purpose of
learning shape updates at each cascade stage, we introduce a deep greedy neural
forest (GNF) model, which is an improved version of deep neural forest (NF).
GNF appears as an ideal regressor for face alignment, as it combines
differentiability, high expressivity and fast evaluation runtime. The proposed
framework is very fast and achieves high accuracies on multiple challenging
benchmarks, including small, medium and large pose experiments.Comment: 10 pages, 1 page appendix, 5 figure
A fast online cascaded regression algorithm for face alignment
Traditional face alignment based on machine learning usually tracks the
localizations of facial landmarks employing a static model trained offline
where all of the training data is available in advance. When new training
samples arrive, the static model must be retrained from scratch, which is
excessively time-consuming and memory-consuming. In many real-time
applications, the training data is obtained one by one or batch by batch. It
results in that the static model limits its performance on sequential images
with extensive variations. Therefore, the most critical and challenging aspect
in this field is dynamically updating the tracker's models to enhance
predictive and generalization capabilities continuously. In order to address
this question, we develop a fast and accurate online learning algorithm for
face alignment. Particularly, we incorporate on-line sequential extreme
learning machine into a parallel cascaded regression framework, coined
incremental cascade regression(ICR). To the best of our knowledge, this is the
first incremental cascaded framework with the non-linear regressor. One main
advantage of ICR is that the tracker model can be fast updated in an
incremental way without the entire retraining process when a new input is
incoming. Experimental results demonstrate that the proposed ICR is more
accurate and efficient on still or sequential images compared with the recent
state-of-the-art cascade approaches. Furthermore, the incremental learning
proposed in this paper can update the trained model in real time
Evaluation of the Spatio-Temporal features and GAN for Micro-expression Recognition System
Owing to the development and advancement of artificial intelligence, numerous
works were established in the human facial expression recognition system.
Meanwhile, the detection and classification of micro-expressions are attracting
attentions from various research communities in the recent few years. In this
paper, we first review the processes of a conventional optical-flow-based
recognition system, which comprised of facial landmarks annotations, optical
flow guided images computation, features extraction and emotion class
categorization. Secondly, a few approaches have been proposed to improve the
feature extraction part, such as exploiting GAN to generate more image samples.
Particularly, several variations of optical flow are computed in order to
generate optimal images to lead to high recognition accuracy. Next, GAN, a
combination of Generator and Discriminator, is utilized to generate new "fake"
images to increase the sample size. Thirdly, a modified state-of-the-art
Convolutional neural networks is proposed. To verify the effectiveness of the
the proposed method, the results are evaluated on spontaneous micro-expression
databases, namely SMIC, CASME II and SAMM. Both the F1-score and accuracy
performance metrics are reported in this paper.Comment: 15 pages, 16 figures, 6 table
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Face Alignment using a 3D Deeply-initialized Ensemble of Regression Trees
Face alignment algorithms locate a set of landmark points in images of faces
taken in unrestricted situations. State-of-the-art approaches typically fail or
lose accuracy in the presence of occlusions, strong deformations, large pose
variations and ambiguous configurations. In this paper we present 3DDE, a
robust and efficient face alignment algorithm based on a coarse-to-fine cascade
of ensembles of regression trees. It is initialized by robustly fitting a 3D
face model to the probability maps produced by a convolutional neural network.
With this initialization we address self-occlusions and large face rotations.
Further, the regressor implicitly imposes a prior face shape on the solution,
addressing occlusions and ambiguous face configurations. Its coarse-to-fine
structure tackles the combinatorial explosion of parts deformation. In the
experiments performed, 3DDE improves the state-of-the-art in 300W, COFW, AFLW
and WFLW data sets. Finally, we perform cross-dataset experiments that reveal
the existence of a significant data set bias in these benchmarks.Comment: Accepted Version to Computer Vision and Image Understandin
Deep Hierarchical Machine: a Flexible Divide-and-Conquer Architecture
We propose Deep Hierarchical Machine (DHM), a model inspired from the
divide-and-conquer strategy while emphasizing representation learning ability
and flexibility. A stochastic routing framework as used by recent deep neural
decision/regression forests is incorporated, but we remove the need to evaluate
unnecessary computation paths by utilizing a different topology and introducing
a probabilistic pruning technique. We also show a specified version of DHM
(DSHM) for efficiency, which inherits the sparse feature extraction process as
in traditional decision tree with pixel-difference feature. To achieve sparse
feature extraction, we propose to utilize sparse convolution operation in DSHM
and show one possibility of introducing sparse convolution kernels by using
local binary convolution layer. DHM can be applied to both classification and
regression problems, and we validate it on standard image classification and
face alignment tasks to show its advantages over past architectures
- …