1,018 research outputs found
Learning Deep Representation for Face Alignment with Auxiliary Attributes
In this study, we show that landmark detection or face alignment task is not
a single and independent problem. Instead, its robustness can be greatly
improved with auxiliary information. Specifically, we jointly optimize landmark
detection together with the recognition of heterogeneous but subtly correlated
facial attributes, such as gender, expression, and appearance attributes. This
is non-trivial since different attribute inference tasks have different
learning difficulties and convergence rates. To address this problem, we
formulate a novel tasks-constrained deep model, which not only learns the
inter-task correlation but also employs dynamic task coefficients to facilitate
the optimization convergence when learning multiple complex tasks. Extensive
evaluations show that the proposed task-constrained learning (i) outperforms
existing face alignment methods, especially in dealing with faces with severe
occlusion and pose variation, and (ii) reduces model complexity drastically
compared to the state-of-the-art methods based on cascaded deep model.Comment: to be published in the IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
Joint Multi-view Face Alignment in the Wild
The de facto algorithm for facial landmark estimation involves running a face
detector with a subsequent deformable model fitting on the bounding box. This
encompasses two basic problems: i) the detection and deformable fitting steps
are performed independently, while the detector might not provide best-suited
initialisation for the fitting step, ii) the face appearance varies hugely
across different poses, which makes the deformable face fitting very
challenging and thus distinct models have to be used (\eg, one for profile and
one for frontal faces). In this work, we propose the first, to the best of our
knowledge, joint multi-view convolutional network to handle large pose
variations across faces in-the-wild, and elegantly bridge face detection and
facial landmark localisation tasks. Existing joint face detection and landmark
localisation methods focus only on a very small set of landmarks. By contrast,
our method can detect and align a large number of landmarks for semi-frontal
(68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a
plethora of datasets including standard static image datasets such as IBUG,
300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile
faces. Significant improvement over state-of-the-art methods on deformable face
tracking is witnessed on 300VW benchmark. We also demonstrate state-of-the-art
results for face detection on FDDB and MALF datasets.Comment: submit to IEEE Transactions on Image Processin
DeCaFA: Deep Convolutional Cascade for Face Alignment In The Wild
Face Alignment is an active computer vision domain, that consists in
localizing a number of facial landmarks that vary across datasets.
State-of-the-art face alignment methods either consist in end-to-end
regression, or in refining the shape in a cascaded manner, starting from an
initial guess. In this paper, we introduce DeCaFA, an end-to-end deep
convolutional cascade architecture for face alignment. DeCaFA uses
fully-convolutional stages to keep full spatial resolution throughout the
cascade. Between each cascade stage, DeCaFA uses multiple chained transfer
layers with spatial softmax to produce landmark-wise attention maps for each of
several landmark alignment tasks. Weighted intermediate supervision, as well as
efficient feature fusion between the stages allow to learn to progressively
refine the attention maps in an end-to-end manner. We show experimentally that
DeCaFA significantly outperforms existing approaches on 300W, CelebA and WFLW
databases. In addition, we show that DeCaFA can learn fine alignment with
reasonable accuracy from very few images using coarsely annotated data
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
We present an algorithm for simultaneous face detection, landmarks
localization, pose estimation and gender recognition using deep convolutional
neural networks (CNN). The proposed method called, HyperFace, fuses the
intermediate layers of a deep CNN using a separate CNN followed by a multi-task
learning algorithm that operates on the fused features. It exploits the synergy
among the tasks which boosts up their individual performances. Additionally, we
propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the
ResNet-101 model and achieves significant improvement in performance, and (2)
Fast-HyperFace that uses a high recall fast face detector for generating region
proposals to improve the speed of the algorithm. Extensive experiments show
that the proposed models are able to capture both global and local information
in faces and performs significantly better than many competitive algorithms for
each of these four tasks.Comment: Accepted in Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
A Convolution Tree with Deconvolution Branches: Exploiting Geometric Relationships for Single Shot Keypoint Detection
Recently, Deep Convolution Networks (DCNNs) have been applied to the task of
face alignment and have shown potential for learning improved feature
representations. Although deeper layers can capture abstract concepts like
pose, it is difficult to capture the geometric relationships among the
keypoints in DCNNs. In this paper, we propose a novel convolution-deconvolution
network for facial keypoint detection. Our model predicts the 2D locations of
the keypoints and their individual visibility along with 3D head pose, while
exploiting the spatial relationships among different keypoints. Different from
existing approaches of modeling these relationships, we propose learnable
transform functions which captures the relationships between keypoints at
feature level. However, due to extensive variations in pose, not all of these
relationships act at once, and hence we propose, a pose-based routing function
which implicitly models the active relationships. Both transform functions and
the routing function are implemented through convolutions in a multi-task
framework. Our approach presents a single-shot keypoint detection method,
making it different from many existing cascade regression-based methods. We
also show that learning these relationships significantly improve the accuracy
of keypoint detections for in-the-wild face images from challenging datasets
such as AFW and AFLW
Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning
Facial landmark localization plays a critical role in face recognition and
analysis. In this paper, we propose a novel cascaded backbone-branches fully
convolutional neural network~(BB-FCN) for rapidly and accurately localizing
facial landmarks in unconstrained and cluttered settings. Our proposed BB-FCN
generates facial landmark response maps directly from raw images without any
preprocessing. BB-FCN follows a coarse-to-fine cascaded pipeline, which
consists of a backbone network for roughly detecting the locations of all
facial landmarks and one branch network for each type of detected landmark for
further refining their locations. Furthermore, to facilitate the facial
landmark localization under unconstrained settings, we propose a large-scale
benchmark named SYSU16K, which contains 16000 faces with large variations in
pose, expression, illumination and resolution. Extensive experimental
evaluations demonstrate that our proposed BB-FCN can significantly outperform
the state-of-the-art under both constrained (i.e., within detected facial
regions only) and unconstrained settings. We further confirm that high-quality
facial landmarks localized with our proposed network can also improve the
precision and recall of face detection
The Intelligent ICU Pilot Study: Using Artificial Intelligence Technology for Autonomous Patient Monitoring
Currently, many critical care indices are repetitively assessed and recorded
by overburdened nurses, e.g. physical function or facial pain expressions of
nonverbal patients. In addition, many essential information on patients and
their environment are not captured at all, or are captured in a non-granular
manner, e.g. sleep disturbance factors such as bright light, loud background
noise, or excessive visitations. In this pilot study, we examined the
feasibility of using pervasive sensing technology and artificial intelligence
for autonomous and granular monitoring of critically ill patients and their
environment in the Intensive Care Unit (ICU). As an exemplar prevalent
condition, we also characterized delirious and non-delirious patients and their
environment. We used wearable sensors, light and sound sensors, and a
high-resolution camera to collected data on patients and their environment. We
analyzed collected data using deep learning and statistical analysis. Our
system performed face detection, face recognition, facial action unit
detection, head pose detection, facial expression recognition, posture
recognition, actigraphy analysis, sound pressure and light level detection, and
visitation frequency detection. We were able to detect patient's face (Mean
average precision (mAP)=0.94), recognize patient's face (mAP=0.80), and their
postures (F1=0.94). We also found that all facial expressions, 11 activity
features, visitation frequency during the day, visitation frequency during the
night, light levels, and sound pressure levels during the night were
significantly different between delirious and non-delirious patients
(p-value<0.05). In summary, we showed that granular and autonomous monitoring
of critically ill patients and their environment is feasible and can be used
for characterizing critical care conditions and related environment factors
A Detailed Look At CNN-based Approaches In Facial Landmark Detection
Facial landmark detection has been studied over decades. Numerous neural
network (NN)-based approaches have been proposed for detecting landmarks,
especially the convolutional neural network (CNN)-based approaches. In general,
CNN-based approaches can be divided into regression and heatmap approaches.
However, no research systematically studies the characteristics of different
approaches. In this paper, we investigate both CNN-based approaches, generalize
their advantages and disadvantages, and introduce a variation of the heatmap
approach, a pixel-wise classification (PWC) model. To the best of our
knowledge, using the PWC model to detect facial landmarks have not been
comprehensively studied. We further design a hybrid loss function and a
discrimination network for strengthening the landmarks' interrelationship
implied in the PWC model to improve the detection accuracy without modifying
the original model architecture. Six common facial landmark datasets, AFW,
Helen, LFPW, 300-W, IBUG, and COFW are adopted to train or evaluate our model.
A comprehensive evaluation is conducted and the result shows that the proposed
model outperforms other models in all tested datasets
- …