1,185 research outputs found
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Learning Deep Representation for Face Alignment with Auxiliary Attributes
In this study, we show that landmark detection or face alignment task is not
a single and independent problem. Instead, its robustness can be greatly
improved with auxiliary information. Specifically, we jointly optimize landmark
detection together with the recognition of heterogeneous but subtly correlated
facial attributes, such as gender, expression, and appearance attributes. This
is non-trivial since different attribute inference tasks have different
learning difficulties and convergence rates. To address this problem, we
formulate a novel tasks-constrained deep model, which not only learns the
inter-task correlation but also employs dynamic task coefficients to facilitate
the optimization convergence when learning multiple complex tasks. Extensive
evaluations show that the proposed task-constrained learning (i) outperforms
existing face alignment methods, especially in dealing with faces with severe
occlusion and pose variation, and (ii) reduces model complexity drastically
compared to the state-of-the-art methods based on cascaded deep model.Comment: to be published in the IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
Deep Multi-Center Learning for Face Alignment
Facial landmarks are highly correlated with each other since a certain
landmark can be estimated by its neighboring landmarks. Most of the existing
deep learning methods only use one fully-connected layer called shape
prediction layer to estimate the locations of facial landmarks. In this paper,
we propose a novel deep learning framework named Multi-Center Learning with
multiple shape prediction layers for face alignment. In particular, each shape
prediction layer emphasizes on the detection of a certain cluster of
semantically relevant landmarks respectively. Challenging landmarks are focused
firstly, and each cluster of landmarks is further optimized respectively.
Moreover, to reduce the model complexity, we propose a model assembling method
to integrate multiple shape prediction layers into one shape prediction layer.
Extensive experiments demonstrate that our method is effective for handling
complex occlusions and appearance variations with real-time performance. The
code for our method is available at
https://github.com/ZhiwenShao/MCNet-Extension.Comment: This paper has been accepted by Neurocomputin
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation
Hand pose estimation from a single depth image is an essential topic in
computer vision and human computer interaction. Despite recent advancements in
this area promoted by convolutional neural network, accurate hand pose
estimation is still a challenging problem. In this paper we propose a Pose
guided structured Region Ensemble Network (Pose-REN) to boost the performance
of hand pose estimation. The proposed method extracts regions from the feature
maps of convolutional neural network under the guide of an initially estimated
pose, generating more optimal and representative features for hand pose
estimation. The extracted feature regions are then integrated hierarchically
according to the topology of hand joints by employing tree-structured fully
connections. A refined estimation of hand pose is directly regressed by the
proposed network and the final hand pose is obtained by utilizing an iterative
cascaded method. Comprehensive experiments on public hand pose datasets
demonstrate that our proposed method outperforms state-of-the-art algorithms.Comment: Accepted by Neurocomputin
Joint Multi-view Face Alignment in the Wild
The de facto algorithm for facial landmark estimation involves running a face
detector with a subsequent deformable model fitting on the bounding box. This
encompasses two basic problems: i) the detection and deformable fitting steps
are performed independently, while the detector might not provide best-suited
initialisation for the fitting step, ii) the face appearance varies hugely
across different poses, which makes the deformable face fitting very
challenging and thus distinct models have to be used (\eg, one for profile and
one for frontal faces). In this work, we propose the first, to the best of our
knowledge, joint multi-view convolutional network to handle large pose
variations across faces in-the-wild, and elegantly bridge face detection and
facial landmark localisation tasks. Existing joint face detection and landmark
localisation methods focus only on a very small set of landmarks. By contrast,
our method can detect and align a large number of landmarks for semi-frontal
(68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a
plethora of datasets including standard static image datasets such as IBUG,
300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile
faces. Significant improvement over state-of-the-art methods on deformable face
tracking is witnessed on 300VW benchmark. We also demonstrate state-of-the-art
results for face detection on FDDB and MALF datasets.Comment: submit to IEEE Transactions on Image Processin
Evaluation of the Spatio-Temporal features and GAN for Micro-expression Recognition System
Owing to the development and advancement of artificial intelligence, numerous
works were established in the human facial expression recognition system.
Meanwhile, the detection and classification of micro-expressions are attracting
attentions from various research communities in the recent few years. In this
paper, we first review the processes of a conventional optical-flow-based
recognition system, which comprised of facial landmarks annotations, optical
flow guided images computation, features extraction and emotion class
categorization. Secondly, a few approaches have been proposed to improve the
feature extraction part, such as exploiting GAN to generate more image samples.
Particularly, several variations of optical flow are computed in order to
generate optimal images to lead to high recognition accuracy. Next, GAN, a
combination of Generator and Discriminator, is utilized to generate new "fake"
images to increase the sample size. Thirdly, a modified state-of-the-art
Convolutional neural networks is proposed. To verify the effectiveness of the
the proposed method, the results are evaluated on spontaneous micro-expression
databases, namely SMIC, CASME II and SAMM. Both the F1-score and accuracy
performance metrics are reported in this paper.Comment: 15 pages, 16 figures, 6 table
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
- …