2,516 research outputs found
Vision-based Human Gender Recognition: A Survey
Gender is an important demographic attribute of people. This paper provides a
survey of human gender recognition in computer vision. A review of approaches
exploiting information from face and whole body (either from a still image or
gait sequence) is presented. We highlight the challenges faced and survey the
representative methods of these approaches. Based on the results, good
performance have been achieved for datasets captured under controlled
environments, but there is still much work that can be done to improve the
robustness of gender recognition under real-life environments.Comment: 30 page
A Survey on Periocular Biometrics Research
Periocular refers to the facial region in the vicinity of the eye, including
eyelids, lashes and eyebrows. While face and irises have been extensively
studied, the periocular region has emerged as a promising trait for
unconstrained biometrics, following demands for increased robustness of face or
iris systems. With a surprisingly high discrimination ability, this region can
be easily obtained with existing setups for face and iris, and the requirement
of user cooperation can be relaxed, thus facilitating the interaction with
biometric systems. It is also available over a wide range of distances even
when the iris texture cannot be reliably obtained (low resolution) or under
partial face occlusion (close distances). Here, we review the state of the art
in periocular biometrics research. A number of aspects are described,
including: i) existing databases, ii) algorithms for periocular detection
and/or segmentation, iii) features employed for recognition, iv) identification
of the most discriminative regions of the periocular area, v) comparison with
iris and face modalities, vi) soft-biometrics (gender/ethnicity
classification), and vii) impact of gender transformation and plastic surgery
on the recognition accuracy. This work is expected to provide an insight of the
most relevant issues in periocular biometrics, giving a comprehensive coverage
of the existing literature and current state of the art.Comment: Published in Pattern Recognition Letter
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
Attended End-to-end Architecture for Age Estimation from Facial Expression Videos
The main challenges of age estimation from facial expression videos lie not
only in the modeling of the static facial appearance, but also in the capturing
of the temporal facial dynamics. Traditional techniques to this problem focus
on constructing handcrafted features to explore the discriminative information
contained in facial appearance and dynamics separately. This relies on
sophisticated feature-refinement and framework-design. In this paper, we
present an end-to-end architecture for age estimation, called Spatially-Indexed
Attention Model (SIAM), which is able to simultaneously learn both the
appearance and dynamics of age from raw videos of facial expressions.
Specifically, we employ convolutional neural networks to extract effective
latent appearance representations and feed them into recurrent networks to
model the temporal dynamics. More importantly, we propose to leverage attention
models for salience detection in both the spatial domain for each single image
and the temporal domain for the whole video as well. We design a specific
spatially-indexed attention mechanism among the convolutional layers to extract
the salient facial regions in each individual image, and a temporal attention
layer to assign attention weights to each frame. This two-pronged approach not
only improves the performance by allowing the model to focus on informative
frames and facial areas, but it also offers an interpretable correspondence
between the spatial facial regions as well as temporal frames, and the task of
age estimation. We demonstrate the strong performance of our model in
experiments on a large, gender-balanced database with 400 subjects with ages
spanning from 8 to 76 years. Experiments reveal that our model exhibits
significant superiority over the state-of-the-art methods given sufficient
training data.Comment: Accepted by Transactions on Image Processing (TIP
Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
Occlusion and pose variations, which can change facial appearance
significantly, are two major obstacles for automatic Facial Expression
Recognition (FER). Though automatic FER has made substantial progresses in the
past few decades, occlusion-robust and pose-invariant issues of FER have
received relatively less attention, especially in real-world scenarios. This
paper addresses the real-world pose and occlusion robust FER problem with
three-fold contributions. First, to stimulate the research of FER under
real-world occlusions and variant poses, we build several in-the-wild facial
expression datasets with manual annotations for the community. Second, we
propose a novel Region Attention Network (RAN), to adaptively capture the
importance of facial regions for occlusion and pose variant FER. The RAN
aggregates and embeds varied number of region features produced by a backbone
convolutional neural network into a compact fixed-length representation. Last,
inspired by the fact that facial expressions are mainly defined by facial
action units, we propose a region biased loss to encourage high attention
weights for the most important regions. We validate our RAN and region biased
loss on both our built test datasets and four popular datasets: FERPlus,
AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region
biased loss largely improve the performance of FER with occlusion and variant
pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet,
RAF-DB, and SFEW. Code and the collected test data will be publicly available.Comment: The test set and the code of this paper will be available at
https://github.com/kaiwang960112/Challenge-condition-FER-datase
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
We present an algorithm for simultaneous face detection, landmarks
localization, pose estimation and gender recognition using deep convolutional
neural networks (CNN). The proposed method called, HyperFace, fuses the
intermediate layers of a deep CNN using a separate CNN followed by a multi-task
learning algorithm that operates on the fused features. It exploits the synergy
among the tasks which boosts up their individual performances. Additionally, we
propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the
ResNet-101 model and achieves significant improvement in performance, and (2)
Fast-HyperFace that uses a high recall fast face detector for generating region
proposals to improve the speed of the algorithm. Extensive experiments show
that the proposed models are able to capture both global and local information
in faces and performs significantly better than many competitive algorithms for
each of these four tasks.Comment: Accepted in Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Pose-adaptive Hierarchical Attention Network for Facial Expression Recognition
Multi-view facial expression recognition (FER) is a challenging task because
the appearance of an expression varies in poses. To alleviate the influences of
poses, recent methods either perform pose normalization or learn separate FER
classifiers for each pose. However, these methods usually have two stages and
rely on good performance of pose estimators. Different from existing methods,
we propose a pose-adaptive hierarchical attention network (PhaNet) that can
jointly recognize the facial expressions and poses in unconstrained
environment. Specifically, PhaNet discovers the most relevant regions to the
facial expression by an attention mechanism in hierarchical scales, and the
most informative scales are then selected to learn the pose-invariant and
expression-discriminative representations. PhaNet is end-to-end trainable by
minimizing the hierarchical attention losses, the FER loss and pose loss with
dynamically learned loss weights. We validate the effectiveness of the proposed
PhaNet on three multi-view datasets (BU-3DFE, Multi-pie, and KDEF) and two
in-the-wild FER datasets (AffectNet and SFEW). Extensive experiments
demonstrate that our framework outperforms the state-of-the-arts under both
within-dataset and cross-dataset settings, achieving the average accuracies of
84.92\%, 93.53\%, 88.5\%, 54.82\% and 31.25\% respectively.Comment: 12 pages, 15 figure
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
- …