4,762 research outputs found
Facial Pose Estimation by Deep Learning from Label Distributions
Facial pose estimation has gained a lot of attentions in many practical
applications, such as human-robot interaction, gaze estimation and driver
monitoring. Meanwhile, end-to-end deep learning-based facial pose estimation is
becoming more and more popular. However, facial pose estimation suffers from a
key challenge: the lack of sufficient training data for many poses, especially
for large poses. Inspired by the observation that the faces under close poses
look similar, we reformulate the facial pose estimation as a label distribution
learning problem, considering each face image as an example associated with a
Gaussian label distribution rather than a single label, and construct a
convolutional neural network which is trained with a multi-loss function on
AFLW dataset and 300W-LP dataset to predict the facial poses directly from
color image. Extensive experiments are conducted on several popular benchmarks,
including AFLW2000, BIWI, AFLW and AFW, where our approach shows a significant
advantage over other state-of-the-art methods.Comment: 9 pages,5 figures, Accepted by ICCV 2019 worksho
Diversity in Faces
Face recognition is a long standing challenge in the field of Artificial
Intelligence (AI). The goal is to create systems that accurately detect,
recognize, verify, and understand human faces. There are significant technical
hurdles in making these systems accurate, particularly in unconstrained
settings due to confounding factors related to pose, resolution, illumination,
occlusion, and viewpoint. However, with recent advances in neural networks,
face recognition has achieved unprecedented accuracy, largely built on
data-driven deep learning methods. While this is encouraging, a critical aspect
that is limiting facial recognition accuracy and fairness is inherent facial
diversity. Every face is different. Every face reflects something unique about
us. Aspects of our heritage - including race, ethnicity, culture, geography -
and our individual identify - age, gender, and other visible manifestations of
self-expression, are reflected in our faces. We expect face recognition to work
equally accurately for every face. Face recognition needs to be fair. As we
rely on data-driven methods to create face recognition technology, we need to
ensure necessary balance and coverage in training data. However, there are
still scientific questions about how to represent and extract pertinent facial
features and quantitatively measure facial diversity. Towards this goal,
Diversity in Faces (DiF) provides a data set of one million annotated human
face images for advancing the study of facial diversity. The annotations are
generated using ten well-established facial coding schemes from the scientific
literature. The facial coding schemes provide human-interpretable quantitative
measures of facial features. We believe that by making the extracted coding
schemes available on a large set of faces, we can accelerate research and
development towards creating more fair and accurate facial recognition systems.Comment: Updated statistics after slight modification to dataset due to
inactive links and deletion
DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model
In this paper, we address the problem of how to robustly train a ConvNet for
regression, or deep robust regression. Traditionally, deep regression employs
the L2 loss function, known to be sensitive to outliers, i.e. samples that
either lie at an abnormal distance away from the majority of the training
samples, or that correspond to wrongly annotated targets. This means that,
during back-propagation, outliers may bias the training process due to the high
magnitude of their gradient. In this paper, we propose DeepGUM: a deep
regression model that is robust to outliers thanks to the use of a
Gaussian-uniform mixture model. We derive an optimization algorithm that
alternates between the unsupervised detection of outliers using
expectation-maximization, and the supervised training with cleaned samples
using stochastic gradient descent. DeepGUM is able to adapt to a continuously
evolving outlier distribution, avoiding to manually impose any threshold on the
proportion of outliers in the training set. Extensive experimental evaluations
on four different tasks (facial and fashion landmark detection, age and head
pose estimation) lead us to conclude that our novel robust technique provides
reliability in the presence of various types of noise and protection against a
high percentage of outliers.Comment: accepted at ECCV 201
Attended End-to-end Architecture for Age Estimation from Facial Expression Videos
The main challenges of age estimation from facial expression videos lie not
only in the modeling of the static facial appearance, but also in the capturing
of the temporal facial dynamics. Traditional techniques to this problem focus
on constructing handcrafted features to explore the discriminative information
contained in facial appearance and dynamics separately. This relies on
sophisticated feature-refinement and framework-design. In this paper, we
present an end-to-end architecture for age estimation, called Spatially-Indexed
Attention Model (SIAM), which is able to simultaneously learn both the
appearance and dynamics of age from raw videos of facial expressions.
Specifically, we employ convolutional neural networks to extract effective
latent appearance representations and feed them into recurrent networks to
model the temporal dynamics. More importantly, we propose to leverage attention
models for salience detection in both the spatial domain for each single image
and the temporal domain for the whole video as well. We design a specific
spatially-indexed attention mechanism among the convolutional layers to extract
the salient facial regions in each individual image, and a temporal attention
layer to assign attention weights to each frame. This two-pronged approach not
only improves the performance by allowing the model to focus on informative
frames and facial areas, but it also offers an interpretable correspondence
between the spatial facial regions as well as temporal frames, and the task of
age estimation. We demonstrate the strong performance of our model in
experiments on a large, gender-balanced database with 400 subjects with ages
spanning from 8 to 76 years. Experiments reveal that our model exhibits
significant superiority over the state-of-the-art methods given sufficient
training data.Comment: Accepted by Transactions on Image Processing (TIP
Soft-ranking Label Encoding for Robust Facial Age Estimation
Automatic facial age estimation can be used in a wide range of real-world
applications. However, this process is challenging due to the randomness and
slowness of the aging process. Accordingly, in this paper, we propose a
comprehensive framework aimed at overcoming the challenges associated with
facial age estimation. First, we propose a novel age encoding method, referred
to as 'Soft-ranking', which encodes two important properties of facial age,
i.e., the ordinal property and the correlation between adjacent ages.
Therefore, Soft-ranking provides a richer supervision signal for training deep
models. Moreover, we also carefully analyze existing evaluation protocols for
age estimation, finding that the overlap in identity between the training and
testing sets affects the relative performance of different age encoding
methods. Finally, since existing face databases for age estimation are
generally small, deep models tend to suffer from an overfitting problem. To
address this issue, we propose a novel regularization strategy to encourage
deep models to learn more robust features from facial parts for age estimation
purposes. Extensive experiments indicate that the proposed techniques improve
the age estimation performance; moreover, we achieve state-of-the-art
performance on the three most popular age databases, , Morph II,
CLAP2015, and CLAP2016
FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild
Image-based age estimation aims to predict a person's age from facial images.
It is used in a variety of real-world applications. Although end-to-end deep
models have achieved impressive results for age estimation on benchmark
datasets, their performance in-the-wild still leaves much room for improvement
due to the challenges caused by large variations in head pose, facial
expressions, and occlusions. To address this issue, we propose a simple yet
effective method to explicitly incorporate facial semantics into age
estimation, so that the model would learn to correctly focus on the most
informative facial components from unaligned facial images regardless of head
pose and non-rigid deformation. To this end, we design a face parsing-based
network to learn semantic information at different scales and a novel face
parsing attention module to leverage these semantic features for age
estimation. To evaluate our method on in-the-wild data, we also introduce a new
challenging large-scale benchmark called IMDB-Clean. This dataset is created by
semi-automatically cleaning the noisy IMDB-WIKI dataset using a constrained
clustering method. Through comprehensive experiment on IMDB-Clean and other
benchmark datasets, under both intra-dataset and cross-dataset evaluation
protocols, we show that our method consistently outperforms all existing age
estimation methods and achieves a new state-of-the-art performance. To the best
of our knowledge, our work presents the first attempt of leveraging face
parsing attention to achieve semantic-aware age estimation, which may be
inspiring to other high level facial analysis tasks.Comment: Code and data will be available on
https://github.com/hhj1897/age_estimatio
Elastic Neural Networks: A Scalable Framework for Embedded Computer Vision
We propose a new framework for image classification with deep neural
networks. The framework introduces intermediate outputs to the computational
graph of a network. This enables flexible control of the computational load and
balances the tradeoff between accuracy and execution time.
Moreover, we present an interesting finding that the intermediate outputs can
act as a regularizer at training time, improving the prediction accuracy. In
the experimental section we demonstrate the performance of our proposed
framework with various commonly used pretrained deep networks in the use case
of apparent age estimation.Comment: EUSIPCO 201
Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning
This paper presents a novel Subject-dependent Deep Aging Path (SDAP), which
inherits the merits of both Generative Probabilistic Modeling and Inverse
Reinforcement Learning to model the facial structures and the longitudinal face
aging process of a given subject. The proposed SDAP is optimized using
tractable log-likelihood objective functions with Convolutional Neural Networks
(CNNs) based deep feature extraction. Instead of applying a fixed aging
development path for all input faces and subjects, SDAP is able to provide the
most appropriate aging development path for individual subject that optimizes
the reward aging formulation. Unlike previous methods that can take only one
image as the input, SDAP further allows multiple images as inputs, i.e. all
information of a subject at either the same or different ages, to produce the
optimal aging path for the given subject. Finally, SDAP allows efficiently
synthesizing in-the-wild aging faces. The proposed model is experimented in
both tasks of face aging synthesis and cross-age face verification. The
experimental results consistently show SDAP achieves the state-of-the-art
performance on numerous face aging databases, i.e. FG-NET, MORPH, AginG Faces
in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). Furthermore, we
also evaluate the performance of SDAP on large-scale Megaface challenge to
demonstrate the advantages of the proposed solution
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Age estimation is an important yet very challenging problem in computer
vision. Existing methods for age estimation usually apply a divide-and-conquer
strategy to deal with heterogeneous data caused by the non-stationary aging
process. However, the facial aging process is also a continuous process, and
the continuity relationship between different components has not been
effectively exploited. In this paper, we propose BridgeNet for age estimation,
which aims to mine the continuous relation between age labels effectively. The
proposed BridgeNet consists of local regressors and gating networks. Local
regressors partition the data space into multiple overlapping subspaces to
tackle heterogeneous data and gating networks learn continuity aware weights
for the results of local regressors by employing the proposed bridge-tree
structure, which introduces bridge connections into tree models to enforce the
similarity between neighbor nodes. Moreover, these two components of BridgeNet
can be jointly learned in an end-to-end way. We show experimental results on
the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet
outperforms the state-of-the-art methods.Comment: CVPR 201
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
- …