14,302 research outputs found
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
Attended End-to-end Architecture for Age Estimation from Facial Expression Videos
The main challenges of age estimation from facial expression videos lie not
only in the modeling of the static facial appearance, but also in the capturing
of the temporal facial dynamics. Traditional techniques to this problem focus
on constructing handcrafted features to explore the discriminative information
contained in facial appearance and dynamics separately. This relies on
sophisticated feature-refinement and framework-design. In this paper, we
present an end-to-end architecture for age estimation, called Spatially-Indexed
Attention Model (SIAM), which is able to simultaneously learn both the
appearance and dynamics of age from raw videos of facial expressions.
Specifically, we employ convolutional neural networks to extract effective
latent appearance representations and feed them into recurrent networks to
model the temporal dynamics. More importantly, we propose to leverage attention
models for salience detection in both the spatial domain for each single image
and the temporal domain for the whole video as well. We design a specific
spatially-indexed attention mechanism among the convolutional layers to extract
the salient facial regions in each individual image, and a temporal attention
layer to assign attention weights to each frame. This two-pronged approach not
only improves the performance by allowing the model to focus on informative
frames and facial areas, but it also offers an interpretable correspondence
between the spatial facial regions as well as temporal frames, and the task of
age estimation. We demonstrate the strong performance of our model in
experiments on a large, gender-balanced database with 400 subjects with ages
spanning from 8 to 76 years. Experiments reveal that our model exhibits
significant superiority over the state-of-the-art methods given sufficient
training data.Comment: Accepted by Transactions on Image Processing (TIP
Inferring Dynamic Representations of Facial Actions from a Still Image
Facial actions are spatio-temporal signals by nature, and therefore their
modeling is crucially dependent on the availability of temporal information. In
this paper, we focus on inferring such temporal dynamics of facial actions when
no explicit temporal information is available, i.e. from still images. We
present a novel approach to capture multiple scales of such temporal dynamics,
with an application to facial Action Unit (AU) intensity estimation and
dimensional affect estimation. In particular, 1) we propose a framework that
infers a dynamic representation (DR) from a still image, which captures the
bi-directional flow of time within a short time-window centered at the input
image; 2) we show that we can train our method without the need of explicitly
generating target representations, allowing the network to represent dynamics
more broadly; and 3) we propose to apply a multiple temporal scale approach
that infers DRs for different window lengths (MDR) from a still image. We
empirically validate the value of our approach on the task of frame ranking,
and show how our proposed MDR attains state of the art results on BP4D for AU
intensity estimation and on SEMAINE for dimensional affect estimation, using
only still images at test time.Comment: 10 pages, 5 figure
Ordinal Distribution Regression for Gait-based Age Estimation
Computer vision researchers prefer to estimate age from face images because
facial features provide useful information. However, estimating age from face
images becomes challenging when people are distant from the camera or occluded.
A person's gait is a unique biometric feature that can be perceived efficiently
even at a distance. Thus, gait can be used to predict age when face images are
not available. However, existing gait-based classification or regression
methods ignore the ordinal relationship of different ages, which is an
important clue for age estimation. This paper proposes an ordinal distribution
regression with a global and local convolutional neural network for gait-based
age estimation. Specifically, we decompose gait-based age regression into a
series of binary classifications to incorporate the ordinal age information.
Then, an ordinal distribution loss is proposed to consider the inner
relationships among these classifications by penalizing the distribution
discrepancy between the estimated value and the ground truth. In addition, our
neural network comprises a global and three local sub-networks, and thus, is
capable of learning the global structure and local details from the head, body,
and feet. Experimental results indicate that the proposed approach outperforms
state-of-the-art gait-based age estimation methods on the OULP-Age dataset.Comment: Accepted by the journal of "SCIENCE CHINA Information Sciences
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Age estimation is an important yet very challenging problem in computer
vision. Existing methods for age estimation usually apply a divide-and-conquer
strategy to deal with heterogeneous data caused by the non-stationary aging
process. However, the facial aging process is also a continuous process, and
the continuity relationship between different components has not been
effectively exploited. In this paper, we propose BridgeNet for age estimation,
which aims to mine the continuous relation between age labels effectively. The
proposed BridgeNet consists of local regressors and gating networks. Local
regressors partition the data space into multiple overlapping subspaces to
tackle heterogeneous data and gating networks learn continuity aware weights
for the results of local regressors by employing the proposed bridge-tree
structure, which introduces bridge connections into tree models to enforce the
similarity between neighbor nodes. Moreover, these two components of BridgeNet
can be jointly learned in an end-to-end way. We show experimental results on
the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet
outperforms the state-of-the-art methods.Comment: CVPR 201
RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation
We address the challenging problem of RGB image-based head pose estimation.
We first reformulate head pose representation learning to constrain it to a
bounded space. Head pose represented as vector projection or vector angles
shows helpful to improving performance. Further, a ranking loss combined with
MSE regression loss is proposed. The ranking loss supervises a neural network
with paired samples of the same person and penalises incorrect ordering of pose
prediction. Analysis on this new loss function suggests it contributes to a
better local feature extractor, where features are generalised to Abstract
Landmarks which are pose-related features instead of pose-irrelevant
information such as identity, age, and lighting. Extensive experiments show
that our method significantly outperforms the current state-of-the-art schemes
on public datasets: AFLW2000 and BIWI. Our model achieves significant
improvements over previous SOTA MAE on AFLW2000 and BIWI from 4.50 to 3.66 and
from 4.0 to 3.71 respectively. Source code will be made available at:
https://github.com/seathiefwang/RankHeadPose
C3AE: Exploring the Limits of Compact Model for Age Estimation
Age estimation is a classic learning problem in computer vision. Many larger
and deeper CNNs have been proposed with promising performance, such as AlexNet,
VggNet, GoogLeNet and ResNet. However, these models are not practical for the
embedded/mobile devices. Recently, MobileNets and ShuffleNets have been
proposed to reduce the number of parameters, yielding lightweight models.
However, their representation has been weakened because of the adoption of
depth-wise separable convolution. In this work, we investigate the limits of
compact model for small-scale image and propose an extremely Compact yet
efficient Cascade Context-based Age Estimation model(C3AE). This model
possesses only 1/9 and 1/2000 parameters compared with MobileNets/ShuffleNets
and VggNet, while achieves competitive performance. In particular, we re-define
age estimation problem by two-points representation, which is implemented by a
cascade model. Moreover, to fully utilize the facial context information,
multi-branch CNN network is proposed to aggregate multi-scale context.
Experiments are carried out on three age estimation datasets. The
state-of-the-art performance on compact model has been achieved with a
relatively large margin.Comment: accepted by cvpr201
A Coupled Evolutionary Network for Age Estimation
Age estimation of unknown persons is a challenging pattern analysis task due
to the lacking of training data and various aging mechanisms for different
people. Label distribution learning-based methods usually make distribution
assumptions to simplify age estimation. However, age label distributions are
often complex and difficult to be modeled in a parameter way. Inspired by the
biological evolutionary mechanism, we propose a Coupled Evolutionary Network
(CEN) with two concurrent evolutionary processes: evolutionary label
distribution learning and evolutionary slack regression. Evolutionary network
learns and refines age label distributions in an iteratively learning way.
Evolutionary label distribution learning adaptively learns and constantly
refines the age label distributions without making strong assumptions on the
distribution patterns. To further utilize the ordered and continuous
information of age labels, we accordingly propose an evolutionary slack
regression to convert the discrete age label regression into the continuous age
interval regression. Experimental results on Morph, ChaLearn15 and
MegaAge-Asian datasets show the superiority of our method
TRk-CNN: Transferable Ranking-CNN for image classification of glaucoma, glaucoma suspect, and normal eyes
In this paper, we proposed Transferable Ranking Convolutional Neural Network
(TRk-CNN) that can be effectively applied when the classes of images to be
classified show a high correlation with each other. The multi-class
classification method based on the softmax function, which is generally used,
is not effective in this case because the inter-class relationship is ignored.
Although there is a Ranking-CNN that takes into account the ordinal classes, it
cannot reflect the inter-class relationship to the final prediction. TRk-CNN,
on the other hand, combines the weights of the primitive classification model
to reflect the inter-class information to the final classification phase. We
evaluated TRk-CNN in glaucoma image dataset that was labeled into three
classes: normal, glaucoma suspect, and glaucoma eyes. Based on the literature
we surveyed, this study is the first to classify three status of glaucoma
fundus image dataset into three different classes. We compared the evaluation
results of TRk-CNN with Ranking-CNN (Rk-CNN) and multi-class CNN (MC-CNN) using
the DenseNet as the backbone CNN model. As a result, TRk-CNN achieved an
average accuracy of 92.96%, specificity of 93.33%, sensitivity for glaucoma
suspect of 95.12% and sensitivity for glaucoma of 93.98%. Based on average
accuracy, TRk-CNN is 8.04% and 9.54% higher than Rk-CNN and MC-CNN and
surprisingly 26.83% higher for sensitivity for suspicious than multi-class CNN.
Our TRk-CNN is expected to be effectively applied to the medical image
classification problem where the disease state is continuous and increases in
the positive class direction.Comment: 49 pages, 12 figure
UVA: A Universal Variational Framework for Continuous Age Analysis
Conventional methods for facial age analysis tend to utilize accurate age
labels in a supervised way. However, existing age datasets lies in a limited
range of ages, leading to a long-tailed distribution. To alleviate the problem,
this paper proposes a Universal Variational Aging (UVA) framework to formulate
facial age priors in a disentangling manner. Benefiting from the variational
evidence lower bound, the facial images are encoded and disentangled into an
age-irrelevant distribution and an age-related distribution in the latent
space. A conditional introspective adversarial learning mechanism is introduced
to boost the image quality. In this way, when manipulating the age-related
distribution, UVA can achieve age translation with arbitrary ages. Further, by
sampling noise from the age-irrelevant distribution, we can generate
photorealistic facial images with a specific age. Moreover, given an input face
image, the mean value of age-related distribution can be treated as an age
estimator. These indicate that UVA can efficiently and accurately estimate the
age-related distribution by a disentangling manner, even if the training
dataset performs a long-tailed age distribution. UVA is the first attempt to
achieve facial age analysis tasks, including age translation, age generation
and age estimation, in a universal framework. The qualitative and quantitative
experiments demonstrate the superiority of UVA on five popular datasets,
including CACD2000, Morph, UTKFace, MegaAge-Asian and FG-NET
- …