96,398 research outputs found
Cross-Age LFW: A Database for Studying Cross-Age Face Recognition in Unconstrained Environments
Labeled Faces in the Wild (LFW) database has been widely utilized as the
benchmark of unconstrained face verification and due to big data driven machine
learning methods, the performance on the database approaches nearly 100%.
However, we argue that this accuracy may be too optimistic because of some
limiting factors. Besides different poses, illuminations, occlusions and
expressions, cross-age face is another challenge in face recognition. Different
ages of the same person result in large intra-class variations and aging
process is unavoidable in real world face verification. However, LFW does not
pay much attention on it. Thereby we construct a Cross-Age LFW (CALFW) which
deliberately searches and selects 3,000 positive face pairs with age gaps to
add aging process intra-class variance. Negative pairs with same gender and
race are also selected to reduce the influence of attribute difference between
positive/negative pairs and achieve face verification instead of attributes
classification. We evaluate several metric learning and deep learning methods
on the new database. Compared to the accuracy on LFW, the accuracy drops about
10%-17% on CALFW.Comment: 10 pages, 9 figure
Multi-Expert Gender Classification on Age Group by Integrating Deep Neural Networks
Generally, facial age variations affect gender classification accuracy
significantly, because facial shape and skin texture change as they grow old.
This requires re-examination on the gender classification system to consider
facial age information. In this paper, we propose Multi-expert Gender
Classification on Age Group (MGA), an end-to-end multi-task learning schemes of
age estimation and gender classification. First, two types of deep neural
networks are utilized; Convolutional Appearance Network (CAN) for facial
appearance feature and Deep Geometry Network (DGN) for facial geometric
feature. Then, CAN and DGN are integrated by the proposed model integration
strategy and fine-tuned in order to improve age and gender classification
accuracy. The facial images are categorized into one of three age groups
(young, adult and elder group) based on their estimated age, and the system
makes a gender prediction according to average fusion strategy of three gender
classification experts, which are trained to fit gender characteristics of each
age group. Rigorous experimental results conducted on the challenging databases
suggest that the proposed MGA outperforms several state-of-art researches with
smaller computational cost.Comment: 12 page
Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition
Automatic understanding of human affect using visual signals is a problem
that has attracted significant interest over the past 20 years. However, human
emotional states are quite complex. To appraise such states displayed in
real-world settings, we need expressive emotional descriptors that are capable
of capturing and describing this complexity. The circumplex model of affect,
which is described in terms of valence (i.e., how positive or negative is an
emotion) and arousal (i.e., power of the activation of the emotion), can be
used for this purpose. Recent progress in the emotion recognition domain has
been achieved through the development of deep neural architectures and the
availability of very large training databases. To this end, Aff-Wild has been
the first large-scale "in-the-wild" database, containing around 1,200,000
frames. In this paper, we build upon this database, extending it with 260 more
subjects and 1,413,000 new video frames. We call the union of Aff-Wild with the
additional data, Aff-Wild2. The videos are downloaded from Youtube and have
large variations in pose, age, illumination conditions, ethnicity and
profession. Both database-specific as well as cross-database experiments are
performed in this paper, by utilizing the Aff-Wild2, along with the RECOLA
database. The developed deep neural architectures are based on the joint
training of state-of-the-art convolutional and recurrent neural networks with
attention mechanism; thus exploiting both the invariant properties of
convolutional features, while modeling temporal dynamics that arise in human
behaviour via the recurrent layers. The obtained results show premise for
utilization of the extended Aff-Wild, as well as of the developed deep neural
architectures for visual analysis of human behaviour in terms of continuous
emotion dimensions
Probabilistic Attribute Tree in Convolutional Neural Networks for Facial Expression Recognition
In this paper, we proposed a novel Probabilistic Attribute Tree-CNN (PAT-CNN)
to explicitly deal with the large intra-class variations caused by
identity-related attributes, e.g., age, race, and gender. Specifically, a novel
PAT module with an associated PAT loss was proposed to learn features in a
hierarchical tree structure organized according to attributes, where the final
features are less affected by the attributes. Then, expression-related features
are extracted from leaf nodes. Samples are probabilistically assigned to tree
nodes at different levels such that expression-related features can be learned
from all samples weighted by probabilities. We further proposed a
semi-supervised strategy to learn the PAT-CNN from limited attribute-annotated
samples to make the best use of available data. Experimental results on five
facial expression datasets have demonstrated that the proposed PAT-CNN
outperforms the baseline models by explicitly modeling attributes. More
impressively, the PAT-CNN using a single model achieves the best performance
for faces in the wild on the SFEW dataset, compared with the state-of-the-art
methods using an ensemble of hundreds of CNNs.Comment: 10 page
Deep Learning for Face Recognition: Pride or Prejudiced?
Do very high accuracies of deep networks suggest pride of effective AI or are
deep networks prejudiced? Do they suffer from in-group biases (own-race-bias
and own-age-bias), and mimic the human behavior? Is in-group specific
information being encoded sub-consciously by the deep networks?
This research attempts to answer these questions and presents an in-depth
analysis of `bias' in deep learning based face recognition systems. This is the
first work which decodes if and where bias is encoded for face recognition.
Taking cues from cognitive studies, we inspect if deep networks are also
affected by social in- and out-group effect. Networks are analyzed for own-race
and own-age bias, both of which have been well established in human beings. The
sub-conscious behavior of face recognition models is examined to understand if
they encode race or age specific features for face recognition. Analysis is
performed based on 36 experiments conducted on multiple datasets. Four deep
learning networks either trained from scratch or pre-trained on over 10M images
are used. Variations across class activation maps and feature visualizations
provide novel insights into the functioning of deep learning systems,
suggesting behavior similar to humans. It is our belief that a better
understanding of state-of-the-art deep learning networks would enable
researchers to address the given challenge of bias in AI, and develop fairer
systems
Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!
Autoencoders are unsupervised deep learning models used for learning
representations. In literature, autoencoders have shown to perform well on a
variety of tasks spread across multiple domains, thereby establishing
widespread applicability. Typically, an autoencoder is trained to generate a
model that minimizes the reconstruction error between the input and the
reconstructed output, computed in terms of the Euclidean distance. While this
can be useful for applications related to unsupervised reconstruction, it may
not be optimal for classification. In this paper, we propose a novel Supervised
COSMOS Autoencoder which utilizes a multi-objective loss function to learn
representations that simultaneously encode the (i) "similarity" between the
input and reconstructed vectors in terms of their direction, (ii)
"distribution" of pixel values of the reconstruction with respect to the input
sample, while also incorporating (iii) "discriminability" in the feature
learning pipeline. The proposed autoencoder model incorporates a Cosine
similarity and Mahalanobis distance based loss function, along with supervision
via Mutual Information based loss. Detailed analysis of each component of the
proposed model motivates its applicability for feature learning in different
classification tasks. The efficacy of Supervised COSMOS autoencoder is
demonstrated via extensive experimental evaluations on different image
datasets. The proposed model outperforms existing algorithms on MNIST,
CIFAR-10, and SVHN databases. It also yields state-of-the-art results on
CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face
recognition, respectively
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
A Survey of Deep Facial Attribute Analysis
Facial attribute analysis has received considerable attention when deep
learning techniques made remarkable breakthroughs in this field over the past
few years. Deep learning based facial attribute analysis consists of two basic
sub-issues: facial attribute estimation (FAE), which recognizes whether facial
attributes are present in given images, and facial attribute manipulation
(FAM), which synthesizes or removes desired facial attributes. In this paper,
we provide a comprehensive survey of deep facial attribute analysis from the
perspectives of both estimation and manipulation. First, we summarize a general
pipeline that deep facial attribute analysis follows, which comprises two
stages: data preprocessing and model construction. Additionally, we introduce
the underlying theories of this two-stage pipeline for both FAE and FAM.
Second, the datasets and performance metrics commonly used in facial attribute
analysis are presented. Third, we create a taxonomy of state-of-the-art methods
and review deep FAE and FAM algorithms in detail. Furthermore, several
additional facial attribute related issues are introduced, as well as relevant
real-world applications. Finally, we discuss possible challenges and promising
future research directions.Comment: submitted to International Journal of Computer Vision (IJCV
Physical Attribute Prediction Using Deep Residual Neural Networks
Images taken from the Internet have been used alongside Deep Learning for
many different tasks such as: smile detection, ethnicity, hair style, hair
colour, gender and age prediction. After witnessing these usages, we were
wondering what other attributes can be predicted from facial images available
on the Internet. In this paper we tackle the prediction of physical attributes
from face images using Convolutional Neural Networks trained on our dataset
named FIRW. We crawled around 61, 000 images from the web, then use face
detection to crop faces from these real world images. We choose ResNet-50 as
our base network architecture. This network was pretrained for the task of face
recognition by using the VGG-Face dataset, and we finetune it by using our own
dataset to predict physical attributes. Separate networks are trained for the
prediction of body type, ethnicity, gender, height and weight; our models
achieve the following accuracies for theses tasks, respectively: 84.58%,
87.34%, 97.97%, 70.51%, 63.99%. To validate our choice of ResNet-50 as the base
architecture, we also tackle the famous CelebA dataset. Our models achieve an
averagy accuracy of 91.19% on CelebA, which is comparable to state-of-the-art
approaches
EmotioNet Challenge: Recognition of facial expressions of emotion in the wild
This paper details the methodology and results of the EmotioNet challenge.
This challenge is the first to test the ability of computer vision algorithms
in the automatic analysis of a large number of images of facial expressions of
emotion in the wild. The challenge was divided into two tracks. The first track
tested the ability of current computer vision algorithms in the automatic
detection of action units (AUs). Specifically, we tested the detection of 11
AUs. The second track tested the algorithms' ability to recognize emotion
categories in images of facial expressions. Specifically, we tested the
recognition of 16 basic and compound emotion categories. The results of the
challenge suggest that current computer vision and machine learning algorithms
are unable to reliably solve these two tasks. The limitations of current
algorithms are more apparent when trying to recognize emotion. We also show
that current algorithms are not affected by mild resolution changes, small
occluders, gender or age, but that 3D pose is a major limiting factor on
performance. We provide an in-depth discussion of the points that need special
attention moving forward
- …