9,903 research outputs found
Recognizing Families In the Wild: White Paper for the 4th Edition Data Challenge
Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track
automatic kinship recognition evaluation that supports various visual kin-based
problems on scales much higher than ever before. Organized in conjunction with
the 15th IEEE International Conference on Automatic Face and Gesture
Recognition (FG) as a Challenge, RFIW provides a platform for publishing
original work and the gathering of experts for a discussion of the next steps.
This paper summarizes the supported tasks (i.e., kinship verification,
tri-subject verification, and search & retrieval of missing children) in the
evaluation protocols, which include the practical motivation, technical
background, data splits, metrics, and benchmark results. Furthermore, top
submissions (i.e., leader-board stats) are listed and reviewed as a high-level
analysis on the state of the problem. In the end, the purpose of this paper is
to describe the 2020 RFIW challenge, end-to-end, along with forecasts in
promising future directions.Comment: White Paper for challenge in conjunction with 15th IEEE International
Conference on Automatic Face and Gesture Recognition (FG 2020
Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment
We address a problem of estimating pose of a person's head from its RGB
image. The employment of CNNs for the problem has contributed to significant
improvement in accuracy in recent works. However, we show that the following
two methods, despite their simplicity, can attain further improvement: (i)
proper adjustment of the margin of bounding box of a detected face, and (ii)
choice of loss functions. We show that the integration of these two methods
achieve the new state-of-the-art on standard benchmark datasets for in-the-wild
head pose estimation.Comment: IEEE International Conference on Automatic Face & Gesture Recognition
(FG2019
Emotion Recognition for In-the-wild Videos
This paper is a brief introduction to our submission to the seven basic
expression classification track of Affective Behavior Analysis in-the-wild
Competition held in conjunction with the IEEE International Conference on
Automatic Face and Gesture Recognition (FG) 2020. Our method combines Deep
Residual Network (ResNet) and Bidirectional Long Short-Term Memory Network
(BLSTM), achieving 64.3% accuracy and 43.4% final metric on the validation set
Clustering based Contrastive Learning for Improving Face Representations
A good clustering algorithm can discover natural groupings in data. These
groupings, if used wisely, provide a form of weak supervision for learning
representations. In this work, we present Clustering-based Contrastive Learning
(CCL), a new clustering-based representation learning approach that uses labels
obtained from clustering along with video constraints to learn discriminative
face features. We demonstrate our method on the challenging task of learning
representations for video face clustering. Through several ablation studies, we
analyze the impact of creating pair-wise positive and negative labels from
different sources. Experiments on three challenging video face clustering
datasets: BBT-0101, BF-0502, and ACCIO show that CCL achieves a new
state-of-the-art on all datasets.Comment: To appear at IEEE International Conference on Automatic Face and
Gesture Recognition (FG), 202
Attributes in Multiple Facial Images
Facial attribute recognition is conventionally computed from a single image.
In practice, each subject may have multiple face images. Taking the eye size as
an example, it should not change, but it may have different estimation in
multiple images, which would make a negative impact on face recognition. Thus,
how to compute these attributes corresponding to each subject rather than each
single image is a profound work. To address this question, we deploy deep
training for facial attributes prediction, and we explore the inconsistency
issue among the attributes computed from each single image. Then, we develop
two approaches to address the inconsistency issue. Experimental results show
that the proposed methods can handle facial attribute estimation on either
multiple still images or video frames, and can correct the incorrectly
annotated labels. The experiments are conducted on two large public databases
with annotations of facial attributes.Comment: Accepted by 2018 13th IEEE International Conference on Automatic Face
& Gesture Recognition (FG 2018 Spotlight
Segmentation Guided Image-to-Image Translation with Adversarial Networks
Recently image-to-image translation has received increasing attention, which
aims to map images in one domain to another specific one. Existing methods
mainly solve this task via a deep generative model, and focus on exploring the
relationship between different domains. However, these methods neglect to
utilize higher-level and instance-specific information to guide the training
process, leading to a great deal of unrealistic generated images of low
quality. Existing methods also lack of spatial controllability during
translation. To address these challenge, we propose a novel Segmentation Guided
Generative Adversarial Networks (SGGAN), which leverages semantic segmentation
to further boost the generation performance and provide spatial mapping. In
particular, a segmentor network is designed to impose semantic information on
the generated images. Experimental results on multi-domain face image
translation task empirically demonstrate our ability of the spatial
modification and our superiority in image quality over several state-of-the-art
methods.Comment: Accepted for publication in 2019 14th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2019
Head2Head: Video-based Neural Head Synthesis
In this paper, we propose a novel machine learning architecture for facial
reenactment. In particular, contrary to the model-based approaches or recent
frame-based methods that use Deep Convolutional Neural Networks (DCNNs) to
generate individual frames, we propose a novel method that (a) exploits the
special structure of facial motion (paying particular attention to mouth
motion) and (b) enforces temporal consistency. We demonstrate that the proposed
method can transfer facial expressions, pose and gaze of a source actor to a
target video in a photo-realistic fashion more accurately than state-of-the-art
methods.Comment: To be published in 15th IEEE International Conference on Automatic
Face and Gesture Recognition (FG 2020
Real-time Facial Expression Recognition "In The Wild'' by Disentangling 3D Expression from Identity
Human emotions analysis has been the focus of many studies, especially in the
field of Affective Computing, and is important for many applications, e.g.
human-computer intelligent interaction, stress analysis, interactive games,
animations, etc. Solutions for automatic emotion analysis have also benefited
from the development of deep learning approaches and the availability of vast
amount of visual facial data on the internet. This paper proposes a novel
method for human emotion recognition from a single RGB image. We construct a
large-scale dataset of facial videos (\textbf{FaceVid}), rich in facial
dynamics, identities, expressions, appearance and 3D pose variations. We use
this dataset to train a deep Convolutional Neural Network for estimating
expression parameters of a 3D Morphable Model and combine it with an effective
back-end emotion classifier. Our proposed framework runs at 50 frames per
second and is capable of robustly estimating parameters of 3D expression
variation and accurately recognizing facial expressions from in-the-wild
images. We present extensive experimental evaluation that shows that the
proposed method outperforms the compared techniques in estimating the 3D
expression parameters and achieves state-of-the-art performance in recognising
the basic emotions from facial images, as well as recognising stress from
facial videos. %compared to the current state of the art in emotion recognition
from facial images.Comment: to be published in 15th IEEE International Conference on Automatic
Face and Gesture Recognition (FG 2020
On the effect of age perception biases for real age regression
Automatic age estimation from facial images represents an important task in
computer vision. This paper analyses the effect of gender, age, ethnic, makeup
and expression attributes of faces as sources of bias to improve deep apparent
age prediction. Following recent works where it is shown that apparent age
labels benefit real age estimation, rather than direct real to real age
regression, our main contribution is the integration, in an end-to-end
architecture, of face attributes for apparent age prediction with an additional
loss for real age regression. Experimental results on the APPA-REAL dataset
indicate the proposed network successfully take advantage of the adopted
attributes to improve both apparent and real age estimation. Our model
outperformed a state-of-the-art architecture proposed to separately address
apparent and real age regression. Finally, we present preliminary results and
discussion of a proof of concept application using the proposed model to
regress the apparent age of an individual based on the gender of an external
observer.Comment: Accepted in the 14th IEEE International Conference on Automatic Face
and Gesture Recognition (FG 2019
First Investigation Into the Use of Deep Learning for Continuous Assessment of Neonatal Postoperative Pain
This paper presents the first investigation into the use of fully automated
deep learning framework for assessing neonatal postoperative pain. It
specifically investigates the use of Bilinear Convolutional Neural Network
(B-CNN) to extract facial features during different levels of postoperative
pain followed by modeling the temporal pattern using Recurrent Neural Network
(RNN). Although acute and postoperative pain have some common characteristics
(e.g., visual action units), postoperative pain has a different dynamic, and it
evolves in a unique pattern over time. Our experimental results indicate a
clear difference between the pattern of acute and postoperative pain. They also
suggest the efficiency of using a combination of bilinear CNN with RNN model
for the continuous assessment of postoperative pain intensity.Comment: Accepted in the 15th IEEE International Conference on Automatic Face
and Gesture Recognition (FG 2020
- …