37,381 research outputs found
DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation
Dense 3D facial motion capture from only monocular in-the-wild pairs of RGB
images is a highly challenging problem with numerous applications, ranging from
facial expression recognition to facial reenactment. In this work, we propose
DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense
estimation of 3D non-rigid facial flow between pairs of monocular images. Our
DeepFaceFlow framework was trained and tested on two very large-scale facial
video datasets, one of them of our own collection and annotation, with the aid
of occlusion-aware and 3D-based loss function. We conduct comprehensive
experiments probing different aspects of our approach and demonstrating its
improved performance against state-of-the-art flow and 3D reconstruction
methods. Furthermore, we incorporate our framework in a full-head
state-of-the-art facial video synthesis method and demonstrate the ability of
our method in better representing and capturing the facial dynamics, resulting
in a highly-realistic facial video synthesis. Given registered pairs of images,
our framework generates 3D flow maps at ~60 fps.Comment: to be published in the IEEE conference on Computer Vision and Pattern
Recognition (CVPR). 202
Facial Expressions Tracking and Recognition: Database Protocols for Systems Validation and Evaluation
Each human face is unique. It has its own shape, topology, and distinguishing
features. As such, developing and testing facial tracking systems are
challenging tasks. The existing face recognition and tracking algorithms in
Computer Vision mainly specify concrete situations according to particular
goals and applications, requiring validation methodologies with data that fits
their purposes. However, a database that covers all possible variations of
external and factors does not exist, increasing researchers' work in acquiring
their own data or compiling groups of databases.
To address this shortcoming, we propose a methodology for facial data
acquisition through definition of fundamental variables, such as subject
characteristics, acquisition hardware, and performance parameters. Following
this methodology, we also propose two protocols that allow the capturing of
facial behaviors under uncontrolled and real-life situations. As validation, we
executed both protocols which lead to creation of two sample databases: FdMiee
(Facial database with Multi input, expressions, and environments) and FACIA
(Facial Multimodal database driven by emotional induced acting).
Using different types of hardware, FdMiee captures facial information under
environmental and facial behaviors variations. FACIA is an extension of FdMiee
introducing a pipeline to acquire additional facial behaviors and speech using
an emotion-acting method. Therefore, this work eases the creation of adaptable
database according to algorithm's requirements and applications, leading to
simplified validation and testing processes.Comment: 10 pages, 6 images, Computers & Graphic
Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
Occlusion and pose variations, which can change facial appearance
significantly, are two major obstacles for automatic Facial Expression
Recognition (FER). Though automatic FER has made substantial progresses in the
past few decades, occlusion-robust and pose-invariant issues of FER have
received relatively less attention, especially in real-world scenarios. This
paper addresses the real-world pose and occlusion robust FER problem with
three-fold contributions. First, to stimulate the research of FER under
real-world occlusions and variant poses, we build several in-the-wild facial
expression datasets with manual annotations for the community. Second, we
propose a novel Region Attention Network (RAN), to adaptively capture the
importance of facial regions for occlusion and pose variant FER. The RAN
aggregates and embeds varied number of region features produced by a backbone
convolutional neural network into a compact fixed-length representation. Last,
inspired by the fact that facial expressions are mainly defined by facial
action units, we propose a region biased loss to encourage high attention
weights for the most important regions. We validate our RAN and region biased
loss on both our built test datasets and four popular datasets: FERPlus,
AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region
biased loss largely improve the performance of FER with occlusion and variant
pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet,
RAF-DB, and SFEW. Code and the collected test data will be publicly available.Comment: The test set and the code of this paper will be available at
https://github.com/kaiwang960112/Challenge-condition-FER-datase
Face Expression Recognition and Analysis: The State of the Art
The automatic recognition of facial expressions has been an active research
topic since the early nineties. There have been several advances in the past
few years in terms of face detection and tracking, feature extraction
mechanisms and the techniques used for expression classification. This paper
surveys some of the published work since 2001 till date. The paper presents a
time-line view of the advances made in this field, the applications of
automatic face expression recognizers, the characteristics of an ideal system,
the databases that have been used and the advances made in terms of their
standardization and a detailed summary of the state of the art. The paper also
discusses facial parameterization using FACS Action Units (AUs) and MPEG-4
Facial Animation Parameters (FAPs) and the recent advances in face detection,
tracking and feature extraction methods. Notes have also been presented on
emotions, expressions and facial features, discussion on the six prototypic
expressions and the recent studies on expression classifiers. The paper ends
with a note on the challenges and the future work. This paper has been written
in a tutorial style with the intention of helping students and researchers who
are new to this field
Towards Fine-grained Human Pose Transfer with Detail Replenishing Network
Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.Comment: IEEE TIP submissio
Kernel Projection of Latent Structures Regression for Facial Animation Retargeting
Inspired by kernel methods that have been used extensively in achieving
efficient facial animation retargeting, this paper presents a solution to
retargeting facial animation in virtual character's face model based on the
kernel projection of latent structure (KPLS) regression between semantically
similar facial expressions. Specifically, a given number of corresponding
semantically similar facial expressions are projected into the latent space. By
using the Nonlinear Iterative Partial Least Square method, decomposition of the
latent variables is achieved. Finally, the KPLS is achieved by solving a
kernalized version of the eigenvalue problem. By evaluating our methodology
with other kernel-based solutions, the efficiency of the presented methodology
in transferring facial animation to face models with different morphological
variations is demonstrated
M2FPA: A Multi-Yaw Multi-Pitch High-Quality Database and Benchmark for Facial Pose Analysis
Facial images in surveillance or mobile scenarios often have large view-point
variations in terms of pitch and yaw angles. These jointly occurred angle
variations make face recognition challenging. Current public face databases
mainly consider the case of yaw variations. In this paper, a new large-scale
Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose
Analysis (M2FPA), including face frontalization, face rotation, facial pose
estimation and pose-invariant face recognition. It contains 397,544 images of
229 subjects with yaw, pitch, attribute, illumination and accessory. M2FPA is
the most comprehensive multi-view face database for facial pose analysis.
Further, we provide an effective benchmark for face frontalization and
pose-invariant face recognition on M2FPA with several state-of-the-art methods,
including DR-GAN, TP-GAN and CAPG-GAN. We believe that the new database and
benchmark can significantly push forward the advance of facial pose analysis in
real-world applications. Moreover, a simple yet effective parsing guided
discriminator is introduced to capture the local consistency during GAN
optimization. Extensive quantitative and qualitative results on M2FPA and
Multi-PIE demonstrate the superiority of our face frontalization method.
Baseline results for both face synthesis and face recognition from
state-of-theart methods demonstrate the challenge offered by this new database.Comment: Accepted for publication at ICCV2019; The M2FPA dataset is available
at https://pp2li.github.io/M2FPA-dataset
Facial Descriptors for Human Interaction Recognition In Still Images
This paper presents a novel approach in a rarely studied area of computer
vision: Human interaction recognition in still images. We explore whether the
facial regions and their spatial configurations contribute to the recognition
of interactions. In this respect, our method involves extraction of several
visual features from the facial regions, as well as incorporation of scene
characteristics and deep features to the recognition. Extracted multiple
features are utilized within a discriminative learning framework for
recognizing interactions between people. Our designed facial descriptors are
based on the observation that relative positions, size and locations of the
faces are likely to be important for characterizing human interactions. Since
there is no available dataset in this relatively new domain, a comprehensive
new dataset which includes several images of human interactions is collected.
Our experimental results show that faces and scene characteristics contain
important information to recognize interactions between people
Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem that aims to
generate a high-resolution (HR) face image from a low-resolution~(LR) input. In
contrast to the existing patch-wise super-resolution models that divide a face
image into regular patches and independently apply LR to HR mapping to each
patch, we implement deep reinforcement learning and develop a novel
attention-aware face hallucination (Attention-FH) framework, which recurrently
learns to attend a sequence of patches and performs facial part enhancement by
fully exploiting the global interdependency of the image. Specifically, our
proposed framework incorporates two components: a recurrent policy network for
dynamically specifying a new attended region at each time step based on the
status of the super-resolved image and the past attended region sequence, and a
local enhancement network for selected patch hallucination and global state
updating. The Attention-FH model jointly learns the recurrent policy network
and local enhancement network through maximizing a long-term reward that
reflects the hallucination result with respect to the whole HR image. Extensive
experiments demonstrate that our Attention-FH significantly outperforms the
state-of-the-art methods on in-the-wild face images with large pose and
illumination variations.Comment: To be published in TPAM
- …