2,136 research outputs found
Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person
The performance of still-to-video FR systems can decline significantly
because faces captured in unconstrained operational domain (OD) over multiple
video cameras have a different underlying data distribution compared to faces
captured under controlled conditions in the enrollment domain (ED) with a still
camera. This is particularly true when individuals are enrolled to the system
using a single reference still. To improve the robustness of these systems, it
is possible to augment the reference set by generating synthetic faces based on
the original still. However, without knowledge of the OD, many synthetic images
must be generated to account for all possible capture conditions. FR systems
may, therefore, require complex implementations and yield lower accuracy when
training on many less relevant images. This paper introduces an algorithm for
domain-specific face synthesis (DSFS) that exploits the representative
intra-class variation information available from the OD. Prior to operation, a
compact set of faces from unknown persons appearing in the OD is selected
through clustering in the captured condition space. The domain-specific
variations of these face images are projected onto the reference stills by
integrating an image-based face relighting technique inside the 3D
reconstruction framework. A compact set of synthetic faces is generated that
resemble individuals of interest under the capture conditions relevant to the
OD. In a particular implementation based on sparse representation
classification, the synthetic faces generated with the DSFS are employed to
form a cross-domain dictionary that account for structured sparsity.
Experimental results reveal that augmenting the reference gallery set of FR
systems using the proposed DSFS approach can provide a higher level of accuracy
compared to state-of-the-art approaches, with only a moderate increase in its
computational complexity
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
3D Human Face Reconstruction and 2D Appearance Synthesis
3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store.
In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs.
In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption.
As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach.
In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses.
We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results.
We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results
Vec2Face-v2: Unveil Human Faces from their Blackbox Features via Attention-based Network in Face Recognition
In this work, we investigate the problem of face reconstruction given a
facial feature representation extracted from a blackbox face recognition
engine. Indeed, it is a very challenging problem in practice due to the
limitations of abstracted information from the engine. We, therefore, introduce
a new method named Attention-based Bijective Generative Adversarial Networks in
a Distillation framework (DAB-GAN) to synthesize the faces of a subject given
his/her extracted face recognition features. Given any unconstrained unseen
facial features of a subject, the DAB-GAN can reconstruct his/her facial images
in high definition. The DAB-GAN method includes a novel attention-based
generative structure with the newly defined Bijective Metrics Learning
approach. The framework starts by introducing a bijective metric so that the
distance measurement and metric learning process can be directly adopted in the
image domain for an image reconstruction task. The information from the
blackbox face recognition engine will be optimally exploited using the global
distillation process. Then an attention-based generator is presented for a
highly robust generator to synthesize realistic faces with ID preservation. We
have evaluated our method on the challenging face recognition databases, i.e.,
CelebA, LFW, CFP-FP, CP-LFW, AgeDB, CA-LFW, and consistently achieved
state-of-the-art results. The advancement of DAB-GAN is also proven in both
image realism and ID preservation properties.Comment: arXiv admin note: substantial text overlap with arXiv:2003.0695
- …