6 research outputs found
DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization
Facial action unit recognition has many applications from market research to
psychotherapy and from image captioning to entertainment. Despite its recent
progress, deployment of these models has been impeded due to their limited
generalization to unseen people and demographics. This work conducts an
in-depth analysis of performance across several dimensions: individuals(40
subjects), genders (male and female), skin types (darker and lighter), and
databases (BP4D and DISFA). To help suppress the variance in data, we use the
notion of self-supervised denoising autoencoders to design a method for deep
face normalization(DeepFN) that transfers facial expressions of different
people onto a common facial template which is then used to train and evaluate
facial action recognition models. We show that person-independent models yield
significantly lower performance (55% average F1 and accuracy across 40
subjects) than person-dependent models (60.3%), leading to a generalization gap
of 5.3%. However, normalizing the data with the newly introduced DeepFN
significantly increased the performance of person-independent models (59.6%),
effectively reducing the gap. Similarly, we observed generalization gaps when
considering gender (2.4%), skin type (5.3%), and dataset (9.4%), which were
significantly reduced with the use of DeepFN. These findings represent an
important step towards the creation of more generalizable facial action unit
recognition systems
VIDIT: Virtual Image Dataset for Illumination Transfer
Deep image relighting is gaining more interest lately, as it allows photo
enhancement through illumination-specific retouching without human effort.
Aside from aesthetic enhancement and photo montage, image relighting is
valuable for domain adaptation, whether to augment datasets for training or to
normalize input test data. Accurate relighting is, however, very challenging
for various reasons, such as the difficulty in removing and recasting shadows
and the modeling of different surfaces. We present a novel dataset, the Virtual
Image Dataset for Illumination Transfer (VIDIT), in an effort to create a
reference evaluation benchmark and to push forward the development of
illumination manipulation methods. Virtual datasets are not only an important
step towards achieving real-image performance but have also proven capable of
improving training even when real datasets are possible to acquire and
available. VIDIT contains 300 virtual scenes used for training, where every
scene is captured 40 times in total: from 8 equally-spaced azimuthal angles,
each lit with 5 different illuminants.Comment: For further information and data, see
https://github.com/majedelhelou/VIDI
Reliability of camera systems to recognize facial features for access to specialized production areas
The article deals with ergonomics and reliability of camera systems for recognition of
facial features and identify person for access to specialized areas. The monitoring of areas relates
not only to crime, but it is also an integral part of access to specialized production areas
(pharmaceutical production, chemical production, specialized food production, etc.). It is
therefore important to adequately secure these premises using the relevant system. One of them
is a system based on user identification using specific facial features. For this purpose, there are
CCTV systems for recognition of facial features of different price categories (conventional
cameras, semi-professional and professional) on the world market. However, problematic
situations may occur when identifying. For example, by having the user partially masked face.
This research is focusing on the problem. The main goal of the research is establishing the scale
of negative impact, in case the identified person has partially masked face, on camera systems
recognizing facial features, primarily on recognition time. The results are evaluated in detail.
Some camera systems are not suitable in specialized production areas due to their insufficient
recognition ability. From all the tested devices, the HIKVISION iDS-2CD8426G0 / F-I camera
identification system has proved to be optimal for identification purposes. In the case of
designing, it is therefore necessary to choose suitable camera systems that have ergonomics and
reliability at a level that will guarantee their sufficient use in the mentioned areas, while
decreasing comfort and user-friendliness as little as possible. By measuring the ergonomics and
reliability of these CCTV systems, it can be stated that there are statistically significant
differences between conventional, semi-professional and professional systems, and it’s not just a
design change, but also a more efficient recognition method
BareSkinNet: De-makeup and De-lighting via 3D Face Reconstruction
We propose BareSkinNet, a novel method that simultaneously removes makeup and
lighting influences from the face image. Our method leverages a 3D morphable
model and does not require a reference clean face image or a specified light
condition. By combining the process of 3D face reconstruction, we can easily
obtain 3D geometry and coarse 3D textures. Using this information, we can infer
normalized 3D face texture maps (diffuse, normal, roughness, and specular) by
an image-translation network. Consequently, reconstructed 3D face textures
without undesirable information will significantly benefit subsequent
processes, such as re-lighting or re-makeup. In experiments, we show that
BareSkinNet outperforms state-of-the-art makeup removal methods. In addition,
our method is remarkably helpful in removing makeup to generate consistent
high-fidelity texture maps, which makes it extendable to many realistic face
generation applications. It can also automatically build graphic assets of face
makeup images before and after with corresponding 3D data. This will assist
artists in accelerating their work, such as 3D makeup avatar creation.Comment: accepted at PG202
Real-time face view correction for front-facing cameras
Face view is particularly important in person-to-person communication. Disparity between the camera location and the face orientation can result in undesirable facial appearances of the participants during video conferencing. This phenomenon becomes particularly notable on devices where the front-facing camera is placed at unconventional locations such as below the display or within the keyboard. In this paper, we takes the video stream from a single RGB camera as input, and generates a video stream that emulates the view from a virtual camera at a designated location. The most challenging issue of this problem is that the corrected view often needs out-of-plane head rotations. To address this challenge, we reconstruct 3D face shape and re-render it into synthesized frames according to the virtual camera location. To output the corrected video stream with natural appearance in real-time, we propose several novel techniques including accurate eyebrow reconstruction, high-quality blending between corrected face image and background, and a template-based 3D reconstruction of glasses. Our system works well for different lighting conditions and skin tones, and is able to handle users wearing glasses. Extensive experiments and user studies demonstrate that our proposed method can achieve high-quality results
FreeStyleGAN: Free-view Editable Portrait Rendering with the Camera Manifold
International audienceCurrent Generative Adversarial Networks (GANs) produce photorealisticrenderings of portrait images. Embedding real images into the latent spaceof such models enables high-level image editing. While recent methodsprovide considerable semantic control over the (re-)generated images, theycan only generate a limited set of viewpoints and cannot explicitly controlthe camera. Such 3D camera control is required for 3D virtual and mixedreality applications. In our solution, we use a few images of a face to perform3D reconstruction, and we introduce the notion of the GAN camera manifold,the key element allowing us to precisely define the range of images that theGAN can reproduce in a stable manner. We train a small face-specific neuralimplicit representation network to map a captured face to this manifoldand complement it with a warping scheme to obtain free-viewpoint novel-view synthesis. We show how our approach ś due to its precise cameracontrol ś enables the integration of a pre-trained StyleGAN into standard 3Drendering pipelines, allowing e.g., stereo rendering or consistent insertionof faces in synthetic 3D environments. Our solution proposes the first trulyfree-viewpoint rendering of realistic faces at interactive rates, using onlya small number of casual photos as input, while simultaneously allowingsemantic editing capabilities, such as facial expression or lighting changes