6 research outputs found

    DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization

    Full text link
    Facial action unit recognition has many applications from market research to psychotherapy and from image captioning to entertainment. Despite its recent progress, deployment of these models has been impeded due to their limited generalization to unseen people and demographics. This work conducts an in-depth analysis of performance across several dimensions: individuals(40 subjects), genders (male and female), skin types (darker and lighter), and databases (BP4D and DISFA). To help suppress the variance in data, we use the notion of self-supervised denoising autoencoders to design a method for deep face normalization(DeepFN) that transfers facial expressions of different people onto a common facial template which is then used to train and evaluate facial action recognition models. We show that person-independent models yield significantly lower performance (55% average F1 and accuracy across 40 subjects) than person-dependent models (60.3%), leading to a generalization gap of 5.3%. However, normalizing the data with the newly introduced DeepFN significantly increased the performance of person-independent models (59.6%), effectively reducing the gap. Similarly, we observed generalization gaps when considering gender (2.4%), skin type (5.3%), and dataset (9.4%), which were significantly reduced with the use of DeepFN. These findings represent an important step towards the creation of more generalizable facial action unit recognition systems

    VIDIT: Virtual Image Dataset for Illumination Transfer

    Full text link
    Deep image relighting is gaining more interest lately, as it allows photo enhancement through illumination-specific retouching without human effort. Aside from aesthetic enhancement and photo montage, image relighting is valuable for domain adaptation, whether to augment datasets for training or to normalize input test data. Accurate relighting is, however, very challenging for various reasons, such as the difficulty in removing and recasting shadows and the modeling of different surfaces. We present a novel dataset, the Virtual Image Dataset for Illumination Transfer (VIDIT), in an effort to create a reference evaluation benchmark and to push forward the development of illumination manipulation methods. Virtual datasets are not only an important step towards achieving real-image performance but have also proven capable of improving training even when real datasets are possible to acquire and available. VIDIT contains 300 virtual scenes used for training, where every scene is captured 40 times in total: from 8 equally-spaced azimuthal angles, each lit with 5 different illuminants.Comment: For further information and data, see https://github.com/majedelhelou/VIDI

    Reliability of camera systems to recognize facial features for access to specialized production areas

    Get PDF
    The article deals with ergonomics and reliability of camera systems for recognition of facial features and identify person for access to specialized areas. The monitoring of areas relates not only to crime, but it is also an integral part of access to specialized production areas (pharmaceutical production, chemical production, specialized food production, etc.). It is therefore important to adequately secure these premises using the relevant system. One of them is a system based on user identification using specific facial features. For this purpose, there are CCTV systems for recognition of facial features of different price categories (conventional cameras, semi-professional and professional) on the world market. However, problematic situations may occur when identifying. For example, by having the user partially masked face. This research is focusing on the problem. The main goal of the research is establishing the scale of negative impact, in case the identified person has partially masked face, on camera systems recognizing facial features, primarily on recognition time. The results are evaluated in detail. Some camera systems are not suitable in specialized production areas due to their insufficient recognition ability. From all the tested devices, the HIKVISION iDS-2CD8426G0 / F-I camera identification system has proved to be optimal for identification purposes. In the case of designing, it is therefore necessary to choose suitable camera systems that have ergonomics and reliability at a level that will guarantee their sufficient use in the mentioned areas, while decreasing comfort and user-friendliness as little as possible. By measuring the ergonomics and reliability of these CCTV systems, it can be stated that there are statistically significant differences between conventional, semi-professional and professional systems, and it’s not just a design change, but also a more efficient recognition method

    BareSkinNet: De-makeup and De-lighting via 3D Face Reconstruction

    Full text link
    We propose BareSkinNet, a novel method that simultaneously removes makeup and lighting influences from the face image. Our method leverages a 3D morphable model and does not require a reference clean face image or a specified light condition. By combining the process of 3D face reconstruction, we can easily obtain 3D geometry and coarse 3D textures. Using this information, we can infer normalized 3D face texture maps (diffuse, normal, roughness, and specular) by an image-translation network. Consequently, reconstructed 3D face textures without undesirable information will significantly benefit subsequent processes, such as re-lighting or re-makeup. In experiments, we show that BareSkinNet outperforms state-of-the-art makeup removal methods. In addition, our method is remarkably helpful in removing makeup to generate consistent high-fidelity texture maps, which makes it extendable to many realistic face generation applications. It can also automatically build graphic assets of face makeup images before and after with corresponding 3D data. This will assist artists in accelerating their work, such as 3D makeup avatar creation.Comment: accepted at PG202

    Real-time face view correction for front-facing cameras

    Get PDF
    Face view is particularly important in person-to-person communication. Disparity between the camera location and the face orientation can result in undesirable facial appearances of the participants during video conferencing. This phenomenon becomes particularly notable on devices where the front-facing camera is placed at unconventional locations such as below the display or within the keyboard. In this paper, we takes the video stream from a single RGB camera as input, and generates a video stream that emulates the view from a virtual camera at a designated location. The most challenging issue of this problem is that the corrected view often needs out-of-plane head rotations. To address this challenge, we reconstruct 3D face shape and re-render it into synthesized frames according to the virtual camera location. To output the corrected video stream with natural appearance in real-time, we propose several novel techniques including accurate eyebrow reconstruction, high-quality blending between corrected face image and background, and a template-based 3D reconstruction of glasses. Our system works well for different lighting conditions and skin tones, and is able to handle users wearing glasses. Extensive experiments and user studies demonstrate that our proposed method can achieve high-quality results

    FreeStyleGAN: Free-view Editable Portrait Rendering with the Camera Manifold

    Get PDF
    International audienceCurrent Generative Adversarial Networks (GANs) produce photorealisticrenderings of portrait images. Embedding real images into the latent spaceof such models enables high-level image editing. While recent methodsprovide considerable semantic control over the (re-)generated images, theycan only generate a limited set of viewpoints and cannot explicitly controlthe camera. Such 3D camera control is required for 3D virtual and mixedreality applications. In our solution, we use a few images of a face to perform3D reconstruction, and we introduce the notion of the GAN camera manifold,the key element allowing us to precisely define the range of images that theGAN can reproduce in a stable manner. We train a small face-specific neuralimplicit representation network to map a captured face to this manifoldand complement it with a warping scheme to obtain free-viewpoint novel-view synthesis. We show how our approach ś due to its precise cameracontrol ś enables the integration of a pre-trained StyleGAN into standard 3Drendering pipelines, allowing e.g., stereo rendering or consistent insertionof faces in synthetic 3D environments. Our solution proposes the first trulyfree-viewpoint rendering of realistic faces at interactive rates, using onlya small number of casual photos as input, while simultaneously allowingsemantic editing capabilities, such as facial expression or lighting changes
    corecore