57 research outputs found

    3D Generative Model Latent Disentanglement via Local Eigenprojection

    Get PDF
    Designing realistic digital humans is extremely complex. Most data-driven generative models used to simplify the creation of their underlying geometric shape do not offer control over the generation of local shape attributes. In this paper, we overcome this limitation by introducing a novel loss function grounded in spectral geometry and applicable to different neural-network-based generative models of 3D head and body meshes. Encouraging the latent variables of mesh variational autoencoders (VAEs) or generative adversarial networks (GANs) to follow the local eigenprojections of identity attributes, we improve latent disentanglement and properly decouple the attribute creation. Experimental results show that our local eigenprojection disentangled (LED) models not only offer improved disentanglement with respect to the state-of-the-art, but also maintain good generation capabilities with training times comparable to the vanilla implementations of the models. Our code and pre-trained models are available at github.com/simofoti/LocalEigenprojDisentangled


    Get PDF
    The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding. The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection. The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously

    FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

    Full text link
    This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs), without using intermediate representations such as 2D images, depth maps, or point clouds. Our 3D coarse-to-fine Face Landmarks NeRF (FLNeRF) model efficiently samples from the NeRF on the whole face with individual facial features for accurate landmarks. To mitigate the limited number of facial expressions in the available data, local and non-linear NeRF warp is applied at facial features in fine scale to simulate large emotions range, including exaggerated facial expressions (e.g., cheek blowing, wide opening mouth, eye blinking), for training FLNeRF. With such expression augmentation, our model can predict 3D landmarks not limited to the 20 discrete expressions given in the data. Robust 3D NeRF facial landmarks contribute to many downstream tasks. As an example, we modify MoFaNeRF to enable high-quality face editing and swapping using face landmarks on NeRF, allowing more direct control and wider range of complex expressions. Experiments show that the improved model using landmarks achieves comparable to better results.Comment: Hao Zhang and Tianyuan Dai contributed equally. Project website: https://github.com/ZHANG1023/FLNeR

    VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

    Full text link
    We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions. Our method is real-time and produces high-fidelity and view-consistent output, suitable for 3D teleconferencing systems based on holographic displays. Existing cutting-edge 3D-aware reenactment methods often use neural radiance fields or 3D meshes to produce view-consistent appearance encoding, but, at the same time, they rely on linear face models, such as 3DMM, to achieve its disentanglement with facial expressions. As a result, their reenactment results often exhibit identity leakage from the driver or have unnatural expressions. To address these problems, we propose a neural self-supervised disentanglement approach that lifts both the source image and driver video frame into a shared 3D volumetric representation based on tri-planes. This representation can then be freely manipulated with expression tri-planes extracted from the driving images and rendered from an arbitrary view using neural radiance fields. We achieve this disentanglement via self-supervised learning on a large in-the-wild video dataset. We further introduce a highly effective fine-tuning approach to improve the generalizability of the 3D lifting using the same real-world data. We demonstrate state-of-the-art performance on a wide range of datasets, and also showcase high-quality 3D-aware head reenactment on highly challenging and diverse subjects, including non-frontal head poses and complex expressions for both source and driver

    High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors

    Full text link
    High-fidelity facial avatar reconstruction from a monocular video is a significant research problem in computer graphics and computer vision. Recently, Neural Radiance Field (NeRF) has shown impressive novel view rendering results and has been considered for facial avatar reconstruction. However, the complex facial dynamics and missing 3D information in monocular videos raise significant challenges for faithful facial reconstruction. In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. Different from existing works that depend on a conditional deformation field for dynamic modeling, we propose to learn a personalized generative prior, which is formulated as a local and low dimensional subspace in the latent space of 3D-GAN. We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios. Compared with existing works, we obtain superior novel view synthesis results and faithfully face reenactment performance.Comment: 8 pages, 7 figure

    Latent Disentanglement for the Analysis and Generation of Digital Human Shapes

    Get PDF
    Analysing and generating digital human shapes is crucial for a wide variety of applications ranging from movie production to healthcare. The most common approaches for the analysis and generation of digital human shapes involve the creation of statistical shape models. At the heart of these techniques is the definition of a mapping between shapes and a low-dimensional representation. However, making these representations interpretable is still an open challenge. This thesis explores latent disentanglement as a powerful technique to make the latent space of geometric deep learning based statistical shape models more structured and interpretable. In particular, it introduces two novel techniques to disentangle the latent representation of variational autoencoders and generative adversarial networks with respect to the local shape attributes characterising the identity of the generated body and head meshes. This work was inspired by a shape completion framework that was proposed as a viable alternative to intraoperative registration in minimally invasive surgery of the liver. In addition, one of these methods for latent disentanglement was also applied to plastic surgery, where it was shown to improve the diagnosis of craniofacial syndromes and aid surgical planning

    Unsupervised Face Alignment by Robust Nonrigid Mapping

    Get PDF
    We propose a novel approach to unsupervised facial im-age alignment. Differently from previous approaches, that are confined to affine transformations on either the entire face or separate patches, we extract a nonrigid mapping be-tween facial images. Based on a regularized face model, we frame unsupervised face alignment into the Lucas-Kanade image registration approach. We propose a robust optimiza-tion scheme to handle appearance variations. The method is fully automatic and can cope with pose variations and ex-pressions, all in an unsupervised manner. Experiments on a large set of images showed that the approach is effective. 1
    • …