57 research outputs found
3D Generative Model Latent Disentanglement via Local Eigenprojection
Designing realistic digital humans is extremely complex. Most data-driven generative models used to simplify the creation of their underlying geometric shape do not offer control over the generation of local shape attributes. In this paper, we overcome this limitation by introducing a novel loss function grounded in spectral geometry and applicable to different neural-network-based generative models of 3D head and body meshes. Encouraging the latent variables of mesh variational autoencoders (VAEs) or generative adversarial networks (GANs) to follow the local eigenprojections of identity attributes, we improve latent disentanglement and properly decouple the attribute creation. Experimental results show that our local eigenprojection disentangled (LED) models not only offer improved disentanglement with respect to the state-of-the-art, but also maintain good generation capabilities with training times comparable to the vanilla implementations of the models. Our code and pre-trained models are available at github.com/simofoti/LocalEigenprojDisentangled
AFFECT-PRESERVING VISUAL PRIVACY PROTECTION
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding.
The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection.
The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously
FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields
This paper presents the first significant work on directly predicting 3D face
landmarks on neural radiance fields (NeRFs), without using intermediate
representations such as 2D images, depth maps, or point clouds. Our 3D
coarse-to-fine Face Landmarks NeRF (FLNeRF) model efficiently samples from the
NeRF on the whole face with individual facial features for accurate landmarks.
To mitigate the limited number of facial expressions in the available data,
local and non-linear NeRF warp is applied at facial features in fine scale to
simulate large emotions range, including exaggerated facial expressions (e.g.,
cheek blowing, wide opening mouth, eye blinking), for training FLNeRF. With
such expression augmentation, our model can predict 3D landmarks not limited to
the 20 discrete expressions given in the data. Robust 3D NeRF facial landmarks
contribute to many downstream tasks. As an example, we modify MoFaNeRF to
enable high-quality face editing and swapping using face landmarks on NeRF,
allowing more direct control and wider range of complex expressions.
Experiments show that the improved model using landmarks achieves comparable to
better results.Comment: Hao Zhang and Tianyuan Dai contributed equally. Project website:
https://github.com/ZHANG1023/FLNeR
VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment
We present a 3D-aware one-shot head reenactment method based on a fully
volumetric neural disentanglement framework for source appearance and driver
expressions. Our method is real-time and produces high-fidelity and
view-consistent output, suitable for 3D teleconferencing systems based on
holographic displays. Existing cutting-edge 3D-aware reenactment methods often
use neural radiance fields or 3D meshes to produce view-consistent appearance
encoding, but, at the same time, they rely on linear face models, such as 3DMM,
to achieve its disentanglement with facial expressions. As a result, their
reenactment results often exhibit identity leakage from the driver or have
unnatural expressions. To address these problems, we propose a neural
self-supervised disentanglement approach that lifts both the source image and
driver video frame into a shared 3D volumetric representation based on
tri-planes. This representation can then be freely manipulated with expression
tri-planes extracted from the driving images and rendered from an arbitrary
view using neural radiance fields. We achieve this disentanglement via
self-supervised learning on a large in-the-wild video dataset. We further
introduce a highly effective fine-tuning approach to improve the
generalizability of the 3D lifting using the same real-world data. We
demonstrate state-of-the-art performance on a wide range of datasets, and also
showcase high-quality 3D-aware head reenactment on highly challenging and
diverse subjects, including non-frontal head poses and complex expressions for
both source and driver
High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
High-fidelity facial avatar reconstruction from a monocular video is a
significant research problem in computer graphics and computer vision.
Recently, Neural Radiance Field (NeRF) has shown impressive novel view
rendering results and has been considered for facial avatar reconstruction.
However, the complex facial dynamics and missing 3D information in monocular
videos raise significant challenges for faithful facial reconstruction. In this
work, we propose a new method for NeRF-based facial avatar reconstruction that
utilizes 3D-aware generative prior. Different from existing works that depend
on a conditional deformation field for dynamic modeling, we propose to learn a
personalized generative prior, which is formulated as a local and low
dimensional subspace in the latent space of 3D-GAN. We propose an efficient
method to construct the personalized generative prior based on a small set of
facial images of a given individual. After learning, it allows for
photo-realistic rendering with novel views and the face reenactment can be
realized by performing navigation in the latent space. Our proposed method is
applicable for different driven signals, including RGB images, 3DMM
coefficients, and audios. Compared with existing works, we obtain superior
novel view synthesis results and faithfully face reenactment performance.Comment: 8 pages, 7 figure
Latent Disentanglement for the Analysis and Generation of Digital Human Shapes
Analysing and generating digital human shapes is crucial for a wide variety of applications ranging from movie production to healthcare. The most common approaches for the analysis and generation of digital human shapes involve the creation of statistical shape models. At the heart of these techniques is the definition of a mapping between shapes and a low-dimensional representation. However, making these representations interpretable is still an open challenge. This thesis explores latent disentanglement as a powerful technique to make the latent space of geometric deep learning based statistical shape models more structured and interpretable. In particular, it introduces two novel techniques to disentangle the latent representation of variational autoencoders and generative adversarial networks with respect to the local shape attributes characterising the identity of the generated body and head meshes. This work was inspired by a shape completion framework that was proposed as a viable alternative to intraoperative registration in minimally invasive surgery of the liver. In addition, one of these methods for latent disentanglement was also applied to plastic surgery, where it was shown to improve the diagnosis of craniofacial syndromes and aid surgical planning
Unsupervised Face Alignment by Robust Nonrigid Mapping
We propose a novel approach to unsupervised facial im-age alignment. Differently from previous approaches, that are confined to affine transformations on either the entire face or separate patches, we extract a nonrigid mapping be-tween facial images. Based on a regularized face model, we frame unsupervised face alignment into the Lucas-Kanade image registration approach. We propose a robust optimiza-tion scheme to handle appearance variations. The method is fully automatic and can cope with pose variations and ex-pressions, all in an unsupervised manner. Experiments on a large set of images showed that the approach is effective. 1
- …