41 research outputs found
SIDER: Single-Image Neural Optimization for Facial Geometric Detail Recovery
© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present SIDER (Single-Image neural optimization for facial geometric DEtail Recovery), a novel photometric optimization method that recovers detailed facial geometry from a single image in an unsupervised manner. Inspired by classical techniques of coarse-to-fine optimization and recent advances in implicit neural representations of 3D shape, SIDER combines a geometry prior based on statistical models and Signed Distance Functions (SDFs) to recover facial details from single images. First, it estimates a coarse geometry using a morphable model represented as an SDF. Next, it reconstructs facial geometry details by optimizing a photometric loss with respect to the ground truth image. In contrast to prior work, SIDER does not rely on any dataset priors and does not require additional supervision from multiple views, lighting changes or ground truth 3D shape. Extensive qualitative and quantitative evaluation demonstrates that our method achieves state-of-the-art on facial geometric detail recovery, using only a single in the-wild image.Peer ReviewedPostprint (author's final draft
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality
Social presence, the feeling of being there with a real person, will fuel the
next generation of communication systems driven by digital humans in virtual
reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny
effect rely on person-specific (PS) models. However, these PS models are
time-consuming to build and are typically trained with limited data
variability, which results in poor generalization and robustness. Major sources
of variability that affects the accuracy of facial expression transfer
algorithms include using different VR headsets (e.g., camera configuration,
slop of the headset), facial appearance changes over time (e.g., beard,
make-up), and environmental factors (e.g., lighting, backgrounds). This is a
major drawback for the scalability of these models in VR. This paper makes
progress in overcoming these limitations by proposing an end-to-end
multi-identity architecture (MIA) trained with specialized augmentation
strategies. MIA drives the shape component of the avatar from three cameras in
the VR headset (two eyes, one mouth), in untrained subjects, using minimal
personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS
texture decoder is available, MIA is able to drive the full avatar
(shape+texture) robustly outperforming PS models in challenging scenarios. Our
key contribution to improve robustness and generalization, is that our method
implicitly decouples, in an unsupervised manner, the facial expression from
nuisance factors (e.g., headset, environment, facial appearance). We
demonstrate the superior performance and robustness of the proposed method
versus state-of-the-art PS approaches in a variety of experiments
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset
We present FaceVerse, a fine-grained 3D Neural Face Model, which is built
from hybrid East Asian face datasets containing 60K fused RGB-D images and 2K
high-fidelity 3D head scan models. A novel coarse-to-fine structure is proposed
to take better advantage of our hybrid dataset. In the coarse module, we
generate a base parametric model from large-scale RGB-D images, which is able
to predict accurate rough 3D face models in different genders, ages, etc. Then
in the fine module, a conditional StyleGAN architecture trained with
high-fidelity scan models is introduced to enrich elaborate facial geometric
and texture details. Note that different from previous methods, our base and
detailed modules are both changeable, which enables an innovative application
of adjusting both the basic attributes and the facial details of 3D face
models. Furthermore, we propose a single-image fitting framework based on
differentiable rendering. Rich experiments show that our method outperforms the
state-of-the-art methods.Comment: https://github.com/LizhenWangT/FaceVers
Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation
Head generation with diverse identities is an important task in computer
vision and computer graphics, widely used in multimedia applications. However,
current full head generation methods require a large number of 3D scans or
multi-view images to train the model, resulting in expensive data acquisition
cost. To address this issue, we propose Head3D, a method to generate full 3D
heads with limited multi-view images. Specifically, our approach first extracts
facial priors represented by tri-planes learned in EG3D, a 3D-aware generative
model, and then proposes feature distillation to deliver the 3D frontal faces
into complete heads without compromising head integrity. To mitigate the domain
gap between the face and head models, we present dual-discriminators to guide
the frontal and back head generation, respectively. Our model achieves
cost-efficient and diverse complete head generation with photo-realistic
renderings and high-quality geometry representations. Extensive experiments
demonstrate the effectiveness of our proposed Head3D, both qualitatively and
quantitatively
3D Face Arbitrary Style Transfer
Style transfer of 3D faces has gained more and more attention. However,
previous methods mainly use images of artistic faces for style transfer while
ignoring arbitrary style images such as abstract paintings. To solve this
problem, we propose a novel method, namely Face-guided Dual Style Transfer
(FDST). To begin with, FDST employs a 3D decoupling module to separate facial
geometry and texture. Then we propose a style fusion strategy for facial
geometry. Subsequently, we design an optimization-based DDSG mechanism for
textures that can guide the style transfer by two style images. Besides the
normal style image input, DDSG can utilize the original face input as another
style input as the face prior. By this means, high-quality face arbitrary style
transfer results can be obtained. Furthermore, FDST can be applied in many
downstream tasks, including region-controllable style transfer, high-fidelity
face texture reconstruction, large-pose face reconstruction, and artistic face
reconstruction. Comprehensive quantitative and qualitative results show that
our method can achieve comparable performance. All source codes and pre-trained
weights will be released to the public
Learning Neural Parametric Head Models
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation that disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points. To facilitate generalization, we train our model on a newly-captured dataset of over 3700 head scans from 203 different identities using a custom high-end 3D scanning setup. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M mesh faces per scan 1 1 We will publicly release our dataset along with a public benchmark for both neural head avatar construction as well as an evaluation on a hidden test-set for inference-time fitting.. Finally, we demonstrate that our approach outperforms state-of-the-art methods in terms of fitting error and reconstruction quality
H3D-Net: Few-shot high-fidelity 3D head reconstruction
Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness of these techniques is, however, subject to the availability of a large number (several tens) of input views of the scene, and computationally demanding optimizations. In this paper, we tackle these limitations for the specific problem of few-shot full 3D head reconstruction, by endowing coordinate-based representations with a probabilistic shape prior that enables faster convergence and better generalization when using few input images (down to three). First, we learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations. At test time, we jointly overfit two coordinate-based neural networks to the scene, one modeling the geometry and another estimating the surface radiance, using implicit differentiable rendering. We devise a two-stage optimization strategy in which the learned prior is used to initialize and constrain the geometry during an initial optimization phase. Then, the prior is unfrozen and fine-tuned to the scene. By doing this, we achieve high-fidelity head reconstructions, including hair and shoulders, and with a high level of detail that consistently outperforms both state-of-the-art 3D Morphable Models methods in the few-shot scenario, and non-parametric methods when large sets of views are available.This work has been partially funded by the Spanish government with the projects MoHuCo PID2020-120049RBI00, DeeLight PID2020-117142GB-I00 and Maria de Maeztu Seal of Excellence MDM-2016-0656, and by the Government of Catalonia under 2017 DI 028.Peer ReviewedPostprint (author's final draft