14,391 research outputs found
Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar
Rendering photorealistic and dynamically moving human heads is crucial for
ensuring a pleasant and immersive experience in AR/VR and video conferencing
applications. However, existing methods often struggle to model challenging
facial regions (e.g., mouth interior, eyes, hair/beard), resulting in
unrealistic and blurry results. In this paper, we propose {\fullname}
({\name}), a method that adopts the neural point representation as well as the
neural volume rendering process and discards the predefined connectivity and
hard correspondence imposed by mesh-based approaches. Specifically, the neural
points are strategically constrained around the surface of the target
expression via a high-resolution UV displacement map, achieving increased
modeling capacity and more accurate control. We introduce three technical
innovations to improve the rendering and training efficiency: a patch-wise
depth-guided (shading point) sampling strategy, a lightweight radiance decoding
process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By
design, our {\name} is better equipped to handle topologically changing regions
and thin structures while also ensuring accurate expression control when
animating avatars. Experiments conducted on three subjects from the Multiface
dataset demonstrate the effectiveness of our designs, outperforming previous
state-of-the-art methods, especially in handling challenging facial regions
Realistic Real-Time Rendering of Global Illumination and Hair through Machine Learning Precomputations
Over the last decade, machine learning has gained a lot of traction in many areas, and with the advent of new GPU models that include acceleration hardware for neural network inference, real-time applications have also started to take advantage of these algorithms.In general, machine learning and neural network methods are not designed to run at the speeds that are required for rendering in high-performance real-time environments, except for very specific and typically limited uses. For example, several methods have been developed recently for denoising of low quality pathtraced images, or to upsample images rendered at lower resolution, that can run in real-time.This thesis collects two methods that attempt to improve realistic scene rendering in such high-performance environments by using machine learning.Paper I presents a neural network application for compressing surface lightfields into a set of unconstrained spherical gaussians to render surfaces with global illumination in a real-time environment.Paper II describes a filter based on a small convolutional neural network that can be used to denoise hair rendered with stochastic transparency in real time
Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz
The reconstruction of dense 3D models of face geometry and appearance from a
single image is highly challenging and ill-posed. To constrain the problem,
many approaches rely on strong priors, such as parametric face models learned
from limited 3D scan data. However, prior models restrict generalization of the
true diversity in facial geometry, skin reflectance and illumination. To
alleviate this problem, we present the first approach that jointly learns 1) a
regressor for face shape, expression, reflectance and illumination on the basis
of 2) a concurrently learned parametric face model. Our multi-level face model
combines the advantage of 3D Morphable Models for regularization with the
out-of-space generalization of a learned corrective space. We train end-to-end
on in-the-wild images without dense annotations by fusing a convolutional
encoder with a differentiable expert-designed renderer and a self-supervised
training loss, both defined at multiple detail levels. Our approach compares
favorably to the state-of-the-art in terms of reconstruction quality, better
generalizes to real world faces, and runs at over 250 Hz.Comment: CVPR 2018 (Oral). Project webpage:
https://gvv.mpi-inf.mpg.de/projects/FML
Neural Face Editing with Intrinsic Image Disentangling
Traditional face editing methods often require a number of sophisticated and
task specific algorithms to be applied one after the other --- a process that
is tedious, fragile, and computationally intensive. In this paper, we propose
an end-to-end generative adversarial network that infers a face-specific
disentangled representation of intrinsic face properties, including shape (i.e.
normals), albedo, and lighting, and an alpha matte. We show that this network
can be trained on "in-the-wild" images by incorporating an in-network
physically-based image formation module and appropriate loss functions. Our
disentangling latent representation allows for semantically relevant edits,
where one aspect of facial appearance can be manipulated while keeping
orthogonal properties fixed, and we demonstrate its use for a number of facial
editing applications.Comment: CVPR 2017 ora
Disentangling Factors of Variation by Mixing Them
We propose an approach to learn image representations that consist of
disentangled factors of variation without exploiting any manual labeling or
data domain knowledge. A factor of variation corresponds to an image attribute
that can be discerned consistently across a set of images, such as the pose or
color of objects. Our disentangled representation consists of a concatenation
of feature chunks, each chunk representing a factor of variation. It supports
applications such as transferring attributes from one image to another, by
simply mixing and unmixing feature chunks, and classification or retrieval
based on one or several attributes, by considering a user-specified subset of
feature chunks. We learn our representation without any labeling or knowledge
of the data domain, using an autoencoder architecture with two novel training
objectives: first, we propose an invariance objective to encourage that
encoding of each attribute, and decoding of each chunk, are invariant to
changes in other attributes and chunks, respectively; second, we include a
classification objective, which ensures that each chunk corresponds to a
consistently discernible attribute in the represented image, hence avoiding
degenerate feature mappings where some chunks are completely ignored. We
demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA
datasets.Comment: CVPR 201
A Generative Model of People in Clothing
We present the first image-based generative model of people in clothing for
the full body. We sidestep the commonly used complex graphics rendering
pipeline and the need for high-quality 3D scans of dressed people. Instead, we
learn generative models from a large image database. The main challenge is to
cope with the high variance in human pose, shape and appearance. For this
reason, pure image-based approaches have not been considered so far. We show
that this challenge can be overcome by splitting the generating process in two
parts. First, we learn to generate a semantic segmentation of the body and
clothing. Second, we learn a conditional model on the resulting segments that
creates realistic images. The full model is differentiable and can be
conditioned on pose, shape or color. The result are samples of people in
different clothing items and styles. The proposed model can generate entirely
new people with realistic clothing. In several experiments we present
encouraging results that suggest an entirely data-driven approach to people
generation is possible
Tex2Shape: Detailed Full Human Body Geometry From a Single Image
We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method
- …