194 research outputs found
Lightweight Face Relighting
In this paper we present a method to relight human faces in real time, using consumer-grade graphics cards even with limited 3D capabilities. We show how to render faces using a combination of a simple, hardware-accelerated parametric model simulating skin shading and a detail texture map, and provide robust procedures to estimate all the necessary parameters for a given face. Our model strikes a balance between the difficulty of realistic face rendering (given the very specific reflectance properties of skin) and the goal of real-time rendering with limited hardware capabilities. This is accomplished by automatically generating an optimal set of parameters for a simple rendering model. We offer a discussion of the issues in face rendering to discern the pros and cons of various rendering models and to generalize our approach to most of the current hardware constraints. We provide results demonstrating the usability of our approach and the improvements we introduce both in the performance and in the visual quality of the resulting faces
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
A Dataset of Relighted 3D Interacting Hands
The two-hand interaction is one of the most challenging signals to analyze
due to the self-similarity, complicated articulations, and occlusions of hands.
Although several datasets have been proposed for the two-hand interaction
analysis, all of them do not achieve 1) diverse and realistic image appearances
and 2) diverse and large-scale groundtruth (GT) 3D poses at the same time. In
this work, we propose Re:InterHand, a dataset of relighted 3D interacting hands
that achieve the two goals. To this end, we employ a state-of-the-art hand
relighting network with our accurately tracked two-hand 3D poses. We compare
our Re:InterHand with existing 3D interacting hands datasets and show the
benefit of it. Our Re:InterHand is available in
https://mks0601.github.io/ReInterHand/.Comment: Accepted by NeurIPS 2023 (Datasets and Benchmarks Track
Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing
Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differentiable rasterization based image formation models, as well as underlying scene parameterization, limit them to Lambertian face reflectance and to poor shape details. More recently, ray tracing was introduced for monocular face reconstruction within a classic optimization-based framework and enables state-of-the art results. However optimization-based approaches are inherently slow and lack robustness. In this paper, we build our work on the aforementioned approaches and propose a new method that greatly improves reconstruction quality and robustness in general scenes. We achieve this by combining a CNN encoder with a differentiable ray tracer, which enables us to base the reconstruction on much more advanced personalized diffuse and specular albedos, a more sophisticated illumination model and a plausible representation of self-shadows. This enables to take a big leap forward in reconstruction quality of shape, appearance and lighting even in scenes with difficult illumination. With consistent face attributes reconstruction, our method leads to practical applications such as relighting and self-shadows removal. Compared to state-of-the-art methods, our results show improved accuracy and validity of the approach
Efficient Multi-View Inverse Rendering Using a Hybrid Differentiable Rendering Method
Recovering the shape and appearance of real-world objects from natural 2D
images is a long-standing and challenging inverse rendering problem. In this
paper, we introduce a novel hybrid differentiable rendering method to
efficiently reconstruct the 3D geometry and reflectance of a scene from
multi-view images captured by conventional hand-held cameras. Our method
follows an analysis-by-synthesis approach and consists of two phases. In the
initialization phase, we use traditional SfM and MVS methods to reconstruct a
virtual scene roughly matching the real scene. Then in the optimization phase,
we adopt a hybrid approach to refine the geometry and reflectance, where the
geometry is first optimized using an approximate differentiable rendering
method, and the reflectance is optimized afterward using a physically-based
differentiable rendering method. Our hybrid approach combines the efficiency of
approximate methods with the high-quality results of physically-based methods.
Extensive experiments on synthetic and real data demonstrate that our method
can produce reconstructions with similar or higher quality than
state-of-the-art methods while being more efficient.Comment: IJCAI202
PhotoMat: A Material Generator Learned from Single Flash Photos
Authoring high-quality digital materials is key to realism in 3D rendering.
Previous generative models for materials have been trained exclusively on
synthetic data; such data is limited in availability and has a visual gap to
real materials. We circumvent this limitation by proposing PhotoMat: the first
material generator trained exclusively on real photos of material samples
captured using a cell phone camera with flash. Supervision on individual
material maps is not available in this setting. Instead, we train a generator
for a neural material representation that is rendered with a learned relighting
module to create arbitrarily lit RGB images; these are compared against real
photos using a discriminator. We then train a material maps estimator to decode
material reflectance properties from the neural material representation. We
train PhotoMat with a new dataset of 12,000 material photos captured with
handheld phone cameras under flash lighting. We demonstrate that our generated
materials have better visual quality than previous material generators trained
on synthetic data. Moreover, we can fit analytical material models to closely
match these generated neural materials, thus allowing for further editing and
use in 3D rendering
- …