69,104 research outputs found
Physically-Based Editing of Indoor Scene Lighting from a Single Image
We present a method to edit complex indoor lighting from a single image with
its predicted depth and light source segmentation masks. This is an extremely
challenging problem that requires modeling complex light transport, and
disentangling HDR lighting from material and geometry with only a partial LDR
observation of the scene. We tackle this problem using two novel components: 1)
a holistic scene reconstruction method that estimates scene reflectance and
parametric 3D lighting, and 2) a neural rendering framework that re-renders the
scene from our predictions. We use physically-based indoor light
representations that allow for intuitive editing, and infer both visible and
invisible light sources. Our neural rendering framework combines
physically-based direct illumination and shadow rendering with deep networks to
approximate global illumination. It can capture challenging lighting effects,
such as soft shadows, directional lighting, specular materials, and
interreflections. Previous single image inverse rendering methods usually
entangle scene lighting and geometry and only support applications like object
insertion. Instead, by combining parametric 3D lighting estimation with neural
scene rendering, we demonstrate the first automatic method to achieve full
scene relighting, including light source insertion, removal, and replacement,
from a single image. All source code and data will be publicly released
In-the-wild Material Appearance Editing using Perceptual Attributes
Intuitively editing the appearance of materials from a single image is a challenging task given the complexity of the interactions between light and matter, and the ambivalence of human perception. This problem has been traditionally addressed by estimating additional factors of the scene like geometry or illumination, thus solving an inverse rendering problem and subduing the final quality of the results to the quality of these estimations. We present a single-image appearance editing framework that allows us to intuitively modify the material appearance of an object by increasing or decreasing high-level perceptual attributes describing such appearance (e.g., glossy or metallic). Our framework takes as input an in-the-wild image of a single object, where geometry, material, and illumination are not controlled, and inverse rendering is not required. We rely on generative models and devise a novel architecture with Selective Transfer Unit (STU) cells that allow to preserve the high-frequency details from the input image in the edited one. To train our framework we leverage a dataset with pairs of synthetic images rendered with physically-based algorithms, and the corresponding crowd-sourced ratings of high-level perceptual attributes. We show that our material editing framework outperforms the state of the art, and showcase its applicability on synthetic images, in-the-wild real-world photographs, and video sequences
RenderNet: A deep convolutional network for differentiable rendering from 3D shapes
Traditional computer graphics rendering pipeline is designed for procedurally
generating 2D quality images from 3D shapes with high performance. The
non-differentiability due to discrete operations such as visibility computation
makes it hard to explicitly correlate rendering parameters and the resulting
image, posing a significant challenge for inverse rendering tasks. Recent work
on differentiable rendering achieves differentiability either by designing
surrogate gradients for non-differentiable operations or via an approximate but
differentiable renderer. These methods, however, are still limited when it
comes to handling occlusion, and restricted to particular rendering effects. We
present RenderNet, a differentiable rendering convolutional network with a
novel projection unit that can render 2D images from 3D shapes. Spatial
occlusion and shading calculation are automatically encoded in the network. Our
experiments show that RenderNet can successfully learn to implement different
shaders, and can be used in inverse rendering tasks to estimate shape, pose,
lighting and texture from a single image.Comment: 14 pages, 9 figure
CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images
With the powerfulness of convolution neural networks (CNN), CNN based face
reconstruction has recently shown promising performance in reconstructing
detailed face shape from 2D face images. The success of CNN-based methods
relies on a large number of labeled data. The state-of-the-art synthesizes such
data using a coarse morphable face model, which however has difficulty to
generate detailed photo-realistic images of faces (with wrinkles). This paper
presents a novel face data generation method. Specifically, we render a large
number of photo-realistic face images with different attributes based on
inverse rendering. Furthermore, we construct a fine-detailed face image dataset
by transferring different scales of details from one image to another. We also
construct a large number of video-type adjacent frame pairs by simulating the
distribution of real video data. With these nicely constructed datasets, we
propose a coarse-to-fine learning framework consisting of three convolutional
networks. The networks are trained for real-time detailed 3D face
reconstruction from monocular video as well as from a single image. Extensive
experimental results demonstrate that our framework can produce high-quality
reconstruction but with much less computation time compared to the
state-of-the-art. Moreover, our method is robust to pose, expression and
lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
InverseFaceNet: Deep Monocular Inverse Face Rendering
We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. By estimating all parameters from just a single image, advanced editing possibilities on a single face image, such as appearance editing and relighting, become feasible in real time. Most previous learning-based face reconstruction approaches do not jointly recover all dimensions, or are severely limited in terms of visual quality. In contrast, we propose to recover high-quality facial pose, shape, expression, reflectance and illumination using a deep neural network that is trained using a large, synthetically created training corpus. Our approach builds on a novel loss function that measures model-space similarity directly in parameter space and significantly improves reconstruction accuracy. We further propose a self-supervised bootstrapping process in the network training loop, which iteratively updates the synthetic training corpus to better reflect the distribution of real-world imagery. We demonstrate that this strategy outperforms completely synthetically trained networks. Finally, we show high-quality reconstructions and compare our approach to several state-of-the-art approaches
- …