13,658 research outputs found
DIP: Differentiable Interreflection-aware Physics-based Inverse Rendering
We present a physics-based inverse rendering method that learns the
illumination, geometry, and materials of a scene from posed multi-view RGB
images. To model the illumination of a scene, existing inverse rendering works
either completely ignore the indirect illumination or model it by coarse
approximations, leading to sub-optimal illumination, geometry, and material
prediction of the scene. In this work, we propose a physics-based illumination
model that explicitly traces the incoming indirect lights at each surface point
based on interreflection, followed by estimating each identified indirect light
through an efficient neural network. Furthermore, we utilize the Leibniz's
integral rule to resolve non-differentiability in the proposed illumination
model caused by one type of environment light -- the tangent lights. As a
result, the proposed interreflection-aware illumination model can be learned
end-to-end together with geometry and materials estimation. As a side product,
our physics-based inverse rendering model also facilitates flexible and
realistic material editing as well as relighting. Extensive experiments on both
synthetic and real-world datasets demonstrate that the proposed method performs
favorably against existing inverse rendering methods on novel view synthesis
and inverse rendering
3D-Aware Scene Manipulation via Inverse Graphics
We aim to obtain an interpretable, expressive, and disentangled scene
representation that contains comprehensive structural and textural information
for each object. Previous scene representations learned by neural networks are
often uninterpretable, limited to a single object, or lacking 3D knowledge. In
this work, we propose 3D scene de-rendering networks (3D-SDN) to address the
above issues by integrating disentangled representations for semantics,
geometry, and appearance into a deep generative model. Our scene encoder
performs inverse graphics, translating a scene into a structured object-wise
representation. Our decoder has two components: a differentiable shape renderer
and a neural texture generator. The disentanglement of semantics, geometry, and
appearance supports 3D-aware scene manipulation, e.g., rotating and moving
objects freely while keeping the consistent shape and texture, and changing the
object appearance without affecting its shape. Experiments demonstrate that our
editing scheme based on 3D-SDN is superior to its 2D counterpart.Comment: NeurIPS 2018. Code: https://github.com/ysymyth/3D-SDN Website:
http://3dsdn.csail.mit.edu
Neural View-Interpolation for Sparse Light Field Video
We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution
- …