24,192 research outputs found
3D-Aware Scene Manipulation via Inverse Graphics
We aim to obtain an interpretable, expressive, and disentangled scene
representation that contains comprehensive structural and textural information
for each object. Previous scene representations learned by neural networks are
often uninterpretable, limited to a single object, or lacking 3D knowledge. In
this work, we propose 3D scene de-rendering networks (3D-SDN) to address the
above issues by integrating disentangled representations for semantics,
geometry, and appearance into a deep generative model. Our scene encoder
performs inverse graphics, translating a scene into a structured object-wise
representation. Our decoder has two components: a differentiable shape renderer
and a neural texture generator. The disentanglement of semantics, geometry, and
appearance supports 3D-aware scene manipulation, e.g., rotating and moving
objects freely while keeping the consistent shape and texture, and changing the
object appearance without affecting its shape. Experiments demonstrate that our
editing scheme based on 3D-SDN is superior to its 2D counterpart.Comment: NeurIPS 2018. Code: https://github.com/ysymyth/3D-SDN Website:
http://3dsdn.csail.mit.edu
Enhanced Perspective Generation by Consensus of NeX neural models
Neural rendering is a relatively new field of research that aims to produce high quality perspectives of a 3D scene from a reduced set of sample images. This is done with the help of deep artificial neural networks that model the geometry and color characteristics of the scene. The NeX model relies on neural basis expansion to yield accurate results with a lower computational load than the previous NeRF model. In this work, a procedure is proposed to further enhance the quality of the perspectives generated by NeX. Our proposal is based on the combination of the outputs of several NeX models by a consensus mechanism. The approach is compared to the original NeX for a wide range of scenes. It is found that our method significantly outperforms the original procedure, both in quantitative and qualitative terms.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
- …