Search CORE

24,192 research outputs found

3D-Aware Scene Manipulation via Inverse Graphics

Author: Freeman William T.
Hsu Tzu Ming Harry
Tenenbaum Joshua B.
Torralba Antonio
Wu Jiajun
Yao Shunyu
Zhu Jun-Yan
Publication venue
Publication date: 18/12/2018
Field of study

We aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous scene representations learned by neural networks are often uninterpretable, limited to a single object, or lacking 3D knowledge. In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model. Our scene encoder performs inverse graphics, translating a scene into a structured object-wise representation. Our decoder has two components: a differentiable shape renderer and a neural texture generator. The disentanglement of semantics, geometry, and appearance supports 3D-aware scene manipulation, e.g., rotating and moving objects freely while keeping the consistent shape and texture, and changing the object appearance without affecting its shape. Experiments demonstrate that our editing scheme based on 3D-SDN is superior to its 2D counterpart.Comment: NeurIPS 2018. Code: https://github.com/ysymyth/3D-SDN Website: http://3dsdn.csail.mit.edu

arXiv.org e-Print Archive

DSpace@MIT

Enhanced Perspective Generation by Consensus of NeX neural models

Author: Domínguez-Merino Enrique
Fernández Rodríguez José David
López-Rubio Ezequiel
Ortiz-de-lazcano-Lobato Juan Miguel
Pacheco dos Santos Lima Junior Marcos Sergio
Publication venue
Publication date: 01/07/2022
Field of study

Neural rendering is a relatively new field of research that aims to produce high quality perspectives of a 3D scene from a reduced set of sample images. This is done with the help of deep artificial neural networks that model the geometry and color characteristics of the scene. The NeX model relies on neural basis expansion to yield accurate results with a lower computational load than the previous NeRF model. In this work, a procedure is proposed to further enhance the quality of the perspectives generated by NeX. Our proposal is based on the combination of the outputs of several NeX models by a consensus mechanism. The approach is compared to the original NeX for a wide range of scenes. It is found that our method significantly outperforms the original procedure, both in quantitative and qualitative terms.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga

Occlusion resistant learning of intuitive physics from videos

Author: Dupoux Emmanuel
Laptev Ivan
Riochet Ronan
Sivic Josef
Publication venue
Publication date: 30/04/2020
Field of study

To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server