Search CORE

20,229 research outputs found

DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

Author: Bai Mingru
Izadi Shahram
Kohli Pushmeet
Xiao Jianxiong
Zhang Yinda
Publication venue
Publication date: 01/01/2017
Field of study

While deep neural networks have led to human-level performance on computer vision tasks, they have yet to demonstrate similar gains for holistic scene understanding. In particular, 3D context has been shown to be an extremely important cue for scene understanding - yet very little research has been done on integrating context information with deep models. This paper presents an approach to embed 3D context into the topology of a neural network trained to perform holistic scene understanding. Given a depth image depicting a 3D scene, our network aligns the observed scene with a predefined 3D scene template, and then reasons about the existence and location of each object within the scene template. In doing so, our model recognizes multiple objects in a single forward pass of a 3D convolutional neural network, capturing both global scene and local object information simultaneously. To create training data for this 3D network, we generate partly hallucinated depth images which are rendered by replacing real objects with a repository of CAD models of the same object category. Extensive experiments demonstrate the effectiveness of our algorithm compared to the state-of-the-arts. Source code and data are available at http://deepcontext.cs.princeton.edu.Comment: Accepted by ICCV201

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering

Author: DJ Butler
DJ Butler
E Reinhard
EH Land
Elena Garces
J Jeon
JT Barron
Nicolas Bonneel
Q Zhao
R Achanta
S Bell
S Bi
S Kim
SR Richter
Publication venue
Publication date: 05/12/2018
Field of study

Intrinsic image decomposition is a challenging, long-standing computer vision problem for which ground truth data is very difficult to acquire. We explore the use of synthetic data for training CNN-based intrinsic image decomposition models, then applying these learned models to real-world images. To that end, we present \ICG, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. The rendering process we use is carefully designed to yield high-quality, realistic images, which we find to be crucial for this problem domain. We also propose a new end-to-end training method that learns better decompositions by leveraging \ICG, and optionally IIW and SAW, two recent datasets of sparse annotations on real-world images. Surprisingly, we find that a decomposition network trained solely on our synthetic data outperforms the state-of-the-art on both IIW and SAW, and performance improves even further when IIW and SAW data is added during training. Our work demonstrates the suprising effectiveness of carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering' published in ECCV, 201

arXiv.org e-Print Archive

Crossref

Physical Primitive Decomposition

Author: Freeman William T.
Liu Zhijian
Tenenbaum Joshua B.
Wu Jiajun
Publication venue
Publication date: 13/09/2018
Field of study

Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Developing such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition---understanding an object through its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object's appearance and its behaviors in physical events. Our model performs well on block towers and tools in both synthetic and real scenarios; we also demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to better understand our model and contrast it with human performance.Comment: ECCV 2018. Project page: http://ppd.csail.mit.edu

arXiv.org e-Print Archive

DSpace@MIT

Crossref