12,110 research outputs found
Learning to Reconstruct Shapes from Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an
object by exploiting learned shape priors from everyday life. Contemporary
single-image 3D reconstruction algorithms aim to solve this task in a similar
fashion, but often end up with priors that are highly biased by training
classes. Here we present an algorithm, Generalizable Reconstruction (GenRe),
designed to capture more generic, class-agnostic shape priors. We achieve this
with an inference network and training procedure that combine 2.5D
representations of visible surfaces (depth and silhouette), spherical shape
representations of both visible and non-visible surfaces, and 3D voxel-based
representations, in a principled manner that exploits the causal structure of
how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe
performs well on single-view shape reconstruction, and generalizes to diverse
novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to
this paper. Project page: http://genre.csail.mit.edu
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
It is common to implicitly assume access to intelligently captured inputs
(e.g., photos from a human photographer), yet autonomously capturing good
observations is itself a major challenge. We address the problem of learning to
look around: if a visual agent has the ability to voluntarily acquire new views
to observe its environment, how can it learn efficient exploratory behaviors to
acquire informative observations? We propose a reinforcement learning solution,
where the agent is rewarded for actions that reduce its uncertainty about the
unobserved portions of its environment. Based on this principle, we develop a
recurrent neural network-based approach to perform active completion of
panoramic natural scenes and 3D object shapes. Crucially, the learned policies
are not tied to any recognition task nor to the particular semantic content
seen during training. As a result, 1) the learned "look around" behavior is
relevant even for new tasks in unseen environments, and 2) training data
acquisition involves no manual labeling. Through tests in diverse settings, we
demonstrate that our approach learns useful generic policies that transfer to
new unseen tasks and environments. Completion episodes are shown at
https://goo.gl/BgWX3W
Self-Supervised Intrinsic Image Decomposition
Intrinsic decomposition from a single image is a highly challenging task, due
to its inherent ambiguity and the scarcity of training data. In contrast to
traditional fully supervised learning approaches, in this paper we propose
learning intrinsic image decomposition by explaining the input image. Our
model, the Rendered Intrinsics Network (RIN), joins together an image
decomposition pipeline, which predicts reflectance, shape, and lighting
conditions given a single image, with a recombination function, a learned
shading model used to recompose the original input based off of intrinsic image
predictions. Our network can then use unsupervised reconstruction error as an
additional signal to improve its intermediate representations. This allows
large-scale unlabeled data to be useful during training, and also enables
transferring learned knowledge to images of unseen object categories, lighting
conditions, and shapes. Extensive experiments demonstrate that our method
performs well on both intrinsic image decomposition and knowledge transfer.Comment: NIPS 2017 camera-ready version, project page:
http://rin.csail.mit.edu
Dense 3D Object Reconstruction from a Single Depth View
In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs
the complete 3D structure of a given object from a single arbitrary depth view
using generative adversarial networks. Unlike existing work which typically
requires multiple views of the same object or class labels to recover the full
3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation
of a depth view of the object as input, and is able to generate the complete 3D
occupancy grid with a high resolution of 256^3 by recovering the
occluded/missing regions. The key idea is to combine the generative
capabilities of autoencoders and the conditional Generative Adversarial
Networks (GAN) framework, to infer accurate and fine-grained 3D structures of
objects in high-dimensional voxel space. Extensive experiments on large
synthetic datasets and real-world Kinect datasets show that the proposed
3D-RecGAN++ significantly outperforms the state of the art in single view 3D
object reconstruction, and is able to reconstruct unseen types of objects.Comment: TPAMI 2018. Code and data are available at:
https://github.com/Yang7879/3D-RecGAN-extended. This article extends from
arXiv:1708.0796
- …