22,718 research outputs found
Learning Shape Priors for Single-View 3D Completion and Reconstruction
The problem of single-view 3D shape completion or reconstruction is
challenging, because among the many possible shapes that explain an
observation, most are implausible and do not correspond to natural objects.
Recent research in the field has tackled this problem by exploiting the
expressiveness of deep convolutional networks. In fact, there is another level
of ambiguity that is often overlooked: among plausible shapes, there are still
multiple shapes that fit the 2D image equally well; i.e., the ground truth
shape is non-deterministic given a single-view input. Existing fully supervised
approaches fail to address this issue, and often produce blurry mean shapes
with smooth surfaces but no fine details.
In this paper, we propose ShapeHD, pushing the limit of single-view shape
completion and reconstruction by integrating deep generative models with
adversarially learned shape priors. The learned priors serve as a regularizer,
penalizing the model only if its output is unrealistic, not if it deviates from
the ground truth. Our design thus overcomes both levels of ambiguity
aforementioned. Experiments demonstrate that ShapeHD outperforms state of the
art by a large margin in both shape completion and shape reconstruction on
multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work.
Project page: http://shapehd.csail.mit.edu
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Learning to predict scene depth from RGB inputs is a challenging task both
for indoor and outdoor robot navigation. In this work we address unsupervised
learning of scene depth and robot ego-motion where supervision is provided by
monocular videos, as cameras are the cheapest, least restrictive and most
ubiquitous sensor for robotics.
Previous work in unsupervised image-to-depth learning has established strong
baselines in the domain. We propose a novel approach which produces higher
quality results, is able to model moving objects and is shown to transfer
across data domains, e.g. from outdoors to indoor scenes. The main idea is to
introduce geometric structure in the learning process, by modeling the scene
and the individual objects; camera ego-motion and object motions are learned
from monocular videos as input. Furthermore an online refinement method is
introduced to adapt learning on the fly to unknown domains.
The proposed approach outperforms all state-of-the-art approaches, including
those that handle motion e.g. through learned flow. Our results are comparable
in quality to the ones which used stereo as supervision and significantly
improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across
environments, by transferring models trained on data collected for robot
navigation in urban scenes to indoor navigation settings. The code associated
with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19
Weakly supervised 3D Reconstruction with Adversarial Constraint
Supervised 3D reconstruction has witnessed a significant progress through the
use of deep neural networks. However, this increase in performance requires
large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D
supervision as an alternative for expensive 3D CAD annotation. Specifically, we
use foreground masks as weak supervision through a raytrace pooling layer that
enables perspective projection and backpropagation. Additionally, since the 3D
reconstruction from masks is an ill posed problem, we propose to constrain the
3D reconstruction to the manifold of unlabeled realistic 3D shapes that match
mask observations. We demonstrate that learning a log-barrier solution to this
constrained optimization problem resembles the GAN objective, enabling the use
of existing tools for training GANs. We evaluate and analyze the manifold
constrained reconstruction on various datasets for single and multi-view
reconstruction of both synthetic and real images
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
It is common to implicitly assume access to intelligently captured inputs
(e.g., photos from a human photographer), yet autonomously capturing good
observations is itself a major challenge. We address the problem of learning to
look around: if a visual agent has the ability to voluntarily acquire new views
to observe its environment, how can it learn efficient exploratory behaviors to
acquire informative observations? We propose a reinforcement learning solution,
where the agent is rewarded for actions that reduce its uncertainty about the
unobserved portions of its environment. Based on this principle, we develop a
recurrent neural network-based approach to perform active completion of
panoramic natural scenes and 3D object shapes. Crucially, the learned policies
are not tied to any recognition task nor to the particular semantic content
seen during training. As a result, 1) the learned "look around" behavior is
relevant even for new tasks in unseen environments, and 2) training data
acquisition involves no manual labeling. Through tests in diverse settings, we
demonstrate that our approach learns useful generic policies that transfer to
new unseen tasks and environments. Completion episodes are shown at
https://goo.gl/BgWX3W
- …