157 research outputs found
Self-Supervised Intrinsic Image Decomposition
Intrinsic decomposition from a single image is a highly challenging task, due
to its inherent ambiguity and the scarcity of training data. In contrast to
traditional fully supervised learning approaches, in this paper we propose
learning intrinsic image decomposition by explaining the input image. Our
model, the Rendered Intrinsics Network (RIN), joins together an image
decomposition pipeline, which predicts reflectance, shape, and lighting
conditions given a single image, with a recombination function, a learned
shading model used to recompose the original input based off of intrinsic image
predictions. Our network can then use unsupervised reconstruction error as an
additional signal to improve its intermediate representations. This allows
large-scale unlabeled data to be useful during training, and also enables
transferring learned knowledge to images of unseen object categories, lighting
conditions, and shapes. Extensive experiments demonstrate that our method
performs well on both intrinsic image decomposition and knowledge transfer.Comment: NIPS 2017 camera-ready version, project page:
http://rin.csail.mit.edu
LightDepth: Single-View Depth Self-Supervision from Illumination Decline
Single-view depth estimation can be remarkably effective if there is enough
ground-truth depth data for supervised training. However, there are scenarios,
especially in medicine in the case of endoscopies, where such data cannot be
obtained. In such cases, multi-view self-supervision and synthetic-to-real
transfer serve as alternative approaches, however, with a considerable
performance reduction in comparison to supervised case. Instead, we propose a
single-view self-supervised method that achieves a performance similar to the
supervised case. In some medical devices, such as endoscopes, the camera and
light sources are co-located at a small distance from the target surfaces.
Thus, we can exploit that, for any given albedo and surface orientation, pixel
brightness is inversely proportional to the square of the distance to the
surface, providing a strong single-view self-supervisory signal. In our
experiments, our self-supervised models deliver accuracies comparable to those
of fully supervised ones, while being applicable without depth ground-truth
data
Single-image RGB Photometric Stereo With Spatially-varying Albedo
We present a single-shot system to recover surface geometry of objects with
spatially-varying albedos, from images captured under a calibrated RGB
photometric stereo setup---with three light directions multiplexed across
different color channels in the observed RGB image. Since the problem is
ill-posed point-wise, we assume that the albedo map can be modeled as
piece-wise constant with a restricted number of distinct albedo values. We show
that under ideal conditions, the shape of a non-degenerate local constant
albedo surface patch can theoretically be recovered exactly. Moreover, we
present a practical and efficient algorithm that uses this model to robustly
recover shape from real images. Our method first reasons about shape locally in
a dense set of patches in the observed image, producing shape distributions for
every patch. These local distributions are then combined to produce a single
consistent surface normal map. We demonstrate the efficacy of the approach
through experiments on both synthetic renderings as well as real captured
images.Comment: 3DV 2016. Project page at http://www.ttic.edu/chakrabarti/rgbps
Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation
This paper addresses the task of designing a modular neural network
architecture that jointly solves different tasks. As an example we use the
tasks of depth estimation and semantic segmentation given a single RGB image.
The main focus of this work is to analyze the cross-modality influence between
depth and semantic prediction maps on their joint refinement. While most
previous works solely focus on measuring improvements in accuracy, we propose a
way to quantify the cross-modality influence. We show that there is a
relationship between final accuracy and cross-modality influence, although not
a simple linear one. Hence a larger cross-modality influence does not
necessarily translate into an improved accuracy. We find that a beneficial
balance between the cross-modality influences can be achieved by network
architecture and conjecture that this relationship can be utilized to
understand different network design choices. Towards this end we propose a
Convolutional Neural Network (CNN) architecture that fuses the state of the
state-of-the-art results for depth estimation and semantic labeling. By
balancing the cross-modality influences between depth and semantic prediction,
we achieve improved results for both tasks using the NYU-Depth v2 benchmark.Comment: Accepted to ICRA 201
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Intrinsic image decomposition is a challenging, long-standing computer vision
problem for which ground truth data is very difficult to acquire. We explore
the use of synthetic data for training CNN-based intrinsic image decomposition
models, then applying these learned models to real-world images. To that end,
we present \ICG, a new, large-scale dataset of physically-based rendered images
of scenes with full ground truth decompositions. The rendering process we use
is carefully designed to yield high-quality, realistic images, which we find to
be crucial for this problem domain. We also propose a new end-to-end training
method that learns better decompositions by leveraging \ICG, and optionally IIW
and SAW, two recent datasets of sparse annotations on real-world images.
Surprisingly, we find that a decomposition network trained solely on our
synthetic data outperforms the state-of-the-art on both IIW and SAW, and
performance improves even further when IIW and SAW data is added during
training. Our work demonstrates the suprising effectiveness of
carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through
Physically-Based Rendering' published in ECCV, 201
- …