157 research outputs found

    Self-Supervised Intrinsic Image Decomposition

    Full text link
    Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data. In contrast to traditional fully supervised learning approaches, in this paper we propose learning intrinsic image decomposition by explaining the input image. Our model, the Rendered Intrinsics Network (RIN), joins together an image decomposition pipeline, which predicts reflectance, shape, and lighting conditions given a single image, with a recombination function, a learned shading model used to recompose the original input based off of intrinsic image predictions. Our network can then use unsupervised reconstruction error as an additional signal to improve its intermediate representations. This allows large-scale unlabeled data to be useful during training, and also enables transferring learned knowledge to images of unseen object categories, lighting conditions, and shapes. Extensive experiments demonstrate that our method performs well on both intrinsic image decomposition and knowledge transfer.Comment: NIPS 2017 camera-ready version, project page: http://rin.csail.mit.edu

    LightDepth: Single-View Depth Self-Supervision from Illumination Decline

    Full text link
    Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction in comparison to supervised case. Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. In some medical devices, such as endoscopes, the camera and light sources are co-located at a small distance from the target surfaces. Thus, we can exploit that, for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance to the surface, providing a strong single-view self-supervisory signal. In our experiments, our self-supervised models deliver accuracies comparable to those of fully supervised ones, while being applicable without depth ground-truth data

    Single-image RGB Photometric Stereo With Spatially-varying Albedo

    Full text link
    We present a single-shot system to recover surface geometry of objects with spatially-varying albedos, from images captured under a calibrated RGB photometric stereo setup---with three light directions multiplexed across different color channels in the observed RGB image. Since the problem is ill-posed point-wise, we assume that the albedo map can be modeled as piece-wise constant with a restricted number of distinct albedo values. We show that under ideal conditions, the shape of a non-degenerate local constant albedo surface patch can theoretically be recovered exactly. Moreover, we present a practical and efficient algorithm that uses this model to robustly recover shape from real images. Our method first reasons about shape locally in a dense set of patches in the observed image, producing shape distributions for every patch. These local distributions are then combined to produce a single consistent surface normal map. We demonstrate the efficacy of the approach through experiments on both synthetic renderings as well as real captured images.Comment: 3DV 2016. Project page at http://www.ttic.edu/chakrabarti/rgbps

    Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation

    Full text link
    This paper addresses the task of designing a modular neural network architecture that jointly solves different tasks. As an example we use the tasks of depth estimation and semantic segmentation given a single RGB image. The main focus of this work is to analyze the cross-modality influence between depth and semantic prediction maps on their joint refinement. While most previous works solely focus on measuring improvements in accuracy, we propose a way to quantify the cross-modality influence. We show that there is a relationship between final accuracy and cross-modality influence, although not a simple linear one. Hence a larger cross-modality influence does not necessarily translate into an improved accuracy. We find that a beneficial balance between the cross-modality influences can be achieved by network architecture and conjecture that this relationship can be utilized to understand different network design choices. Towards this end we propose a Convolutional Neural Network (CNN) architecture that fuses the state of the state-of-the-art results for depth estimation and semantic labeling. By balancing the cross-modality influences between depth and semantic prediction, we achieve improved results for both tasks using the NYU-Depth v2 benchmark.Comment: Accepted to ICRA 201

    CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering

    Full text link
    Intrinsic image decomposition is a challenging, long-standing computer vision problem for which ground truth data is very difficult to acquire. We explore the use of synthetic data for training CNN-based intrinsic image decomposition models, then applying these learned models to real-world images. To that end, we present \ICG, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. The rendering process we use is carefully designed to yield high-quality, realistic images, which we find to be crucial for this problem domain. We also propose a new end-to-end training method that learns better decompositions by leveraging \ICG, and optionally IIW and SAW, two recent datasets of sparse annotations on real-world images. Surprisingly, we find that a decomposition network trained solely on our synthetic data outperforms the state-of-the-art on both IIW and SAW, and performance improves even further when IIW and SAW data is added during training. Our work demonstrates the suprising effectiveness of carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering' published in ECCV, 201
    corecore