Search CORE

51 research outputs found

Deep Depth Completion of a Single RGB-D Image

Author: Funkhouser Thomas
Zhang Yinda
Publication venue
Publication date: 01/01/2018
Field of study

The goal of our work is to complete the depth channel of an RGB-D image. Commodity-grade depth cameras often fail to sense depth for shiny, bright, transparent, and distant surfaces. To address this problem, we train a deep network that takes an RGB image as input and predicts dense surface normals and occlusion boundaries. Those predictions are then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation. This method was chosen over others (e.g., inpainting depths directly) as the result of extensive experiments with a new depth completion benchmark dataset, where holes are filled in training data through the rendering of surface reconstructions created from multiview RGB-D scans. Experiments with different network inputs, depth representations, loss functions, optimization methods, inpainting methods, and deep depth estimation networks show that our proposed approach provides better depth completions than these alternatives.Comment: Accepted by CVPR2018 (Spotlight). Project webpage: http://deepcompletion.cs.princeton.edu/ This version includes supplementary materials which provide more implementation details, quantitative evaluation, and qualitative results. Due to file size limit, please check project website for high-res pape

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

Author: Bai Mingru
Izadi Shahram
Kohli Pushmeet
Xiao Jianxiong
Zhang Yinda
Publication venue
Publication date: 01/01/2017
Field of study

While deep neural networks have led to human-level performance on computer vision tasks, they have yet to demonstrate similar gains for holistic scene understanding. In particular, 3D context has been shown to be an extremely important cue for scene understanding - yet very little research has been done on integrating context information with deep models. This paper presents an approach to embed 3D context into the topology of a neural network trained to perform holistic scene understanding. Given a depth image depicting a 3D scene, our network aligns the observed scene with a predefined 3D scene template, and then reasons about the existence and location of each object within the scene template. In doing so, our model recognizes multiple objects in a single forward pass of a 3D convolutional neural network, capturing both global scene and local object information simultaneously. To create training data for this 3D network, we generate partly hallucinated depth images which are rendered by replacing real objects with a repository of CAD models of the same object category. Extensive experiments demonstrate the effectiveness of our algorithm compared to the state-of-the-arts. Source code and data are available at http://deepcontext.cs.princeton.edu.Comment: Accepted by ICCV201

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Author: Funkhouser Thomas
Seff Ari
Song Shuran
Xiao Jianxiong
Yu Fisher
Zhang Yinda
Publication venue
Publication date: 04/06/2016
Field of study

While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry. Large labeled training datasets, expensive and tedious to produce, are required to optimize millions of parameters in deep network models. Lagging behind the growth in model capacity, the available datasets are quickly becoming outdated in terms of size and density. To circumvent this bottleneck, we propose to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop. Starting from a large set of candidate images for each category, we iteratively sample a subset, ask people to label them, classify the others with a trained model, split the set into positives, negatives, and unlabeled based on the classification confidence, and then iterate with the unlabeled set. To assess the effectiveness of this cascading procedure and enable further progress in visual recognition research, we construct a new image dataset, LSUN. It contains around one million labeled images for each of 10 scene categories and 20 object categories. We experiment with training popular convolutional networks and find that they achieve substantial performance gains when trained on this dataset

arXiv.org e-Print Archive

CiteSeerX

FrameBreak: Dramatic Image Extrapolation by Guided Shift-Maps

Author: ZHANG YINDA
Publication venue
Publication date: 18/12/2012
Field of study

Master'sMASTER OF ENGINEERIN

ScholarBank@NUS