912 research outputs found

    USING CNNS TO UNDERSTAND LIGHTING WITHOUT REAL LABELED TRAINING DATA

    Get PDF
    The task of computer vision is to make computers understand the physical word through images. Lighting is the medium through which we capture images of the physical world. Without lighting, there is no image, and dierent lighting leads to dierent images of the same physical world. In this dissertation, we study how to understand lighting from images. With the emergence of large datasets and deep learning in recent years, learning based methods play a more and more important role in computer vision, and deep Convolutional Neural Networks (CNNs) now dominate most of the problems in computer vision. Despite their success, deep CNNs are notorious for their data hungry nature compared with traditional learning based methods. While collecting images from the internet is easy and fast, labeling those images is both time consuming and expensive, and sometimes, even impossible. In this work, we focus on understanding lighting from faces and natural scenes, in which ground truth labels of the lighting are impossible to achieve. As a preliminary topic, we rst study the capacity of deep CNNs. Designing deep CNNs with less capacity and good generalization is one way to reduce the amount of labeled data needed in training deep CNNs, and understanding the capacity of deep CNNs is the rst step towards that goal. In this work, we empirically study the capacity of deep CNNs by studying the redundancy of parameters in them. More specically, we aim at optimizing the number of neurons in a network, thus the number of parameters. To achieve that goal, we incorporate sparse constraints into the objective function and apply a forward-backward splitting method to solve this sparse constrained optimization problem eciently. The proposed method can signicantly reduce the number of parameters, showing that networks with small capacity can work well. We then study an important problem in computer vision: inverse lighting from a single face image. Lacking massive ground truth lighting labels, we generate a large amount of synthetic data with ground truth lighting to train a deep network. However, due to the large domain gap between real and synthetic data, the network trained using synthetic data cannot generalize well to real data. We thus propose to use real data to train the deep CNN together with synthetic data. We apply an existing method to estimate lighting conditions of real face images. However, these lighting labels are noisy. We then propose a Label Denoising Adversarial Network (LDAN) to make use of these synthetic data to help train a deep CNN to regress lighting from real face images, denoising labels of real images. We have shown that the proposed method can generate more consistent lighting for faces taken under the same lighting condition. Third, we study how to relight a face image using deep CNNs. We formulate this problem as a supervised image to image translation problem. Due to the lack of a "in the wild" face dataset that is suitable for this task, we apply a physically based face relighting method to generate a large scale, high resolution, "in the wild" portrait relighting dataset (DPR). A deep Convolutional Neural Network (CNN) is then trained using this dataset to generate a relighted portrait image by taking a source image and a target lighting as input. We show that our training procedure can regularize the generated results, removing the artifacts caused by physically-based relighting methods. Fourth, we study how to understand lighting from a natural scene based on an RGB image. We propose a Global-Local Spherical Harmonics (GLoSH) lighting model to improve the lighting representation, and jointly predict refectance and surface normals. The global SH models the holistic lighting while local SHs account for the spatial variation of lighting. A novel non-negative lighting constraint is proposed to encourage the estimated SHs to be physically meaningful. To seamlessly make use of the GLoSH model, we design a coarse-to-ne network structure. Lacking labels for refectance and lighting, we apply synthetic data for model pre-training and fine-tune the model with real data in a self-supervised way. We have shown that the proposed method outperforms state-of-the-art methods in understanding lighting, refectance and shading of a natural scene

    Transfer of albedo and local depth variation to photo-textures

    Get PDF
    Acquisition of displacement and albedo maps for full building façades is a difficult problem and traditionally achieved through a labor intensive artistic process. In this paper, we present a material appearance transfer method, Transfer by Analogy, designed to infer surface detail and diffuse reflectance for textured surfaces like the present in building façades. We begin by acquiring small exemplars (displacement and albedo maps), in accessible areas, where capture conditions can be controlled. We then transfer these properties to a complete phototexture constructed from reference images and captured under diffuse daylight illumination. Our approach allows super-resolution inference of albedo and displacement from information in the photo-texture. When transferring appearance from multiple exemplars to façades containing multiple materials, our approach also sidesteps the need for segmentation. We show how we use these methods to create relightable models with a high degree of texture detail, reproducing the visually rich self-shadowing effects that would normally be difficult to capture using just simple consumer equipment. Copyright © 2012 by the Association for Computing Machinery, Inc

    Toward color image segmentation in analog VLSI: Algorithm and hardware

    Get PDF
    Standard techniques for segmenting color images are based on finding normalized RGB discontinuities, color histogramming, or clustering techniques in RGB or CIE color spaces. The use of the psychophysical variable hue in HSI space has not been popular due to its numerical instability at low saturations. In this article, we propose the use of a simplified hue description suitable for implementation in analog VLSI. We demonstrate that if theintegrated white condition holds, hue is invariant to certain types of highlights, shading, and shadows. This is due to theadditive/shift invariance property, a property that other color variables lack. The more restrictive uniformly varying lighting model associated with themultiplicative/scale invariance property shared by both hue and normalized RGB allows invariance to transparencies, and to simple models of shading and shadows. Using binary hue discontinuities in conjunction with first-order type of surface interpolation, we demonstrate these invariant properties and compare them against the performance of RGB, normalized RGB, and CIE color spaces. We argue that working in HSI space offers an effective method for segmenting scenes in the presence of confounding cues due to shading, transparency, highlights, and shadows. Based on this work, we designed and fabricated for the first time an analog CMOS VLSI circuit with on-board phototransistor input that computes normalized color and hue

    Category-Specific Object Reconstruction from a Single Image

    Full text link
    Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.Comment: First two authors contributed equally. To appear at CVPR 201

    Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View

    Get PDF
    We propose a method for predicting the 3D shape of a deformable surface from a single view. By contrast with previous approaches, we do not need a pre-registered template of the surface, and our method is robust to the lack of texture and partial occlusions. At the core of our approach is a {\it geometry-aware} deep architecture that tackles the problem as usually done in analytic solutions: first perform 2D detection of the mesh and then estimate a 3D shape that is geometrically consistent with the image. We train this architecture in an end-to-end manner using a large dataset of synthetic renderings of shapes under different levels of deformation, material properties, textures and lighting conditions. We evaluate our approach on a test split of this dataset and available real benchmarks, consistently improving state-of-the-art solutions with a significantly lower computational time.Comment: Accepted at CVPR 201
    corecore