281 research outputs found
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Intrinsic image decomposition is a challenging, long-standing computer vision
problem for which ground truth data is very difficult to acquire. We explore
the use of synthetic data for training CNN-based intrinsic image decomposition
models, then applying these learned models to real-world images. To that end,
we present \ICG, a new, large-scale dataset of physically-based rendered images
of scenes with full ground truth decompositions. The rendering process we use
is carefully designed to yield high-quality, realistic images, which we find to
be crucial for this problem domain. We also propose a new end-to-end training
method that learns better decompositions by leveraging \ICG, and optionally IIW
and SAW, two recent datasets of sparse annotations on real-world images.
Surprisingly, we find that a decomposition network trained solely on our
synthetic data outperforms the state-of-the-art on both IIW and SAW, and
performance improves even further when IIW and SAW data is added during
training. Our work demonstrates the suprising effectiveness of
carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through
Physically-Based Rendering' published in ECCV, 201
CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition
Most of the traditional work on intrinsic image decomposition rely on
deriving priors about scene characteristics. On the other hand, recent research
use deep learning models as in-and-out black box and do not consider the
well-established, traditional image formation process as the basis of their
intrinsic learning process. As a consequence, although current deep learning
approaches show superior performance when considering quantitative benchmark
results, traditional approaches are still dominant in achieving high
qualitative results. In this paper, the aim is to exploit the best of the two
worlds. A method is proposed that (1) is empowered by deep learning
capabilities, (2) considers a physics-based reflection model to steer the
learning process, and (3) exploits the traditional approach to obtain intrinsic
images by exploiting reflectance and shading gradient information. The proposed
model is fast to compute and allows for the integration of all intrinsic
components. To train the new model, an object centered large-scale datasets
with intrinsic ground-truth images are created. The evaluation results
demonstrate that the new model outperforms existing methods. Visual inspection
shows that the image formation loss function augments color reproduction and
the use of gradient information produces sharper edges. Datasets, models and
higher resolution images are available at https://ivi.fnwi.uva.nl/cv/retinet.Comment: CVPR 201
Accidental Pinhole and Pinspeck Cameras
We identify and study two types of “accidental” images that can be formed in scenes. The first is an accidental pinhole camera image. The second class of accidental images are “inverse” pinhole camera images, formed by subtracting an image with a small occluder present from a reference image without the occluder. Both types of accidental cameras happen in a variety of different situations. For example, an indoor scene illuminated by natural light, a street with a person walking under the shadow of a building, etc. The images produced by accidental cameras are often mistaken for shadows or interreflections. However, accidental images can reveal information about the scene outside the image, the lighting conditions, or the aperture by which light enters the scene.National Science Foundation (U.S.) (CAREER Award 0747120)United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933)National Science Foundation (U.S.) (CGV 1111415)National Science Foundation (U.S.) (CGV 0964004
Building colour terms: A combined GIS and stereo vision approach to identifying building pixels in images to determine appropriate colour terms
Color information is a useful attribute to include in a building’s description to assist the listener in identifying the intended target. Often this information is only available as image data, and not readily accessible for use in constructing referring expressions for verbal communication. The method presented uses a GIS building polygon layer in conjunction with street-level captured imagery to provide a method to automatically filter foreground objects and select pixels which correspond to building fac¸ades. These selected pixels are then used to define the most appropriate color term for the building, and corresponding fuzzy color term histogram. The technique uses a single camera capturing images at a high frame rate, with the baseline distance between frames calculated from a GPS speed log. The expected distance from the camera to the building is measured from the polygon layer and refined from the calculated depth map, after which building pixels are selected. In addition significant foreground planar surfaces between the known road edge and building fac¸ade are identified as possible boundarywalls and hedges. The output is a dataset of the most appropriate color terms for both the building and boundary walls. Initial trials demonstrate the usefulness of the technique in automatically capturing color terms for buildings in urban regions
- …