15 research outputs found
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Intrinsic image decomposition is a challenging, long-standing computer vision
problem for which ground truth data is very difficult to acquire. We explore
the use of synthetic data for training CNN-based intrinsic image decomposition
models, then applying these learned models to real-world images. To that end,
we present \ICG, a new, large-scale dataset of physically-based rendered images
of scenes with full ground truth decompositions. The rendering process we use
is carefully designed to yield high-quality, realistic images, which we find to
be crucial for this problem domain. We also propose a new end-to-end training
method that learns better decompositions by leveraging \ICG, and optionally IIW
and SAW, two recent datasets of sparse annotations on real-world images.
Surprisingly, we find that a decomposition network trained solely on our
synthetic data outperforms the state-of-the-art on both IIW and SAW, and
performance improves even further when IIW and SAW data is added during
training. Our work demonstrates the suprising effectiveness of
carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through
Physically-Based Rendering' published in ECCV, 201
VIDIT: Virtual Image Dataset for Illumination Transfer
Deep image relighting is gaining more interest lately, as it allows photo
enhancement through illumination-specific retouching without human effort.
Aside from aesthetic enhancement and photo montage, image relighting is
valuable for domain adaptation, whether to augment datasets for training or to
normalize input test data. Accurate relighting is, however, very challenging
for various reasons, such as the difficulty in removing and recasting shadows
and the modeling of different surfaces. We present a novel dataset, the Virtual
Image Dataset for Illumination Transfer (VIDIT), in an effort to create a
reference evaluation benchmark and to push forward the development of
illumination manipulation methods. Virtual datasets are not only an important
step towards achieving real-image performance but have also proven capable of
improving training even when real datasets are possible to acquire and
available. VIDIT contains 300 virtual scenes used for training, where every
scene is captured 40 times in total: from 8 equally-spaced azimuthal angles,
each lit with 5 different illuminants.Comment: For further information and data, see
https://github.com/majedelhelou/VIDI
Unsupervised Deep Single-Image Intrinsic Decomposition using Illumination-Varying Image Sequences
Machine learning based Single Image Intrinsic Decomposition (SIID) methods
decompose a captured scene into its albedo and shading images by using the
knowledge of a large set of known and realistic ground truth decompositions.
Collecting and annotating such a dataset is an approach that cannot scale to
sufficient variety and realism. We free ourselves from this limitation by
training on unannotated images.
Our method leverages the observation that two images of the same scene but
with different lighting provide useful information on their intrinsic
properties: by definition, albedo is invariant to lighting conditions, and
cross-combining the estimated albedo of a first image with the estimated
shading of a second one should lead back to the second one's input image. We
transcribe this relationship into a siamese training scheme for a deep
convolutional neural network that decomposes a single image into albedo and
shading. The siamese setting allows us to introduce a new loss function
including such cross-combinations, and to train solely on (time-lapse) images,
discarding the need for any ground truth annotations.
As a result, our method has the good properties of i) taking advantage of the
time-varying information of image sequences in the (pre-computed) training
step, ii) not requiring ground truth data to train on, and iii) being able to
decompose single images of unseen scenes at runtime. To demonstrate and
evaluate our work, we additionally propose a new rendered dataset containing
illumination-varying scenes and a set of quantitative metrics to evaluate SIID
algorithms. Despite its unsupervised nature, our results compete with state of
the art methods, including supervised and non data-driven methods.Comment: To appear in Pacific Graphics 201
Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement
Humans use abstract concepts for understanding instead of hard features.
Recent interpretability research has focused on human-centered concept
explanations of neural networks. Concept Activation Vectors (CAVs) estimate a
model's sensitivity and possible biases to a given concept. In this paper, we
extend CAVs from post-hoc analysis to ante-hoc training in order to reduce
model bias through fine-tuning using an additional Concept Loss. Concepts were
defined on the final layer of the network in the past. We generalize it to
intermediate layers using class prototypes. This facilitates class learning in
the last convolution layer, which is known to be most informative. We also
introduce Concept Distillation to create richer concepts using a pre-trained
knowledgeable model as the teacher. Our method can sensitize or desensitize a
model towards concepts. We show applications of concept-sensitive training to
debias several classification problems. We also use concepts to induce prior
knowledge into IID, a reconstruction problem. Concept-sensitive training can
improve model interpretability, reduce biases, and induce prior knowledge.
Please visit https://avani17101.github.io/Concept-Distilllation/ for code and
more details.Comment: Neurips 202
EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes
Multimodal large-scale datasets for outdoor scenes are mostly designed for
urban driving problems. The scenes are highly structured and semantically
different from scenarios seen in nature-centered scenes such as gardens or
parks. To promote machine learning methods for nature-oriented applications,
such as agriculture and gardening, we propose the multimodal synthetic dataset
for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images
captured from more than 100 garden models. Each image is annotated with various
low/high-level vision modalities, including semantic segmentation, depth,
surface normals, intrinsic colors, and optical flow. Experimental results on
the state-of-the-art methods for semantic segmentation and monocular depth
prediction, two important tasks in computer vision, show positive impact of
pre-training deep networks on our dataset for unstructured natural scenes. The
dataset and related materials will be available at
https://lhoangan.github.io/eden.Comment: Accepted for publishing at WACV 202