2 research outputs found
InverseRenderNet: Learning single image inverse rendering
We show how to train a fully convolutional neural network to perform inverse
rendering from a single, uncontrolled image. The network takes an RGB image as
input, regresses albedo and normal maps from which we compute lighting
coefficients. Our network is trained using large uncontrolled image collections
without ground truth. By incorporating a differentiable renderer, our network
can learn from self-supervision. Since the problem is ill-posed we introduce
additional supervision: 1. We learn a statistical natural illumination prior,
2. Our key insight is to perform offline multiview stereo (MVS) on images
containing rich illumination variation. From the MVS pose and depth maps, we
can cross project between overlapping views such that Siamese training can be
used to ensure consistent estimation of photometric invariants. MVS depth also
provides direct coarse supervision for normal map estimation. We believe this
is the first attempt to use MVS supervision for learning inverse rendering
Intrinsic Image Decomposition using Paradigms
Intrinsic image decomposition is the classical task of mapping image to
albedo. The WHDR dataset allows methods to be evaluated by comparing
predictions to human judgements ("lighter", "same as", "darker"). The best
modern intrinsic image methods learn a map from image to albedo using rendered
models and human judgements. This is convenient for practical methods, but
cannot explain how a visual agent without geometric, surface and illumination
models and a renderer could learn to recover intrinsic images.
This paper describes a method that learns intrinsic image decomposition
without seeing WHDR annotations, rendered data, or ground truth data. The
method relies on paradigms - fake albedos and fake shading fields - together
with a novel smoothing procedure that ensures good behavior at short scales on
real images. Long scale error is controlled by averaging. Our method achieves
WHDR scores competitive with those of strong recent methods allowed to see
training WHDR annotations, rendered data, and ground truth data. Because our
method is unsupervised, we can compute estimates of the test/train variance of
WHDR scores; these are quite large, and it is unsafe to rely small differences
in reported WHDR