164 research outputs found
CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition
Most of the traditional work on intrinsic image decomposition rely on
deriving priors about scene characteristics. On the other hand, recent research
use deep learning models as in-and-out black box and do not consider the
well-established, traditional image formation process as the basis of their
intrinsic learning process. As a consequence, although current deep learning
approaches show superior performance when considering quantitative benchmark
results, traditional approaches are still dominant in achieving high
qualitative results. In this paper, the aim is to exploit the best of the two
worlds. A method is proposed that (1) is empowered by deep learning
capabilities, (2) considers a physics-based reflection model to steer the
learning process, and (3) exploits the traditional approach to obtain intrinsic
images by exploiting reflectance and shading gradient information. The proposed
model is fast to compute and allows for the integration of all intrinsic
components. To train the new model, an object centered large-scale datasets
with intrinsic ground-truth images are created. The evaluation results
demonstrate that the new model outperforms existing methods. Visual inspection
shows that the image formation loss function augments color reproduction and
the use of gradient information produces sharper edges. Datasets, models and
higher resolution images are available at https://ivi.fnwi.uva.nl/cv/retinet.Comment: CVPR 201
MorphPool: Efficient Non-linear Pooling & Unpooling in CNNs
Pooling is essentially an operation from the field of Mathematical
Morphology, with max pooling as a limited special case. The more general
setting of MorphPooling greatly extends the tool set for building neural
networks. In addition to pooling operations, encoder-decoder networks used for
pixel-level predictions also require unpooling. It is common to combine
unpooling with convolution or deconvolution for up-sampling. However, using its
morphological properties, unpooling can be generalised and improved. Extensive
experimentation on two tasks and three large-scale datasets shows that
morphological pooling and unpooling lead to improved predictive performance at
much reduced parameter counts.Comment: Accepted paper at the British Machine Vision Conference (BMVC) 202
Intrinsic Appearance Decomposition Using Point Cloud Representation
Intrinsic decomposition is to infer the albedo and shading from the image.
Since it is a heavily ill-posed problem, previous methods rely on prior
assumptions from 2D images, however, the exploration of the data representation
itself is limited. The point cloud is known as a rich format of scene
representation, which naturally aligns the geometric information and the color
information of an image. Our proposed method, Point Intrinsic Net, in short,
PoInt-Net, jointly predicts the albedo, light source direction, and shading,
using point cloud representation. Experiments reveal the benefits of PoInt-Net,
in terms of accuracy, it outperforms 2D representation approaches on multiple
metrics across datasets; in terms of efficiency, it trains on small-scale point
clouds and performs stably on any-scale point clouds; in terms of robustness,
it only trains on single object level dataset, and demonstrates reasonable
generalization ability for unseen objects and scenes.Comment: 14 pages, 14 figure
Multi-Loss Weighting with Coefficient of Variations
Many interesting tasks in machine learning and computer vision are learned by
optimising an objective function defined as a weighted linear combination of
multiple losses. The final performance is sensitive to choosing the correct
(relative) weights for these losses. Finding a good set of weights is often
done by adopting them into the set of hyper-parameters, which are set using an
extensive grid search. This is computationally expensive. In this paper, we
propose a weighting scheme based on the coefficient of variations and set the
weights based on properties observed while training the model. The proposed
method incorporates a measure of uncertainty to balance the losses, and as a
result the loss weights evolve during training without requiring another
(learning based) optimisation. In contrast to many loss weighting methods in
literature, we focus on single-task multi-loss problems, such as monocular
depth estimation and semantic segmentation, and show that multi-task approaches
for loss weighting do not work on those single-tasks. The validity of the
approach is shown empirically for depth estimation and semantic segmentation on
multiple datasets.Comment: Paper was accepted at the IEEE Winter Conference on Applications of
Computer Vision 2021 (WACV2021
Physics-based Shading Reconstruction for Intrinsic Image Decomposition
We investigate the use of photometric invariance and deep learning to compute
intrinsic images (albedo and shading). We propose albedo and shading gradient
descriptors which are derived from physics-based models. Using the descriptors,
albedo transitions are masked out and an initial sparse shading map is
calculated directly from the corresponding RGB image gradients in a
learning-free unsupervised manner. Then, an optimization method is proposed to
reconstruct the full dense shading map. Finally, we integrate the generated
shading map into a novel deep learning framework to refine it and also to
predict corresponding albedo image to achieve intrinsic image decomposition. By
doing so, we are the first to directly address the texture and intensity
ambiguity problems of the shading estimations. Large scale experiments show
that our approach steered by physics-based invariant descriptors achieve
superior results on MIT Intrinsics, NIR-RGB Intrinsics, Multi-Illuminant
Intrinsic Images, Spectral Intrinsic Images, As Realistic As Possible, and
competitive results on Intrinsic Images in the Wild datasets while achieving
state-of-the-art shading estimations.Comment: Submitted to Computer Vision and Image Understanding (CVIU
- …