22 research outputs found
Intrinsic Appearance Decomposition Using Point Cloud Representation
Intrinsic decomposition is to infer the albedo and shading from the image.
Since it is a heavily ill-posed problem, previous methods rely on prior
assumptions from 2D images, however, the exploration of the data representation
itself is limited. The point cloud is known as a rich format of scene
representation, which naturally aligns the geometric information and the color
information of an image. Our proposed method, Point Intrinsic Net, in short,
PoInt-Net, jointly predicts the albedo, light source direction, and shading,
using point cloud representation. Experiments reveal the benefits of PoInt-Net,
in terms of accuracy, it outperforms 2D representation approaches on multiple
metrics across datasets; in terms of efficiency, it trains on small-scale point
clouds and performs stably on any-scale point clouds; in terms of robustness,
it only trains on single object level dataset, and demonstrates reasonable
generalization ability for unseen objects and scenes.Comment: 14 pages, 14 figure
Multi-Loss Weighting with Coefficient of Variations
Many interesting tasks in machine learning and computer vision are learned by
optimising an objective function defined as a weighted linear combination of
multiple losses. The final performance is sensitive to choosing the correct
(relative) weights for these losses. Finding a good set of weights is often
done by adopting them into the set of hyper-parameters, which are set using an
extensive grid search. This is computationally expensive. In this paper, we
propose a weighting scheme based on the coefficient of variations and set the
weights based on properties observed while training the model. The proposed
method incorporates a measure of uncertainty to balance the losses, and as a
result the loss weights evolve during training without requiring another
(learning based) optimisation. In contrast to many loss weighting methods in
literature, we focus on single-task multi-loss problems, such as monocular
depth estimation and semantic segmentation, and show that multi-task approaches
for loss weighting do not work on those single-tasks. The validity of the
approach is shown empirically for depth estimation and semantic segmentation on
multiple datasets.Comment: Paper was accepted at the IEEE Winter Conference on Applications of
Computer Vision 2021 (WACV2021
Physics-based Shading Reconstruction for Intrinsic Image Decomposition
We investigate the use of photometric invariance and deep learning to compute
intrinsic images (albedo and shading). We propose albedo and shading gradient
descriptors which are derived from physics-based models. Using the descriptors,
albedo transitions are masked out and an initial sparse shading map is
calculated directly from the corresponding RGB image gradients in a
learning-free unsupervised manner. Then, an optimization method is proposed to
reconstruct the full dense shading map. Finally, we integrate the generated
shading map into a novel deep learning framework to refine it and also to
predict corresponding albedo image to achieve intrinsic image decomposition. By
doing so, we are the first to directly address the texture and intensity
ambiguity problems of the shading estimations. Large scale experiments show
that our approach steered by physics-based invariant descriptors achieve
superior results on MIT Intrinsics, NIR-RGB Intrinsics, Multi-Illuminant
Intrinsic Images, Spectral Intrinsic Images, As Realistic As Possible, and
competitive results on Intrinsic Images in the Wild datasets while achieving
state-of-the-art shading estimations.Comment: Submitted to Computer Vision and Image Understanding (CVIU
EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes
Multimodal large-scale datasets for outdoor scenes are mostly designed for
urban driving problems. The scenes are highly structured and semantically
different from scenarios seen in nature-centered scenes such as gardens or
parks. To promote machine learning methods for nature-oriented applications,
such as agriculture and gardening, we propose the multimodal synthetic dataset
for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images
captured from more than 100 garden models. Each image is annotated with various
low/high-level vision modalities, including semantic segmentation, depth,
surface normals, intrinsic colors, and optical flow. Experimental results on
the state-of-the-art methods for semantic segmentation and monocular depth
prediction, two important tasks in computer vision, show positive impact of
pre-training deep networks on our dataset for unstructured natural scenes. The
dataset and related materials will be available at
https://lhoangan.github.io/eden.Comment: Accepted for publishing at WACV 202
ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
In general, intrinsic image decomposition algorithms interpret shading as one
unified component including all photometric effects. As shading transitions are
generally smoother than reflectance (albedo) changes, these methods may fail in
distinguishing strong photometric effects from reflectance variations.
Therefore, in this paper, we propose to decompose the shading component into
direct (illumination) and indirect shading (ambient light and shadows)
subcomponents. The aim is to distinguish strong photometric effects from
reflectance variations. An end-to-end deep convolutional neural network
(ShadingNet) is proposed that operates in a fine-to-coarse manner with a
specialized fusion and refinement unit exploiting the fine-grained shading
model. It is designed to learn specific reflectance cues separated from
specific photometric effects to analyze the disentanglement capability. A
large-scale dataset of scene-level synthetic images of outdoor natural
environments is provided with fine-grained intrinsic image ground-truths. Large
scale experiments show that our approach using fine-grained shading
decompositions outperforms state-of-the-art algorithms utilizing unified
shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD
datasets.Comment: Submitted to International Journal of Computer Vision (IJCV
PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition
Intrinsic image decomposition is the process of recovering the image
formation components (reflectance and shading) from an image. Previous methods
employ either explicit priors to constrain the problem or implicit constraints
as formulated by their losses (deep learning). These methods can be negatively
influenced by strong illumination conditions causing shading-reflectance
leakages.
Therefore, in this paper, an end-to-end edge-driven hybrid CNN approach is
proposed for intrinsic image decomposition. Edges correspond to illumination
invariant gradients. To handle hard negative illumination transitions, a
hierarchical approach is taken including global and local refinement layers. We
make use of attention layers to further strengthen the learning process.
An extensive ablation study and large scale experiments are conducted showing
that it is beneficial for edge-driven hybrid IID networks to make use of
illumination invariant descriptors and that separating global and local cues
helps in improving the performance of the network. Finally, it is shown that
the proposed method obtains state of the art performance and is able to
generalise well to real world images. The project page with pretrained models,
finetuned models and network code can be found at
https://ivi.fnwi.uva.nl/cv/pienet/