34 research outputs found
CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition
Most of the traditional work on intrinsic image decomposition rely on
deriving priors about scene characteristics. On the other hand, recent research
use deep learning models as in-and-out black box and do not consider the
well-established, traditional image formation process as the basis of their
intrinsic learning process. As a consequence, although current deep learning
approaches show superior performance when considering quantitative benchmark
results, traditional approaches are still dominant in achieving high
qualitative results. In this paper, the aim is to exploit the best of the two
worlds. A method is proposed that (1) is empowered by deep learning
capabilities, (2) considers a physics-based reflection model to steer the
learning process, and (3) exploits the traditional approach to obtain intrinsic
images by exploiting reflectance and shading gradient information. The proposed
model is fast to compute and allows for the integration of all intrinsic
components. To train the new model, an object centered large-scale datasets
with intrinsic ground-truth images are created. The evaluation results
demonstrate that the new model outperforms existing methods. Visual inspection
shows that the image formation loss function augments color reproduction and
the use of gradient information produces sharper edges. Datasets, models and
higher resolution images are available at https://ivi.fnwi.uva.nl/cv/retinet.Comment: CVPR 201
Prior to Segment: Foreground Cues for Weakly Annotated Classes in Partially Supervised Instance Segmentation
Instance segmentation methods require large datasets with expensive and thus
limited instance-level mask labels. Partially supervised instance segmentation
aims to improve mask prediction with limited mask labels by utilizing the more
abundant weak box labels. In this work, we show that a class agnostic mask
head, commonly used in partially supervised instance segmentation, has
difficulties learning a general concept of foreground for the weakly annotated
classes using box supervision only. To resolve this problem we introduce an
object mask prior (OMP) that provides the mask head with the general concept of
foreground implicitly learned by the box classification head under the
supervision of all classes. This helps the class agnostic mask head to focus on
the primary object in a region of interest (RoI) and improves generalization to
the weakly annotated classes. We test our approach on the COCO dataset using
different splits of strongly and weakly supervised classes. Our approach
significantly improves over the Mask R-CNN baseline and obtains competitive
performance with the state-of-the-art, while offering a much simpler
architecture
Physics-based Shading Reconstruction for Intrinsic Image Decomposition
We investigate the use of photometric invariance and deep learning to compute
intrinsic images (albedo and shading). We propose albedo and shading gradient
descriptors which are derived from physics-based models. Using the descriptors,
albedo transitions are masked out and an initial sparse shading map is
calculated directly from the corresponding RGB image gradients in a
learning-free unsupervised manner. Then, an optimization method is proposed to
reconstruct the full dense shading map. Finally, we integrate the generated
shading map into a novel deep learning framework to refine it and also to
predict corresponding albedo image to achieve intrinsic image decomposition. By
doing so, we are the first to directly address the texture and intensity
ambiguity problems of the shading estimations. Large scale experiments show
that our approach steered by physics-based invariant descriptors achieve
superior results on MIT Intrinsics, NIR-RGB Intrinsics, Multi-Illuminant
Intrinsic Images, Spectral Intrinsic Images, As Realistic As Possible, and
competitive results on Intrinsic Images in the Wild datasets while achieving
state-of-the-art shading estimations.Comment: Submitted to Computer Vision and Image Understanding (CVIU
Three for one and one for three: Flow, Segmentation, and Surface Normals
Optical flow, semantic segmentation, and surface normals represent different
information modalities, yet together they bring better cues for scene
understanding problems. In this paper, we study the influence between the three
modalities: how one impacts on the others and their efficiency in combination.
We employ a modular approach using a convolutional refinement network which is
trained supervised but isolated from RGB images to enforce joint modality
features. To assist the training process, we create a large-scale synthetic
outdoor dataset that supports dense annotation of semantic segmentation,
optical flow, and surface normals. The experimental results show positive
influence among the three modalities, especially for objects' boundaries,
region consistency, and scene structures.Comment: BMVC 201
ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
In general, intrinsic image decomposition algorithms interpret shading as one
unified component including all photometric effects. As shading transitions are
generally smoother than reflectance (albedo) changes, these methods may fail in
distinguishing strong photometric effects from reflectance variations.
Therefore, in this paper, we propose to decompose the shading component into
direct (illumination) and indirect shading (ambient light and shadows)
subcomponents. The aim is to distinguish strong photometric effects from
reflectance variations. An end-to-end deep convolutional neural network
(ShadingNet) is proposed that operates in a fine-to-coarse manner with a
specialized fusion and refinement unit exploiting the fine-grained shading
model. It is designed to learn specific reflectance cues separated from
specific photometric effects to analyze the disentanglement capability. A
large-scale dataset of scene-level synthetic images of outdoor natural
environments is provided with fine-grained intrinsic image ground-truths. Large
scale experiments show that our approach using fine-grained shading
decompositions outperforms state-of-the-art algorithms utilizing unified
shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD
datasets.Comment: Submitted to International Journal of Computer Vision (IJCV
Automatic generation of dense non-rigid optical flow
There hardly exists any large-scale datasets with dense optical flow of
non-rigid motion from real-world imagery as of today. The reason lies mainly in
the difficulty of human annotation to generate optical flow ground-truth. To
circumvent the need for human annotation, we propose a framework to
automatically generate optical flow from real-world videos. The method extracts
and matches objects from video frames to compute initial constraints, and
applies a deformation over the objects of interest to obtain dense optical flow
fields. We propose several ways to augment the optical flow variations.
Extensive experimental results show that training on our automatically
generated optical flow outperforms methods that are trained on rigid synthetic
data using FlowNet-S, PWC-Net, and LiteFlowNet. Datasets and algorithms of our
optical flow generation framework is available at
https://github.com/lhoangan/arap_flow.Comment: The paper is under consideration at Computer Vision and Image
Understandin