1,750 research outputs found
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
Fast Fourier Intrinsic Network
We address the problem of decomposing an image into albedo and shading. We
propose the Fast Fourier Intrinsic Network, FFI-Net in short, that operates in
the spectral domain, splitting the input into several spectral bands. Weights
in FFI-Net are optimized in the spectral domain, allowing faster convergence to
a lower error. FFI-Net is lightweight and does not need auxiliary networks for
training. The network is trained end-to-end with a novel spectral loss which
measures the global distance between the network prediction and corresponding
ground truth. FFI-Net achieves state-of-the-art performance on MPI-Sintel, MIT
Intrinsic, and IIW datasets.Comment: WACV 2021 - camera read
DPF: Learning Dense Prediction Fields with Weak Supervision
Nowadays, many visual scene understanding problems are addressed by dense
prediction networks. But pixel-wise dense annotations are very expensive (e.g.,
for scene parsing) or impossible (e.g., for intrinsic image decomposition),
motivating us to leverage cheap point-level weak supervision. However, existing
pointly-supervised methods still use the same architecture designed for full
supervision. In stark contrast to them, we propose a new paradigm that makes
predictions for point coordinate queries, as inspired by the recent success of
implicit representations, like distance or radiance fields. As such, the method
is named as dense prediction fields (DPFs). DPFs generate expressive
intermediate features for continuous sub-pixel locations, thus allowing outputs
of an arbitrary resolution. DPFs are naturally compatible with point-level
supervision. We showcase the effectiveness of DPFs using two substantially
different tasks: high-level semantic parsing and low-level intrinsic image
decomposition. In these two cases, supervision comes in the form of
single-point semantic category and two-point relative reflectance,
respectively. As benchmarked by three large-scale public datasets
PASCALContext, ADE20K and IIW, DPFs set new state-of-the-art performance on all
of them with significant margins.
Code can be accessed at https://github.com/cxx226/DPF
Estimating Reflectance Layer from A Single Image: Integrating Reflectance Guidance and Shadow/Specular Aware Learning
Estimating reflectance layer from a single image is a challenging task. It
becomes more challenging when the input image contains shadows or specular
highlights, which often render an inaccurate estimate of the reflectance layer.
Therefore, we propose a two-stage learning method, including reflectance
guidance and a Shadow/Specular-Aware (S-Aware) network to tackle the problem.
In the first stage, an initial reflectance layer free from shadows and
specularities is obtained with the constraint of novel losses that are guided
by prior-based shadow-free and specular-free images. To further enforce the
reflectance layer to be independent from shadows and specularities in the
second-stage refinement, we introduce an S-Aware network that distinguishes the
reflectance image from the input image. Our network employs a classifier to
categorize shadow/shadow-free, specular/specular-free classes, enabling the
activation features to function as attention maps that focus on shadow/specular
regions. Our quantitative and qualitative evaluations show that our method
outperforms the state-of-the-art methods in the reflectance layer estimation
that is free from shadows and specularities.Comment: Accepted to AAAI202
- …