292 research outputs found
Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN
Shadow removal is an essential task for scene understanding. Many studies
consider only matching the image contents, which often causes two types of
ghosts: color in-consistencies in shadow regions or artifacts on shadow
boundaries. In this paper, we tackle these issues in two ways. First, to
carefully learn the border artifacts-free image, we propose a novel network
structure named the dual hierarchically aggregation network~(DHAN). It contains
a series of growth dilated convolutions as the backbone without any
down-samplings, and we hierarchically aggregate multi-context features for
attention and prediction, respectively. Second, we argue that training on a
limited dataset restricts the textural understanding of the network, which
leads to the shadow region color in-consistencies. Currently, the largest
dataset contains 2k+ shadow/shadow-free image pairs. However, it has only 0.1k+
unique scenes since many samples share exactly the same background with
different shadow positions. Thus, we design a shadow matting generative
adversarial network~(SMGAN) to synthesize realistic shadow mattings from a
given shadow mask and shadow-free image. With the help of novel masks or
scenes, we enhance the current datasets using synthesized shadow images.
Experiments show that our DHAN can erase the shadows and produce high-quality
ghost-free images. After training on the synthesized and real datasets, our
network outperforms other state-of-the-art methods by a large margin. The code
is available: http://github.com/vinthony/ghost-free-shadow-removal/Comment: Accepted by AAAI 202
Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data
This paper presents a new method for shadow removal using unpaired data,
enabling us to avoid tedious annotations and obtain more diverse training
samples. However, directly employing adversarial learning and cycle-consistency
constraints is insufficient to learn the underlying relationship between the
shadow and shadow-free domains, since the mapping between shadow and
shadow-free images is not simply one-to-one. To address the problem, we
formulate Mask-ShadowGAN, a new deep framework that automatically learns to
produce a shadow mask from the input shadow image and then takes the mask to
guide the shadow generation via re-formulated cycle-consistency constraints.
Particularly, the framework simultaneously learns to produce shadow masks and
learns to remove shadows, to maximize the overall performance. Also, we
prepared an unpaired dataset for shadow removal and demonstrated the
effectiveness of Mask-ShadowGAN on various experiments, even it was trained on
unpaired data.Comment: Accepted to ICCV 201
Enhanced Boundary Learning for Glass-like Object Segmentation
Glass-like objects such as windows, bottles, and mirrors exist widely in the
real world. Sensing these objects has many applications, including robot
navigation and grasping. However, this task is very challenging due to the
arbitrary scenes behind glass-like objects. This paper aims to solve the
glass-like object segmentation problem via enhanced boundary learning. In
particular, we first propose a novel refined differential module that outputs
finer boundary cues. We then introduce an edge-aware point-based graph
convolution network module to model the global shape along the boundary. We use
these two modules to design a decoder that generates accurate and clean
segmentation results, especially on the object contours. Both modules are
lightweight and effective: they can be embedded into various segmentation
models. In extensive experiments on three recent glass-like object segmentation
datasets, including Trans10k, MSD, and GDD, our approach establishes new
state-of-the-art results. We also illustrate the strong generalization
properties of our method on three generic segmentation datasets, including
Cityscapes, BDD, and COCO Stuff. Code and models is available at
\url{https://github.com/hehao13/EBLNet}.Comment: ICCV-2021 Code is availabe at https://github.com/hehao13/EBLNe
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
Shadows in videos are difficult to detect because of the large shadow
deformation between frames. In this work, we argue that accounting for shadow
deformation is essential when designing a video shadow detection method. To
this end, we introduce the shadow deformation attention trajectory (SODA), a
new type of video self-attention module, specially designed to handle the large
shadow deformations in videos. Moreover, we present a new shadow contrastive
learning mechanism (SCOTCH) which aims at guiding the network to learn a
unified shadow representation from massive positive shadow pairs across
different videos. We demonstrate empirically the effectiveness of our two
contributions in an ablation study. Furthermore, we show that SCOTCH and SODA
significantly outperforms existing techniques for video shadow detection. Code
is available at the project page:
https://lihaoliu-cambridge.github.io/scotch_and_soda/Comment: Accepted to CVPR 202
Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization
Pedestrian attribute recognition has been an emerging research topic in the
area of video surveillance. To predict the existence of a particular attribute,
it is demanded to localize the regions related to the attribute. However, in
this task, the region annotations are not available. How to carve out these
attribute-related regions remains challenging. Existing methods applied
attribute-agnostic visual attention or heuristic body-part localization
mechanisms to enhance the local feature representations, while neglecting to
employ attributes to define local feature areas. We propose a flexible
Attribute Localization Module (ALM) to adaptively discover the most
discriminative regions and learns the regional features for each attribute at
multiple levels. Moreover, a feature pyramid architecture is also introduced to
enhance the attribute-specific localization at low-levels with high-level
semantic guidance. The proposed framework does not require additional region
annotations and can be trained end-to-end with multi-level deep supervision.
Extensive experiments show that the proposed method achieves state-of-the-art
results on three pedestrian attribute datasets, including PETA, RAP, and
PA-100K.Comment: Accepted by ICCV 201
- …