6,218 research outputs found
Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data
This paper presents a new method for shadow removal using unpaired data,
enabling us to avoid tedious annotations and obtain more diverse training
samples. However, directly employing adversarial learning and cycle-consistency
constraints is insufficient to learn the underlying relationship between the
shadow and shadow-free domains, since the mapping between shadow and
shadow-free images is not simply one-to-one. To address the problem, we
formulate Mask-ShadowGAN, a new deep framework that automatically learns to
produce a shadow mask from the input shadow image and then takes the mask to
guide the shadow generation via re-formulated cycle-consistency constraints.
Particularly, the framework simultaneously learns to produce shadow masks and
learns to remove shadows, to maximize the overall performance. Also, we
prepared an unpaired dataset for shadow removal and demonstrated the
effectiveness of Mask-ShadowGAN on various experiments, even it was trained on
unpaired data.Comment: Accepted to ICCV 201
SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More
The emergence of large models, also known as foundation models, has brought
significant advancements to AI research. One such model is Segment Anything
(SAM), which is designed for image segmentation tasks. However, as with other
foundation models, our experimental findings suggest that SAM may fail or
perform poorly in certain segmentation tasks, such as shadow detection and
camouflaged object detection (concealed object detection). This study first
paves the way for applying the large pre-trained image segmentation model SAM
to these downstream tasks, even in situations where SAM performs poorly. Rather
than fine-tuning the SAM network, we propose \textbf{SAM-Adapter}, which
incorporates domain-specific information or visual prompts into the
segmentation network by using simple yet effective adapters. Our extensive
experiments show that SAM-Adapter can significantly elevate the performance of
SAM in challenging tasks and we can even outperform task-specific network
models and achieve state-of-the-art performance in the task we tested:
camouflaged object detection and shadow detection. We believe our work opens up
opportunities for utilizing SAM in downstream tasks, with potential
applications in various fields, including medical image processing,
agriculture, remote sensing, and more
Don't worry about mistakes! Glass Segmentation Network via Mistake Correction
Recall one time when we were in an unfamiliar mall. We might mistakenly think
that there exists or does not exist a piece of glass in front of us. Such
mistakes will remind us to walk more safely and freely at the same or a similar
place next time. To absorb the human mistake correction wisdom, we propose a
novel glass segmentation network to detect transparent glass, dubbed
GlassSegNet. Motivated by this human behavior, GlassSegNet utilizes two key
stages: the identification stage (IS) and the correction stage (CS). The IS is
designed to simulate the detection procedure of human recognition for
identifying transparent glass by global context and edge information. The CS
then progressively refines the coarse prediction by correcting mistake regions
based on gained experience. Extensive experiments show clear improvements of
our GlassSegNet over thirty-four state-of-the-art methods on three benchmark
datasets
Telephone conversation impairs sustained visual attention via a central bottleneck
Recent research has shown that holding telephone conversations disrupts one's driving ability. We asked whether this effect could be attributed to a visual attention impairment. In Experiment 1, participants conversed on a telephone or listened to a narrative while engaged in multiple object tracking (MOT), a task requiring sustained visual attention. We found that MOT was disrupted in the telephone conversation condition, relative to single-task MOT performance, but that listening to a narrative had no effect. In Experiment 2, we asked which component of conversation might be interfering with MOT performance. We replicated the conversation and single-task conditions of Experiment 1 and added two conditions in which participants heard a sequence of words over a telephone. In the shadowing condition, participants simply repeated each word in the sequence. In the generation condition, participants were asked to generate a new word based on each word in the sequence. Word generation interfered with MOT performance, but shadowing did not. The data indicate that telephone conversation disrupts attention at a central stage, the act of generating verbal stimuli, rather than at a peripheral stage, such as listening or speaking
Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN
Shadow removal is an essential task for scene understanding. Many studies
consider only matching the image contents, which often causes two types of
ghosts: color in-consistencies in shadow regions or artifacts on shadow
boundaries. In this paper, we tackle these issues in two ways. First, to
carefully learn the border artifacts-free image, we propose a novel network
structure named the dual hierarchically aggregation network~(DHAN). It contains
a series of growth dilated convolutions as the backbone without any
down-samplings, and we hierarchically aggregate multi-context features for
attention and prediction, respectively. Second, we argue that training on a
limited dataset restricts the textural understanding of the network, which
leads to the shadow region color in-consistencies. Currently, the largest
dataset contains 2k+ shadow/shadow-free image pairs. However, it has only 0.1k+
unique scenes since many samples share exactly the same background with
different shadow positions. Thus, we design a shadow matting generative
adversarial network~(SMGAN) to synthesize realistic shadow mattings from a
given shadow mask and shadow-free image. With the help of novel masks or
scenes, we enhance the current datasets using synthesized shadow images.
Experiments show that our DHAN can erase the shadows and produce high-quality
ghost-free images. After training on the synthesized and real datasets, our
network outperforms other state-of-the-art methods by a large margin. The code
is available: http://github.com/vinthony/ghost-free-shadow-removal/Comment: Accepted by AAAI 202
Explicit Visual Prompting for Universal Foreground Segmentations
Foreground segmentation is a fundamental problem in computer vision, which
includes salient object detection, forgery detection, defocus blur detection,
shadow detection, and camouflage object detection. Previous works have
typically relied on domain-specific solutions to address accuracy and
robustness issues in those applications. In this paper, we present a unified
framework for a number of foreground segmentation tasks without any
task-specific designs. We take inspiration from the widely-used pre-training
and then prompt tuning protocols in NLP and propose a new visual prompting
model, named Explicit Visual Prompting (EVP). Different from the previous
visual prompting which is typically a dataset-level implicit embedding, our key
insight is to enforce the tunable parameters focusing on the explicit visual
content from each individual image, i.e., the features from frozen patch
embeddings and high-frequency components. Our method freezes a pre-trained
model and then learns task-specific knowledge using a few extra parameters.
Despite introducing only a small number of tunable parameters, EVP achieves
superior performance than full fine-tuning and other parameter-efficient
fine-tuning methods. Experiments in fourteen datasets across five tasks show
the proposed method outperforms other task-specific methods while being
considerably simple. The proposed method demonstrates the scalability in
different architectures, pre-trained weights, and tasks. The code is available
at: https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1088
- …