204 research outputs found
Learning Segmentation Masks with the Independence Prior
An instance with a bad mask might make a composite image that uses it look
fake. This encourages us to learn segmentation by generating realistic
composite images. To achieve this, we propose a novel framework that exploits a
new proposed prior called the independence prior based on Generative
Adversarial Networks (GANs). The generator produces an image with multiple
category-specific instance providers, a layout module and a composition module.
Firstly, each provider independently outputs a category-specific instance image
with a soft mask. Then the provided instances' poses are corrected by the
layout module. Lastly, the composition module combines these instances into a
final image. Training with adversarial loss and penalty for mask area, each
provider learns a mask that is as small as possible but enough to cover a
complete category-specific instance. Weakly supervised semantic segmentation
methods widely use grouping cues modeling the association between image parts,
which are either artificially designed or learned with costly segmentation
labels or only modeled on local pairs. Unlike them, our method automatically
models the dependence between any parts and learns instance segmentation. We
apply our framework in two cases: (1) Foreground segmentation on
category-specific images with box-level annotation. (2) Unsupervised learning
of instance appearances and masks with only one image of homogeneous object
cluster (HOC). We get appealing results in both tasks, which shows the
independence prior is useful for instance segmentation and it is possible to
unsupervisedly learn instance masks with only one image.Comment: 7+5 pages, 13 figures, Accepted to AAAI 201
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation
Weakly-supervised semantic segmentation (WSSS) using image-level labels has
recently attracted much attention for reducing annotation costs. Existing WSSS
methods utilize localization maps from the classification network to generate
pseudo segmentation labels. However, since localization maps obtained from the
classifier focus only on sparse discriminative object regions, it is difficult
to generate high-quality segmentation labels. To address this issue, we
introduce discriminative region suppression (DRS) module that is a simple yet
effective method to expand object activation regions. DRS suppresses the
attention on discriminative regions and spreads it to adjacent
non-discriminative regions, generating dense localization maps. DRS requires
few or no additional parameters and can be plugged into any network.
Furthermore, we introduce an additional learning strategy to give a
self-enhancement of localization maps, named localization map refinement
learning. Benefiting from this refinement learning, localization maps are
refined and enhanced by recovering some missing parts or removing noise itself.
Due to its simplicity and effectiveness, our approach achieves mIoU 71.4% on
the PASCAL VOC 2012 segmentation benchmark using only image-level labels.
Extensive experiments demonstrate the effectiveness of our approach. The code
is available at https://github.com/qjadud1994/DRS.Comment: AAAI 2021, Accepte
P-NOC: Adversarial CAM Generation for Weakly Supervised Semantic Segmentation
To mitigate the necessity for large amounts of supervised segmentation
annotation sets, multiple Weakly Supervised Semantic Segmentation (WSSS)
strategies have been devised. These will often rely on advanced data and model
regularization strategies to instigate the development of useful properties
(e.g., prediction completeness and fidelity to semantic boundaries) in
segmentation priors, notwithstanding the lack of annotated information. In this
work, we first create a strong baseline by analyzing complementary WSSS
techniques and regularizing strategies, considering their strengths and
limitations. We then propose a new Class-specific Adversarial Erasing strategy,
comprising two adversarial CAM generating networks being gradually refined to
produce robust semantic segmentation proposals. Empirical results suggest that
our approach induces substantial improvement in the effectiveness of the
baseline, resulting in a noticeable improvement over both Pascal VOC 2012 and
MS COCO 2014 datasets.Comment: 19 pages, 10 figure
Confidence-and-Refinement Adaptation Model for Cross-Domain Semantic Segmentation
With the rapid development of convolutional neural networks (CNNs), significant progress has been achieved in semantic segmentation. Despite the great success, such deep learning approaches require large scale real-world datasets with pixel-level annotations. However, considering that pixel-level labeling of semantics is extremely laborious, many researchers turn to utilize synthetic data with free annotations. But due to the clear domain gap, the segmentation model trained with the synthetic images tends to perform poorly on the real-world datasets. Unsupervised domain adaptation (UDA) for semantic segmentation recently gains an increasing research attention, which aims at alleviating the domain discrepancy. Existing methods in this scope either simply align features or the outputs across the source and target domains or have to deal with the complex image processing and post-processing problems. In this work, we propose a novel multi-level UDA model named Confidence-and-Refinement Adaptation Model (CRAM), which contains a confidence-aware entropy alignment (CEA) module and a style feature alignment (SFA) module. Through CEA, the adaptation is done locally via adversarial learning in the output space, making the segmentation model pay attention to the high-confident predictions. Furthermore, to enhance the model transfer in the shallow feature space, the SFA module is applied to minimize the appearance gap across domains. Experiments on two challenging UDA benchmarks ``GTA5-to-Cityscapes'' and ``SYNTHIA-to-Cityscapes'' demonstrate the effectiveness of CRAM. We achieve comparable performance with the existing state-of-the-art works with advantages in simplicity and convergence speed
A Survey on Label-efficient Deep Image Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction
The rapid development of deep learning has made a great progress in image
segmentation, one of the fundamental tasks of computer vision. However, the
current segmentation algorithms mostly rely on the availability of pixel-level
annotations, which are often expensive, tedious, and laborious. To alleviate
this burden, the past years have witnessed an increasing attention in building
label-efficient, deep-learning-based image segmentation algorithms. This paper
offers a comprehensive review on label-efficient image segmentation methods. To
this end, we first develop a taxonomy to organize these methods according to
the supervision provided by different types of weak labels (including no
supervision, inexact supervision, incomplete supervision and inaccurate
supervision) and supplemented by the types of segmentation problems (including
semantic segmentation, instance segmentation and panoptic segmentation). Next,
we summarize the existing label-efficient image segmentation methods from a
unified perspective that discusses an important question: how to bridge the gap
between weak supervision and dense prediction -- the current methods are mostly
based on heuristic priors, such as cross-pixel similarity, cross-label
constraint, cross-view consistency, and cross-image relation. Finally, we share
our opinions about the future research directions for label-efficient deep
image segmentation.Comment: Accepted to IEEE TPAM
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
ZeroWaste Dataset: Towards Deformable Object Segmentation in Extreme Clutter
Less than 35% of recyclable waste is being actually recycled in the US, which
leads to increased soil and sea pollution and is one of the major concerns of
environmental researchers as well as the common public. At the heart of the
problem are the inefficiencies of the waste sorting process (separating paper,
plastic, metal, glass, etc.) due to the extremely complex and cluttered nature
of the waste stream. Automated waste detection has great potential to enable
more efficient, reliable, and safe waste sorting practices, but it requires
label-efficient detection of deformable objects in extremely cluttered scenes.
This challenging computer vision task currently lacks suitable datasets or
methods in the available literature. In this paper, we take a step towards
computer-aided waste detection and present the first in-the-wild
industrial-grade waste detection and segmentation dataset, ZeroWaste. This
dataset contains over 1800 fully segmented video frames collected from a real
waste sorting plant along with waste material labels for training and
evaluation of the segmentation methods, as well as over 6000 unlabeled frames
that can be further used for semi-supervised and self-supervised learning
techniques, as well as frames of the conveyor belt before and after the sorting
process, comprising a novel setup that can be used for weakly-supervised
segmentation. Our experimental results demonstrate that state-of-the-art
segmentation methods struggle to correctly detect and classify target objects
which suggests the challenging nature of our proposed real-world task of
fine-grained object detection in cluttered scenes. We believe that ZeroWaste
will catalyze research in object detection and semantic segmentation in extreme
clutter as well as applications in the recycling domain.
Our project page can be found at http://ai.bu.edu/zerowaste/
- …