1,496 research outputs found
Weakly- and Semi-Supervised Panoptic Segmentation
We present a weakly supervised model that jointly performs both semantic- and
instance-segmentation -- a particularly relevant problem given the substantial
cost of obtaining pixel-perfect annotation for these tasks. In contrast to many
popular instance segmentation approaches based on object detectors, our method
does not predict any overlapping instances. Moreover, we are able to segment
both "thing" and "stuff" classes, and thus explain all the pixels in the image.
"Thing" classes are weakly-supervised with bounding boxes, and "stuff" with
image-level tags. We obtain state-of-the-art results on Pascal VOC, for both
full and weak supervision (which achieves about 95% of fully-supervised
performance). Furthermore, we present the first weakly-supervised results on
Cityscapes for both semantic- and instance-segmentation. Finally, we use our
weakly supervised framework to analyse the relationship between annotation
quality and predictive performance, which is of interest to dataset creators.Comment: ECCV 2018. The first two authors contributed equall
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
DPF: Learning Dense Prediction Fields with Weak Supervision
Nowadays, many visual scene understanding problems are addressed by dense
prediction networks. But pixel-wise dense annotations are very expensive (e.g.,
for scene parsing) or impossible (e.g., for intrinsic image decomposition),
motivating us to leverage cheap point-level weak supervision. However, existing
pointly-supervised methods still use the same architecture designed for full
supervision. In stark contrast to them, we propose a new paradigm that makes
predictions for point coordinate queries, as inspired by the recent success of
implicit representations, like distance or radiance fields. As such, the method
is named as dense prediction fields (DPFs). DPFs generate expressive
intermediate features for continuous sub-pixel locations, thus allowing outputs
of an arbitrary resolution. DPFs are naturally compatible with point-level
supervision. We showcase the effectiveness of DPFs using two substantially
different tasks: high-level semantic parsing and low-level intrinsic image
decomposition. In these two cases, supervision comes in the form of
single-point semantic category and two-point relative reflectance,
respectively. As benchmarked by three large-scale public datasets
PASCALContext, ADE20K and IIW, DPFs set new state-of-the-art performance on all
of them with significant margins.
Code can be accessed at https://github.com/cxx226/DPF
Towards holistic scene understanding:Semantic segmentation and beyond
This dissertation addresses visual scene understanding and enhances
segmentation performance and generalization, training efficiency of networks,
and holistic understanding. First, we investigate semantic segmentation in the
context of street scenes and train semantic segmentation networks on
combinations of various datasets. In Chapter 2 we design a framework of
hierarchical classifiers over a single convolutional backbone, and train it
end-to-end on a combination of pixel-labeled datasets, improving
generalizability and the number of recognizable semantic concepts. Chapter 3
focuses on enriching semantic segmentation with weak supervision and proposes a
weakly-supervised algorithm for training with bounding box-level and
image-level supervision instead of only with per-pixel supervision. The memory
and computational load challenges that arise from simultaneous training on
multiple datasets are addressed in Chapter 4. We propose two methodologies for
selecting informative and diverse samples from datasets with weak supervision
to reduce our networks' ecological footprint without sacrificing performance.
Motivated by memory and computation efficiency requirements, in Chapter 5, we
rethink simultaneous training on heterogeneous datasets and propose a universal
semantic segmentation framework. This framework achieves consistent increases
in performance metrics and semantic knowledgeability by exploiting various
scene understanding datasets. Chapter 6 introduces the novel task of part-aware
panoptic segmentation, which extends our reasoning towards holistic scene
understanding. This task combines scene and parts-level semantics with
instance-level object detection. In conclusion, our contributions span over
convolutional network architectures, weakly-supervised learning, part and
panoptic segmentation, paving the way towards a holistic, rich, and sustainable
visual scene understanding.Comment: PhD Thesis, Eindhoven University of Technology, October 202
- …