944 research outputs found
Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation
We propose a convolutional network with hierarchical classifiers for
per-pixel semantic segmentation, which is able to be trained on multiple,
heterogeneous datasets and exploit their semantic hierarchy. Our network is the
first to be simultaneously trained on three different datasets from the
intelligent vehicles domain, i.e. Cityscapes, GTSDB and Mapillary Vistas, and
is able to handle different semantic level-of-detail, class imbalances, and
different annotation types, i.e. dense per-pixel and sparse bounding-box
labels. We assess our hierarchical approach, by comparing against flat,
non-hierarchical classifiers and we show improvements in mean pixel accuracy of
13.0% for Cityscapes classes and 2.4% for Vistas classes and 32.3% for GTSDB
classes. Our implementation achieves inference rates of 17 fps at a resolution
of 520x706 for 108 classes running on a GPU.Comment: IEEE Intelligent Vehicles 201
A Domain Agnostic Normalization Layer for Unsupervised Adversarial Domain Adaptation
We propose a normalization layer for unsupervised domain adaption in semantic
scene segmentation. Normalization layers are known to improve convergence and
generalization and are part of many state-of-the-art fully-convolutional neural
networks. We show that conventional normalization layers worsen the performance
of current Unsupervised Adversarial Domain Adaption (UADA), which is a method
to improve network performance on unlabeled datasets and the focus of our
research. Therefore, we propose a novel Domain Agnostic Normalization layer and
thereby unlock the benefits of normalization layers for unsupervised
adversarial domain adaptation. In our evaluation, we adapt from the synthetic
GTA5 data set to the real Cityscapes data set, a common benchmark experiment,
and surpass the state-of-the-art. As our normalization layer is domain agnostic
at test time, we furthermore demonstrate that UADA using Domain Agnostic
Normalization improves performance on unseen domains, specifically on
Apolloscape and Mapillary
Towards holistic scene understanding:Semantic segmentation and beyond
This dissertation addresses visual scene understanding and enhances
segmentation performance and generalization, training efficiency of networks,
and holistic understanding. First, we investigate semantic segmentation in the
context of street scenes and train semantic segmentation networks on
combinations of various datasets. In Chapter 2 we design a framework of
hierarchical classifiers over a single convolutional backbone, and train it
end-to-end on a combination of pixel-labeled datasets, improving
generalizability and the number of recognizable semantic concepts. Chapter 3
focuses on enriching semantic segmentation with weak supervision and proposes a
weakly-supervised algorithm for training with bounding box-level and
image-level supervision instead of only with per-pixel supervision. The memory
and computational load challenges that arise from simultaneous training on
multiple datasets are addressed in Chapter 4. We propose two methodologies for
selecting informative and diverse samples from datasets with weak supervision
to reduce our networks' ecological footprint without sacrificing performance.
Motivated by memory and computation efficiency requirements, in Chapter 5, we
rethink simultaneous training on heterogeneous datasets and propose a universal
semantic segmentation framework. This framework achieves consistent increases
in performance metrics and semantic knowledgeability by exploiting various
scene understanding datasets. Chapter 6 introduces the novel task of part-aware
panoptic segmentation, which extends our reasoning towards holistic scene
understanding. This task combines scene and parts-level semantics with
instance-level object detection. In conclusion, our contributions span over
convolutional network architectures, weakly-supervised learning, part and
panoptic segmentation, paving the way towards a holistic, rich, and sustainable
visual scene understanding.Comment: PhD Thesis, Eindhoven University of Technology, October 202
Recommended from our members
Concepts and analogies in cybernetics: Mathematical investigations of the role of analogy in concept formation and problem solving; with emphasis for conflict resolution via object and morphism eliminations
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.We address two problematic areas of cybernetics; nam. Analogical Problem Solving (APS) and Analogical Learning (AL). Both these human faculties do unquestionably require Intelligence. In addition, we point out that shifting of representations is the main unified theme underlying these two intellectual tasks. We focus our attention on the formulation and clarification of the notion of analogy, which has been loosely treated and used in the literature; and also on its role in shifting of representations.
We describe analogizing situations in a new representational scheme, borrowed from mathematics and modified and extended to cater for our targets. We call it k-structure, closely resembling semantic networks and directed graphs; the main components of it are the so-called objects and morphisms. We argue and substantiate the need for such a representation scheme, by analysing what its constituents stand for and by cataloguing its virtues, the main one being its visual appeal and its mathematical clarity, and by listing its disadvantages when it is compared to other representation systems. Emphasis is also given to its descriptive power and usefulness by implementing it in a number of APS and AL situations. Besides representation issues, attention is paid to intelligence mechanisms which are involved in APS and AL. A cornerstone in APS and a fundamental theme in AL is the 'skeletization of k-structures'. APS is conceived as 'harmonization of skeletons'. The methodology we develop involves techniques which are computer implemented and extensively studied in theoretic terms via a proposed theory for extended k-structures. To name but a few: 1. 'the separation of the context of a concept from the concept itself', based on the ideas of k-opens and k-spaces; 2, 'object and morphism elimination' of a controversial nature; and 3. 'conflict or deadlock or dilemma resolution' which naturally arises in a k-structure interaction. The overall system, is then applied to capture the essence of EVANS' (1963) analogy-type problems and WINSTOM (1970) learning-type situations. In our attempt not to be too informal, we use basic notions and terminology from abstract Algebra, Topology and Category theory. We rather tend to be "non-logical" (analogical) in EVANS' and WINSTON's sense; "non-numeric", in MESAROVIC (1970) terms (we rather deal with abstract conceptual entities); "non-linguistic" (we do not touch natural language); and "non-resolution" oriented, in the sense of BLEDSOE (1977). However, we give hints sometimes about logical deductive axiomatic systems, employing First Order Predicate Calculus (FOPC); and about semiotics, by which we denote syntactic-semantic-pragmatic features of our system and issues of the problem domains it is acting upon. We believe in what we call: shift from the traditional 'Heuristic search paradigm' era to the 'Analogy-paradigm' era underlying Artificial Intelligence and Cybernetics. We justify this merely by listing a number of A. I. works, which employ, in some way or another, the concept of analogy, over the last fifteen years or so, where a noticeable peak is obvious during the last years and especially in 1977. Finally, we hope that if the proposed conceptual framework and techniques developed do not straightforwardly constitute some kind of platform for Artificial Intelligence, at least it would give some insights into and illuminate our understanding of the two most fundamental faculties the human brain is occupied with; namely problem solving and learning
On Boosting Semantic Street Scene Segmentation with Weak Supervision
Training convolutional networks for semantic segmentation requires per-pixel
ground truth labels, which are very time consuming and hence costly to obtain.
Therefore, in this work, we research and develop a hierarchical deep network
architecture and the corresponding loss for semantic segmentation that can be
trained from weak supervision, such as bounding boxes or image level labels, as
well as from strong per-pixel supervision. We demonstrate that the hierarchical
structure and the simultaneous training on strong (per-pixel) and weak
(bounding boxes) labels, even from separate datasets, constantly increases the
performance against per-pixel only training. Moreover, we explore the more
challenging case of adding weak image-level labels. We collect street scene
images and weak labels from the immense Open Images dataset to generate the
OpenScapes dataset, and we use this novel dataset to increase segmentation
performance on two established per-pixel labeled datasets, Cityscapes and
Vistas. We report performance gains up to +13.2% mIoU on crucial street scene
classes, and inference speed of 20 fps on a Titan V GPU for Cityscapes at 512 x
1024 resolution. Our network and OpenScapes dataset are shared with the
research community.Comment: Oral presentation IEEE IV 201
- …