1,264 research outputs found
Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training
Unsupervised Domain Adaptation (UDA) aims at improving the generalization
capability of a model trained on a source domain to perform well on a target
domain for which no labeled data is available. In this paper, we consider the
semantic segmentation of urban scenes and we propose an approach to adapt a
deep neural network trained on synthetic data to real scenes addressing the
domain shift between the two different data distributions. We introduce a novel
UDA framework where a standard supervised loss on labeled synthetic data is
supported by an adversarial module and a self-training strategy aiming at
aligning the two domain distributions. The adversarial module is driven by a
couple of fully convolutional discriminators dealing with different domains:
the first discriminates between ground truth and generated maps, while the
second between segmentation maps coming from synthetic or real world data. The
self-training module exploits the confidence estimated by the discriminators on
unlabeled data to select the regions used to reinforce the learning process.
Furthermore, the confidence is thresholded with an adaptive mechanism based on
the per-class overall confidence. Experimental results prove the effectiveness
of the proposed strategy in adapting a segmentation network trained on
synthetic datasets like GTA5 and SYNTHIA, to real world datasets like
Cityscapes and Mapillary.Comment: 8 pages, 3 figures, 2 table
ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning
The state of the art in semantic segmentation is steadily increasing in
performance, resulting in more precise and reliable segmentations in many
different applications. However, progress is limited by the cost of generating
labels for training, which sometimes requires hours of manual labor for a
single image. Because of this, semi-supervised methods have been applied to
this task, with varying degrees of success. A key challenge is that common
augmentations used in semi-supervised classification are less effective for
semantic segmentation. We propose a novel data augmentation mechanism called
ClassMix, which generates augmentations by mixing unlabelled samples, by
leveraging on the network's predictions for respecting object boundaries. We
evaluate this augmentation technique on two common semi-supervised semantic
segmentation benchmarks, showing that it attains state-of-the-art results.
Lastly, we also provide extensive ablation studies comparing different design
decisions and training regimes.Comment: This paper has been accepted to WACV202
Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth
One Class One Click: Quasi Scene-level Weakly Supervised Point Cloud Semantic Segmentation with Active Learning
Reliance on vast annotations to achieve leading performance severely
restricts the practicality of large-scale point cloud semantic segmentation.
For the purpose of reducing data annotation costs, effective labeling schemes
are developed and contribute to attaining competitive results under weak
supervision strategy. Revisiting current weak label forms, we introduce One
Class One Click (OCOC), a low cost yet informative quasi scene-level label,
which encapsulates point-level and scene-level annotations. An active weakly
supervised framework is proposed to leverage scarce labels by involving weak
supervision from global and local perspectives. Contextual constraints are
imposed by an auxiliary scene classification task, respectively based on global
feature embedding and point-wise prediction aggregation, which restricts the
model prediction merely to OCOC labels. Furthermore, we design a context-aware
pseudo labeling strategy, which effectively supplement point-level supervisory
signals. Finally, an active learning scheme with a uncertainty measure -
temporal output discrepancy is integrated to examine informative samples and
provides guidance on sub-clouds query, which is conducive to quickly attaining
desirable OCOC annotations and reduces the labeling cost to an extremely low
extent. Extensive experimental analysis using three LiDAR benchmarks collected
from airborne, mobile and ground platforms demonstrates that our proposed
method achieves very promising results though subject to scarce labels. It
considerably outperforms genuine scene-level weakly supervised methods by up to
25\% in terms of average F1 score and achieves competitive results against full
supervision schemes. On terrestrial LiDAR dataset - Semantics3D, using
approximately 2\textpertenthousand{} of labels, our method achieves an average
F1 score of 85.2\%, which increases by 11.58\% compared to the baseline model
SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation
Learning models on one labeled dataset that generalize well on another domain
is a difficult task, as several shifts might happen between the data domains.
This is notably the case for lidar data, for which models can exhibit large
performance discrepancies due for instance to different lidar patterns or
changes in acquisition conditions. This paper addresses the corresponding
Unsupervised Domain Adaptation (UDA) task for semantic segmentation. To
mitigate this problem, we introduce an unsupervised auxiliary task of learning
an implicit underlying surface representation simultaneously on source and
target data. As both domains share the same latent representation, the model is
forced to accommodate discrepancies between the two sources of data. This novel
strategy differs from classical minimization of statistical divergences or
lidar-specific domain adaptation techniques. Our experiments demonstrate that
our method achieves a better performance than the current state of the art,
both in real-to-real and synthetic-to-real scenarios.Comment: Project repository: github.com/valeoai/SALUD
- …