280 research outputs found
Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
Most progress in semantic segmentation reports on daytime images taken under
favorable illumination conditions. We instead address the problem of semantic
segmentation of nighttime images and improve the state-of-the-art, by adapting
daytime models to nighttime without using nighttime annotations. Moreover, we
design a new evaluation framework to address the substantial uncertainty of
semantics in nighttime images. Our central contributions are: 1) a curriculum
framework to gradually adapt semantic segmentation models from day to night via
labeled synthetic images and unlabeled real images, both for progressively
darker times of day, which exploits cross-time-of-day correspondences for the
real images to guide the inference of their labels; 2) a novel
uncertainty-aware annotation and evaluation framework and metric for semantic
segmentation, designed for adverse conditions and including image regions
beyond human recognition capability in the evaluation in a principled fashion;
3) the Dark Zurich dataset, which comprises 2416 unlabeled nighttime and 2920
unlabeled twilight images with correspondences to their daytime counterparts
plus a set of 151 nighttime images with fine pixel-level annotations created
with our protocol, which serves as a first benchmark to perform our novel
evaluation. Experiments show that our guided curriculum adaptation
significantly outperforms state-of-the-art methods on real nighttime sets both
for standard metrics and our uncertainty-aware metric. Furthermore, our
uncertainty-aware evaluation reveals that selective invalidation of predictions
can lead to better results on data with ambiguous content such as our nighttime
benchmark and profit safety-oriented applications which involve invalid inputs.Comment: ICCV 2019 camera-read
Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
We address the problem of semantic nighttime image segmentation and improve
the state-of-the-art, by adapting daytime models to nighttime without using
nighttime annotations. Moreover, we design a new evaluation framework to
address the substantial uncertainty of semantics in nighttime images. Our
central contributions are: 1) a curriculum framework to gradually adapt
semantic segmentation models from day to night through progressively darker
times of day, exploiting cross-time-of-day correspondences between daytime
images from a reference map and dark images to guide the label inference in the
dark domains; 2) a novel uncertainty-aware annotation and evaluation framework
and metric for semantic segmentation, including image regions beyond human
recognition capability in the evaluation in a principled fashion; 3) the Dark
Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight
images with correspondences to their daytime counterparts plus a set of 201
nighttime images with fine pixel-level annotations created with our protocol,
which serves as a first benchmark for our novel evaluation. Experiments show
that our map-guided curriculum adaptation significantly outperforms
state-of-the-art methods on nighttime sets both for standard metrics and our
uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals
that selective invalidation of predictions can improve results on data with
ambiguous content such as our benchmark and profit safety-oriented applications
involving invalid inputs.Comment: IEEE T-PAMI 202
DTBS: Dual-Teacher Bi-directional Self-training for Domain Adaptation in Nighttime Semantic Segmentation
Due to the poor illumination and the difficulty in annotating, nighttime
conditions pose a significant challenge for autonomous vehicle perception
systems. Unsupervised domain adaptation (UDA) has been widely applied to
semantic segmentation on such images to adapt models from normal conditions to
target nighttime-condition domains. Self-training (ST) is a paradigm in UDA,
where a momentum teacher is utilized for pseudo-label prediction, but a
confirmation bias issue exists. Because the one-directional knowledge transfer
from a single teacher is insufficient to adapt to a large domain shift. To
mitigate this issue, we propose to alleviate domain gap by incrementally
considering style influence and illumination change. Therefore, we introduce a
one-stage Dual-Teacher Bi-directional Self-training (DTBS) framework for smooth
knowledge transfer and feedback. Based on two teacher models, we present a
novel pipeline to respectively decouple style and illumination shift. In
addition, we propose a new Re-weight exponential moving average (EMA) to merge
the knowledge of style and illumination factors, and provide feedback to the
student model. In this way, our method can be embedded in other UDA methods to
enhance their performance. For example, the Cityscapes to ACDC night task
yielded 53.8 mIoU (\%), which corresponds to an improvement of +5\% over the
previous state-of-the-art. The code is available at
\url{https://github.com/hf618/DTBS}
GPS-GLASS: Learning Nighttime Semantic Segmentation Using Daytime Video and GPS data
Semantic segmentation for autonomous driving should be robust against various
in-the-wild environments. Nighttime semantic segmentation is especially
challenging due to a lack of annotated nighttime images and a large domain gap
from daytime images with sufficient annotation. In this paper, we propose a
novel GPS-based training framework for nighttime semantic segmentation. Given
GPS-aligned pairs of daytime and nighttime images, we perform cross-domain
correspondence matching to obtain pixel-level pseudo supervision. Moreover, we
conduct flow estimation between daytime video frames and apply GPS-based
scaling to acquire another pixel-level pseudo supervision. Using these pseudo
supervisions with a confidence map, we train a nighttime semantic segmentation
network without any annotation from nighttime images. Experimental results
demonstrate the effectiveness of the proposed method on several nighttime
semantic segmentation datasets. Our source code is available at
https://github.com/jimmy9704/GPS-GLASS.Comment: ICCVW 202
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
In unsupervised domain adaptation (UDA), a model trained on source data (e.g.synthetic) is adapted to target data (e.g. real-world) without access to targetannotation. Most previous UDA methods struggle with classes that have a similarvisual appearance on the target domain as no ground truth is available to learnthe slight appearance differences. To address this problem, we propose a MaskedImage Consistency (MIC) module to enhance UDA by learning spatial contextrelations of the target domain as additional clues for robust visualrecognition. MIC enforces the consistency between predictions of masked targetimages, where random patches are withheld, and pseudo-labels that are generatedbased on the complete image by an exponential moving average teacher. Tominimize the consistency loss, the network has to learn to infer thepredictions of the masked regions from their context. Due to its simple anduniversal concept, MIC can be integrated into various UDA methods acrossdifferent visual recognition tasks such as image classification, semanticsegmentation, and object detection. MIC significantly improves thestate-of-the-art performance across the different recognition tasks forsynthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. Forinstance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8%on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to animprovement of +2.1 and +3.0 percent points over the previous state of the art.The implementation is available at https://github.com/lhoyer/MIC.<br
Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions
Due to the scarcity of dense pixel-level semantic annotations for images
recorded in adverse visual conditions, there has been a keen interest in
unsupervised domain adaptation (UDA) for the semantic segmentation of such
images. UDA adapts models trained on normal conditions to the target
adverse-condition domains. Meanwhile, multiple datasets with driving scenes
provide corresponding images of the same scenes across multiple conditions,
which can serve as a form of weak supervision for domain adaptation. We propose
Refign, a generic extension to self-training-based UDA methods which leverages
these cross-domain correspondences. Refign consists of two steps: (1) aligning
the normal-condition image to the corresponding adverse-condition image using
an uncertainty-aware dense matching network, and (2) refining the adverse
prediction with the normal prediction using an adaptive label correction
mechanism. We design custom modules to streamline both steps and set the new
state of the art for domain-adaptive semantic segmentation on several
adverse-condition benchmarks, including ACDC and Dark Zurich. The approach
introduces no extra training parameters, minimal computational overhead --
during training only -- and can be used as a drop-in extension to improve any
given self-training-based UDA method. Code is available at
https://github.com/brdav/refign.Comment: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
202
BTSeg: Barlow Twins Regularization for Domain Adaptation in Semantic Segmentation
Semantic image segmentation is a critical component in many computer vision
systems, such as autonomous driving. In such applications, adverse conditions
(heavy rain, night time, snow, extreme lighting) on the one hand pose specific
challenges, yet are typically underrepresented in the available datasets.
Generating more training data is cumbersome and expensive, and the process
itself is error-prone due to the inherent aleatoric uncertainty. To address
this challenging problem, we propose BTSeg, which exploits image-level
correspondences as weak supervision signal to learn a segmentation model that
is agnostic to adverse conditions. To this end, our approach uses the Barlow
twins loss from the field of unsupervised learning and treats images taken at
the same location but under different adverse conditions as "augmentations" of
the same unknown underlying base image. This allows the training of a
segmentation model that is robust to appearance changes introduced by different
adverse conditions. We evaluate our approach on ACDC and the new challenging
ACG benchmark to demonstrate its robustness and generalization capabilities.
Our approach performs favorably when compared to the current state-of-the-art
methods, while also being simpler to implement and train. The code will be
released upon acceptance
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
In unsupervised domain adaptation (UDA), a model trained on source data (e.g.
synthetic) is adapted to target data (e.g. real-world) without access to target
annotation. Most previous UDA methods struggle with classes that have a similar
visual appearance on the target domain as no ground truth is available to learn
the slight appearance differences. To address this problem, we propose a Masked
Image Consistency (MIC) module to enhance UDA by learning spatial context
relations of the target domain as additional clues for robust visual
recognition. MIC enforces the consistency between predictions of masked target
images, where random patches are withheld, and pseudo-labels that are generated
based on the complete image by an exponential moving average teacher. To
minimize the consistency loss, the network has to learn to infer the
predictions of the masked regions from their context. Due to its simple and
universal concept, MIC can be integrated into various UDA methods across
different visual recognition tasks such as image classification, semantic
segmentation, and object detection. MIC significantly improves the
state-of-the-art performance across the different recognition tasks for
synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For
instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8%
on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an
improvement of +2.1 and +3.0 percent points over the previous state of the art.
The implementation is available at https://github.com/lhoyer/MIC.Comment: CVPR 202
Contrastive Model Adaptation for Cross-Condition Robustness in Semantic Segmentation
Standard unsupervised domain adaptation methods adapt models from a source to
a target domain using labeled source data and unlabeled target data jointly. In
model adaptation, on the other hand, access to the labeled source data is
prohibited, i.e., only the source-trained model and unlabeled target data are
available. We investigate normal-to-adverse condition model adaptation for
semantic segmentation, whereby image-level correspondences are available in the
target domain. The target set consists of unlabeled pairs of adverse- and
normal-condition street images taken at GPS-matched locations. Our method --
CMA -- leverages such image pairs to learn condition-invariant features via
contrastive learning. In particular, CMA encourages features in the embedding
space to be grouped according to their condition-invariant semantic content and
not according to the condition under which respective inputs are captured. To
obtain accurate cross-domain semantic correspondences, we warp the normal image
to the viewpoint of the adverse image and leverage warp-confidence scores to
create robust, aggregated features. With this approach, we achieve
state-of-the-art semantic segmentation performance for model adaptation on
several normal-to-adverse adaptation benchmarks, such as ACDC and Dark Zurich.
We also evaluate CMA on a newly procured adverse-condition generalization
benchmark and report favorable results compared to standard unsupervised domain
adaptation methods, despite the comparative handicap of CMA due to source data
inaccessibility. Code is available at https://github.com/brdav/cma.Comment: International Conference on Computer Vision (ICCV) 202
LoopDA: Constructing Self-loops to Adapt Nighttime Semantic Segmentation
Due to the lack of training labels and the difficulty of annotating, dealing
with adverse driving conditions such as nighttime has posed a huge challenge to
the perception system of autonomous vehicles. Therefore, adapting knowledge
from a labelled daytime domain to an unlabelled nighttime domain has been
widely researched. In addition to labelled daytime datasets, existing nighttime
datasets usually provide nighttime images with corresponding daytime reference
images captured at nearby locations for reference. The key challenge is to
minimize the performance gap between the two domains. In this paper, we propose
LoopDA for domain adaptive nighttime semantic segmentation. It consists of
self-loops that result in reconstructing the input data using predicted
semantic maps, by rendering them into the encoded features. In a warm-up
training stage, the self-loops comprise of an inner-loop and an outer-loop,
which are responsible for intra-domain refinement and inter-domain alignment,
respectively. To reduce the impact of day-night pose shifts, in the later
self-training stage, we propose a co-teaching pipeline that involves an offline
pseudo-supervision signal and an online reference-guided signal `DNA'
(Day-Night Agreement), bringing substantial benefits to enhance nighttime
segmentation. Our model outperforms prior methods on Dark Zurich and Nighttime
Driving datasets for semantic segmentation. Code and pretrained models are
available at https://github.com/fy-vision/LoopDA.Comment: Accepted to WACV202
- …