Unsupervised domain adaptation (UDA) aims to adapt a model of the labeled
source domain to an unlabeled target domain. Existing UDA-based semantic
segmentation approaches always reduce the domain shifts in pixel level, feature
level, and output level. However, almost all of them largely neglect the
contextual dependency, which is generally shared across different domains,
leading to less-desired performance. In this paper, we propose a novel
Context-Aware Mixup (CAMix) framework for domain adaptive semantic
segmentation, which exploits this important clue of context-dependency as
explicit prior knowledge in a fully end-to-end trainable manner for enhancing
the adaptability toward the target domain. Firstly, we present a contextual
mask generation strategy by leveraging the accumulated spatial distributions
and prior contextual relationships. The generated contextual mask is critical
in this work and will guide the context-aware domain mixup on three different
levels. Besides, provided the context knowledge, we introduce a
significance-reweighted consistency loss to penalize the inconsistency between
the mixed student prediction and the mixed teacher prediction, which alleviates
the negative transfer of the adaptation, e.g., early performance degradation.
Extensive experiments and analysis demonstrate the effectiveness of our method
against the state-of-the-art approaches on widely-used UDA benchmarks.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT