22,823 research outputs found
UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline
State-of-the-art 3D semantic segmentation models are trained on off-the-shelf
public benchmarks, but they will inevitably face the challenge of recognition
accuracy drop when these well-trained models are deployed to a new domain. In
this paper, we introduce a Unified Domain Adaptive 3D semantic segmentation
pipeline (UniDA3D) to enhance the weak generalization ability, and bridge the
point distribution gap between domains. Different from previous studies that
only focus on a single adaptation task, UniDA3D can tackle several adaptation
tasks in 3D segmentation field, by designing a unified source-and-target active
sampling strategy, which selects a maximally-informative subset from both
source and target domains for effective model adaptation. Besides, benefiting
from the rise of multi-modal 2D-3D datasets, UniDA3D investigates the
possibility of achieving a multi-modal sampling strategy, by developing a
cross-modality feature interaction module that can extract a representative
pair of image and point features to achieve a bi-directional image-point
feature interaction for safe model adaptation. Experimentally, UniDA3D is
verified to be effective in many adaptation tasks including: 1) unsupervised
domain adaptation, 2) unsupervised few-shot domain adaptation; 3) active domain
adaptation. Their results demonstrate that, by easily coupling UniDA3D with
off-the-shelf 3D segmentation baselines, domain generalization ability of these
baselines can be enhanced
Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation
Existing techniques to adapt semantic segmentation networks across the source
and target domains within deep convolutional neural networks (CNNs) deal with
all the samples from the two domains in a global or category-aware manner. They
do not consider an inter-class variation within the target domain itself or
estimated category, providing the limitation to encode the domains having a
multi-modal data distribution. To overcome this limitation, we introduce a
learnable clustering module, and a novel domain adaptation framework called
cross-domain grouping and alignment. To cluster the samples across domains with
an aim to maximize the domain alignment without forgetting precise segmentation
ability on the source domain, we present two loss functions, in particular, for
encouraging semantic consistency and orthogonality among the clusters. We also
present a loss so as to solve a class imbalance problem, which is the other
limitation of the previous methods. Our experiments show that our method
consistently boosts the adaptation performance in semantic segmentation,
outperforming the state-of-the-arts on various domain adaptation settings.Comment: AAAI 202
Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation
Deep-learning models for 3D point cloud semantic segmentation exhibit limited
generalization capabilities when trained and tested on data captured with
different sensors or in varying environments due to domain shift. Domain
adaptation methods can be employed to mitigate this domain shift, for instance,
by simulating sensor noise, developing domain-agnostic generators, or training
point cloud completion networks. Often, these methods are tailored for range
view maps or necessitate multi-modal input. In contrast, domain adaptation in
the image domain can be executed through sample mixing, which emphasizes input
data manipulation rather than employing distinct adaptation modules. In this
study, we introduce compositional semantic mixing for point cloud domain
adaptation, representing the first unsupervised domain adaptation technique for
point cloud segmentation based on semantic and geometric sample mixing. We
present a two-branch symmetric network architecture capable of concurrently
processing point clouds from a source domain (e.g. synthetic) and point clouds
from a target domain (e.g. real-world). Each branch operates within one domain
by integrating selected data fragments from the other domain and utilizing
semantic information derived from source labels and target (pseudo) labels.
Additionally, our method can leverage a limited number of human point-level
annotations (semi-supervised) to further enhance performance. We assess our
approach in both synthetic-to-real and real-to-real scenarios using LiDAR
datasets and demonstrate that it significantly outperforms state-of-the-art
methods in both unsupervised and semi-supervised settings.Comment: TPAMI. arXiv admin note: text overlap with arXiv:2207.0977
DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency
Video semantic segmentation is a pivotal aspect of video representation
learning. However, significant domain shifts present a challenge in effectively
learning invariant spatio-temporal features across the labeled source domain
and unlabeled target domain for video semantic segmentation. To solve the
challenge, we propose a novel DA-STC method for domain adaptive video semantic
segmentation, which incorporates a bidirectional multi-level spatio-temporal
fusion module and a category-aware spatio-temporal feature alignment module to
facilitate consistent learning for domain-invariant features. Firstly, we
perform bidirectional spatio-temporal fusion at the image sequence level and
shallow feature level, leading to the construction of two fused intermediate
video domains. This prompts the video semantic segmentation model to
consistently learn spatio-temporal features of shared patch sequences which are
influenced by domain-specific contexts, thereby mitigating the feature gap
between the source and target domain. Secondly, we propose a category-aware
feature alignment module to promote the consistency of spatio-temporal
features, facilitating adaptation to the target domain. Specifically, we
adaptively aggregate the domain-specific deep features of each category along
spatio-temporal dimensions, which are further constrained to achieve
cross-domain intra-class feature alignment and inter-class feature separation.
Extensive experiments demonstrate the effectiveness of our method, which
achieves state-of-the-art mIOUs on multiple challenging benchmarks.
Furthermore, we extend the proposed DA-STC to the image domain, where it also
exhibits superior performance for domain adaptive semantic segmentation. The
source code and models will be made available at
\url{https://github.com/ZHE-SAPI/DA-STC}.Comment: 18 pages,9 figure
Domain Adaptive Semantic Segmentation by Optimal Transport
Scene segmentation is widely used in the field of autonomous driving for
environment perception, and semantic scene segmentation (3S) has received a
great deal of attention due to the richness of the semantic information it
contains. It aims to assign labels to pixels in an image, thus enabling
automatic image labeling. Current approaches are mainly based on convolutional
neural networks (CNN), but they rely on a large number of labels. Therefore,
how to use a small size of labeled data to achieve semantic segmentation
becomes more and more important. In this paper, we propose a domain adaptation
(DA) framework based on optimal transport (OT) and attention mechanism to
address this issue. Concretely, first we generate the output space via CNN due
to its superiority of feature representation. Second, we utilize OT to achieve
a more robust alignment of source and target domains in output space, where the
OT plan defines a well attention mechanism to improve the adaptation of the
model. In particular, with OT, the number of network parameters has been
reduced and the network has been better interpretable. Third, to better
describe the multi-scale property of features, we construct a multi-scale
segmentation network to perform domain adaptation. Finally, in order to verify
the performance of our proposed method, we conduct experimental comparison with
three benchmark and four SOTA methods on three scene datasets, and the mean
intersection-over-union (mIOU) has been significant improved, and visualization
results under multiple domain adaptation scenarios also show that our proposed
method has better performance than compared semantic segmentation methods
DAugNet: Unsupervised, Multi-source, Multi-target, and Life-long Domain Adaptation for Semantic Segmentation of Satellite Images
The domain adaptation of satellite images has recently gained an increasing
attention to overcome the limited generalization abilities of machine learning
models when segmenting large-scale satellite images. Most of the existing
approaches seek for adapting the model from one domain to another. However,
such single-source and single-target setting prevents the methods from being
scalable solutions, since nowadays multiple source and target domains having
different data distributions are usually available. Besides, the continuous
proliferation of satellite images necessitates the classifiers to adapt to
continuously increasing data. We propose a novel approach, coined DAugNet, for
unsupervised, multi-source, multi-target, and life-long domain adaptation of
satellite images. It consists of a classifier and a data augmentor. The data
augmentor, which is a shallow network, is able to perform style transfer
between multiple satellite images in an unsupervised manner, even when new data
are added over the time. In each training iteration, it provides the classifier
with diversified data, which makes the classifier robust to large data
distribution difference between the domains. Our extensive experiments prove
that DAugNet significantly better generalizes to new geographic locations than
the existing approaches
- …