8,358 research outputs found
Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels
Domain adaptation for semantic segmentation across datasets consisting of the
same categories has seen several recent successes. However, a more general
scenario is when the source and target datasets correspond to non-overlapping
label spaces. For example, categories in segmentation datasets change vastly
depending on the type of environment or application, yet share many valuable
semantic relations. Existing approaches based on feature alignment or
discrepancy minimization do not take such category shift into account. In this
work, we present Cluster-to-Adapt (C2A), a computationally efficient
clustering-based approach for domain adaptation across segmentation datasets
with completely different, but possibly related categories. We show that such a
clustering objective enforced in a transformed feature space serves to
automatically select categories across source and target domains that can be
aligned for improving the target performance, while preventing negative
transfer for unrelated categories. We demonstrate the effectiveness of our
approach through experiments on the challenging problem of outdoor to indoor
adaptation for semantic segmentation in few-shot as well as zero-shot settings,
with consistent improvements in performance over existing approaches and
baselines in all cases.Comment: Accepted to L3D workshop at CVPR 202
UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline
State-of-the-art 3D semantic segmentation models are trained on off-the-shelf
public benchmarks, but they will inevitably face the challenge of recognition
accuracy drop when these well-trained models are deployed to a new domain. In
this paper, we introduce a Unified Domain Adaptive 3D semantic segmentation
pipeline (UniDA3D) to enhance the weak generalization ability, and bridge the
point distribution gap between domains. Different from previous studies that
only focus on a single adaptation task, UniDA3D can tackle several adaptation
tasks in 3D segmentation field, by designing a unified source-and-target active
sampling strategy, which selects a maximally-informative subset from both
source and target domains for effective model adaptation. Besides, benefiting
from the rise of multi-modal 2D-3D datasets, UniDA3D investigates the
possibility of achieving a multi-modal sampling strategy, by developing a
cross-modality feature interaction module that can extract a representative
pair of image and point features to achieve a bi-directional image-point
feature interaction for safe model adaptation. Experimentally, UniDA3D is
verified to be effective in many adaptation tasks including: 1) unsupervised
domain adaptation, 2) unsupervised few-shot domain adaptation; 3) active domain
adaptation. Their results demonstrate that, by easily coupling UniDA3D with
off-the-shelf 3D segmentation baselines, domain generalization ability of these
baselines can be enhanced
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
We tackle the task of semi-supervised video object segmentation, i.e.
segmenting the pixels belonging to an object in the video using the ground
truth pixel mask for the first frame. We build on the recently introduced
one-shot video object segmentation (OSVOS) approach which uses a pretrained
network and fine-tunes it on the first frame. While achieving impressive
performance, at test time OSVOS uses the fine-tuned network in unchanged form
and is not able to adapt to large changes in object appearance. To overcome
this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS)
which updates the network online using training examples selected based on the
confidence of the network and the spatial configuration. Additionally, we add a
pretraining step based on objectness, which is learned on PASCAL. Our
experiments show that both extensions are highly effective and improve the
state of the art on DAVIS to an intersection-over-union score of 85.7%.Comment: Accepted at BMVC 2017. This version contains minor changes for the
camera ready versio
A Novel BiLevel Paradigm for Image-to-Image Translation
Image-to-image (I2I) translation is a pixel-level mapping that requires a
large number of paired training data and often suffers from the problems of
high diversity and strong category bias in image scenes. In order to tackle
these problems, we propose a novel BiLevel (BiL) learning paradigm that
alternates the learning of two models, respectively at an instance-specific
(IS) and a general-purpose (GP) level. In each scene, the IS model learns to
maintain the specific scene attributes. It is initialized by the GP model that
learns from all the scenes to obtain the generalizable translation knowledge.
This GP initialization gives the IS model an efficient starting point, thus
enabling its fast adaptation to the new scene with scarce training data. We
conduct extensive I2I translation experiments on human face and street view
datasets. Quantitative results validate that our approach can significantly
boost the performance of classical I2I translation models, such as PG2 and
Pix2Pix. Our visualization results show both higher image quality and more
appropriate instance-specific details, e.g., the translated image of a person
looks more like that person in terms of identity
- …