2,364 research outputs found
Exploring Object Relation in Mean Teacher for Cross-Domain Detection
Rendering synthetic data (e.g., 3D CAD-rendered images) to generate
annotations for learning deep models in vision tasks has attracted increasing
attention in recent years. However, simply applying the models learnt on
synthetic images may lead to high generalization error on real images due to
domain shift. To address this issue, recent progress in cross-domain
recognition has featured the Mean Teacher, which directly simulates
unsupervised domain adaptation as semi-supervised learning. The domain gap is
thus naturally bridged with consistency regularization in a teacher-student
scheme. In this work, we advance this Mean Teacher paradigm to be applicable
for cross-domain detection. Specifically, we present Mean Teacher with Object
Relations (MTOR) that novelly remolds Mean Teacher under the backbone of Faster
R-CNN by integrating the object relations into the measure of consistency cost
between teacher and student modules. Technically, MTOR firstly learns
relational graphs that capture similarities between pairs of regions for
teacher and student respectively. The whole architecture is then optimized with
three consistency regularizations: 1) region-level consistency to align the
region-level predictions between teacher and student, 2) inter-graph
consistency for matching the graph structures between teacher and student, and
3) intra-graph consistency to enhance the similarity between regions of same
class within the graph of student. Extensive experiments are conducted on the
transfers across Cityscapes, Foggy Cityscapes, and SIM10k, and superior results
are reported when comparing to state-of-the-art approaches. More remarkably, we
obtain a new record of single model: 22.8% of mAP on Syn2Real detection
dataset.Comment: CVPR 2019; The codes and model of our MTOR are publicly available at:
https://github.com/caiqi/mean-teacher-cross-domain-detectio
One-Shot Unsupervised Cross-Domain Detection
Despite impressive progress in object detection over the last years, it is
still an open challenge to reliably detect objects across visual domains.
Although the topic has attracted attention recently, current approaches all
rely on the ability to access a sizable amount of target data for use at
training time. This is a heavy assumption, as often it is not possible to
anticipate the domain where a detector will be used, nor to access it in
advance for data acquisition. Consider for instance the task of monitoring
image feeds from social media: as every image is created and uploaded by a
different user it belongs to a different target domain that is impossible to
foresee during training. This paper addresses this setting, presenting an
object detection algorithm able to perform unsupervised adaption across domains
by using only one target sample, seen at test time. We achieve this by
introducing a multi-task architecture that one-shot adapts to any incoming
sample by iteratively solving a self-supervised task on it. We further enhance
this auxiliary adaptation with cross-task pseudo-labeling. A thorough benchmark
analysis against the most recent cross-domain detection methods and a detailed
ablation study show the advantage of our method, which sets the
state-of-the-art in the defined one-shot scenario
Hierarchy Composition GAN for High-fidelity Image Synthesis
Despite the rapid progress of generative adversarial networks (GANs) in image
synthesis in recent years, the existing image synthesis approaches work in
either geometry domain or appearance domain alone which often introduces
various synthesis artifacts. This paper presents an innovative Hierarchical
Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and
appearance domains into an end-to-end trainable network and achieves superior
synthesis realism in both domains simultaneously. We design an innovative
hierarchical composition mechanism that is capable of learning realistic
composition geometry and handling occlusions while multiple foreground objects
are involved in image composition. In addition, we introduce a novel attention
mask mechanism that guides to adapt the appearance of foreground objects which
also helps to provide better training reference for learning in geometry
domain. Extensive experiments on scene text image synthesis, portrait editing
and indoor rendering tasks show that the proposed HIC-GAN achieves superior
synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure
AIR-DA: Adversarial Image Reconstruction for Unsupervised Domain Adaptive Object Detection
Unsupervised domain adaptive object detection is a challenging vision task
where object detectors are adapted from a label-rich source domain to an
unlabeled target domain. Recent advances prove the efficacy of the adversarial
based domain alignment where the adversarial training between the feature
extractor and domain discriminator results in domain-invariance in the feature
space. However, due to the domain shift, domain discrimination, especially on
low-level features, is an easy task. This results in an imbalance of the
adversarial training between the domain discriminator and the feature
extractor. In this work, we achieve a better domain alignment by introducing an
auxiliary regularization task to improve the training balance. Specifically, we
propose Adversarial Image Reconstruction (AIR) as the regularizer to facilitate
the adversarial training of the feature extractor. We further design a
multi-level feature alignment module to enhance the adaptation performance. Our
evaluations across several datasets of challenging domain shifts demonstrate
that the proposed method outperforms all previous methods, of both one- and
two-stage, in most settings.Comment: Accepted at IEEE Robotics and Automation Letters 202
- …