4,322 research outputs found
Domain Adaptive Faster R-CNN for Object Detection in the Wild
Object detection typically assumes that training and test data are drawn from
an identical distribution, which, however, does not always hold in practice.
Such a distribution mismatch will lead to a significant performance drop. In
this work, we aim to improve the cross-domain robustness of object detection.
We tackle the domain shift on two levels: 1) the image-level shift, such as
image style, illumination, etc, and 2) the instance-level shift, such as object
appearance, size, etc. We build our approach based on the recent
state-of-the-art Faster R-CNN model, and design two domain adaptation
components, on image level and instance level, to reduce the domain
discrepancy. The two domain adaptation components are based on H-divergence
theory, and are implemented by learning a domain classifier in adversarial
training manner. The domain classifiers on different levels are further
reinforced with a consistency regularization to learn a domain-invariant region
proposal network (RPN) in the Faster R-CNN model. We evaluate our newly
proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K,
etc. The results demonstrate the effectiveness of our proposed approach for
robust object detection in various domain shift scenarios.Comment: Accepted to CVPR 201
Exploring Object Relation in Mean Teacher for Cross-Domain Detection
Rendering synthetic data (e.g., 3D CAD-rendered images) to generate
annotations for learning deep models in vision tasks has attracted increasing
attention in recent years. However, simply applying the models learnt on
synthetic images may lead to high generalization error on real images due to
domain shift. To address this issue, recent progress in cross-domain
recognition has featured the Mean Teacher, which directly simulates
unsupervised domain adaptation as semi-supervised learning. The domain gap is
thus naturally bridged with consistency regularization in a teacher-student
scheme. In this work, we advance this Mean Teacher paradigm to be applicable
for cross-domain detection. Specifically, we present Mean Teacher with Object
Relations (MTOR) that novelly remolds Mean Teacher under the backbone of Faster
R-CNN by integrating the object relations into the measure of consistency cost
between teacher and student modules. Technically, MTOR firstly learns
relational graphs that capture similarities between pairs of regions for
teacher and student respectively. The whole architecture is then optimized with
three consistency regularizations: 1) region-level consistency to align the
region-level predictions between teacher and student, 2) inter-graph
consistency for matching the graph structures between teacher and student, and
3) intra-graph consistency to enhance the similarity between regions of same
class within the graph of student. Extensive experiments are conducted on the
transfers across Cityscapes, Foggy Cityscapes, and SIM10k, and superior results
are reported when comparing to state-of-the-art approaches. More remarkably, we
obtain a new record of single model: 22.8% of mAP on Syn2Real detection
dataset.Comment: CVPR 2019; The codes and model of our MTOR are publicly available at:
https://github.com/caiqi/mean-teacher-cross-domain-detectio
Open Set Logo Detection and Retrieval
Current logo retrieval research focuses on closed set scenarios. We argue
that the logo domain is too large for this strategy and requires an open set
approach. To foster research in this direction, a large-scale logo dataset,
called Logos in the Wild, is collected and released to the public. A typical
open set logo retrieval application is, for example, assessing the
effectiveness of advertisement in sports event broadcasts. Given a query sample
in shape of a logo image, the task is to find all further occurrences of this
logo in a set of images or videos. Currently, common logo retrieval approaches
are unsuitable for this task because of their closed world assumption. Thus, an
open set logo retrieval method is proposed in this work which allows searching
for previously unseen logos by a single query sample. A two stage concept with
separate logo detection and comparison is proposed where both modules are based
on task specific CNNs. If trained with the Logos in the Wild data, significant
performance improvements are observed, especially compared with
state-of-the-art closed set approaches.Comment: accepted at VISAPP 201
- …