5,985 research outputs found
Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean-teacher
Adapting visual object detectors to operational target domains is a
challenging task, commonly achieved using unsupervised domain adaptation (UDA)
methods. When the labeled dataset is coming from multiple source domains,
treating them as separate domains and performing a multi-source domain
adaptation (MSDA) improves the accuracy and robustness over mixing these source
domains and performing a UDA, as observed by recent studies in MSDA. Existing
MSDA methods learn domain invariant and domain-specific parameters (for each
source domain) for the adaptation. However, unlike single-source UDA methods,
learning domain-specific parameters makes them grow significantly proportional
to the number of source domains used. This paper proposes a novel MSDA method
called Prototype-based Mean-Teacher (PMT), which uses class prototypes instead
of domain-specific subnets to preserve domain-specific information. These
prototypes are learned using a contrastive loss, aligning the same categories
across domains and separating different categories far apart. Because of the
use of prototypes, the parameter size of our method does not increase
significantly with the number of source domains, thus reducing memory issues
and possible overfitting. Empirical studies show PMT outperforms
state-of-the-art MSDA methods on several challenging object detection datasets
Efficient Teacher: Semi-Supervised Object Detection for YOLOv5
Semi-Supervised Object Detection (SSOD) has been successful in improving the
performance of both R-CNN series and anchor-free detectors. However, one-stage
anchor-based detectors lack the structure to generate high-quality or flexible
pseudo labels, leading to serious inconsistency problems in SSOD. In this
paper, we propose the Efficient Teacher framework for scalable and effective
one-stage anchor-based SSOD training, consisting of Dense Detector, Pseudo
Label Assigner, and Epoch Adaptor. Dense Detector is a baseline model that
extends RetinaNet with dense sampling techniques inspired by YOLOv5. The
Efficient Teacher framework introduces a novel pseudo label assignment
mechanism, named Pseudo Label Assigner, which makes more refined use of pseudo
labels from Dense Detector. Epoch Adaptor is a method that enables a stable and
efficient end-to-end semi-supervised training schedule for Dense Detector. The
Pseudo Label Assigner prevents the occurrence of bias caused by a large number
of low-quality pseudo labels that may interfere with the Dense Detector during
the student-teacher mutual learning mechanism, and the Epoch Adaptor utilizes
domain and distribution adaptation to allow Dense Detector to learn globally
distributed consistent features, making the training independent of the
proportion of labeled data. Our experiments show that the Efficient Teacher
framework achieves state-of-the-art results on VOC, COCO-standard, and
COCO-additional using fewer FLOPs than previous methods. To the best of our
knowledge, this is the first attempt to apply Semi-Supervised Object Detection
to YOLOv5.Comment: 14 page
Correlation Debiasing for Unbiased Scene Graph Generation in Videos
Dynamic scene graph generation (SGG) from videos requires not only
comprehensive understanding of objects across the scenes that are prone to
temporal fluctuations but also a model the temporal motions and interactions
with different objects. Moreover, the long-tailed distribution of visual
relationships is the crucial bottleneck of most dynamic SGG methods, since most
of them focus on capturing spatio-temporal context using complex architectures,
which leads to the generation of biased scene graphs. To address these
challenges, we propose FloCoDe: Flow-aware temporal consistency and Correlation
Debiasing with uncertainty attenuation for unbiased dynamic scene graphs.
FloCoDe employs feature warping using flow to detect temporally consistent
objects across the frames. In addition, it uses correlation debiasing to learn
the unbiased relation representation for long-tailed classes. Moreover, to
attenuate the predictive uncertainties, it uses a mixture of sigmoidal
cross-entropy loss and contrastive loss to incorporate label correlations to
identify the commonly co-occurring relations and help debias the long-tailed
ones. Extensive experimental evaluation shows a performance gain as high as
4.1% showing the superiority of generating more unbiased scene graphs.Comment: 11 pages, 5 tables, 4 figure
Training-based Model Refinement and Representation Disagreement for Semi-Supervised Object Detection
Semi-supervised object detection (SSOD) aims to improve the performance and
generalization of existing object detectors by utilizing limited labeled data
and extensive unlabeled data. Despite many advances, recent SSOD methods are
still challenged by inadequate model refinement using the classical exponential
moving average (EMA) strategy, the consensus of Teacher-Student models in the
latter stages of training (i.e., losing their distinctiveness), and
noisy/misleading pseudo-labels. This paper proposes a novel training-based
model refinement (TMR) stage and a simple yet effective representation
disagreement (RD) strategy to address the limitations of classical EMA and the
consensus problem. The TMR stage of Teacher-Student models optimizes the
lightweight scaling operation to refine the model's weights and prevent
overfitting or forgetting learned patterns from unlabeled data. Meanwhile, the
RD strategy helps keep these models diverged to encourage the student model to
explore complementary representations. Our approach can be integrated into
established SSOD methods and is empirically validated using two baseline
methods, with and without cascade regression, to generate more reliable
pseudo-labels. Extensive experiments demonstrate the superior performance of
our approach over state-of-the-art SSOD methods. Specifically, the proposed
approach outperforms the baseline Unbiased-Teacher-v2 (& Unbiased-Teacher-v1)
method by an average mAP margin of 2.23, 2.1, and 3.36 (& 2.07, 1.9, and 3.27)
on COCO-standard, COCO-additional, and Pascal VOC datasets, respectively.Comment: Under revie
- …