This paper focuses on long-tailed object detection in the semi-supervised
learning setting, which poses realistic challenges, but has rarely been studied
in the literature. We propose a novel pseudo-labeling-based detector called
CascadeMatch. Our detector features a cascade network architecture, which has
multi-stage detection heads with progressive confidence thresholds. To avoid
manually tuning the thresholds, we design a new adaptive pseudo-label mining
mechanism to automatically identify suitable values from data. To mitigate
confirmation bias, where a model is negatively reinforced by incorrect
pseudo-labels produced by itself, each detection head is trained by the
ensemble pseudo-labels of all detection heads. Experiments on two long-tailed
datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses
existing state-of-the-art semi-supervised approaches -- across a wide range of
detection architectures -- in handling long-tailed object detection. For
instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when
using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using
Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can
even handle the challenging sparsely annotated object detection problem.Comment: International Journal of Computer Vision (IJCV), 202