3 research outputs found
Simple Image-level Classification Improves Open-vocabulary Object Detection
Open-Vocabulary Object Detection (OVOD) aims to detect novel objects beyond a
given set of base categories on which the detection model is trained. Recent
OVOD methods focus on adapting the image-level pre-trained vision-language
models (VLMs), such as CLIP, to a region-level object detection task via, eg.,
region-level knowledge distillation, regional prompt learning, or region-text
pre-training, to expand the detection vocabulary. These methods have
demonstrated remarkable performance in recognizing regional visual concepts,
but they are weak in exploiting the VLMs' powerful global scene understanding
ability learned from the billion-scale image-level text descriptions. This
limits their capability in detecting hard objects of small, blurred, or
occluded appearance from novel/base categories, whose detection heavily relies
on contextual information. To address this, we propose a novel approach, namely
Simple Image-level Classification for Context-Aware Detection Scoring
(SIC-CADS), to leverage the superior global knowledge yielded from CLIP for
complementing the current OVOD models from a global perspective. The core of
SIC-CADS is a multi-modal multi-label recognition (MLR) module that learns the
object co-occurrence-based contextual information from CLIP to recognize all
possible object categories in the scene. These image-level MLR scores can then
be utilized to refine the instance-level detection scores of the current OVOD
models in detecting those hard objects. This is verified by extensive empirical
results on two popular benchmarks, OV-LVIS and OV-COCO, which show that
SIC-CADS achieves significant and consistent improvement when combined with
different types of OVOD models. Further, SIC-CADS also improves the
cross-dataset generalization ability on Objects365 and OpenImages. The code is
available at https://github.com/mala-lab/SIC-CADS.Comment: Accepted at AAAI 202
Unsupervised Recognition of Unknown Objects for Open-World Object Detection
Open-World Object Detection (OWOD) extends object detection problem to a
realistic and dynamic scenario, where a detection model is required to be
capable of detecting both known and unknown objects and incrementally learning
newly introduced knowledge. Current OWOD models, such as ORE and OW-DETR, focus
on pseudo-labeling regions with high objectness scores as unknowns, whose
performance relies heavily on the supervision of known objects. While they can
detect the unknowns that exhibit similar features to the known objects, they
suffer from a severe label bias problem that they tend to detect all regions
(including unknown object regions) that are dissimilar to the known objects as
part of the background. To eliminate the label bias, this paper proposes a
novel approach that learns an unsupervised discriminative model to recognize
true unknown objects from raw pseudo labels generated by unsupervised region
proposal methods. The resulting model can be further refined by a
classification-free self-training method which iteratively extends pseudo
unknown objects to the unlabeled regions. Experimental results show that our
method 1) significantly outperforms the prior SOTA in detecting unknown objects
while maintaining competitive performance of detecting known object classes on
the MS COCO dataset, and 2) achieves better generalization ability on the LVIS
and Objects365 datasets