25,600 research outputs found
DETReg: Unsupervised Pretraining with Region Priors for Object Detection
Recent self-supervised pretraining methods for object detection largely focus
on pretraining the backbone of the object detector, neglecting key parts of
detection architecture. Instead, we introduce DETReg, a new self-supervised
method that pretrains the entire object detection network, including the object
localization and embedding components. During pretraining, DETReg predicts
object localizations to match the localizations from an unsupervised region
proposal generator and simultaneously aligns the corresponding feature
embeddings with embeddings from a self-supervised image encoder. We implement
DETReg using the DETR family of detectors and show that it improves over
competitive baselines when finetuned on COCO, PASCAL VOC, and Airbus Ship
benchmarks. In low-data regimes, including semi-supervised and few-shot
learning settings, DETReg establishes many state-of-the-art results, e.g., on
COCO we see a +6.0 AP improvement for 10-shot detection and over 2 AP
improvements when training with only 1\% of the labels. For code and pretrained
models, visit the project page at https://amirbar.net/detregComment: CVPR 2022 Camera Read
MOST: Multiple Object localization with Self-supervised Transformers for object discovery
We tackle the challenging task of unsupervised object localization in this
work. Recently, transformers trained with self-supervised learning have been
shown to exhibit object localization properties without being trained for this
task. In this work, we present Multiple Object localization with
Self-supervised Transformers (MOST) that uses features of transformers trained
using self-supervised learning to localize multiple objects in real world
images. MOST analyzes the similarity maps of the features using box counting; a
fractal analysis tool to identify tokens lying on foreground patches. The
identified tokens are then clustered together, and tokens of each cluster are
used to generate bounding boxes on foreground regions. Unlike recent
state-of-the-art object localization methods, MOST can localize multiple
objects per image and outperforms SOTA algorithms on several object
localization and discovery benchmarks on PASCAL-VOC 07, 12 and COCO20k
datasets. Additionally, we show that MOST can be used for self-supervised
pre-training of object detectors, and yields consistent improvements on fully,
semi-supervised object detection and unsupervised region proposal generation
Improved Region Proposal Network for Enhanced Few-Shot Object Detection
Despite significant success of deep learning in object detection tasks, the
standard training of deep neural networks requires access to a substantial
quantity of annotated images across all classes. Data annotation is an arduous
and time-consuming endeavor, particularly when dealing with infrequent objects.
Few-shot object detection (FSOD) methods have emerged as a solution to the
limitations of classic object detection approaches based on deep learning. FSOD
methods demonstrate remarkable performance by achieving robust object detection
using a significantly smaller amount of training data. A challenge for FSOD is
that instances from novel classes that do not belong to the fixed set of
training classes appear in the background and the base model may pick them up
as potential objects. These objects behave similarly to label noise because
they are classified as one of the training dataset classes, leading to FSOD
performance degradation. We develop a semi-supervised algorithm to detect and
then utilize these unlabeled novel objects as positive samples during the FSOD
training stage to improve FSOD performance. Specifically, we develop a
hierarchical ternary classification region proposal network (HTRPN) to localize
the potential unlabeled novel objects and assign them new objectness labels to
distinguish these objects from the base training dataset classes. Our improved
hierarchical sampling strategy for the region proposal network (RPN) also
boosts the perception ability of the object detection model for large objects.
We test our approach and COCO and PASCAL VOC baselines that are commonly used
in FSOD literature. Our experimental results indicate that our method is
effective and outperforms the existing state-of-the-art (SOTA) FSOD methods.
Our implementation is provided as a supplement to support reproducibility of
the results.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1042
LSTD: A Low-Shot Transfer Detector for Object Detection
Recent advances in object detection are mainly driven by deep learning with
large-scale detection benchmarks. However, the fully-annotated training set is
often limited for a target detection task, which may deteriorate the
performance of deep detectors. To address this challenge, we propose a novel
low-shot transfer detector (LSTD) in this paper, where we leverage rich
source-domain knowledge to construct an effective target-domain detector with
very few training examples. The main contributions are described as follows.
First, we design a flexible deep architecture of LSTD to alleviate transfer
difficulties in low-shot detection. This architecture can integrate the
advantages of both SSD and Faster RCNN in a unified deep framework. Second, we
introduce a novel regularized transfer learning framework for low-shot
detection, where the transfer knowledge (TK) and background depression (BD)
regularizations are proposed to leverage object knowledge respectively from
source and target domains, in order to further enhance fine-tuning with a few
target images. Finally, we examine our LSTD on a number of challenging low-shot
detection experiments, where LSTD outperforms other state-of-the-art
approaches. The results demonstrate that LSTD is a preferable deep detector for
low-shot scenarios.Comment: Accepted by AAAI201
- …