116 research outputs found
IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors
Knowledge distillation (KD) has been proven to be useful for training compact
object detection models. However, we observe that KD is often effective when
the teacher model and student counterpart share similar proposal information.
This explains why existing KD methods are less effective for 1-bit detectors,
caused by a significant information discrepancy between the real-valued teacher
and the 1-bit student. This paper presents an Information Discrepancy-aware
strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate
information discrepancies and significantly reduce the performance gap between
a 1-bit detector and its real-valued counterpart. We formulate the distillation
process as a bi-level optimization formulation. At the inner level, we select
the representative proposals with maximum information discrepancy. We then
introduce a novel entropy distillation loss to reduce the disparity based on
the selected proposals. Extensive experiments demonstrate IDa-Det's superiority
over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and
COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with
ResNet-18 backbone. Our code is open-sourced on
https://github.com/SteveTsui/IDa-Det
Knowledge Distillation for Object Detection: from generic to remote sensing datasets
Knowledge distillation, a well-known model compression technique, is an
active research area in both computer vision and remote sensing communities. In
this paper, we evaluate in a remote sensing context various off-the-shelf
object detection knowledge distillation methods which have been originally
developed on generic computer vision datasets such as Pascal VOC. In
particular, methods covering both logit mimicking and feature imitation
approaches are applied for vehicle detection using the well-known benchmarks
such as xView and VEDAI datasets. Extensive experiments are performed to
compare the relative performance and interrelationships of the methods.
Experimental results show high variations and confirm the importance of result
aggregation and cross validation on remote sensing datasets.Comment: Accepted for publishing at IGARSS 202
Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation
Efficient object detection methods have recently received great attention in
remote sensing. Although deep convolutional networks often have excellent
detection accuracy, their deployment on resource-limited edge devices is
difficult. Knowledge distillation (KD) is a strategy for addressing this issue
since it makes models lightweight while maintaining accuracy. However, existing
KD methods for object detection have encountered two constraints. First, they
discard potentially important background information and only distill nearby
foreground regions. Second, they only rely on the global context, which limits
the student detector's ability to acquire local information from the teacher
detector. To address the aforementioned challenges, we propose Attention-based
Feature Distillation (AFD), a new KD approach that distills both local and
global information from the teacher detector. To enhance local distillation, we
introduce a multi-instance attention mechanism that effectively distinguishes
between background and foreground elements. This approach prompts the student
detector to focus on the pertinent channels and pixels, as identified by the
teacher detector. Local distillation lacks global information, thus attention
global distillation is proposed to reconstruct the relationship between various
pixels and pass it from teacher to student detector. The performance of AFD is
evaluated on two public aerial image benchmarks, and the evaluation results
demonstrate that AFD in object detection can attain the performance of other
state-of-the-art models while being efficient
Dual Relation Knowledge Distillation for Object Detection
Knowledge distillation is an effective method for model compression. However,
it is still a challenging topic to apply knowledge distillation to detection
tasks. There are two key points resulting in poor distillation performance for
detection tasks. One is the serious imbalance between foreground and background
features, another one is that small object lacks enough feature representation.
To solve the above issues, we propose a new distillation method named dual
relation knowledge distillation (DRKD), including pixel-wise relation
distillation and instance-wise relation distillation. The pixel-wise relation
distillation embeds pixel-wise features in the graph space and applies graph
convolution to capture the global pixel relation. By distilling the global
pixel relation, the student detector can learn the relation between foreground
and background features, and avoid the difficulty of distilling features
directly for the feature imbalance issue. Besides, we find that instance-wise
relation supplements valuable knowledge beyond independent features for small
objects. Thus, the instance-wise relation distillation is designed, which
calculates the similarity of different instances to obtain a relation matrix.
More importantly, a relation filter module is designed to highlight valuable
instance relations. The proposed dual relation knowledge distillation is
general and can be easily applied for both one-stage and two-stage detectors.
Our method achieves state-of-the-art performance, which improves Faster R-CNN
based on ResNet50 from 38.4% to 41.6% mAP and improves RetinaNet based on
ResNet50 from 37.4% to 40.3% mAP on COCO 2017.Comment: Accepted by IJCAI-202
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation
3D perception based on the representations learned from multi-camera
bird's-eye-view (BEV) is trending as cameras are cost-effective for mass
production in autonomous driving industry. However, there exists a distinct
performance gap between multi-camera BEV and LiDAR based 3D object detection.
One key reason is that LiDAR captures accurate depth and other geometry
measurements, while it is notoriously challenging to infer such 3D information
from merely image input. In this work, we propose to boost the representation
learning of a multi-camera BEV based student detector by training it to imitate
the features of a well-trained LiDAR based teacher detector. We propose
effective balancing strategy to enforce the student to focus on learning the
crucial features from the teacher, and generalize knowledge transfer to
multi-scale layers with temporal fusion. We conduct extensive evaluations on
multiple representative models of multi-camera BEV. Experiments reveal that our
approach renders significant improvement over the student models, leading to
the state-of-the-art performance on the popular benchmark nuScenes.Comment: ICCV 202
- …