18 research outputs found
IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors
Knowledge distillation (KD) has been proven to be useful for training compact
object detection models. However, we observe that KD is often effective when
the teacher model and student counterpart share similar proposal information.
This explains why existing KD methods are less effective for 1-bit detectors,
caused by a significant information discrepancy between the real-valued teacher
and the 1-bit student. This paper presents an Information Discrepancy-aware
strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate
information discrepancies and significantly reduce the performance gap between
a 1-bit detector and its real-valued counterpart. We formulate the distillation
process as a bi-level optimization formulation. At the inner level, we select
the representative proposals with maximum information discrepancy. We then
introduce a novel entropy distillation loss to reduce the disparity based on
the selected proposals. Extensive experiments demonstrate IDa-Det's superiority
over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and
COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with
ResNet-18 backbone. Our code is open-sourced on
https://github.com/SteveTsui/IDa-Det
Dual Relation Knowledge Distillation for Object Detection
Knowledge distillation is an effective method for model compression. However,
it is still a challenging topic to apply knowledge distillation to detection
tasks. There are two key points resulting in poor distillation performance for
detection tasks. One is the serious imbalance between foreground and background
features, another one is that small object lacks enough feature representation.
To solve the above issues, we propose a new distillation method named dual
relation knowledge distillation (DRKD), including pixel-wise relation
distillation and instance-wise relation distillation. The pixel-wise relation
distillation embeds pixel-wise features in the graph space and applies graph
convolution to capture the global pixel relation. By distilling the global
pixel relation, the student detector can learn the relation between foreground
and background features, and avoid the difficulty of distilling features
directly for the feature imbalance issue. Besides, we find that instance-wise
relation supplements valuable knowledge beyond independent features for small
objects. Thus, the instance-wise relation distillation is designed, which
calculates the similarity of different instances to obtain a relation matrix.
More importantly, a relation filter module is designed to highlight valuable
instance relations. The proposed dual relation knowledge distillation is
general and can be easily applied for both one-stage and two-stage detectors.
Our method achieves state-of-the-art performance, which improves Faster R-CNN
based on ResNet50 from 38.4% to 41.6% mAP and improves RetinaNet based on
ResNet50 from 37.4% to 40.3% mAP on COCO 2017.Comment: Accepted by IJCAI-202
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
Conventional knowledge distillation (KD) methods for object detection mainly
concentrate on homogeneous teacher-student detectors. However, the design of a
lightweight detector for deployment is often significantly different from a
high-capacity detector. Thus, we investigate KD among heterogeneous
teacher-student pairs for a wide application. We observe that the core
difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap
between the backbone features of heterogeneous detectors due to the different
optimization manners. Conventional homogeneous KD (homo-KD) methods suffer from
such a gap and are hard to directly obtain satisfactory performance for
hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD)
framework, leveraging heterogeneous detection heads as assistants to guide the
optimization of the student detector to reduce this gap. In HEAD, the assistant
is an additional detection head with the architecture homogeneous to the
teacher head attached to the student backbone. Thus, a hetero-KD is transformed
into a homo-KD, allowing efficient knowledge transfer from the teacher to the
student. Moreover, we extend HEAD into a Teacher-Free HEAD (TF-HEAD) framework
when a well-trained teacher detector is unavailable. Our method has achieved
significant improvement compared to current detection KD methods. For example,
on the MS-COCO dataset, TF-HEAD helps R18 RetinaNet achieve 33.9 mAP (+2.2),
while HEAD further pushes the limit to 36.2 mAP (+4.5).Comment: ECCV 2022, Code: https://github.com/LutingWang/HEA
Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction
Previous works on human motion prediction follow the pattern of building a
mapping relation between the sequence observed and the one to be predicted.
However, due to the inherent complexity of multivariate time series data, it
still remains a challenge to find the extrapolation relation between motion
sequences. In this paper, we present a new prediction pattern, which introduces
previously overlooked human poses, to implement the prediction task from the
view of interpolation. These poses exist after the predicted sequence, and form
the privileged sequence. To be specific, we first propose an InTerPolation
learning Network (ITP-Network) that encodes both the observed sequence and the
privileged sequence to interpolate the in-between predicted sequence, wherein
the embedded Privileged-sequence-Encoder (Priv-Encoder) learns the privileged
knowledge (PK) simultaneously. Then, we propose a Final Prediction Network
(FP-Network) for which the privileged sequence is not observable, but is
equipped with a novel PK-Simulator that distills PK learned from the previous
network. This simulator takes as input the observed sequence, but approximates
the behavior of Priv-Encoder, enabling FP-Network to imitate the interpolation
process. Extensive experimental results demonstrate that our prediction pattern
achieves state-of-the-art performance on benchmarked H3.6M, CMU-Mocap and 3DPW
datasets in both short-term and long-term predictions.Comment: accepted by ECCV202