3 research outputs found
Dual Discriminator Adversarial Distillation for Data-free Model Compression
Knowledge distillation has been widely used to produce portable and efficient
neural networks which can be well applied on edge devices for computer vision
tasks. However, almost all top-performing knowledge distillation methods need
to access the original training data, which usually has a huge size and is
often unavailable. To tackle this problem, we propose a novel data-free
approach in this paper, named Dual Discriminator Adversarial Distillation
(DDAD) to distill a neural network without any training data or meta-data. To
be specific, we use a generator to create samples through dual discriminator
adversarial distillation, which mimics the original training data. The
generator not only uses the pre-trained teacher's intrinsic statistics in
existing batch normalization layers but also obtains the maximum discrepancy
from the student model. Then the generated samples are used to train the
compact student network under the supervision of the teacher. The proposed
method obtains an efficient student network which closely approximates its
teacher network, despite using no original training data. Extensive experiments
are conducted to to demonstrate the effectiveness of the proposed approach on
CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover,
we extend our method to semantic segmentation tasks on several public datasets
such as CamVid and NYUv2. All experiments show that our method outperforms all
baselines for data-free knowledge distillation