4 research outputs found
Knowledge distillation via adaptive instance normalization
This paper addresses the problem of model compression via knowledge
distillation. To this end, we propose a new knowledge distillation method based
on transferring feature statistics, specifically the channel-wise mean and
variance, from the teacher to the student. Our method goes beyond the standard
way of enforcing the mean and variance of the student to be similar to those of
the teacher through an loss, which we found it to be of limited
effectiveness. Specifically, we propose a new loss based on adaptive instance
normalization to effectively transfer the feature statistics. The main idea is
to transfer the learned statistics back to the teacher via adaptive instance
normalization (conditioned on the student) and let the teacher network
"evaluate" via a loss whether the statistics learned by the student are
reliably transferred. We show that our distillation method outperforms other
state-of-the-art distillation methods over a large set of experimental settings
including different (a) network architectures, (b) teacher-student capacities,
(c) datasets, and (d) domains
Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup
Knowledge distillation, which involves extracting the "dark knowledge" from a
teacher network to guide the learning of a student network, has emerged as an
essential technique for model compression and transfer learning. Unlike
previous works that focus on the accuracy of student network, here we study a
little-explored but important question, i.e., knowledge distillation
efficiency. Our goal is to achieve a performance comparable to conventional
knowledge distillation with a lower computation cost during training. We show
that the UNcertainty-aware mIXup (UNIX) can serve as a clean yet effective
solution. The uncertainty sampling strategy is used to evaluate the
informativeness of each training sample. Adaptive mixup is applied to uncertain
samples to compact knowledge. We further show that the redundancy of
conventional knowledge distillation lies in the excessive learning of easy
samples. By combining uncertainty and mixup, our approach reduces the
redundancy and makes better use of each query to the teacher network. We
validate our approach on CIFAR100 and ImageNet. Notably, with only 79%
computation cost, we outperform conventional knowledge distillation on CIFAR100
and achieve a comparable result on ImageNet.Comment: The code is available at: https://github.com/xuguodong03/UNIXK
Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation
Automated machine learning (AutoML) can produce complex model ensembles by
stacking, bagging, and boosting many individual models like trees, deep
networks, and nearest neighbor estimators. While highly accurate, the resulting
predictors are large, slow, and opaque as compared to their constituents. To
improve the deployment of AutoML on tabular data, we propose FAST-DAD to
distill arbitrarily complex ensemble predictors into individual models like
boosted trees, random forests, and deep networks. At the heart of our approach
is a data augmentation strategy based on Gibbs sampling from a self-attention
pseudolikelihood estimator. Across 30 datasets spanning regression and
binary/multiclass classification tasks, FAST-DAD distillation produces
significantly better individual models than one obtains through standard
training on the original data. Our individual distilled models are over 10x
faster and more accurate than ensemble predictors produced by AutoML tools like
H2O/AutoSklearn
Compacting Deep Neural Networks for Internet of Things: Methods and Applications
Deep Neural Networks (DNNs) have shown great success in completing complex
tasks. However, DNNs inevitably bring high computational cost and storage
consumption due to the complexity of hierarchical structures, thereby hindering
their wide deployment in Internet-of-Things (IoT) devices, which have limited
computational capability and storage capacity. Therefore, it is a necessity to
investigate the technologies to compact DNNs. Despite tremendous advances in
compacting DNNs, few surveys summarize compacting-DNNs technologies, especially
for IoT applications. Hence, this paper presents a comprehensive study on
compacting-DNNs technologies. We categorize compacting-DNNs technologies into
three major types: 1) network model compression, 2) Knowledge Distillation
(KD), 3) modification of network structures. We also elaborate on the diversity
of these approaches and make side-by-side comparisons. Moreover, we discuss the
applications of compacted DNNs in various IoT applications and outline future
directions.Comment: 25 pages, 11 figure