120,579 research outputs found
Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks
Much of the focus in the area of knowledge distillation has been on
distilling knowledge from a larger teacher network to a smaller student
network. However, there has been little research on how the concept of
distillation can be leveraged to distill the knowledge encapsulated in the
training data itself into a reduced form. In this study, we explore the concept
of progressive label distillation, where we leverage a series of
teacher-student network pairs to progressively generate distilled training data
for learning deep neural networks with greatly reduced input dimensions. To
investigate the efficacy of the proposed progressive label distillation
approach, we experimented with learning a deep limited vocabulary speech
recognition network based on generated 500ms input utterances distilled
progressively from 1000ms source training data, and demonstrated a significant
increase in test accuracy of almost 78% compared to direct learning.Comment: 9 page
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
We propose a lifelong learning system that has the ability to reuse and
transfer knowledge from one task to another while efficiently retaining the
previously learned knowledge-base. Knowledge is transferred by learning
reusable skills to solve tasks in Minecraft, a popular video game which is an
unsolved and high-dimensional lifelong learning problem. These reusable skills,
which we refer to as Deep Skill Networks, are then incorporated into our novel
Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using
two techniques: (1) a deep skill array and (2) skill distillation, our novel
variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill
distillation enables the HDRLN to efficiently retain knowledge and therefore
scale in lifelong learning, by accumulating knowledge and encapsulating
multiple reusable skills into a single distilled network. The H-DRLN exhibits
superior performance and lower learning sample complexity compared to the
regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
Many recent works on knowledge distillation have provided ways to transfer
the knowledge of a trained network for improving the learning process of a new
one, but finding a good technique for knowledge distillation is still an open
problem. In this paper, we provide a new perspective based on a decision
boundary, which is one of the most important component of a classifier. The
generalization performance of a classifier is closely related to the adequacy
of its decision boundary, so a good classifier bears a good decision boundary.
Therefore, transferring information closely related to the decision boundary
can be a good attempt for knowledge distillation. To realize this goal, we
utilize an adversarial attack to discover samples supporting a decision
boundary. Based on this idea, to transfer more accurate information about the
decision boundary, the proposed algorithm trains a student classifier based on
the adversarial samples supporting the decision boundary. Experiments show that
the proposed method indeed improves knowledge distillation and achieves the
state-of-the-arts performance.Comment: Accepted to AAAI 201
- …
