3 research outputs found
Neural Network Memorization Dissection
Deep neural networks (DNNs) can easily fit a random labeling of the training
data with zero training error. What is the difference between DNNs trained with
random labels and the ones trained with true labels? Our paper answers this
question with two contributions. First, we study the memorization properties of
DNNs. Our empirical experiments shed light on how DNNs prioritize the learning
of simple input patterns. In the second part, we propose to measure the
similarity between what different DNNs have learned and memorized. With the
proposed approach, we analyze and compare DNNs trained on data with true labels
and random labels. The analysis shows that DNNs have \textit{One way to Learn}
and \textit{N ways to Memorize}. We also use gradient information to gain an
understanding of the analysis results.Comment: Workshop on Machine Learning with Guarantees, NeurIPS 201
Introspective Learning by Distilling Knowledge from Online Self-explanation
In recent years, many explanation methods have been proposed to explain
individual classifications of deep neural networks. However, how to leverage
the created explanations to improve the learning process has been less
explored. As the privileged information, the explanations of a model can be
used to guide the learning process of the model itself. In the community,
another intensively investigated privileged information used to guide the
training of a model is the knowledge from a powerful teacher model. The goal of
this work is to leverage the self-explanation to improve the learning process
by borrowing ideas from knowledge distillation. We start by investigating the
effective components of the knowledge transferred from the teacher network to
the student network. Our investigation reveals that both the responses in
non-ground-truth classes and class-similarity information in teacher's outputs
contribute to the success of the knowledge distillation. Motivated by the
conclusion, we propose an implementation of introspective learning by
distilling knowledge from online self-explanations. The models trained with the
introspective learning procedure outperform the ones trained with the standard
learning procedure, as well as the ones trained with different regularization
methods. When compared to the models learned from peer networks or teacher
networks, our models also show competitive performance and requires neither
peers nor teachers
When Do Curricula Work?
Inspired by human learning, researchers have proposed ordering examples
during training based on their difficulty. Both curriculum learning, exposing a
network to easier examples early in training, and anti-curriculum learning,
showing the most difficult examples first, have been suggested as improvements
to the standard i.i.d. training. In this work, we set out to investigate the
relative benefits of ordered learning. We first investigate the \emph{implicit
curricula} resulting from architectural and optimization bias and find that
samples are learned in a highly consistent order. Next, to quantify the benefit
of \emph{explicit curricula}, we conduct extensive experiments over thousands
of orderings spanning three kinds of learning: curriculum, anti-curriculum, and
random-curriculum -- in which the size of the training dataset is dynamically
increased over time, but the examples are randomly ordered. We find that for
standard benchmark datasets, curricula have only marginal benefits, and that
randomly ordered samples perform as well or better than curricula and
anti-curricula, suggesting that any benefit is entirely due to the dynamic
training set size. Inspired by common use cases of curriculum learning in
practice, we investigate the role of limited training time budget and noisy
data in the success of curriculum learning. Our experiments demonstrate that
curriculum, but not anti-curriculum can indeed improve the performance either
with limited training time budget or in existence of noisy data.Comment: ICLR 202