288 research outputs found
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
Regularizing Deep Networks by Modeling and Predicting Label Structure
We construct custom regularization functions for use in supervised training
of deep neural networks. Our technique is applicable when the ground-truth
labels themselves exhibit internal structure; we derive a regularizer by
learning an autoencoder over the set of annotations. Training thereby becomes a
two-phase procedure. The first phase models labels with an autoencoder. The
second phase trains the actual network of interest by attaching an auxiliary
branch that must predict output via a hidden layer of the autoencoder. After
training, we discard this auxiliary branch.
We experiment in the context of semantic segmentation, demonstrating this
regularization strategy leads to consistent accuracy boosts over baselines,
both when training from scratch, or in combination with ImageNet pretraining.
Gains are also consistent over different choices of convolutional network
architecture. As our regularizer is discarded after training, our method has
zero cost at test time; the performance improvements are essentially free. We
are simply able to learn better network weights by building an abstract model
of the label space, and then training the network to understand this
abstraction alongside the original task.Comment: to appear at CVPR 201
Distilling Representations from GAN Generator via Squeeze and Span
In recent years, generative adversarial networks (GANs) have been an actively
studied topic and shown to successfully produce high-quality realistic images
in various domains. The controllable synthesis ability of GAN generators
suggests that they maintain informative, disentangled, and explainable image
representations, but leveraging and transferring their representations to
downstream tasks is largely unexplored. In this paper, we propose to distill
knowledge from GAN generators by squeezing and spanning their representations.
We squeeze the generator features into representations that are invariant to
semantic-preserving transformations through a network before they are distilled
into the student network. We span the distilled representation of the synthetic
domain to the real domain by also using real training data to remedy the mode
collapse of GANs and boost the student network performance in a real domain.
Experiments justify the efficacy of our method and reveal its great
significance in self-supervised representation learning. Code is available at
https://github.com/yangyu12/squeeze-and-span.Comment: 16 pages, NeurIPS 202
Detecting Adversarial Examples by Measuring their Stress Response
abstract: Machine learning (ML) and deep neural networks (DNNs) have achieved great success in a variety of application domains, however, despite significant effort to make these networks robust, they remain vulnerable to adversarial attacks in which input that is perceptually indistinguishable from natural data can be erroneously classified with high prediction confidence. Works on defending against adversarial examples can be broadly classified as correcting or detecting, which aim, respectively at negating the effects of the attack and correctly classifying the input, or detecting and rejecting the input as adversarial. In this work, a new approach for detecting adversarial examples is proposed. The approach takes advantage of the robustness of natural images to noise. As noise is added to a natural image, the prediction probability of its true class drops, but the drop is not sudden or precipitous. The same seems to not hold for adversarial examples. In other word, the stress response profile for natural images seems different from that of adversarial examples, which could be detected by their stress response profile. An evaluation of this approach for detecting adversarial examples is performed on the MNIST, CIFAR-10 and ImageNet datasets. Experimental data shows that this approach is effective at detecting some adversarial examples on small scaled simple content images and with little sacrifice on benign accuracy.Dissertation/ThesisMasters Thesis Computer Science 201
- …