116 research outputs found
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
It is desirable to train convolutional networks (CNNs) to run more
efficiently during inference. In many cases however, the computational budget
that the system has for inference cannot be known beforehand during training,
or the inference budget is dependent on the changing real-time resource
availability. Thus, it is inadequate to train just inference-efficient CNNs,
whose inference costs are not adjustable and cannot adapt to varied inference
budgets. We propose a novel approach for cost-adjustable inference in CNNs -
Stochastic Downsampling Point (SDPoint). During training, SDPoint applies
feature map downsampling to a random point in the layer hierarchy, with a
random downsampling ratio. The different stochastic downsampling configurations
known as SDPoint instances (of the same model) have computational costs
different from each other, while being trained to minimize the same prediction
loss. Sharing network parameters across different instances provides
significant regularization boost. During inference, one may handpick a SDPoint
instance that best fits the inference budget. The effectiveness of SDPoint, as
both a cost-adjustable inference approach and a regularizer, is validated
through extensive experiments on image classification
Scale-Space Hypernetworks for Efficient Biomedical Imaging
Convolutional Neural Networks (CNNs) are the predominant model used for a
variety of medical image analysis tasks. At inference time, these models are
computationally intensive, especially with volumetric data. In principle, it is
possible to trade accuracy for computational efficiency by manipulating the
rescaling factor in the downsample and upsample layers of CNN architectures.
However, properly exploring the accuracy-efficiency trade-off is prohibitively
expensive with existing models. To address this, we introduce Scale-Space
HyperNetworks (SSHN), a method that learns a spectrum of CNNs with varying
internal rescaling factors. A single SSHN characterizes an entire Pareto
accuracy-efficiency curve of models that match, and occasionally surpass, the
outcomes of training many separate networks with fixed rescaling factors. We
demonstrate the proposed approach in several medical image analysis
applications, comparing SSHN against strategies with both fixed and dynamic
rescaling factors. We find that SSHN consistently provides a better
accuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNs
enable the user to quickly choose a rescaling factor that appropriately
balances accuracy and computational efficiency for their particular needs at
inference.Comment: Code available at https://github.com/JJGO/scale-space-hypernetwork
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
- …