950 research outputs found
Training Behavior of Sparse Neural Network Topologies
Improvements in the performance of deep neural networks have often come
through the design of larger and more complex networks. As a result, fast
memory is a significant limiting factor in our ability to improve network
performance. One approach to overcoming this limit is the design of sparse
neural networks, which can be both very large and efficiently trained. In this
paper we experiment training on sparse neural network topologies. We test
pruning-based topologies, which are derived from an initially dense network
whose connections are pruned, as well as RadiX-Nets, a class of network
topologies with proven connectivity and sparsity properties. Results show that
sparse networks obtain accuracies comparable to dense networks, but extreme
levels of sparsity cause instability in training, which merits further study.Comment: 6 pages. Presented at the 2019 IEEE High Performance Extreme
Computing (HPEC) Conference. Received "Best Paper" awar
CondenseNet: An Efficient DenseNet using Learned Group Convolutions
Deep neural networks are increasingly used on mobile devices, where
computational resources are limited. In this paper we develop CondenseNet, a
novel network architecture with unprecedented efficiency. It combines dense
connectivity with a novel module called learned group convolution. The dense
connectivity facilitates feature re-use in the network, whereas learned group
convolutions remove connections between layers for which this feature re-use is
superfluous. At test time, our model can be implemented using standard group
convolutions, allowing for efficient computation in practice. Our experiments
show that CondenseNets are far more efficient than state-of-the-art compact
convolutional networks such as MobileNets and ShuffleNets
Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference
Applying ML advancements to healthcare can improve patient outcomes. However,
the sheer operational complexity of ML models, combined with legacy hardware
and multi-modal gigapixel images, poses a severe deployment limitation for
real-time, on-device inference. We consider filter pruning as a solution,
exploring segmentation models in cardiology and ophthalmology. Our preliminary
results show a compression rate of up to 1148x with minimal loss in quality,
stressing the need to consider task complexity and architectural details when
using off-the-shelf models. At high compression rates, filter-pruned models
exhibit faster inference on a CPU than the GPU baseline. We also demonstrate
that such models' robustness and generalisability characteristics exceed that
of the baseline and weight-pruned counterparts. We uncover intriguing questions
and take a step towards realising cost-effective disease diagnosis, monitoring,
and preventive solutions
- …