71 research outputs found
Training Behavior of Sparse Neural Network Topologies
Improvements in the performance of deep neural networks have often come
through the design of larger and more complex networks. As a result, fast
memory is a significant limiting factor in our ability to improve network
performance. One approach to overcoming this limit is the design of sparse
neural networks, which can be both very large and efficiently trained. In this
paper we experiment training on sparse neural network topologies. We test
pruning-based topologies, which are derived from an initially dense network
whose connections are pruned, as well as RadiX-Nets, a class of network
topologies with proven connectivity and sparsity properties. Results show that
sparse networks obtain accuracies comparable to dense networks, but extreme
levels of sparsity cause instability in training, which merits further study.Comment: 6 pages. Presented at the 2019 IEEE High Performance Extreme
Computing (HPEC) Conference. Received "Best Paper" awar
Weightless: Lossy Weight Encoding For Deep Neural Network Compression
The large memory requirements of deep neural networks limit their deployment
and adoption on many devices. Model compression methods effectively reduce the
memory requirements of these models, usually through applying transformations
such as weight pruning or quantization. In this paper, we present a novel
scheme for lossy weight encoding which complements conventional compression
techniques. The encoding is based on the Bloomier filter, a probabilistic data
structure that can save space at the cost of introducing random errors.
Leveraging the ability of neural networks to tolerate these imperfections and
by re-training around the errors, the proposed technique, Weightless, can
compress DNN weights by up to 496x with the same model accuracy. This results
in up to a 1.51x improvement over the state-of-the-art
Predefined Sparseness in Recurrent Sequence Models
Inducing sparseness while training neural networks has been shown to yield
models with a lower memory footprint but similar effectiveness to dense models.
However, sparseness is typically induced starting from a dense model, and thus
this advantage does not hold during training. We propose techniques to enforce
sparseness upfront in recurrent sequence models for NLP applications, to also
benefit training. First, in language modeling, we show how to increase hidden
state sizes in recurrent layers without increasing the number of parameters,
leading to more expressive models. Second, for sequence labeling, we show that
word embeddings with predefined sparseness lead to similar performance as dense
embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning
(CoNLL, 2018
- …