89 research outputs found
Ternary Weight Networks
We introduce ternary weight networks (TWNs) - neural networks with weights
constrained to +1, 0 and -1. The Euclidian distance between full (float or
double) precision weights and the ternary weights along with a scaling factor
is minimized. Besides, a threshold-based ternary function is optimized to get
an approximated solution which can be fast and easily computed. TWNs have
stronger expressive abilities than the recently proposed binary precision
counterparts and are thus more effective than the latter. Meanwhile, TWNs
achieve up to 16 or 32 model compression rate and need fewer
multiplications compared with the full precision counterparts. Benchmarks on
MNIST, CIFAR-10, and large scale ImageNet datasets show that the performance of
TWNs is only slightly worse than the full precision counterparts but
outperforms the analogous binary precision counterparts a lot.Comment: 5 pages, 3 fitures, conferenc
ADaPTION: Toolbox and Benchmark for Training Convolutional Neural Networks with Reduced Numerical Precision Weights and Activation
Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are
useful for many practical tasks in machine learning. Synaptic weights, as well
as neuron activation functions within the deep network are typically stored
with high-precision formats, e.g. 32 bit floating point. However, since storage
capacity is limited and each memory access consumes power, both storage
capacity and memory access are two crucial factors in these networks. Here we
present a method and present the ADaPTION toolbox to extend the popular deep
learning library Caffe to support training of deep CNNs with reduced numerical
precision of weights and activations using fixed point notation. ADaPTION
includes tools to measure the dynamic range of weights and activations. Using
the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit
weights and activations with only 0.8% drop in Top-1 accuracy. The
quantization, especially of the activations, leads to increase of up to 50% of
sparsity especially in early and intermediate layers, which we exploit to skip
multiplications with zero, thus performing faster and computationally cheaper
inference.Comment: 10 pages, 5 figure
- …