22,749 research outputs found
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
ADaPTION: Toolbox and Benchmark for Training Convolutional Neural Networks with Reduced Numerical Precision Weights and Activation
Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are
useful for many practical tasks in machine learning. Synaptic weights, as well
as neuron activation functions within the deep network are typically stored
with high-precision formats, e.g. 32 bit floating point. However, since storage
capacity is limited and each memory access consumes power, both storage
capacity and memory access are two crucial factors in these networks. Here we
present a method and present the ADaPTION toolbox to extend the popular deep
learning library Caffe to support training of deep CNNs with reduced numerical
precision of weights and activations using fixed point notation. ADaPTION
includes tools to measure the dynamic range of weights and activations. Using
the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit
weights and activations with only 0.8% drop in Top-1 accuracy. The
quantization, especially of the activations, leads to increase of up to 50% of
sparsity especially in early and intermediate layers, which we exploit to skip
multiplications with zero, thus performing faster and computationally cheaper
inference.Comment: 10 pages, 5 figure
- …