8,089 research outputs found
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
After the tremendous success of convolutional neural networks in image
classification, object detection, speech recognition, etc., there is now rising
demand for deployment of these compute-intensive ML models on tightly power
constrained embedded and mobile systems at low cost as well as for pushing the
throughput in data centers. This has triggered a wave of research towards
specialized hardware accelerators. Their performance is often constrained by
I/O bandwidth and the energy consumption is dominated by I/O transfers to
off-chip memory. We introduce and evaluate a novel, hardware-friendly
compression scheme for the feature maps present within convolutional neural
networks. We show that an average compression ratio of 4.4x relative to
uncompressed data and a gain of 60% over existing method can be achieved for
ResNet-34 with a compression block requiring <300 bit of sequential cells and
minimal combinational logic
Scalable Neural Network Decoders for Higher Dimensional Quantum Codes
Machine learning has the potential to become an important tool in quantum
error correction as it allows the decoder to adapt to the error distribution of
a quantum chip. An additional motivation for using neural networks is the fact
that they can be evaluated by dedicated hardware which is very fast and
consumes little power. Machine learning has been previously applied to decode
the surface code. However, these approaches are not scalable as the training
has to be redone for every system size which becomes increasingly difficult. In
this work the existence of local decoders for higher dimensional codes leads us
to use a low-depth convolutional neural network to locally assign a likelihood
of error on each qubit. For noiseless syndrome measurements, numerical
simulations show that the decoder has a threshold of around when
applied to the 4D toric code. When the syndrome measurements are noisy, the
decoder performs better for larger code sizes when the error probability is
low. We also give theoretical and numerical analysis to show how a
convolutional neural network is different from the 1-nearest neighbor
algorithm, which is a baseline machine learning method
Deep Spiking Neural Networks: Study on the MNIST and N-MNIST Data Sets
Deep learning, i.e., the use of deep convolutional neural networks (DCNN), is a powerful tool for pattern recognition (image classification) and natural language (speech) processing. Deep convolutional networks use multiple convoltuion layers to learn the input data. They have been used to classify the large dataset Imagenet with an accuracy of 96.6%. In this work deep spiking networks are considered. This is new paradigm for implementing artificial neural networks using mechanisms that incorporate spike-timing dependent plasticity which is a learning algorithm discovered by neuroscientists. Advances in deep learning has opened up multitude of new avenues that once were limited to science fiction. The promise of spiking networks is that they are less computationally intensive and much more energy efficient as the spiking algorithms can be implemented on a neuromorphic chip such as Intel’s LOIHI chip (operates at low power because it runs asynchronously using spikes). Our work is based on the work of Masquelier and Thorpe, and Kheradpisheh et al. In particular a study is done of how such networks classify MNIST image data and N-MNIST spiking data. The networks used in consist of multiple convolution/pooling layers of spiking neurons trained using spike timing dependent plasticity (STDP) and a final classification layer done using a support vector machine (SVM)
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
Deep neural networks have achieved impressive results in computer vision and
machine learning. Unfortunately, state-of-the-art networks are extremely
compute and memory intensive which makes them unsuitable for mW-devices such as
IoT end-nodes. Aggressive quantization of these networks dramatically reduces
the computation and memory footprint. Binary-weight neural networks (BWNs)
follow this trend, pushing weight quantization to the limit. Hardware
accelerators for BWNs presented up to now have focused on core efficiency,
disregarding I/O bandwidth and system-level efficiency that are crucial for
deployment of accelerators in ultra-low power devices. We present Hyperdrive: a
BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel
binary-weight streaming approach, which can be used for arbitrarily sized
convolutional neural network architecture and input resolution by exploiting
the natural scalability of the compute units both at chip-level and
system-level by arranging Hyperdrive chips systolically in a 2D mesh while
processing the entire feature map together in parallel. Hyperdrive achieves 4.3
TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than
state-of-the-art BWN accelerators, even if its core uses resource-intensive
FP16 arithmetic for increased robustness
- …