2,546 research outputs found
Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing
Recent advances in neural networks (NNs) exhibit unprecedented success at
transforming large, unstructured data streams into compact higher-level
semantic information for tasks such as handwriting recognition, image
classification, and speech recognition. Ideally, systems would employ
near-sensor computation to execute these tasks at sensor endpoints to maximize
data reduction and minimize data movement. However, near- sensor computing
presents its own set of challenges such as operating power constraints, energy
budgets, and communication bandwidth capacities. In this paper, we propose a
stochastic- binary hybrid design which splits the computation between the
stochastic and binary domains for near-sensor NN applications. In addition, our
design uses a new stochastic adder and multiplier that are significantly more
accurate than existing adders and multipliers. We also show that retraining the
binary portion of the NN computation can compensate for precision losses
introduced by shorter stochastic bit-streams, allowing faster run times at
minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary
design can achieve 9.8x energy efficiency savings, and application-level
accuracies within 0.05% compared to conventional all-binary designs.Comment: 6 pages, 3 figures, Design, Automata and Test in Europe (DATE) 201
Deep Learning with Limited Numerical Precision
Training of large-scale deep neural networks is often constrained by the
available computational resources. We study the effect of limited precision
data representation and computation on neural network training. Within the
context of low-precision fixed-point computations, we observe the rounding
scheme to play a crucial role in determining the network's behavior during
training. Our results show that deep networks can be trained using only 16-bit
wide fixed-point number representation when using stochastic rounding, and
incur little to no degradation in the classification accuracy. We also
demonstrate an energy-efficient hardware accelerator that implements
low-precision fixed-point arithmetic with stochastic rounding.Comment: 10 pages, 6 figures, 1 tabl
Progressive Stochastic Binarization of Deep Networks
A plethora of recent research has focused on improving the memory footprint
and inference speed of deep networks by reducing the complexity of (i)
numerical representations (for example, by deterministic or stochastic
quantization) and (ii) arithmetic operations (for example, by binarization of
weights).
We propose a stochastic binarization scheme for deep networks that allows for
efficient inference on hardware by restricting itself to additions of small
integers and fixed shifts. Unlike previous approaches, the underlying
randomized approximation is progressive, thus permitting an adaptive control of
the accuracy of each operation at run-time. In a low-precision setting, we
match the accuracy of previous binarized approaches. Our representation is
unbiased - it approaches continuous computation with increasing sample size. In
a high-precision regime, the computational costs are competitive with previous
quantization schemes. Progressive stochastic binarization also permits
localized, dynamic accuracy control within a single network, thereby providing
a new tool for adaptively focusing computational attention.
We evaluate our method on networks of various architectures, already
pretrained on ImageNet. With representational costs comparable to previous
schemes, we obtain accuracies close to the original floating point
implementation. This includes pruned networks, except the known special case of
certain types of separated convolutions. By focusing computational attention
using progressive sampling, we reduce inference costs on ImageNet further by a
factor of up to 33% (before network pruning)
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
INsight: A Neuromorphic Computing System for Evaluation of Large Neural Networks
Deep neural networks have been demonstrated impressive results in various
cognitive tasks such as object detection and image classification. In order to
execute large networks, Von Neumann computers store the large number of weight
parameters in external memories, and processing elements are timed-shared,
which leads to power-hungry I/O operations and processing bottlenecks. This
paper describes a neuromorphic computing system that is designed from the
ground up for the energy-efficient evaluation of large-scale neural networks.
The computing system consists of a non-conventional compiler, a neuromorphic
architecture, and a space-efficient microarchitecture that leverages existing
integrated circuit design methodologies. The compiler factorizes a trained,
feedforward network into a sparsely connected network, compresses the weights
linearly, and generates a time delay neural network reducing the number of
connections. The connections and units in the simplified network are mapped to
silicon synapses and neurons. We demonstrate an implementation of the
neuromorphic computing system based on a field-programmable gate array that
performs the MNIST hand-written digit classification with 97.64% accuracy
Developing a Bubble Chamber Particle Discriminator Using Semi-Supervised Learning
The identification of non-signal events is a major hurdle to overcome for
bubble chamber dark matter experiments such as PICO-60. The current practice of
manually developing a discriminator function to eliminate background events is
difficult when available calibration data is frequently impure and present only
in small quantities. In this study, several different discriminator
input/preprocessing formats and neural network architectures are applied to the
task. First, they are optimized in a supervised learning context. Next, two
novel semi-supervised learning algorithms are trained, and found to replicate
the Acoustic Parameter (AP) discriminator previously used in PICO-60 with a
mean of 97% accuracy.Comment: 27 pages, 10 figure
Recommended from our members
Deep Cytometry: Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry.
Deep learning has achieved spectacular performance in image and speech recognition and synthesis. It outperforms other machine learning algorithms in problems where large amounts of data are available. In the area of measurement technology, instruments based on the photonic time stretch have established record real-time measurement throughput in spectroscopy, optical coherence tomography, and imaging flow cytometry. These extreme-throughput instruments generate approximately 1 Tbit/s of continuous measurement data and have led to the discovery of rare phenomena in nonlinear and complex systems as well as new types of biomedical instruments. Owing to the abundance of data they generate, time-stretch instruments are a natural fit to deep learning classification. Previously we had shown that high-throughput label-free cell classification with high accuracy can be achieved through a combination of time-stretch microscopy, image processing and feature extraction, followed by deep learning for finding cancer cells in the blood. Such a technology holds promise for early detection of primary cancer or metastasis. Here we describe a new deep learning pipeline, which entirely avoids the slow and computationally costly signal processing and feature extraction steps by a convolutional neural network that directly operates on the measured signals. The improvement in computational efficiency enables low-latency inference and makes this pipeline suitable for cell sorting via deep learning. Our neural network takes less than a few milliseconds to classify the cells, fast enough to provide a decision to a cell sorter for real-time separation of individual target cells. We demonstrate the applicability of our new method in the classification of OT-II white blood cells and SW-480 epithelial cancer cells with more than 95% accuracy in a label-free fashion
TableNet: a multiplier-less implementation of neural networks for inferencing
We consider the use of look-up tables (LUT) to simplify the hardware
implementation of a deep learning network for inferencing after weights have
been successfully trained. The use of LUT replaces the matrix multiply and add
operations with a small number of LUTs and addition operations resulting in a
completely multiplier-less implementation. We compare the different tradeoffs
of this approach in terms of accuracy versus LUT size and the number of
operations and show that similar performance can be obtained with a comparable
memory footprint as a full precision deep neural network, but without the use
of any multipliers. We illustrate this with several architectures such as MLP
and CNN.Comment: 7 pages, 8 figure
Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
This paper describes a novel approach of packing sparse convolutional neural
networks for their efficient systolic array implementations. By combining
subsets of columns in the original filter matrix associated with a
convolutional layer, we increase the utilization efficiency of the systolic
array substantially (e.g., ~4x) due to the increased density of nonzeros in the
resulting packed filter matrix. In combining columns, for each row, all filter
weights but one with the largest magnitude are pruned. We retrain the remaining
weights to preserve high accuracy. We demonstrate that in mitigating data
privacy concerns the retraining can be accomplished with only fractions of the
original dataset (e.g., 10\% for CIFAR-10). We study the effectiveness of this
joint optimization for both high utilization and classification accuracy with
ASIC and FPGA designs based on efficient bit-serial implementations of
multiplier-accumulators. We present analysis and empirical evidence on the
superior performance of our column combining approach against prior arts under
metrics such as energy efficiency (3x) and inference latency (12x).Comment: To appear in ASPLOS 201
Towards Accurate Binary Convolutional Neural Network
We introduce a novel scheme to train binary convolutional neural networks
(CNNs) -- CNNs with weights and activations constrained to {-1,+1} at run-time.
It has been known that using binary weights and activations drastically reduce
memory size and accesses, and can replace arithmetic operations with more
efficient bitwise operations, leading to much faster test-time inference and
lower power consumption. However, previous works on binarizing CNNs usually
result in severe prediction accuracy degradation. In this paper, we address
this issue with two major innovations: (1) approximating full-precision weights
with the linear combination of multiple binary weight bases; (2) employing
multiple binary activations to alleviate information loss. The implementation
of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much
closer performance to its full-precision counterpart, and even reach the
comparable prediction accuracy on ImageNet and forest trail datasets, given
adequate binary weight bases and activations
- …