Search CORE

2,546 research outputs found

Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing

Author: Alaghi Armin
Ceze Luis
Hayes John P.
Lee Vincent T.
Sathe Visvesh
Publication venue
Publication date: 07/06/2017
Field of study

Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near- sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic- binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8x energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs.Comment: 6 pages, 3 figures, Design, Automata and Test in Europe (DATE) 201

arXiv.org e-Print Archive

Deep Learning with Limited Numerical Precision

Author: Agrawal Ankur
Gopalakrishnan Kailash
Gupta Suyog
Narayanan Pritish
Publication venue
Publication date: 09/02/2015
Field of study

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network's behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding.Comment: 10 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

Progressive Stochastic Binarization of Deep Networks

Author: Hartmann David
Wand Michael
Publication venue
Publication date: 03/04/2019
Field of study

A plethora of recent research has focused on improving the memory footprint and inference speed of deep networks by reducing the complexity of (i) numerical representations (for example, by deterministic or stochastic quantization) and (ii) arithmetic operations (for example, by binarization of weights). We propose a stochastic binarization scheme for deep networks that allows for efficient inference on hardware by restricting itself to additions of small integers and fixed shifts. Unlike previous approaches, the underlying randomized approximation is progressive, thus permitting an adaptive control of the accuracy of each operation at run-time. In a low-precision setting, we match the accuracy of previous binarized approaches. Our representation is unbiased - it approaches continuous computation with increasing sample size. In a high-precision regime, the computational costs are competitive with previous quantization schemes. Progressive stochastic binarization also permits localized, dynamic accuracy control within a single network, thereby providing a new tool for adaptively focusing computational attention. We evaluate our method on networks of various architectures, already pretrained on ImageNet. With representational costs comparable to previous schemes, we obtain accuracies close to the original floating point implementation. This includes pruned networks, except the known special case of certain types of separated convolutions. By focusing computational attention using progressive sampling, we reduce inference costs on ImageNet further by a factor of up to 33% (before network pruning)

arXiv.org e-Print Archive

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

Author: Cheng Jian
Hu Qinghao
Li Gang
Lu Hanqing
Wang Peisong
Publication venue
Publication date: 11/02/2018
Field of study

Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks also continue to increase. This will pose a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed in recent years. In this paper, we provide a comprehensive survey of recent advances in network acceleration, compression and accelerator design from both algorithm and hardware points of view. Specifically, we provide a thorough analysis of each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerators. Finally, we will introduce and discuss a few possible future directions.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

INsight: A Neuromorphic Computing System for Evaluation of Large Neural Networks

Author: Chung Jaeyong
Kang Yongshin
Shin Taehwan
Publication venue
Publication date: 05/08/2015
Field of study

Deep neural networks have been demonstrated impressive results in various cognitive tasks such as object detection and image classification. In order to execute large networks, Von Neumann computers store the large number of weight parameters in external memories, and processing elements are timed-shared, which leads to power-hungry I/O operations and processing bottlenecks. This paper describes a neuromorphic computing system that is designed from the ground up for the energy-efficient evaluation of large-scale neural networks. The computing system consists of a non-conventional compiler, a neuromorphic architecture, and a space-efficient microarchitecture that leverages existing integrated circuit design methodologies. The compiler factorizes a trained, feedforward network into a sparsely connected network, compresses the weights linearly, and generates a time delay neural network reducing the number of connections. The connections and units in the simplified network are mapped to silicon synapses and neurons. We demonstrate an implementation of the neuromorphic computing system based on a field-programmable gate array that performs the MNIST hand-written digit classification with 97.64% accuracy

arXiv.org e-Print Archive

Developing a Bubble Chamber Particle Discriminator Using Semi-Supervised Learning

Author: Amole C.
Ardid M.
Arnquist I. J.
Asner D. M.
Baxter D.
Behnke E.
Bressler M.
Broerman B.
Cao G.
Chen C. J.
Chowdhury U.
Clark K.
Collar J. I.
Cooper P. S.
Coutu C. B.
Cowles C.
Crisler M.
Crowder G.
Cruz-Venegas N. A.
Dahl C. E.
Das M.
Fallows S.
Farine J.
Felis I.
Filgas R.
Girard F.
Giroux G.
Hall J.
Hardy C.
Harris O.
Hillier T.
Hoppe E. W.
Jackson C. M.
Jin M.
Klopfenstein L.
Krauss C. B.
Laurin M.
Lawson I.
Leblanc A.
Levine I.
Licciardi C.
Lippincott W. H.
Loer B.
Mamedov F.
Matusch B.
Mitra P.
Moore C.
Nania T.
Neilson R.
Noble A. J.
Oedekerk P.
Ortega A.
Piro M. -C.
Plante A.
Podviyanuk R.
Priya S.
Robinson A. E.
Sahoo S.
Scallon O.
Seth S.
Sonnenschein A.
Starinski N.
Sullivan T.
Tardif F.
Vázquez-Jáuregui E.
Walkowski N.
Weima E.
Wichoski U.
Wierman K.
Yan Y.
Zacek V.
Zhang J.
Štekl I.
Publication venue
Publication date: 27/11/2018
Field of study

The identification of non-signal events is a major hurdle to overcome for bubble chamber dark matter experiments such as PICO-60. The current practice of manually developing a discriminator function to eliminate background events is difficult when available calibration data is frequently impure and present only in small quantities. In this study, several different discriminator input/preprocessing formats and neural network architectures are applied to the task. First, they are optimized in a supervised learning context. Next, two novel semi-supervised learning algorithms are trained, and found to replicate the Acoustic Parameter (AP) discriminator previously used in PICO-60 with a mean of 97% accuracy.Comment: 27 pages, 10 figure

arXiv.org e-Print Archive

Recommended from our members

Deep Cytometry: Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry.

Author: Chen Claire Lifan
Jalali Bahram
Li Yueqin
Mahjoubfar Ata
Niazi Kayvan Reza
Pei Li
Publication venue: eScholarship, University of California
Publication date: 31/07/2019
Field of study

Deep learning has achieved spectacular performance in image and speech recognition and synthesis. It outperforms other machine learning algorithms in problems where large amounts of data are available. In the area of measurement technology, instruments based on the photonic time stretch have established record real-time measurement throughput in spectroscopy, optical coherence tomography, and imaging flow cytometry. These extreme-throughput instruments generate approximately 1 Tbit/s of continuous measurement data and have led to the discovery of rare phenomena in nonlinear and complex systems as well as new types of biomedical instruments. Owing to the abundance of data they generate, time-stretch instruments are a natural fit to deep learning classification. Previously we had shown that high-throughput label-free cell classification with high accuracy can be achieved through a combination of time-stretch microscopy, image processing and feature extraction, followed by deep learning for finding cancer cells in the blood. Such a technology holds promise for early detection of primary cancer or metastasis. Here we describe a new deep learning pipeline, which entirely avoids the slow and computationally costly signal processing and feature extraction steps by a convolutional neural network that directly operates on the measured signals. The improvement in computational efficiency enables low-latency inference and makes this pipeline suitable for cell sorting via deep learning. Our neural network takes less than a few milliseconds to classify the cells, fast enough to provide a decision to a cell sorter for real-time separation of individual target cells. We demonstrate the applicability of our new method in the classification of OT-II white blood cells and SW-480 epithelial cancer cells with more than 95% accuracy in a label-free fashion

eScholarship - University of California

TableNet: a multiplier-less implementation of neural networks for inferencing

Author: Wu Chai Wah
Publication venue
Publication date: 05/09/2019
Field of study

We consider the use of look-up tables (LUT) to simplify the hardware implementation of a deep learning network for inferencing after weights have been successfully trained. The use of LUT replaces the matrix multiply and add operations with a small number of LUTs and addition operations resulting in a completely multiplier-less implementation. We compare the different tradeoffs of this approach in terms of accuracy versus LUT size and the number of operations and show that similar performance can be obtained with a comparable memory footprint as a full precision deep neural network, but without the use of any multipliers. We illustrate this with several architectures such as MLP and CNN.Comment: 7 pages, 8 figure

arXiv.org e-Print Archive

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Author: Kung H. T.
McDanel Bradley
Zhang Sai Qian
Publication venue
Publication date: 07/11/2018
Field of study

This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter matrix. In combining columns, for each row, all filter weights but one with the largest magnitude are pruned. We retrain the remaining weights to preserve high accuracy. We demonstrate that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset (e.g., 10\% for CIFAR-10). We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators. We present analysis and empirical evidence on the superior performance of our column combining approach against prior arts under metrics such as energy efficiency (3x) and inference latency (12x).Comment: To appear in ASPLOS 201

arXiv.org e-Print Archive

Towards Accurate Binary Convolutional Neural Network

Author: Lin Xiaofan
Pan Wei
Zhao Cong
Publication venue
Publication date: 30/11/2017
Field of study

We introduce a novel scheme to train binary convolutional neural networks (CNNs) -- CNNs with weights and activations constrained to {-1,+1} at run-time. It has been known that using binary weights and activations drastically reduce memory size and accesses, and can replace arithmetic operations with more efficient bitwise operations, leading to much faster test-time inference and lower power consumption. However, previous works on binarizing CNNs usually result in severe prediction accuracy degradation. In this paper, we address this issue with two major innovations: (1) approximating full-precision weights with the linear combination of multiple binary weight bases; (2) employing multiple binary activations to alleviate information loss. The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations

arXiv.org e-Print Archive