13 research outputs found
Significance Driven Hybrid 8T-6T SRAM for Energy-Efficient Synaptic Storage in Artificial Neural Networks
Multilayered artificial neural networks (ANN) have found widespread utility
in classification and recognition applications. The scale and complexity of
such networks together with the inadequacies of general purpose computing
platforms have led to a significant interest in the development of efficient
hardware implementations. In this work, we focus on designing energy efficient
on-chip storage for the synaptic weights. In order to minimize the power
consumption of typical digital CMOS implementations of such large-scale
networks, the digital neurons could be operated reliably at scaled voltages by
reducing the clock frequency. On the contrary, the on-chip synaptic storage
designed using a conventional 6T SRAM is susceptible to bitcell failures at
reduced voltages. However, the intrinsic error resiliency of NNs to small
synaptic weight perturbations enables us to scale the operating voltage of the
6TSRAM. Our analysis on a widely used digit recognition dataset indicates that
the voltage can be scaled by 200mV from the nominal operating voltage (950mV)
for practically no loss (less than 0.5%) in accuracy (22nm predictive
technology). Scaling beyond that causes substantial performance degradation
owing to increased probability of failures in the MSBs of the synaptic weights.
We, therefore propose a significance driven hybrid 8T-6T SRAM, wherein the
sensitive MSBs are stored in 8T bitcells that are robust at scaled voltages due
to decoupled read and write paths. In an effort to further minimize the area
penalty, we present a synaptic-sensitivity driven hybrid memory architecture
consisting of multiple 8T-6T SRAM banks. Our circuit to system-level simulation
framework shows that the proposed synaptic-sensitivity driven architecture
provides a 30.91% reduction in the memory access power with a 10.41% area
overhead, for less than 1% loss in the classification accuracy.Comment: Accepted in Design, Automation and Test in Europe 2016 conference
(DATE-2016
Embracing Visual Experience and Data Knowledge: Efficient Embedded Memory Design for Big Videos and Deep Learning
Energy efficient memory designs are becoming increasingly important, especially for applications related to mobile video technology and machine learning. The growing popularity of smart phones, tablets and other mobile devices has created an exponential demand for video applications in today?s society. When mobile devices display video, the embedded video memory within the device consumes a large amount of the total system power. This issue has created the need to introduce power-quality tradeoff techniques for enabling good quality video output, while simultaneously enabling power consumption reduction. Similarly, power efficiency issues have arisen within the area of machine learning, especially with applications requiring large and fast computation, such as neural networks. Using the accumulated data knowledge from various machine learning applications, there is now the potential to create more intelligent memory with the capability for optimized trade-off between energy efficiency, area overhead, and classification accuracy on the learning systems. In this dissertation, a review of recently completed works involving video and machine learning memories will be covered. Based on the collected results from a variety of different methods, including: subjective trials, discovered data-mining patterns, software simulations, and hardware power and performance tests, the presented memories provide novel ways to significantly enhance power efficiency for future memory devices. An overview of related works, especially the relevant state-of-the-art research, will be referenced for comparison in order to produce memory design methodologies that exhibit optimal quality, low implementation overhead, and maximum power efficiency.National Science FoundationND EPSCoRCenter for Computationally Assisted Science and Technology (CCAST
MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators
As a result of the increasing demand for deep neural network (DNN)-based
services, efforts to develop dedicated hardware accelerators for DNNs are
growing rapidly. However,while accelerators with high performance and
efficiency on convolutional deep neural networks (Conv-DNNs) have been
developed, less progress has been made with regards to fully-connected DNNs
(FC-DNNs). In this paper, we propose MATIC (Memory Adaptive Training with
In-situ Canaries), a methodology that enables aggressive voltage scaling of
accelerator weight memories to improve the energy-efficiency of DNN
accelerators. To enable accurate operation with voltage overscaling, MATIC
combines the characteristics of destructive SRAM reads with the error
resilience of neural networks in a memory-adaptive training process.
Furthermore, PVT-related voltage margins are eliminated using bit-cells from
synaptic weights as in-situ canaries to track runtime environmental variation.
Demonstrated on a low-power DNN accelerator that we fabricate in 65 nm CMOS,
MATIC enables up to 60-80 mV of voltage overscaling (3.3x total energy
reduction versus the nominal voltage), or 18.6x application error reduction.Comment: 6 pages, 12 figures, 3 tables. Published at Design, Automation and
Test in Europe Conference and Exhibition (DATE) 201
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
The edge processing of deep neural networks (DNNs) is becoming increasingly
important due to its ability to extract valuable information directly at the
data source to minimize latency and energy consumption. Frequency-domain model
compression, such as with the Walsh-Hadamard transform (WHT), has been
identified as an efficient alternative. However, the benefits of
frequency-domain processing are often offset by the increased
multiply-accumulate (MAC) operations required. This paper proposes a novel
approach to an energy-efficient acceleration of frequency-domain neural
networks by utilizing analog-domain frequency-based tensor transformations. Our
approach offers unique opportunities to enhance computational efficiency,
resulting in several high-level advantages, including array micro-architecture
with parallelism, ADC/DAC-free analog computations, and increased output
sparsity. Our approach achieves more compact cells by eliminating the need for
trainable parameters in the transformation matrix. Moreover, our novel array
micro-architecture enables adaptive stitching of cells column-wise and
row-wise, thereby facilitating perfect parallelism in computations.
Additionally, our scheme enables ADC/DAC-free computations by training against
highly quantized matrix-vector products, leveraging the parameter-free nature
of matrix multiplications. Another crucial aspect of our design is its ability
to handle signed-bit processing for frequency-based transformations. This leads
to increased output sparsity and reduced digitization workload. On a
1616 crossbars, for 8-bit input processing, the proposed approach
achieves the energy efficiency of 1602 tera operations per second per Watt
(TOPS/W) without early termination strategy and 5311 TOPS/W with early
termination strategy at VDD = 0.8 V
Bit Error Robustness for Energy-Efficient DNN Accelerators
Deep neural network (DNN) accelerators received considerable attention in
past years due to saved energy compared to mainstream hardware. Low-voltage
operation of DNN accelerators allows to further reduce energy consumption
significantly, however, causes bit-level failures in the memory storing the
quantized DNN weights. In this paper, we show that a combination of robust
fixed-point quantization, weight clipping, and random bit error training
(RandBET) improves robustness against random bit errors in (quantized) DNN
weights significantly. This leads to high energy savings from both low-voltage
operation as well as low-precision quantization. Our approach generalizes
across operating voltages and accelerators, as demonstrated on bit errors from
profiled SRAM arrays. We also discuss why weight clipping alone is already a
quite effective way to achieve robustness against bit errors. Moreover, we
specifically discuss the involved trade-offs regarding accuracy, robustness and
precision: Without losing more than 1% in accuracy compared to a normally
trained 8-bit DNN, we can reduce energy consumption on CIFAR-10 by 20%. Higher
energy savings of, e.g., 30%, are possible at the cost of 2.5% accuracy, even
for 4-bit DNNs