178 research outputs found
MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators
As a result of the increasing demand for deep neural network (DNN)-based
services, efforts to develop dedicated hardware accelerators for DNNs are
growing rapidly. However,while accelerators with high performance and
efficiency on convolutional deep neural networks (Conv-DNNs) have been
developed, less progress has been made with regards to fully-connected DNNs
(FC-DNNs). In this paper, we propose MATIC (Memory Adaptive Training with
In-situ Canaries), a methodology that enables aggressive voltage scaling of
accelerator weight memories to improve the energy-efficiency of DNN
accelerators. To enable accurate operation with voltage overscaling, MATIC
combines the characteristics of destructive SRAM reads with the error
resilience of neural networks in a memory-adaptive training process.
Furthermore, PVT-related voltage margins are eliminated using bit-cells from
synaptic weights as in-situ canaries to track runtime environmental variation.
Demonstrated on a low-power DNN accelerator that we fabricate in 65 nm CMOS,
MATIC enables up to 60-80 mV of voltage overscaling (3.3x total energy
reduction versus the nominal voltage), or 18.6x application error reduction.Comment: 6 pages, 12 figures, 3 tables. Published at Design, Automation and
Test in Europe Conference and Exhibition (DATE) 201
Design techniques for dense embedded memory in advanced CMOS technologies
University of Minnesota Ph.D. dissertation. February 2012. Major: Electrical Engineering. Advisor: Chris H. Kim. 1 computer file (PDF); viii, 116 pages.On-die cache memory is a key component in advanced processors since it can boost micro-architectural level performance at a moderate power penalty. Demand for denser memories only going to increase as the number of cores in a microprocessor goes up with technology scaling. A commensurate increase in the amount of cache memory is needed to fully utilize the larger and more powerful processing units. 6T SRAMs have been the embedded memory of choice for modern microprocessors due to their logic compatibility, high speed, and refresh-free operation. However, the relatively large cell size and conflicting requirements for read and write make aggressive scaling of 6T SRAMs challenging in sub-22 nm. In this dissertation, circuit techniques and simulation methodologies are presented to demonstrate the potential of alternative options such as gain cell eDRAMs and spin-torque-transfer magnetic RAMs (STT-MRAMs) for high density embedded memories.Three unique test chip designs are presented to enhance the retention time and access speed of gain cell eDRAMs. Proposed bit-cells utilize preferential boostings, beneficial couplings, and aggregated cell leakages for expanding signal window between data `1' and `0'. The design space of power-delay product can be further enhanced with various assist schemes that harness the innate properties of gain cell eDRAMs. Experimental results from the test chips demonstrate that the proposed gain cell eDRAMs achieve overall faster system performances and lower static power dissipations than SRAMs in a generic 65 nm low-power (LP) CMOS process. A magnetic tunnel junction (MTJ) scaling scenario and an efficient HSPICE simulation methodology are proposed for exploring the scalability of STT-MRAMs under variation effects from 65 nm to 8 nm. A constant JC0*RA/VDD scaling method is adopted to achieve optimal read and write performances of STT-MRAMs and thermal stabilities for a 10 year retention are achieved by adjusting free layer thicknesses as well as projecting crystalline anisotropy improvements. Studies based on the proposed methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material properties in order to overcome the poor write performance from 22 nm node
An Energy and Performance Exploration of Network-on-Chip Architectures
In this paper, we explore the designs of a circuit-switched router, a wormhole router, a quality-of-service (QoS) supporting virtual channel router and a speculative virtual channel router and accurately evaluate the energy-performance tradeoffs they offer. Power results from the designs placed and routed in a 90-nm CMOS process show that all the architectures dissipate significant idle state power. The additional energy required to route a packet through the router is then shown to be dominated by the data path. This leads to the key result that, if this trend continues, the use of more elaborate control can be justified and will not be immediately limited by the energy budget. A performance analysis also shows that dynamic resource allocation leads to the lowest network latencies, while static allocation may be used to meet QoS goals. Combining the power and performance figures then allows an energy-latency product to be calculated to judge the efficiency of each of the networks. The speculative virtual channel router was shown to have a very similar efficiency to the wormhole router, while providing a better performance, supporting its use for general purpose designs. Finally, area metrics are also presented to allow a comparison of implementation costs
- âŠ