32 research outputs found
DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips
To understand and improve DRAM performance, reliability, security and energy
efficiency, prior works study characteristics of commodity DRAM chips.
Unfortunately, state-of-the-art open source infrastructures capable of
conducting such studies are obsolete, poorly supported, or difficult to use, or
their inflexibility limit the types of studies they can conduct.
We propose DRAM Bender, a new FPGA-based infrastructure that enables
experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three
key features at the same time. First, DRAM Bender enables directly interfacing
with a DRAM chip through its low-level interface. This allows users to issue
DRAM commands in arbitrary order and with finer-grained time intervals compared
to other open source infrastructures. Second, DRAM Bender exposes easy-to-use
C++ and Python programming interfaces, allowing users to quickly and easily
develop different types of DRAM experiments. Third, DRAM Bender is easily
extensible. The modular design of DRAM Bender allows extending it to (i)
support existing and emerging DRAM interfaces, and (ii) run on new commercial
or custom FPGA boards with little effort.
To demonstrate that DRAM Bender is a versatile infrastructure, we conduct
three case studies, two of which lead to new observations about the DRAM
RowHammer vulnerability. In particular, we show that data patterns supported by
DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the
data patterns commonly used by prior work. We demonstrate the extensibility of
DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3
support. DRAM Bender is freely and openly available at
https://github.com/CMU-SAFARI/DRAM-Bender.Comment: To appear in TCAD 202
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Improving Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) as Next-Generation Memories: A Circuit Perspective
In the memory hierarchy of computer systems, the traditional semiconductor memories Static RAM (SRAM) and Dynamic RAM (DRAM) have already served for several decades as cache and main memory. With technology scaling, they face increasingly intractable challenges like power, density, reliability and scalability. As a result, they become less appealing in the multi/many-core era with ever increasing size and memory-intensity of working sets.
Recently, there is an increasing interest in using emerging non-volatile memory technologies in replacement of SRAM and DRAM, due to their advantages like non-volatility, high device density, near-zero cell leakage and resilience to soft errors. Among several new memory technologies, Phase Change Memory (PCM) and Spin-Torque-Transfer Magnetic-RAM (STT-MRAM) are most promising candidates in building main memory and cache, respectively. However, both of them possess unique limitations that preventing them from being effectively adopted.
In this dissertation, I present my circuit design work on tackling the limitations of PCM and STT-MRAM. At bit level, both PCM and STT-MRAM suffer from excessive write energy, and PCM has very limited write endurance. For PCM, I implement Differential Write to remove large number of unnecessary bit-writes that do not alter the stored data. It is then extended to STT-MRAM as Early Write Termination, with specific optimizations to eliminate the overhead of pre-write read. At array level, PCM enjoys high density but could not provide competitive throughput due to its long write latency and limited number of read/write circuits. I propose a Pseudo-Multi-Port Bank design to exploit intra-bank parallelism by recycling and reusing shared peripheral circuits between accesses in a time-multiplexed manner. On the other hand, although STT-MRAM features satisfactory throughput, its conventional array architecture is constrained on density and scalability by the pitch of the per-column bitline pair. I propose a Common-Source-Line Array architecture which uses a shared source-line along the row, essentially leaving only one bitline per column.
For these techniques, I provide circuit level analyses as well as architecture/system level and/or process/device level discussions. In addition, relevant background and work are thoroughly surveyed and potential future research topics are discussed, offering insights and prospects of these next-generation memories
Low energy digital circuit design using sub-threshold operation
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.Includes bibliographical references (p. 189-202).Scaling of process technologies to deep sub-micron dimensions has made power management a significant concern for circuit designers. For emerging low power applications such as distributed micro-sensor networks or medical applications, low energy operation is the primary concern instead of speed, with the eventual goal of harvesting energy from the environment. Sub-threshold operation offers a promising solution for ultra-low-energy applications because it often achieves the minimum energy per operation. While initial explorations into sub-threshold circuits demonstrate its promise, sub-threshold circuit design remains in its infancy. This thesis makes several contributions that make sub-threshold design more accessible to circuit designers. First, a model for energy consumption in sub-threshold provides an analytical solution for the optimum VDD to minimize energy. Fitting this model to a generic circuit allows easy estimation of the impact of processing and environmental parameters on the minimum energy point. Second, analysis of device sizing for sub-threshold circuits shows the trade-offs between sizing for minimum energy and for minimum voltage operation.(cont.) A programmable FIR filter test chip fabricated in 0.18pum bulk CMOS provides measurements to confirm the model and the sizing analysis. Third, a low-overhead method for integrating sub-threshold operation with high performance applications extends dynamic voltage scaling across orders of magnitude of frequency and provides energy scalability down to the minimum energy point. A 90nm bulk CMOS test chip confirms the range of operation for ultra-dynamic voltage scaling. Finally, sub-threshold operation is extended to memories. Analysis of traditional SRAM bitcells and architectures leads to development of a new bitcell for robust sub-threshold SRAM operation. The sub-threshold SRAM is analyzed experimentally in a 65nm bulk CMOS test chip.by Benton H. Calhoun.Ph.D
Nanopower CMOS transponders for UHF and microwave RFID systems
At first, we present an analysis and a discussion of the design options and tradeoffs for a passive microwave transponder. We derive a set of criteria for the optimization of the voltage multiplier, the power matching network and the backscatter modulator in order to optimize the operating range. In order to match the strictly power requirements, the communication protocol between transponder and reader has been chosen in a convenient way, in order to make the architecture of the passive transponder very simple and then ultra-low-power. From the circuital point of view, the digital section has been implemented in subthreshold CMOS logic with very low supply voltage and clock frequency. We present different solutions to supply power to the transponder, in order to keep the power consumption in the deep sub-ยตW regime and to drastically reduce the huge sensitivity of the subthreshold logic to temperature and process variations. Moreover, a low-voltage and low-power EEPROM in a standard CMOS process has been implemented. Finally, we have presented the implementation of the entire passive transponder, operating in the UHF or microwave frequency range
๊ทผ์ฌ ์ปดํจํ ์ ์ด์ฉํ ํ๋ก ๋ ธํ ๋ณด์๊ณผ ์๋์ง ํจ์จ์ ์ธ ์ ๊ฒฝ๋ง ๊ตฌํ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2020. 8. ์ดํ์ฌ.Approximate computing reduces the cost (energy and/or latency) of computations by relaxing the correctness (i.e., precision) of computations up to the level, which is dependent on types of applications. Moreover, it can be realized in various hierarchies of computing system design from circuit level to application level.
This dissertation presents the methodologies applying approximate computing across such hierarchies; compensating aging-induced delay in logic circuit by dynamic computation approximation (Chapter 1), designing energy-efficient neural network by combining low-power and low-latency approximate neuron models (Chapter 2), and co-designing in-memory gradient descent module with neural processing unit so as to address a memory bottleneck incurred by memory I/O for high-precision data (Chapter 3).
The first chapter of this dissertation presents a novel design methodology to turn the timing violation caused by aging into computation approximation error without the reliability guardband or increasing the supply voltage. It can be realized by accurately monitoring the critical path delay at run-time. The proposal is evaluated at two levels: RTL component level and system level. The experimental results at the RTL component level show a significant improvement in terms of (normalized) mean squared error caused by the timing violation and, at the system level, show that the proposed approach successfully transforms the aging-induced timing violation errors into much less harmful computation approximation errors, therefore it recovers image quality up to perceptually acceptable levels. It reduces the dynamic and static power consumption by 21.45% and 10.78%, respectively, with 0.8% area overhead compared to the conventional approach.
The second chapter of this dissertation presents an energy-efficient neural network consisting of alternative neuron models; Stochastic-Computing (SC) and Spiking (SP) neuron models. SC has been adopted in various fields to improve the power efficiency of systems by performing arithmetic computations stochastically, which approximates binary computation in conventional computing systems. Moreover, a recent work showed that deep neural network (DNN) can be implemented in the manner of stochastic computing and it greatly reduces power consumption. However, Stochastic DNN (SC-DNN) suffers from problem of high latency as it processes only a bit per cycle. To address such problem, it is proposed to adopt Spiking DNN (SP-DNN) as an input interface for SC-DNN since SP effectively processes more bits per cycle than SC-DNN. Moreover, this chapter resolves the encoding mismatch problem, between two different neuron models, without hardware cost by compensating the encoding mismatch with synapse weight calibration. A resultant hybrid DNN (SPSC-DNN) consists of SP-DNN as bottom layers and SC-DNN as top layers. Exploiting the reduced latency from SP-DNN and low-power consumption from SC-DNN, the proposed SPSC-DNN achieves improved energy-efficiency with lower error-rate compared to SC-DNN and SP-DNN in same network configuration.
The third chapter of this dissertation proposes GradPim architecture, which accelerates the parameter updates by in-memory processing which is codesigned with 8-bit floating-point training in Neural Processing Unit (NPU) for deep neural networks. By keeping the high precision processing algorithms in memory, such as the parameter update incorporating high-precision weights in its computation, the GradPim architecture can achieve high computational efficiency using 8-bit floating point in NPU and also gain power efficiency by eliminating massive high-precision data transfers between NPU and off-chip memory. A simple extension of DDR4 SDRAM utilizing bank-group parallelism makes the operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance. The experimental results show that the proposed architecture can improve the performance of the parameter update phase in the training by up to 40% and greatly reduce the memory bandwidth requirement while posing only a minimal amount of overhead to the protocol and the DRAM area.๊ทผ์ฌ ์ปดํจํ
์ ์ฐ์ฐ์ ์ ํ๋์ ์์ค์ ์ดํ๋ฆฌ์ผ์ด์
๋ณ ์ ์ ํ ์์ค๊น์ง ํ์ฉํจ์ผ๋ก์จ ์ฐ์ฐ์ ํ์ํ ๋น์ฉ (์๋์ง๋ ์ง์ฐ์๊ฐ)์ ์ค์ธ๋ค. ๊ฒ๋ค๊ฐ, ๊ทผ์ฌ ์ปดํจํ
์ ์ปดํจํ
์์คํ
์ค๊ณ์ ํ๋ก ๊ณ์ธต๋ถํฐ ์ดํ๋ฆฌ์ผ์ด์
๊ณ์ธต๊น์ง ๋ค์ํ ๊ณ์ธต์ ์ ์ฉ๋ ์ ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ๊ทผ์ฌ ์ปดํจํ
๋ฐฉ๋ฒ๋ก ์ ๋ค์ํ ์์คํ
์ค๊ณ์ ๊ณ์ธต์ ์ ์ฉํ์ฌ ์ ๋ ฅ๊ณผ ์๋์ง ์ธก๋ฉด์์ ์ด๋์ ์ป์ ์ ์๋ ๋ฐฉ๋ฒ๋ค์ ์ ์ํ์๋ค. ์ด๋, ์ฐ์ฐ ๊ทผ์ฌํ (computation Approximation)๋ฅผ ํตํด ํ๋ก์ ๋
ธํ๋ก ์ธํด ์ฆ๊ฐ๋ ์ง์ฐ์๊ฐ์ ์ถ๊ฐ์ ์ธ ์ ๋ ฅ์๋ชจ ์์ด ๋ณด์ํ๋ ๋ฐฉ๋ฒ๊ณผ (์ฑํฐ 1), ๊ทผ์ฌ ๋ด๋ฐ๋ชจ๋ธ (approximate neuron model)์ ์ด์ฉํด ์๋์ง ํจ์จ์ด ๋์ ์ ๊ฒฝ๋ง์ ๊ตฌ์ฑํ๋ ๋ฐฉ๋ฒ (์ฑํฐ 2), ๊ทธ๋ฆฌ๊ณ ๋ฉ๋ชจ๋ฆฌ ๋์ญํญ์ผ๋ก ์ธํ ๋ณ๋ชฉํ์ ๋ฌธ์ ๋ฅผ ๋์ ์ ํ๋ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ ์ฐ์ฐ์ ๋ฉ๋ชจ๋ฆฌ ๋ด์์ ์ํํจ์ผ๋ก์จ ์ํ์ํค๋ ๋ฐฉ๋ฒ์ (์ฑํฐ3) ์ ์ํ์๋ค.
์ฒซ ๋ฒ์งธ ์ฑํฐ๋ ํ๋ก์ ๋
ธํ๋ก ์ธํ ์ง์ฐ์๊ฐ์๋ฐ์ (timing violation) ์ค๊ณ๋ง์ง์ด๋ (reliability guardband) ๊ณต๊ธ์ ๋ ฅ์ ์ฆ๊ฐ ์์ด ์ฐ์ฐ์ค์ฐจ (computation approximation error)๋ฅผ ํตํด ๋ณด์ํ๋ ์ค๊ณ๋ฐฉ๋ฒ๋ก (design methodology)๋ฅผ ์ ์ํ์๋ค. ์ด๋ฅผ ์ํด ์ฃผ์๊ฒฝ๋ก์ (critical path) ์ง์ฐ์๊ฐ์ ๋์์๊ฐ์ ์ ํํ๊ฒ ์ธก์ ํ ํ์๊ฐ ์๋ค. ์ฌ๊ธฐ์ ์ ์ํ๋ ๋ฐฉ๋ฒ๋ก ์ RTL component์ system ๋จ๊ณ์์ ํ๊ฐ๋์๋ค. RTL component ๋จ๊ณ์ ์คํ๊ฒฐ๊ณผ๋ฅผ ํตํด ์ ์ํ ๋ฐฉ์์ด ํ์คํ๋ ํ๊ท ์ ๊ณฑ์ค์ฐจ๋ฅผ (normalized mean squared error) ์๋นํ ์ค์์์ ๋ณผ ์ ์๋ค. ๊ทธ๋ฆฌ๊ณ system ๋จ๊ณ์์๋ ์ด๋ฏธ์ง์ฒ๋ฆฌ ์์คํ
์์ ์ด๋ฏธ์ง์ ํ์ง์ด ์ธ์ง์ ์ผ๋ก ์ถฉ๋ถํ ํ๋ณต๋๋ ๊ฒ์ ๋ณด์์ผ๋ก์จ ํ๋ก๋
ธํ๋ก ์ธํด ๋ฐ์ํ ์ง์ฐ์๊ฐ์๋ฐ ์ค์ฐจ๊ฐ ์๋ฌ์ ํฌ๊ธฐ๊ฐ ์์ ์ฐ์ฐ์ค์ฐจ๋ก ๋ณ๊ฒฝ๋๋ ๊ฒ์ ํ์ธ ํ ์ ์์๋ค. ๊ฒฐ๋ก ์ ์ผ๋ก, ์ ์๋ ๋ฐฉ๋ฒ๋ก ์ ๋ฐ๋์ ๋ 0.8%์ ๊ณต๊ฐ์ (area) ๋ ์ฌ์ฉํ๋ ๋น์ฉ์ ์ง๋ถํ๊ณ 21.45%์ ๋์ ์ ๋ ฅ์๋ชจ์ (dynamic power consumption) 10.78%์ ์ ์ ์ ๋ ฅ์๋ชจ์ (static power consumption) ๊ฐ์๋ฅผ ๋ฌ์ฑํ ์ ์์๋ค.
๋ ๋ฒ์งธ ์ฑํฐ๋ ๊ทผ์ฌ ๋ด๋ฐ๋ชจ๋ธ์ ํ์ฉํ๋ ๊ณ -์๋์งํจ์จ์ ์ ๊ฒฝ๋ง์ (neural network) ์ ์ํ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์ ์ฌ์ฉํ ๋ ๊ฐ์ง์ ๊ทผ์ฌ ๋ด๋ฐ๋ชจ๋ธ์ ํ๋ฅ ์ปดํจํ
๊ณผ (stochastic computing) ์คํ์ดํน๋ด๋ฐ (spiking neuron) ์ด๋ก ๋ค์ ๊ธฐ๋ฐ์ผ๋ก ๋ชจ๋ธ๋ง๋์๋ค. ํ๋ฅ ์ปดํจํ
์ ์ฐ์ ์ฐ์ฐ๋ค์ ํ๋ฅ ์ ์ผ๋ก ์ํํจ์ผ๋ก์จ ์ด์ง์ฐ์ฐ์ ๋ฎ์ ์ ๋ ฅ์๋ชจ๋ก ์ํํ๋ค. ์ต๊ทผ์ ํ๋ฅ ์ปดํจํ
๋ด๋ฐ๋ชจ๋ธ์ ์ด์ฉํ์ฌ ์ฌ์ธต ์ ๊ฒฝ๋ง (deep neural network)๋ฅผ ๊ตฌํํ ์ ์๋ค๋ ์ฐ๊ตฌ๊ฐ ์งํ๋์๋ค. ๊ทธ๋ฌ๋, ํ๋ฅ ์ปดํจํ
์ ๋ด๋ฐ๋ชจ๋ธ๋ง์ ํ์ฉํ ๊ฒฝ์ฐ ์ฌ์ธต์ ๊ฒฝ๋ง์ด ๋งค ํด๋ฝ์ฌ์ดํด๋ง๋ค (clock cycle) ํ๋์ ๋นํธ๋ง์ (bit) ์ฒ๋ฆฌํ๋ฏ๋ก, ์ง์ฐ์๊ฐ ์ธก๋ฉด์์ ๋งค์ฐ ๋์ ์ ๋ฐ์ ์๋ ๋ฌธ์ ๊ฐ ์๋ค. ๋ฐ๋ผ์ ๋ณธ ๋
ผ๋ฌธ์์๋ ์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ์ฌ ์คํ์ดํน ๋ด๋ฐ๋ชจ๋ธ๋ก ๊ตฌ์ฑ๋ ์คํ์ดํน ์ฌ์ธต์ ๊ฒฝ๋ง์ ํ๋ฅ ์ปดํจํ
์ ํ์ฉํ ์ฌ์ธต์ ๊ฒฝ๋ง ๊ตฌ์กฐ์ ๊ฒฐํฉํ์๋ค. ์คํ์ดํน ๋ด๋ฐ๋ชจ๋ธ์ ๊ฒฝ์ฐ ๋งค ํด๋ฝ์ฌ์ดํด๋ง๋ค ์ฌ๋ฌ ๋นํธ๋ฅผ ์ฒ๋ฆฌํ ์ ์์ผ๋ฏ๋ก ์ฌ์ธต์ ๊ฒฝ๋ง์ ์
๋ ฅ ์ธํฐํ์ด์ค๋ก ์ฌ์ฉ๋ ๊ฒฝ์ฐ ์ง์ฐ์๊ฐ์ ์ค์ผ ์ ์๋ค. ํ์ง๋ง, ํ๋ฅ ์ปดํจํ
๋ด๋ฐ๋ชจ๋ธ๊ณผ ์คํ์ดํน ๋ด๋ฐ๋ชจ๋ธ์ ๊ฒฝ์ฐ ๋ถํธํ (encoding) ๋ฐฉ์์ด ๋ค๋ฅธ ๋ฌธ์ ๊ฐ ์๋ค. ๋ฐ๋ผ์ ๋ณธ ๋
ผ๋ฌธ์์๋ ํด๋น ๋ถํธํ ๋ถ์ผ์น ๋ฌธ์ ๋ฅผ ๋ชจ๋ธ์ ํ๋ผ๋ฏธํฐ๋ฅผ ํ์ตํ ๋ ๊ณ ๋ คํจ์ผ๋ก์จ, ํ๋ผ๋ฏธํฐ๋ค์ ๊ฐ์ด ๋ถํธํ ๋ถ์ผ์น๋ฅผ ๊ณ ๋ คํ์ฌ ์กฐ์ (calibration) ๋ ์ ์๋๋ก ํ์ฌ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์๋ค. ์ด๋ฌํ ๋ถ์์ ๊ฒฐ๊ณผ๋ก, ์ ์ชฝ์๋ ์คํ์ดํน ์ฌ์ธต์ ๊ฒฝ๋ง์ ๋ฐฐ์นํ๊ณ ๋ท ์ชฝ์ ๋ ํ๋ฅ ์ปดํจํ
์ฌ์ธต์ ๊ฒฝ๋ง์ ๋ฐฐ์นํ๋ ํผ์ฑ์ ๊ฒฝ๋ง์ ์ ์ํ์๋ค. ํผ์ฑ์ ๊ฒฝ๋ง์ ์คํ์ดํน ์ฌ์ธต์ ๊ฒฝ๋ง์ ํตํด ๋งค ํด๋ฝ์ฌ์ดํด๋ง๋ค ์ฒ๋ฆฌ๋๋ ๋นํธ ์์ ์ฆ๊ฐ๋ก ์ธํ ์ง์ฐ์๊ฐ ๊ฐ์ ํจ๊ณผ์ ํ๋ฅ ์ปดํจํ
์ฌ์ธต์ ๊ฒฝ๋ง์ ์ ์ ๋ ฅ ์๋ชจ ํน์ฑ์ ๋ชจ๋ ํ์ฉํจ์ผ๋ก์จ ๊ฐ ์ฌ์ธต์ ๊ฒฝ๋ง์ ๋ฐ๋ก ์ฌ์ฉํ๋ ๊ฒฝ์ฐ ๋๋น ์ฐ์ํ ์๋์ง ํจ์จ์ฑ์ ๋น์ทํ๊ฑฐ๋ ๋ ๋์ ์ ํ๋ ๊ฒฐ๊ณผ๋ฅผ ๋ด๋ฉด์ ๋ฌ์ฑํ๋ค.
์ธ ๋ฒ์งธ ์ฑํฐ๋ ์ฌ์ธต์ ๊ฒฝ๋ง์ 8๋นํธ ๋ถ๋์์ซ์ ์ฐ์ฐ์ผ๋ก ํ์ตํ๋ ์ ๊ฒฝ๋ง์ฒ๋ฆฌ์ ๋์ (neural processing unit) ํ๋ผ๋ฏธํฐ ๊ฐฑ์ ์ (parameter update) ๋ฉ๋ชจ๋ฆฌ-๋ด-์ฐ์ฐ์ผ๋ก (in-memory processing) ๊ฐ์ํ๋ GradPIM ์ํคํ
์ณ๋ฅผ ์ ์ํ์๋ค. GradPIM์ 8๋นํธ์ ๋ฎ์ ์ ํ๋ ์ฐ์ฐ์ ์ ๊ฒฝ๋ง์ฒ๋ฆฌ์ ๋์ ๋จ๊ธฐ๊ณ , ๋์ ์ ํ๋๋ฅผ ๊ฐ์ง๋ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ๋ ์ฐ์ฐ์ (ํ๋ผ๋ฏธํฐ ๊ฐฑ์ ) ๋ฉ๋ชจ๋ฆฌ ๋ด๋ถ์ ๋ ์ผ๋ก์จ ์ ๊ฒฝ๋ง์ฒ๋ฆฌ์ ๋๊ณผ ๋ฉ๋ชจ๋ฆฌ๊ฐ์ ๋ฐ์ดํฐํต์ ์ ์์ ์ค์ฌ, ๋์ ์ฐ์ฐํจ์จ๊ณผ ์ ๋ ฅํจ์จ์ ๋ฌ์ฑํ์๋ค. ๋ํ, GradPIM์ bank-group ์์ค์ ๋ณ๋ ฌํ๋ฅผ ์ด๋ฃจ์ด ๋ด ๋์ ๋ด๋ถ ๋์ญํญ์ ํ์ฉํจ์ผ๋ก์จ ๋ฉ๋ชจ๋ฆฌ ๋์ญํญ์ ํฌ๊ฒ ํ์ฅ์ํฌ ์ ์๊ฒ ๋์๋ค. ๋ํ ์ด๋ฌํ ๋ฉ๋ชจ๋ฆฌ ๊ตฌ์กฐ์ ๋ณ๊ฒฝ์ด ์ต์ํ๋์๊ธฐ ๋๋ฌธ์ ์ถ๊ฐ์ ์ธ ํ๋์จ์ด ๋น์ฉ๋ ์ต์ํ๋์๋ค. ์คํ ๊ฒฐ๊ณผ๋ฅผ ํตํด GradPIM์ด ์ต์ํ์ DRAM ํ๋กํ ์ฝ ๋ณํ์ DRAM์นฉ ๋ด์ ๊ณต๊ฐ์ฌ์ฉ์ ํตํด ์ฌ์ธต์ ๊ฒฝ๋ง ํ์ต๊ณผ์ ์ค ํ๋ผ๋ฏธํฐ ๊ฐฑ์ ์ ํ์ํ ์๊ฐ์ 40%๋งํผ ํฅ์์์ผฐ์์ ๋ณด์๋ค.Chapter I: Dynamic Computation Approximation for Aging Compensation 1
1.1 Introduction 1
1.1.1 Chip Reliability 1
1.1.2 Reliability Guardband 2
1.1.3 Approximate Computing in Logic Circuits 2
1.1.4 Computation approximation for Aging Compensation 3
1.1.5 Motivational Case Study 4
1.2 Previous Work 5
1.2.1 Aging-induced Delay 5
1.2.2 Delay-Configurable Circuits 6
1.3 Proposed System 8
1.3.1 Overview of the Proposed System 8
1.3.2 Proposed Adder 9
1.3.3 Proposed Multiplier 11
1.3.4 Proposed Monitoring Circuit 16
1.3.5 Aging Compensation Scheme 19
1.4 Design Methodology 20
1.5 Evaluation 24
1.5.1 Experimental setup 24
1.5.2 RTL component level Adder/Multiplier 27
1.5.3 RTL component level Monitoring circuit 30
1.5.4 System level 31
1.6 Summary 38
Chapter II: Energy-Efficient Neural Network by Combining Approximate Neuron Models 40
2.1 Introduction 40
2.1.1 Deep Neural Network (DNN) 40
2.1.2 Low-power designs for DNN 41
2.1.3 Stochastic-Computing Deep Neural Network 41
2.1.4 Spiking Deep Neural Network 43
2.2 Hybrid of Stochastic and Spiking DNNs 44
2.2.1 Stochastic-Computing vs Spiking Deep Neural Network 44
2.2.2 Combining Spiking Layers and Stochastic Layers 46
2.2.3 Encoding Mismatch 47
2.3 Evaluation 49
2.3.1 Latency and Test Error 49
2.3.2 Energy Efficiency 51
2.4 Summary 54
Chapter III: GradPIM: In-memory Gradient Descent in Mixed-Precision DNN Training 55
3.1 Introduction 55
3.1.1 Neural Processing Unit 55
3.1.2 Mixed-precision Training 56
3.1.3 Mixed-precision Training with In-memory Gradient Descent 57
3.1.4 DNN Parameter Update Algorithms 59
3.1.5 Modern DRAM Architecture 61
3.1.6 Motivation 63
3.2 Previous Work 65
3.2.1 Processing-In-Memory 65
3.2.2 Co-design Neural Processing Unit and Processing-In-Memory 66
3.2.3 Low-precision Computation in NPU 67
3.3 GradPIM 68
3.3.1 GradPIM Architecture 68
3.3.2 GradPIM Operations 69
3.3.3 Timing Considerations 70
3.3.4 Update Phase Procedure 73
3.3.5 Commanding GradPIM 75
3.4 NPU Co-design with GradPIM 76
3.4.1 NPU Architecture 76
3.4.2 Data Placement 79
3.5 Evaluation 82
3.5.1 Evaluation Methodology 82
3.5.2 Experimental Results 83
3.5.3 Sensitivity Analysis 88
3.5.4 Layer Characterizations 90
3.5.5 Distributed Data Parallelism 90
3.6 Summary 92
3.6.1 Discussion 92
Bibliography 113
์์ฝ 114Docto
Integrated Circuits for Programming Flash Memories in Portable Applications
Smart devices such as smart grids, smart home devices, etc. are infrastructure systems that connect the world around us more than before. These devices can communicate with each other and help us manage our environment. This concept is called the Internet of Things (IoT). Not many smart nodes exist that are both low-power and programmable. Floating-gate (FG) transistors could be used to create adaptive sensor nodes by providing programmable bias currents. FG transistors are mostly used in digital applications like Flash memories. However, FG transistors can be used in analog applications, too. Unfortunately, due to the expensive infrastructure required for programming these transistors, they have not been economical to be used in portable applications. In this work, we present low-power approaches to programming FG transistors which make them a good candidate to be employed in future wireless sensor nodes and portable systems. First, we focus on the design of low-power circuits which can be used in programming the FG transistors such as high-voltage charge pumps, low-drop-out regulators, and voltage reference cells. Then, to achieve the goal of reducing the power consumption in programmable sensor nodes and reducing the programming infrastructure, we present a method to program FG transistors using negative voltages. We also present charge-pump structures to generate the necessary negative voltages for programming in this new configuration
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from todayโs points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
NASA Space Engineering Research Center Symposium on VLSI Design
The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers