30 research outputs found
Architectural Techniques to Enable Reliable and Scalable Memory Systems
High capacity and scalable memory systems play a vital role in enabling our
desktops, smartphones, and pervasive technologies like Internet of Things
(IoT). Unfortunately, memory systems are becoming increasingly prone to faults.
This is because we rely on technology scaling to improve memory density, and at
small feature sizes, memory cells tend to break easily. Today, memory
reliability is seen as the key impediment towards using high-density devices,
adopting new technologies, and even building the next Exascale supercomputer.
To ensure even a bare-minimum level of reliability, present-day solutions tend
to have high performance, power and area overheads. Ideally, we would like
memory systems to remain robust, scalable, and implementable while keeping the
overheads to a minimum. This dissertation describes how simple cross-layer
architectural techniques can provide orders of magnitude higher reliability and
enable seamless scalability for memory systems while incurring negligible
overheads.Comment: PhD thesis, Georgia Institute of Technology (May 2017
Recommended from our members
Strong, thorough, and efficient memory protection against existing and emerging DRAM errors
Memory protection is necessary to ensure the correctness of data in the presence of unavoidable faults. As such, large-scale systems typically employ Error Correcting Codes (ECC) to trade off redundant storage and bandwidth for increased reliability. Single Device Data Correction (SDDC) ECC mechanisms are required to meet the reliability demands of servers and large-scale systems by tolerating even severe faults that disable an entire memory chip. In the future, however, stronger memory protection will be required due to increasing levels of system integration, shrinking process technology, and growing transfer rates. The energy-efficiency of memory protection is also important as DRAM already consumes a significant fraction of system energy budget. This dissertation develops a novel set of ECC schemes to provide strong, safe, flexible, and thorough protection against existing and emerging types of DRAM errors. This research also reduces energy consumption of such protection while only marginally impacting performance. First, this dissertation develops Bamboo ECC, a technique with strongerthan-SDDC correction and very safe detection capabilities (โฅ 99.999994% of data errors with any severity are detected). Bamboo ECC changes ECC layout based on frequent DRAM error patterns, and can correct concurrent errors from multiple devices and all but eliminates the risk of silent data corruption. Also, Bamboo ECC provides flexible configurations to enable more adaptive graceful downgrade schemes in which the system continues to operate correctly after even severe chip faults, albeit at a reduced capacity to protect against future faults. These strength, safety, and flexibility advantages translate to a significantly more reliable memory sub-system for future exascale computing. Then, this dissertation focuses on emerging error types from scaling process technology and increasing data bandwidth. As DRAM process technology scales down to below 10nm, DRAM cells are becoming more vulnerable to errors from an imperfect manufacturing process. At the same time, DRAM signal transfers are getting more susceptible to timing and electrical noises as DRAM interfaces keep increasing signal transfer rates and decreasing I/O voltage levels. With individual DRAM chips getting more vulnerable to errors, industry and academia have proposed mechanisms to tolerate these emerging types of errors; yet they are inefficient because they rely on multiple levels of redundancy in the case of cell errors and ad-hoc schemes with suboptimal protection coverage for transmission errors. Active Guardband ECC and All-Inclusive ECC make systematic use of ECC and existing mechanisms to provide thorough end-to-end protection without requiring redundancy beyond what is common today. Finally, this dissertation targets the energy efficiency of memory protection. Frugal ECC combines ECC with fine-grained compression to provide versatile and energy-efficient protection. Frugal ECC compresses main memory at cache-block granularity, using any left over space to store ECC information. Frugal ECC allows more energy-efficient memory configurations while maintaining SDDC protection. Its tailored compression scheme minimizes insufficiently compressed blocks and results in acceptable performance overhead. The strong, thorough, and efficient protection described by this dissertation may allow for more aggressive design of future computing systems with larger integration, finer process technology, higher transfer rates, and better energy efficiencyElectrical and Computer Engineerin
ํ์ ์๋์ฐ ์นด์ดํฐ๋ฅผ ํ์ฉํ ๋ก์ฐ ํด๋จธ๋ง ๋ฐฉ์ง ๋ฐ ์ฃผ๊ธฐ์ต์ฅ์น ์ฑ๋ฅ ํฅ์
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ์ตํฉ๊ณผํ๊ธฐ์ ๋ํ์ ์ตํฉ๊ณผํ๋ถ(์ง๋ฅํ์ตํฉ์์คํ
์ ๊ณต), 2020. 8. ์์ ํธ.Computer systems using DRAM are exposed to row-hammer (RH) attacks, which can flip data in a DRAM row without directly accessing a row but by frequently activating its adjacent ones. There have been a number of proposals to prevent RH, including both probabilistic and deterministic solutions. However, the probabilistic solutions provide protection with no capability to detect attacks and have a non-zero probability for missing protection. Otherwise, counter-based deterministic solutions either incur large area overhead or suffer from noticeable performance drop on adversarial memory access patterns.
To overcome these challenges, we propose a new counter-based RH prevention solution named Time Window Counter (TWiCe) based row refresh, which accurately detects potential RH attacks only using a small number of counters with a minimal performance impact. We first make a key observation that the number of rows that can cause RH is limited by the maximum values of row activation frequency and DRAM cell retention time. We calculate the maximum number of required counter entries per DRAM bank, with which TWiCe prevents RH with a strong deterministic guarantee. TWiCe incurs no performance overhead on normal DRAM operations and less than 0.7% area and energy overheads over contemporary DRAM devices. Our evaluation shows that TWiCe makes no more than 0.006% of additional DRAM row activations for adversarial memory access patterns, including RH attack scenarios.
To reduce the area and energy overhead further, we propose the threshold adjusted rank-level TWiCe. We first introduce pseudo-associative TWiCe (pa-TWiCe) that can search for hundreds of TWiCe table entries energy-efficiently. In addition, by exploiting pa-TWiCe structure, we propose rank-level TWiCe that reduces the number of required entries further by managing the table entries at a rank-level. We also adjust the thresholds of TWiCe to reduce the number of entries without the increase of false-positive detection on general workloads.
Finally, we propose extend TWiCe as a hot-page detector to improve main-memory performance. TWiCe table contains the row addresses that have been frequently activated recently, and they are likely to be activated again due to temporal locality in memory accesses. We show how the hot-page detection in TWiCe can be combined with a DRAM page swap methodology to reduce the DRAM latency for the hot pages. Also, our evaluation shows that low-latency DRAM using TWiCe achieves up to 12.2% IPC improvement over a baseline DDR4 device for a multi-threaded workload.DRAM์ ์ฃผ๊ธฐ์ต์ฅ์น๋ก ์ฌ์ฉํ๋ ์ปดํจํฐ ์์คํ
์ ๋ก์ฐ ํด๋จธ๋ง ๊ณต๊ฒฉ์ ๋
ธ์ถ๋๋ค. ๋ก์ฐ ํด๋จธ๋ง์ ์ธ์ DRAM ๋ก์ฐ๋ฅผ ์์ฃผ activationํจ์ผ๋ก์จ ํน์ DRAM ๋ก์ฐ ๋ฐ์ดํฐ์ ์ง์ ์ ๊ทผํ์ง ์๊ณ ์๋ ๋ฐ์ดํฐ๋ฅผ ๋ค์ง์ ์ ์๋ ํ์์ ๋งํ๋ค. ์ด๋ฌํ ๋ก์ฐ ํด๋จธ๋ง ํ์์ ๋ฐฉ์งํ๊ธฐ ์ํด ์ฌ๋ฌ๊ฐ์ง ํ๋ฅ ์ ์ธ ๋ฐฉ์ง ๊ธฐ๋ฒ๊ณผ ๊ฒฐ์ ๋ก ์ ๋ฐฉ์ง ๊ธฐ๋ฒ๋ค์ด ์ฐ๊ตฌ๋์ด ์๋ค. ๊ทธ๋ฌ๋, ํ๋ฅ ์ ์ธ ๋ฐฉ์ง ๊ธฐ๋ฒ์ ๊ณต๊ฒฉ ์์ฒด๋ฅผ ํ์งํ ์ ์๊ณ , ๋ฐฉ์ง์ ์คํจํ ํ๋ฅ ์ด 0์ด ์๋๋ผ๋ ํ๊ณ๊ฐ ์๋ค. ๋ํ ๊ธฐ์กด์ ์นด์ดํฐ๋ฅผ ํ์ฉํ ๊ฒฐ์ ๋ก ์ ๋ฐฉ์ง ๊ธฐ๋ฒ๋ค์ ํฐ ์นฉ ๋ฉด์ ๋น์ฉ์ ๋ฐ์์ํค๊ฑฐ๋ ํน์ ๋ฉ๋ชจ๋ฆฌ ์ ๊ทผ ํจํด์์ ํ์ ํ ์ฑ๋ฅ ํ๋ฝ์ ์ผ๊ธฐํ๋ค๋ ๋จ์ ์ด ์๋ค.
์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด, ์ฐ๋ฆฌ๋ TWiCe (Time Window Counter based row refresh)๋ผ๋ ์๋ก์ด ์นด์ดํฐ ๊ธฐ๋ฐ ๊ฒฐ์ ๋ก ์ ๋ฐฉ์ง ๊ธฐ๋ฒ์ ์ ์ํ๋ค. TWiCe๋ ์ ์ ์์ ์นด์ดํฐ๋ฅผ ํ์ฉํ์ฌ ๋ก์ฐ ํด๋จธ๋ง ๊ณต๊ฒฉ์ ์ ํํ๊ฒ ํ์งํ๋ฉด์๋ ์ฑ๋ฅ์ ์
์ํฅ์ ์ต์ํํ๋ ๋ฐฉ๋ฒ์ด๋ค. ์ฐ๋ฆฌ๋ DRAM ํ์ด๋ฐ ํ๋ผ๋ฏธํฐ์ ์ํด ๋ก์ฐ activation ๋น๋๊ฐ ์ ํ๋๊ณ DRAM ์
์ด ์ฃผ๊ธฐ์ ์ผ๋ก ๋ฆฌํ๋ ์ ๋๊ธฐ ๋๋ฌธ์ ๋ก์ฐ ํด๋จธ๋ง์ ์ผ๊ธฐํ ์ ์๋ DRAM ๋ก์ฐ์ ์๊ฐ ํ์ ๋๋ค๋ ์ฌ์ค์ ์ฃผ๋ชฉํ์๋ค. ์ด๋ก๋ถํฐ ์ฐ๋ฆฌ๋ TWiCe๊ฐ ํ์คํ ๊ฒฐ์ ๋ก ์ ๋ฐฉ์ง๋ฅผ ๋ณด์ฅํ ๊ฒฝ์ฐ ํ์ํ DRAM ๋ฑ
ํฌ ๋น ํ์ํ ์นด์ดํฐ ์์ ์ต๋๊ฐ์ ๊ตฌํ์๋ค. TWiCe๋ ์ผ๋ฐ์ ์ธ DRAM ๋์ ๊ณผ์ ์์๋ ์ฑ๋ฅ์ ์๋ฌด๋ฐ ์ํฅ์ ๋ฏธ์น์ง ์์ผ๋ฉฐ, ํ๋ DRAM ๋๋ฐ์ด์ค์์ 0.7% ์ดํ์ ์นฉ ๋ฉด์ ์ฆ๊ฐ ๋ฐ ์๋์ง ์ฆ๊ฐ๋ง์ ํ์๋ก ํ๋ค. ์ฐ๋ฆฌ๊ฐ ์งํํ ํ๊ฐ์์ TWiCe๋ ๋ก์ฐ ํด๋จธ๋ง ๊ณต๊ฒฉ ์๋๋ฆฌ์ค๋ฅผ ํฌํจํ ์ฌ๋ฌ๊ฐ์ง ๋ฉ๋ชจ๋ฆฌ ์ ๊ทผ ํจํด์์ 0.006% ์ดํ์ ์ถ๊ฐ์ ์ธ DRAM activation์ ์๊ตฌํ์๋ค.
๋ํ TWiCe์ ์นฉ ๋ฉด์ ๋ฐ ์๋์ง ๋น์ฉ์ ๋์ฑ ์ค์ด๊ธฐ ์ํ์ฌ, ์ฐ๋ฆฌ๋ threshold๊ฐ ์กฐ์ ๋ ๋ญํฌ ๋จ์ TWiCe๋ฅผ ์ ์ํ๋ค. ๋จผ์ , ์๋ฐฑ๊ฐ๊ฐ ๋๋ TWiCe ํ
์ด๋ธ ํญ๋ชฉ ๊ฒ์์ ์๋์ง ํจ์จ์ ์ผ๋ก ์ํํ ์ ์๋ pa-TWiCe (pseudo-associatvie TWiCe)๋ฅผ ์ ์ํ์๋ค. ๊ทธ๋ฆฌ๊ณ , ํ
์ด๋ธ ํญ๋ชฉ์ ๋ญํฌ ๋จ์๋ก ๊ด๋ฆฌํ์ฌ ํ์ํ ํ
์ด๋ธ ํญ๋ชฉ์ ์๋ฅผ ๋์ฑ ์ค์ธ ๋ญํฌ ๋จ์ TWiCe๋ฅผ ์ ์ํ์๋ค. ๋ํ, ์ฐ๋ฆฌ๋ TWiCe์ threshold ๊ฐ์ ์กฐ์ ํจ์ผ๋ก์จ ์ผ๋ฐ์ ์ธ ์ํฌ๋ก๋ ์์์ ๊ฑฐ์ง ์์ฑ(false-positive) ํ์ง๋ฅผ ์ฆ๊ฐ์ํค์ง ์๋ ์ ์์ TWiCe์ ํ
์ด๋ธ ํญ๋ชฉ ์๋ฅผ ๋์ฑ ์ค์๋ค.
๋ง์ง๋ง์ผ๋ก, ์ฐ๋ฆฌ๋ ์ปดํจํฐ ์์คํ
์ ์ฃผ๊ธฐ์ต์ฅ์น ์ฑ๋ฅ ํฅ์์ ์ํด TWiCe๋ฅผ hot-page ๊ฐ์ง๊ธฐ๋ก ์ฌ์ฉํ๋ ๊ฒ์ ์ ์ํ๋ค. ๋ฉ๋ชจ๋ฆฌ ์ ๊ทผ์ ์๊ฐ์ ์ง์ญ์ฑ์ ์ํด ์ต๊ทผ ์์ฃผ activation๋ DRAM ๋ก์ฐ๋ค์ ๋ค์ activation๋ ํ๋ฅ ์ด ๋๊ณ , TWiCe๋ ์ต๊ทผ ์์ฃผ activation๋ DRAM ๋ก์ฐ์ ๋ํ ์ ๋ณด๋ฅผ ๊ฐ์ง๊ณ ์๋ค. ์ด๋ฌํ ์ฌ์ค์ ๊ธฐ๋ฐํ์ฌ, ์ฐ๋ฆฌ๋ hot-page์ ๋ํ DRAM ์ ๊ทผ ์ง์ฐ์๊ฐ์ ์ค์ด๋ DRAM ํ์ด์ง ์ค์(swap) ๊ธฐ๋ฒ๋ค์ TWiCe๋ฅผ ์ ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ธ๋ค. ์ฐ๋ฆฌ๊ฐ ์ํํ ํ๊ฐ์์ TWiCe๋ฅผ ์ฌ์ฉํ ์ ์ง์ฐ์๊ฐ DRAM์ ๋ฉํฐ ์ฐ๋ ๋ฉ ์ํฌ๋ก๋๋ค์์ ๊ธฐ์กด DDR4 ๋๋ฐ์ด์ค ๋๋น IPC๋ฅผ ์ต๋ 12.2% ์ฆ๊ฐ์์ผฐ๋ค.Introduction 1
1.1 Time Window Counter Based Row Refresh to Prevent Row-hammering 2
1.2 Optimizing Time Window Counter 6
1.3 Using Time Window Counters to Improve Main Memory Performance 8
1.4 Outline 10
Background of DRAM and Row-hammering 11
2.1 DRAM Device Organization 12
2.2 Sparing DRAM Rows to Combat Reliability Challenges 13
2.3 Main Memory Subsystem Organization and Operation 14
2.4 Row-hammering (RH) 18
2.5 Previous RH Prevention Solutions 20
2.6 Limitations of the Previous RH Solutions 21
TWiCe: Time Window Counter based RH Prevention 26
3.1 TWiCe: Time Window Counter 26
3.2 Proof of RH Prevention 30
3.3 Counter Table Size 33
3.4 Architecting TWiCe 35
3.4.1 Location of TWiCe Table 35
3.4.2 Augmenting DRAM Interface with a New Adjacent Row Refresh (ARR) Command 37
3.5 Analysis 40
3.6 Evaluation 42
Optimizing TWiCe to Reduce Implementation Cost 47
4.1 Pseudo-associative TWiCe 47
4.2 Rank-level TWiCe 50
4.3 Adjusting Threshold to Reduce Table Size 55
4.4 Analysis 57
4.5 Evaluation 59
Augmenting TWiCe for Hot-page Detection 62
5.1 Necessity of Counters for Detecting Hot Pages 62
5.2 Previous Studies on Migration for Asymmetric Low-latency DRAM 64
5.3 Extending TWiCe for Dynamic Hot-page Detection 67
5.4 Additional Components and Methodology 70
5.5 Analysis and Evaluation 73
5.5.1 Overhead Analysis 73
5.5.2 Evaluation 75
Conclusion 82
6.1 Future work 84
Bibliography 85
๊ตญ๋ฌธ์ด๋ก 94Docto
Fault-tolerant satellite computing with modern semiconductors
Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis.Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality.ย Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.European Space AgencyComputer Systems, Imagery and Medi
A Solder-Defined Computer Architecture for Backdoor and Malware Resistance
This research is about securing control of those devices we most depend on for integrity and confidentiality. An emerging concern is that complex integrated circuits may be subject to exploitable defects or backdoors, and measures for inspection and audit of these chips are neither supported nor scalable. One approach for providing a โsupply chain firewallโ may be to forgo such components, and instead to build central processing units (CPUs) and other complex logic from simple, generic parts. This work investigates the capability and speed ceiling when open-source hardware methodologies are fused with maker-scale assembly tools and visible-scale final inspection. The author has designed, and demonstrated in simulation, a 36-bit CPU and protected memory subsystem that use only synchronous static random access memory (SRAM) and trivial glue logic integrated circuits as components. The design presently lacks preemptive multitasking, ability to load firmware into the SRAMs used as logic elements, and input/output. Strategies are presented for adding these missing subsystems, again using only SRAM and trivial glue logic. A load-store architecture is employed with four clock cycles per instruction. Simulations indicate that a clock speed of at least 64 MHz is probable, corresponding to 16 million instructions per second (16 MIPS), despite the architecture containing no microprocessors, field programmable gate arrays, programmable logic devices, application specific integrated circuits, or other purchased complex logic. The lower speed, larger size, higher power consumption, and higher cost of an โSRAM minicomputer,โ compared to traditional microcontrollers, may be offset by the fully open architectureโhardware and firmwareโalong with more rigorous user control, reliability, transparency, and auditability of the system. SRAM logic is also particularly well suited for building arithmetic logic units, and can implement complex operations such as population count, a hash function for associative arrays, or a pseudorandom number generator with good statistical properties in as few as eight clock cycles per 36-bit word processed. 36-bit unsigned multiplication can be implemented in software in 47 instructions or fewer (188 clock cycles). A general theory is developed for fast SRAM parallel multipliers should they be needed
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from todayโs points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Aerospace Vehicle Design, Spacecraft Section. Volume 1: Project Groups 3-5
Three groups of student engineers in an aerospace vehicle design course present their designs for a vehicle that can be used to resupply the Space Station Freedom and provide an emergency crew return to earth capability. The vehicle's requirements include a lifetime that exceeds six years, low cost, the capability for withstanding pressurization, launch, orbit, and reentry hazards, and reliability. The vehicle's subsystems are analyzed. These subsystems are structures, communication and command data systems, attitude and articulation control, life support and crew systems, power and propulsion, reentry and recovery systems, and mission management, planning, and costing
Mu2e Technical Design Report
The Mu2e experiment at Fermilab will search for charged lepton flavor
violation via the coherent conversion process mu- N --> e- N with a sensitivity
approximately four orders of magnitude better than the current world's best
limits for this process. The experiment's sensitivity offers discovery
potential over a wide array of new physics models and probes mass scales well
beyond the reach of the LHC. We describe herein the preliminary design of the
proposed Mu2e experiment. This document was created in partial fulfillment of
the requirements necessary to obtain DOE CD-2 approval.Comment: compressed file, 888 pages, 621 figures, 126 tables; full resolution
available at http://mu2e.fnal.gov; corrected typo in background summary,
Table 3.
COBE's search for structure in the Big Bang
The launch of Cosmic Background Explorer (COBE) and the definition of Earth Observing System (EOS) are two of the major events at NASA-Goddard. The three experiments contained in COBE (Differential Microwave Radiometer (DMR), Far Infrared Absolute Spectrophotometer (FIRAS), and Diffuse Infrared Background Experiment (DIRBE)) are very important in measuring the big bang. DMR measures the isotropy of the cosmic background (direction of the radiation). FIRAS looks at the spectrum over the whole sky, searching for deviations, and DIRBE operates in the infrared part of the spectrum gathering evidence of the earliest galaxy formation. By special techniques, the radiation coming from the solar system will be distinguished from that of extragalactic origin. Unique graphics will be used to represent the temperature of the emitting material. A cosmic event will be modeled of such importance that it will affect cosmological theory for generations to come. EOS will monitor changes in the Earth's geophysics during a whole solar color cycle
Technology 2001: The Second National Technology Transfer Conference and Exposition, volume 1
Papers from the technical sessions of the Technology 2001 Conference and Exposition are presented. The technical sessions featured discussions of advanced manufacturing, artificial intelligence, biotechnology, computer graphics and simulation, communications, data and information management, electronics, electro-optics, environmental technology, life sciences, materials science, medical advances, robotics, software engineering, and test and measurement