462 research outputs found
Doctor of Philosophy in Computing
dissertatio
Doctor of Philosophy
dissertationThe computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large datacenters (warehouse- scale computers), e.g., cloud computing and other web services, or in many-core high-performance computing (HPC) platforms in scientific labs. It is clear that the primary challenge to scaling such computing systems into the exascale realm is the efficient supply of large amounts of data to hundreds or thousands of compute cores, i.e., building an efficient memory system. Main memory systems are at an inflection point, due to the convergence of several major application and technology trends. Examples include the increasing importance of energy consumption, reduced access stream locality, increasing failure rates, limited pin counts, increasing heterogeneity and complexity, and the diminished importance of cost-per-bit. In light of these trends, the memory system requires a major overhaul. The key to architecting the next generation of memory systems is a combination of the prudent incorporation of novel technologies, and a fundamental rethinking of certain conventional design decisions. In this dissertation, we study every major element of the memory system - the memory chip, the processor-memory channel, the memory access mechanism, and memory reliability, and identify the key bottlenecks to efficiency. Based on this, we propose a novel main memory system with the following innovative features: (i) overfetch-aware re-organized chips, (ii) low-cost silicon photonic memory channels, (iii) largely autonomous memory modules with a packet-based interface to the proces- sor, and (iv) a RAID-based reliability mechanism. Such a system is energy-efficient, high-performance, low-complexity, reliable, and cost-effective, making it ideally suited to meet the requirements of future large-scale computing systems
Parallel convolution processing using an integrated photonic tensor core
With the proliferation of ultra-high-speed mobile networks and
internet-connected devices, along with the rise of artificial intelligence, the
world is generating exponentially increasing amounts of data - data that needs
to be processed in a fast, efficient and smart way. These developments are
pushing the limits of existing computing paradigms, and highly parallelized,
fast and scalable hardware concepts are becoming progressively more important.
Here, we demonstrate a computational specific integrated photonic tensor core -
the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate
per second (TMAC/s) speeds. The photonic core achieves parallelized photonic
in-memory computing using phase-change memory arrays and photonic chip-based
optical frequency combs (soliton microcombs). The computation is reduced to
measuring the optical transmission of reconfigurable and non-resonant passive
components and can operate at a bandwidth exceeding 14 GHz, limited only by the
speed of the modulators and photodetectors. Given recent advances in hybrid
integration of soliton microcombs at microwave line rates, ultra-low loss
silicon nitride waveguides, and high speed on-chip detectors and modulators,
our approach provides a path towards full CMOS wafer-scale integration of the
photonic tensor core. While we focus on convolution processing, more generally
our results indicate the major potential of integrated photonics for parallel,
fast, and efficient computational hardware in demanding AI applications such as
autonomous driving, live video processing, and next generation cloud computing
services
Recommended from our members
Optically-Connected Memory: Architectures and Experimental Characterizations
Growing demands on future data centers and high-performance computing systems are driving the development of processor-memory interconnects with greater performance and flexibility than can be provided by existing electronic interconnects. A redesign of the systems' memory devices and architectures will be essential to enabling high-bandwidth, low-latency, resilient, energy-efficient memory systems that can meet the challenges of exascale systems and beyond. By leveraging an optics-based approach, this thesis presents the design and implementation of an optically-connected memory system that exploits both the bandwidth density and distance-independent energy dissipation of photonic transceivers, in combination with the flexibility and scalability offered by optical networks. By replacing the electronic memory bus with an optical interconnection network, novel memory architectures can be created that are otherwise infeasible. With remote optically-connected memory nodes accessible to processors as if they are local, programming models can be designed to utilize and efficiently share greater amounts of data. Processors that would otherwise be idle, being starved for data while waiting for scarce memory resources, can instead operate at high utilizations, leading to drastic improvements in the overall system performance. This work presents a prototype optically-connected memory module and a custom processor-based optical-network-aware memory controller that communicate transparently and all-optically across an optical interconnection network. The memory modules and controller are optimized to facilitate memory accesses across the optical network using a packet-switched, circuit-switched, or hybrid packet-and-circuit-switched approach. The novel memory controller is experimentally demonstrated to be compatible with existing processor-memory access protocols, with the memory controller acting as the optics-computing interface to render the optical network transparent. Additionally, the flexibility of the optical network enables additional performance benefits including increased memory bandwidth through optical multicasting. This optically-connected architecture can further enable more resilient memory system realizations by expanding on current error dectection and correction memory protocols. The integration of optics with memory technology constitutes a critical step for both optics and computing. The scalability challenges facing main memory systems today, especially concerning bandwidth and power consumption, complement well with the strengths of optical communications-based systems. Additionally, ongoing efforts focused on developing low-cost optical components and subsystems that are suitable for computing environments may benefit from the high-volume memory market. This work therefore takes the first step in merging the areas of optics and memory, developing the necessary architectures and protocols to interface the two technologies, and demonstrating potential benefits while identifying areas for future work. Future computing systems will undoubtedly benefit from this work through the deployment of high-performance, flexible, energy-efficient optically-connected memory architectures
The impact of global communication latency at extreme scales on Krylov methods
Krylov Subspace Methods (KSMs) are popular numerical tools for solving large linear systems of equations. We consider their role in solving sparse systems on future massively parallel distributed memory machines, by estimating future performance of their constituent operations. To this end we construct a model that is simple, but which takes topology and network acceleration into account as they are important considerations. We show that, as the number of nodes of a parallel machine increases to very large numbers, the increasing latency cost of reductions may well become a problematic bottleneck for traditional formulations of these methods. Finally, we discuss how pipelined KSMs can be used to tackle the potential problem, and appropriate pipeline depths
์ฑ๋ฅ๊ณผ ์ฉ๋ ํฅ์์ ์ํ ์ ์ธตํ ๋ฉ๋ชจ๋ฆฌ ๊ตฌ์กฐ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์ตํฉ๊ณผํ๊ธฐ์ ๋ํ์ ์ตํฉ๊ณผํ๋ถ(์ง๋ฅํ์ตํฉ์์คํ
์ ๊ณต), 2019. 2. ์์ ํธ.The advance of DRAM manufacturing technology slows down, whereas the density and performance needs of DRAM continue to increase. This desire has motivated the industry to explore emerging Non-Volatile Memory (e.g., 3D XPoint) and the high-density DRAM (e.g., Managed DRAM Solution). Since such memory technologies increase the density at the cost of longer latency, lower bandwidth, or both, it is essential to use them with fast memory (e.g., conventional DRAM) to which hot pages are transferred at runtime. Nonetheless, we observe that page transfers to fast memory often block memory channels from servicing memory requests from applications for a long period. This in turn significantly increases the high-percentile response time of latency-sensitive applications. In this thesis, we propose a high-density managed DRAM architecture, dubbed 3D-XPath for applications demanding both low latency and high capacity for memory. 3D-XPath DRAM stacks conventional DRAM dies with high-density DRAM dies explored in this thesis and connects these DRAM dies with 3D-XPath. Especially, 3D-XPath allows unused memory channels to service memory requests from applications when primary channels supposed to handle the memory requests are blocked by page transfers at given moments, considerably increasing the high-percentile response time. This can also improve the throughput of applications frequently copying memory blocks between kernel and user memory spaces. Our evaluation shows that 3D-XPath DRAM decreases high-percentile response time of latency-sensitive applications by โผ30% while improving the throughput of an I/O-intensive applications by โผ39%, compared with DRAM without 3D-XPath.
Recent computer systems are evolving toward the integration of more CPU cores into a single socket, which require higher memory bandwidth and capacity. Increasing the number of channels per socket is a common solution to the bandwidth demand and to better utilize these increased channels, data bus width is reduced and burst length is increased. However, this longer burst length brings increased DRAM access latency. On the memory capacity side, process scaling has been the answer for decades, but cell capacitance now limits how small a cell could be. 3D stacked memory solves this problem by stacking dies on top of other dies.
We made a key observation in real multicore machine that multiple memory controllers are always not fully utilized on SPEC CPU 2006 rate benchmark. To bring these idle channels into play, we proposed memory channel sharing architecture to boost peak bandwidth of one memory channel and reduce the burst latency on 3D stacked memory. By channel sharing, the total performance on multi-programmed workloads and multi-threaded workloads improved up to respectively 4.3% and 3.6% and the average read latency reduced up to 8.22% and 10.18%.DRAM ์ ์กฐ ๊ธฐ์ ์ ๋ฐ์ ์ ์๋๊ฐ ๋๋ ค์ง๋ ๋ฐ๋ฉด DRAM์ ๋ฐ๋ ๋ฐ ์ฑ๋ฅ ์๊ตฌ๋ ๊ณ์ ์ฆ๊ฐํ๊ณ ์๋ค. ์ด๋ฌํ ์๊ตฌ๋ก ์ธํด ์๋ก์ด ๋น ํ๋ฐ์ฑ ๋ฉ๋ชจ๋ฆฌ(์: 3D-XPoint) ๋ฐ ๊ณ ๋ฐ๋ DRAM(์: Managed asymmetric latency DRAM Solution)์ด ๋ฑ์ฅํ์๋ค. ์ด๋ฌํ ๊ณ ๋ฐ๋ ๋ฉ๋ชจ๋ฆฌ ๊ธฐ์ ์ ๊ธด ๋ ์ดํด์, ๋ฎ์ ๋์ญํญ ๋๋ ๋ ๊ฐ์ง ๋ชจ๋๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ์์ผ๋ก ๋ฐ๋๋ฅผ ์ฆ๊ฐ์ํค๊ธฐ ๋๋ฌธ์ ์ฑ๋ฅ์ด ์ข์ง ์์, ํซ ํ์ด์ง๋ฅผ ๊ณ ์ ๋ฉ๋ชจ๋ฆฌ(์: ์ผ๋ฐ DRAM)๋ก ์ค์๋๋ ์ ์ฉ๋์ ๊ณ ์ ๋ฉ๋ชจ๋ฆฌ๊ฐ ๋์์ ์ฌ์ฉ๋๋ ๊ฒ์ด ์ผ๋ฐ์ ์ด๋ค. ์ด๋ฌํ ์ค์ ๊ณผ์ ์์ ๋น ๋ฅธ ๋ฉ๋ชจ๋ฆฌ๋ก์ ํ์ด์ง ์ ์ก์ด ์ผ๋ฐ์ ์ธ ์์ฉํ๋ก๊ทธ๋จ์ ๋ฉ๋ชจ๋ฆฌ ์์ฒญ์ ์ค๋ซ๋์ ์ฒ๋ฆฌํ์ง ๋ชปํ๋๋ก ํ๊ธฐ ๋๋ฌธ์, ๋๊ธฐ ์๊ฐ์ ๋ฏผ๊ฐํ ์์ฉ ํ๋ก๊ทธ๋จ์ ๋ฐฑ๋ถ์ ์๋ต ์๊ฐ์ ํฌ๊ฒ ์ฆ๊ฐ์์ผ, ์๋ต ์๊ฐ์ ํ์ค ํธ์ฐจ๋ฅผ ์ฆ๊ฐ์ํจ๋ค. ์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ ์ง์ฐ์๊ฐ ๋ฐ ๊ณ ์ฉ๋ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์๊ตฌํ๋ ์ ํ๋ฆฌ์ผ์ด์
์ ์ํด 3D-XPath, ์ฆ ๊ณ ๋ฐ๋ ๊ด๋ฆฌ DRAM ์ํคํ
์ฒ๋ฅผ ์ ์ํ๋ค. ์ด๋ฌํ 3D-ํ์๋ฅผ ์ง์ ํ DRAM์ ์ ์์ ๊ณ ๋ฐ๋ DRAM ๋ค์ด๋ฅผ ๊ธฐ์กด์ ์ผ๋ฐ์ ์ธ DRAM ๋ค์ด์ ๋์์ ํ ์นฉ์ ์ ์ธตํ๊ณ , DRAM ๋ค์ด๋ผ๋ฆฌ๋ ์ ์ํ๋ 3D-XPath ํ๋์จ์ด๋ฅผ ํตํด ์ฐ๊ฒฐ๋๋ค. ์ด๋ฌํ 3D-XPath๋ ํซ ํ์ด์ง ์ค์์ด ์ผ์ด๋๋ ๋์ ์์ฉํ๋ก๊ทธ๋จ์ ๋ฉ๋ชจ๋ฆฌ ์์ฒญ์ ์ฐจ๋จํ์ง ์๊ณ ์ฌ์ฉ๋์ด ์ ์ ๋ฉ๋ชจ๋ฆฌ ์ฑ๋๋ก ํซ ํ์ด์ง ์ค์์ ์ฒ๋ฆฌ ํ ์ ์๋๋ก ํ์ฌ, ๋ฐ์ดํฐ ์ง์ค ์์ฉ ํ๋ก๊ทธ๋จ์ ๋ฐฑ๋ถ์ ์๋ต ์๊ฐ์ ๊ฐ์ ์ํจ๋ค. ๋ํ ์ ์ํ๋ ํ๋์จ์ด ๊ตฌ์กฐ๋ฅผ ์ฌ์ฉํ์ฌ, ์ถ๊ฐ์ ์ผ๋ก O/S ์ปค๋๊ณผ ์ ์ ์คํ์ด์ค ๊ฐ์ ๋ฉ๋ชจ๋ฆฌ ๋ธ๋ก์ ์์ฃผ ๋ณต์ฌํ๋ ์์ฉ ํ๋ก๊ทธ๋จ์ ์ฒ๋ฆฌ๋์ ํฅ์์ํฌ ์ ์๋ค. ์ด๋ฌํ 3D-XPath DRAM์ 3D-XPath๊ฐ ์๋ DRAM์ ๋นํด I/O ์ง์ฝ์ ์ธ ์์ฉํ๋ก๊ทธ๋จ์ ์ฒ๋ฆฌ๋์ ์ต๋ 39 % ํฅ์์ํค๋ฉด์ ๋ ์ดํด์์ ๋ฏผ๊ฐํ ์์ฉ ํ๋ก๊ทธ๋จ์ ๋์ ๋ฐฑ๋ถ์ ์๋ต ์๊ฐ์ ์ต๋ 30 %๊น์ง ๊ฐ์์ํฌ ์ ์๋ค.
๋ํ ์ต๊ทผ์ ์ปดํจํฐ ์์คํ
์ ๋ณด๋ค ๋ง์ ๋ฉ๋ชจ๋ฆฌ ๋์ญํญ๊ณผ ์ฉ๋์ ํ์๋กํ๋ ๋ ๋ง์ CPU ์ฝ์ด๋ฅผ ๋จ์ผ ์์ผ์ผ๋ก ํตํฉํ๋ ๋ฐฉํฅ์ผ๋ก ์งํํ๊ณ ์๋ค. ์ด๋ฌํ ์์ผ ๋น ์ฑ๋ ์๋ฅผ ๋๋ฆฌ๋ ๊ฒ์ ๋์ญํญ ์๊ตฌ์ ๋ํ ์ผ๋ฐ์ ์ธ ํด๊ฒฐ์ฑ
์ด๋ฉฐ, ์ต์ ์ DRAM ์ธํฐํ์ด์ค์ ๋ฐ์ ์์์ ์ฆ๊ฐํ ์ฑ๋์ ๋ณด๋ค ์ ํ์ฉํ๊ธฐ ์ํด ๋ฐ์ดํฐ ๋ฒ์ค ํญ์ด ๊ฐ์๋๊ณ ๋ฒ์คํธ ๊ธธ์ด๊ฐ ์ฆ๊ฐํ๋ค. ๊ทธ๋ฌ๋ ๊ธธ์ด์ง ๋ฒ์คํธ ๊ธธ์ด๋ DRAM ์ก์ธ์ค ๋๊ธฐ ์๊ฐ์ ์ฆ๊ฐ์ํจ๋ค. ์ถ๊ฐ์ ์ผ๋ก ์ต์ ์ ์์ฉํ๋ก๊ทธ๋จ์ ๋ ๋ง์ ๋ฉ๋ชจ๋ฆฌ ์ฉ๋์ ์๊ตฌํ๋ฉฐ, ๋ฏธ์ธ ๊ณต์ ์ผ๋ก ๋ฉ๋ชจ๋ฆฌ ์ฉ๋์ ์ฆ๊ฐ์ํค๋ ๋ฐฉ๋ฒ๋ก ์ ์์ญ ๋
๋์ ์ฌ์ฉ๋์์ง๋ง, 20 nm ์ดํ์ ๋ฏธ์ธ๊ณต์ ์์๋ ๋ ์ด์ ๊ณต์ ๋ฏธ์ธํ๋ฅผ ํตํด ๋ฉ๋ชจ๋ฆฌ ๋ฐ๋๋ฅผ ์ฆ๊ฐ์ํค๊ธฐ๊ฐ ์ด๋ ค์ด ์ํฉ์ด๋ฉฐ, ์ ์ธตํ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ฌ์ฉํ์ฌ ์ฉ๋์ ์ฆ๊ฐ์ํค๋ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ๋ค.
์ด๋ฌํ ์ํฉ์์, ์ค์ ์ต์ ์ ๋ฉํฐ์ฝ์ด ๋จธ์ ์์ SPEC CPU 2006 ์์ฉํ๋ก๊ทธ๋จ์ ๋ฉํฐ์ฝ์ด์์ ์คํํ์์ ๋, ํญ์ ์์คํ
์ ๋ชจ๋ ๋ฉ๋ชจ๋ฆฌ ์ปจํธ๋กค๋ฌ๊ฐ ์์ ํ ํ์ฉ๋์ง ์๋๋ค๋ ์ฌ์ค์ ๊ด์ฐฐํ๋ค. ์ด๋ฌํ ์ ํด ์ฑ๋์ ์ฌ์ฉํ๊ธฐ ์ํด ํ๋์ ๋ฉ๋ชจ๋ฆฌ ์ฑ๋์ ํผํฌ ๋์ญํญ์ ๋์ด๊ณ 3D ์คํ ๋ฉ๋ชจ๋ฆฌ์ ๋ฒ์คํธ ๋๊ธฐ ์๊ฐ์ ์ค์ด๊ธฐ ์ํด ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ๋ฉ๋ชจ๋ฆฌ ์ฑ๋ ๊ณต์ ์ํคํ
์ฒ๋ฅผ ์ ์ํ์์ผ๋ฉฐ, ํ๋์จ์ด ๋ธ๋ก์ ์ ์ํ์๋ค. ์ด๋ฌํ ์ฑ๋ ๊ณต์ ๋ฅผ ํตํด ๋ฉํฐ ํ๋ก๊ทธ๋จ ๋ ์์ฉํ๋ก๊ทธ๋จ ๋ฐ ๋ค์ค ์ค๋ ๋ ์์ฉํ๋ก๊ทธ๋จ ์ฑ๋ฅ์ด ๊ฐ๊ฐ 4.3 % ๋ฐ 3.6 %๋ก ํฅ์๋์์ผ๋ฉฐ ํ๊ท ์ฝ๊ธฐ ๋๊ธฐ ์๊ฐ์ 8.22 % ๋ฐ 10.18 %๋ก ๊ฐ์ํ์๋ค.Contents
Abstract i
Contents iv
List of Figures vi
List of Tables viii
Introduction 1
1.1 3D-XPath: High-Density Managed DRAM Architecture with Cost-effective Alternative Paths for Memory Transactions 5
1.2 Boosting Bandwidth โ Dynamic Channel Sharing on 3D Stacked Memory 9
1.3 Research contribution 13
1.4 Outline 14
3D-stacked Heterogeneous Memory Architecture with Cost-effective Extra Block Transfer Paths 17
2.1 Background 17
2.1.1 Heterogeneous Main Memory Systems 17
2.1.2 Specialized DRAM 19
2.1.3 3D-stacked Memory 22
2.2 HIGH-DENSITY DRAM ARCHITECTURE 27
2.2.1 Key Design Challenges 29
2.2.2 Plausible High-density DRAM Designs 33
2.3 3D-STACKED DRAM WITH ALTERNATIVE PATHS FOR MEMORY TRANSACTIONS 37
2.3.1 3D-XPath Architecture 41
2.3.2 3D-XPath Management 46
2.4 EXPERIMENTAL METHODOLOGY 52
2.5 EVALUATION 56
2.5.1 OLDI Workloads 56
2.5.2 Non-OLDI Workloads 61
2.5.3 Sensitivity Analysis 66
2.6 RELATED WORK 70
Boosting bandwidth โDynamic Channel Sharing on 3D Stacked Memory 72
3.1 Background: Memory Operations 72
3.1.1. Memory Controller 72
3.1.2 DRAM column access sequence 73
3.2 Related Work 74
3.3. CHANNEL SHARING ENABLED MEMORY SYSTEM 76
3.3.1 Hardware Requirements 78
3.3.2 Operation Sequence 81
3.4 Analysis 87
3.4.1 Experiment Environment 87
3.4.2 Performance 88
3.4.3 Overhead 90
CONCLUSION 92
REFERENCES 94
๊ตญ๋ฌธ์ด๋ก 107Docto
Resource and thermal management in 3D-stacked multi-/many-core systems
Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures.
3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead.
This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing.
This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory.
At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads.
On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00
LOT-ECC: LOcalized and tiered reliability mechanisms for commodity memory systems
pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chipkill-level mem- ory protection mechanisms suffer from several draw- backs. They activate a large number of chips on ev- ery memory access - this increases energy consump- tion, and reduces performance due to the reduction in rank-level parallelism. Additionally, they increase ac- cess granularity, resulting in wasted bandwidth in the absence of sufficient access locality. They also restrict systems to use narrow-I/O x4 devices, which are known to be less energy-efficient than the wider x8 DRAM de- vices. In this paper, we present LOT-ECC, a local- ized and multi-tiered protection scheme that attempts to solve these problems. We separate error detection and error correction functionality, and employ simple checksum and parity codes effectively to provide strong fault-tolerance, while simultaneously simplifying imple- mentation. Data and codes are localized to the same DRAM row to improve access efficiency. We use sys- tem firmware to store correction codes in DRAM data memory and modify the memory controller to handle data mapping. We thus build an effective fault-tolerance mechanism that provides strong reliability guarantees, activates as few chips as possible (reducing power con- sumption by up to 44.8% and reducing latency by up to 46.9%), and reduces circuit complexity, all while work- ing with commodity DRAMs and operating systems. Fi- nally, we propose the novel concept of a heterogeneous DIMM that enables the extension of LOT-ECC to x16 and wider DRAM parts
- โฆ