1,226 research outputs found

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

    Get Out of the Valley: Power-Efficient Address Mapping for GPUs

    Get PDF
    GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31 x and power-efficiency by 1.25 x compared to state-of-the-art permutation-based address mapping

    Understanding and Improving the Latency of DRAM-Based Memory Systems

    Full text link
    Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic random-access memory), which has been used as the physical substrate for main memory. In stark contrast with capacity and bandwidth, DRAM latency has remained almost constant, reducing by only 1.3x in the same time frame. Therefore, long DRAM latency continues to be a critical performance bottleneck in modern systems. Increasing core counts, and the emergence of increasingly more data-intensive and latency-critical applications further stress the importance of providing low-latency memory access. In this dissertation, we identify three main problems that contribute significantly to long latency of DRAM accesses. To address these problems, we present a series of new techniques. Our new techniques significantly improve both system performance and energy efficiency. We also examine the critical relationship between supply voltage and latency in modern DRAM chips and develop new mechanisms that exploit this voltage-latency trade-off to improve energy efficiency. The key conclusion of this dissertation is that augmenting DRAM architecture with simple and low-cost features, and developing a better understanding of manufactured DRAM chips together lead to significant memory latency reduction as well as energy efficiency improvement. We hope and believe that the proposed architectural techniques and the detailed experimental data and observations on real commodity DRAM chips presented in this dissertation will enable development of other new mechanisms to improve the performance, energy efficiency, or reliability of future memory systems.Comment: PhD Dissertatio

    Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies

    Get PDF
    Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems. In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing โ€˜0โ€™s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes. The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4ร— higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future

    ํƒ€์ž„ ์œˆ๋„์šฐ ์นด์šดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋กœ์šฐ ํ•ด๋จธ๋ง ๋ฐฉ์ง€ ๋ฐ ์ฃผ๊ธฐ์–ต์žฅ์น˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2020. 8. ์•ˆ์ •ํ˜ธ.Computer systems using DRAM are exposed to row-hammer (RH) attacks, which can flip data in a DRAM row without directly accessing a row but by frequently activating its adjacent ones. There have been a number of proposals to prevent RH, including both probabilistic and deterministic solutions. However, the probabilistic solutions provide protection with no capability to detect attacks and have a non-zero probability for missing protection. Otherwise, counter-based deterministic solutions either incur large area overhead or suffer from noticeable performance drop on adversarial memory access patterns. To overcome these challenges, we propose a new counter-based RH prevention solution named Time Window Counter (TWiCe) based row refresh, which accurately detects potential RH attacks only using a small number of counters with a minimal performance impact. We first make a key observation that the number of rows that can cause RH is limited by the maximum values of row activation frequency and DRAM cell retention time. We calculate the maximum number of required counter entries per DRAM bank, with which TWiCe prevents RH with a strong deterministic guarantee. TWiCe incurs no performance overhead on normal DRAM operations and less than 0.7% area and energy overheads over contemporary DRAM devices. Our evaluation shows that TWiCe makes no more than 0.006% of additional DRAM row activations for adversarial memory access patterns, including RH attack scenarios. To reduce the area and energy overhead further, we propose the threshold adjusted rank-level TWiCe. We first introduce pseudo-associative TWiCe (pa-TWiCe) that can search for hundreds of TWiCe table entries energy-efficiently. In addition, by exploiting pa-TWiCe structure, we propose rank-level TWiCe that reduces the number of required entries further by managing the table entries at a rank-level. We also adjust the thresholds of TWiCe to reduce the number of entries without the increase of false-positive detection on general workloads. Finally, we propose extend TWiCe as a hot-page detector to improve main-memory performance. TWiCe table contains the row addresses that have been frequently activated recently, and they are likely to be activated again due to temporal locality in memory accesses. We show how the hot-page detection in TWiCe can be combined with a DRAM page swap methodology to reduce the DRAM latency for the hot pages. Also, our evaluation shows that low-latency DRAM using TWiCe achieves up to 12.2% IPC improvement over a baseline DDR4 device for a multi-threaded workload.DRAM์„ ์ฃผ๊ธฐ์–ต์žฅ์น˜๋กœ ์‚ฌ์šฉํ•˜๋Š” ์ปดํ“จํ„ฐ ์‹œ์Šคํ…œ์€ ๋กœ์šฐ ํ•ด๋จธ๋ง ๊ณต๊ฒฉ์— ๋…ธ์ถœ๋œ๋‹ค. ๋กœ์šฐ ํ•ด๋จธ๋ง์€ ์ธ์ ‘ DRAM ๋กœ์šฐ๋ฅผ ์ž์ฃผ activationํ•จ์œผ๋กœ์จ ํŠน์ • DRAM ๋กœ์šฐ ๋ฐ์ดํ„ฐ์— ์ง์ ‘ ์ ‘๊ทผํ•˜์ง€ ์•Š๊ณ ์„œ๋„ ๋ฐ์ดํ„ฐ๋ฅผ ๋’ค์ง‘์„ ์ˆ˜ ์žˆ๋Š” ํ˜„์ƒ์„ ๋งํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋กœ์šฐ ํ•ด๋จธ๋ง ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ™•๋ฅ ์ ์ธ ๋ฐฉ์ง€ ๊ธฐ๋ฒ•๊ณผ ๊ฒฐ์ •๋ก ์  ๋ฐฉ์ง€ ๊ธฐ๋ฒ•๋“ค์ด ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ํ™•๋ฅ ์ ์ธ ๋ฐฉ์ง€ ๊ธฐ๋ฒ•์€ ๊ณต๊ฒฉ ์ž์ฒด๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์—†๊ณ , ๋ฐฉ์ง€์— ์‹คํŒจํ•  ํ™•๋ฅ ์ด 0์ด ์•„๋‹ˆ๋ผ๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ๋˜ํ•œ ๊ธฐ์กด์˜ ์นด์šดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๊ฒฐ์ •๋ก ์  ๋ฐฉ์ง€ ๊ธฐ๋ฒ•๋“ค์€ ํฐ ์นฉ ๋ฉด์  ๋น„์šฉ์„ ๋ฐœ์ƒ์‹œํ‚ค๊ฑฐ๋‚˜ ํŠน์ • ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์—์„œ ํ˜„์ €ํ•œ ์„ฑ๋Šฅ ํ•˜๋ฝ์„ ์•ผ๊ธฐํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” TWiCe (Time Window Counter based row refresh)๋ผ๋Š” ์ƒˆ๋กœ์šด ์นด์šดํ„ฐ ๊ธฐ๋ฐ˜ ๊ฒฐ์ •๋ก ์  ๋ฐฉ์ง€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. TWiCe๋Š” ์ ์€ ์ˆ˜์˜ ์นด์šดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋กœ์šฐ ํ•ด๋จธ๋ง ๊ณต๊ฒฉ์„ ์ •ํ™•ํ•˜๊ฒŒ ํƒ์ง€ํ•˜๋ฉด์„œ๋„ ์„ฑ๋Šฅ์— ์•…์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์šฐ๋ฆฌ๋Š” DRAM ํƒ€์ด๋ฐ ํŒŒ๋ผ๋ฏธํ„ฐ์— ์˜ํ•ด ๋กœ์šฐ activation ๋นˆ๋„๊ฐ€ ์ œํ•œ๋˜๊ณ  DRAM ์…€์ด ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ฆฌํ”„๋ ˆ์‹œ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋กœ์šฐ ํ•ด๋จธ๋ง์„ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋Š” DRAM ๋กœ์šฐ์˜ ์ˆ˜๊ฐ€ ํ•œ์ •๋œ๋‹ค๋Š” ์‚ฌ์‹ค์— ์ฃผ๋ชฉํ•˜์˜€๋‹ค. ์ด๋กœ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” TWiCe๊ฐ€ ํ™•์‹คํ•œ ๊ฒฐ์ •๋ก ์  ๋ฐฉ์ง€๋ฅผ ๋ณด์žฅํ•  ๊ฒฝ์šฐ ํ•„์š”ํ•œ DRAM ๋ฑ…ํฌ ๋‹น ํ•„์š”ํ•œ ์นด์šดํ„ฐ ์ˆ˜์˜ ์ตœ๋Œ€๊ฐ’์„ ๊ตฌํ•˜์˜€๋‹ค. TWiCe๋Š” ์ผ๋ฐ˜์ ์ธ DRAM ๋™์ž‘ ๊ณผ์ •์—์„œ๋Š” ์„ฑ๋Šฅ์— ์•„๋ฌด๋Ÿฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š์œผ๋ฉฐ, ํ˜„๋Œ€ DRAM ๋””๋ฐ”์ด์Šค์—์„œ 0.7% ์ดํ•˜์˜ ์นฉ ๋ฉด์  ์ฆ๊ฐ€ ๋ฐ ์—๋„ˆ์ง€ ์ฆ๊ฐ€๋งŒ์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ง„ํ–‰ํ•œ ํ‰๊ฐ€์—์„œ TWiCe๋Š” ๋กœ์šฐ ํ•ด๋จธ๋ง ๊ณต๊ฒฉ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ํฌํ•จํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์—์„œ 0.006% ์ดํ•˜์˜ ์ถ”๊ฐ€์ ์ธ DRAM activation์„ ์š”๊ตฌํ•˜์˜€๋‹ค. ๋˜ํ•œ TWiCe์˜ ์นฉ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ๋น„์šฉ์„ ๋”์šฑ ์ค„์ด๊ธฐ ์œ„ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” threshold๊ฐ€ ์กฐ์ •๋œ ๋žญํฌ ๋‹จ์œ„ TWiCe๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ์ˆ˜๋ฐฑ๊ฐœ๊ฐ€ ๋„˜๋Š” TWiCe ํ…Œ์ด๋ธ” ํ•ญ๋ชฉ ๊ฒ€์ƒ‰์„ ์—๋„ˆ์ง€ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” pa-TWiCe (pseudo-associatvie TWiCe)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ , ํ…Œ์ด๋ธ” ํ•ญ๋ชฉ์„ ๋žญํฌ ๋‹จ์œ„๋กœ ๊ด€๋ฆฌํ•˜์—ฌ ํ•„์š”ํ•œ ํ…Œ์ด๋ธ” ํ•ญ๋ชฉ์˜ ์ˆ˜๋ฅผ ๋”์šฑ ์ค„์ธ ๋žญํฌ ๋‹จ์œ„ TWiCe๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” TWiCe์˜ threshold ๊ฐ’์„ ์กฐ์ ˆํ•จ์œผ๋กœ์จ ์ผ๋ฐ˜์ ์ธ ์›Œํฌ๋กœ๋“œ ์ƒ์—์„œ ๊ฑฐ์ง“ ์–‘์„ฑ(false-positive) ํƒ์ง€๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค์ง€ ์•Š๋Š” ์„ ์—์„œ TWiCe์˜ ํ…Œ์ด๋ธ” ํ•ญ๋ชฉ ์ˆ˜๋ฅผ ๋”์šฑ ์ค„์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์šฐ๋ฆฌ๋Š” ์ปดํ“จํ„ฐ ์‹œ์Šคํ…œ์˜ ์ฃผ๊ธฐ์–ต์žฅ์น˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด TWiCe๋ฅผ hot-page ๊ฐ์ง€๊ธฐ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์˜ ์‹œ๊ฐ„์  ์ง€์—ญ์„ฑ์— ์˜ํ•ด ์ตœ๊ทผ ์ž์ฃผ activation๋œ DRAM ๋กœ์šฐ๋“ค์€ ๋‹ค์‹œ activation๋  ํ™•๋ฅ ์ด ๋†’๊ณ , TWiCe๋Š” ์ตœ๊ทผ ์ž์ฃผ activation๋œ DRAM ๋กœ์šฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‚ฌ์‹ค์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” hot-page์— ๋Œ€ํ•œ DRAM ์ ‘๊ทผ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ค„์ด๋Š” DRAM ํŽ˜์ด์ง€ ์Šค์™‘(swap) ๊ธฐ๋ฒ•๋“ค์— TWiCe๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์ธ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ˆ˜ํ–‰ํ•œ ํ‰๊ฐ€์—์„œ TWiCe๋ฅผ ์‚ฌ์šฉํ•œ ์ €์ง€์—ฐ์‹œ๊ฐ„ DRAM์€ ๋ฉ€ํ‹ฐ ์“ฐ๋ ˆ๋”ฉ ์›Œํฌ๋กœ๋“œ๋“ค์—์„œ ๊ธฐ์กด DDR4 ๋””๋ฐ”์ด์Šค ๋Œ€๋น„ IPC๋ฅผ ์ตœ๋Œ€ 12.2% ์ฆ๊ฐ€์‹œ์ผฐ๋‹ค.Introduction 1 1.1 Time Window Counter Based Row Refresh to Prevent Row-hammering 2 1.2 Optimizing Time Window Counter 6 1.3 Using Time Window Counters to Improve Main Memory Performance 8 1.4 Outline 10 Background of DRAM and Row-hammering 11 2.1 DRAM Device Organization 12 2.2 Sparing DRAM Rows to Combat Reliability Challenges 13 2.3 Main Memory Subsystem Organization and Operation 14 2.4 Row-hammering (RH) 18 2.5 Previous RH Prevention Solutions 20 2.6 Limitations of the Previous RH Solutions 21 TWiCe: Time Window Counter based RH Prevention 26 3.1 TWiCe: Time Window Counter 26 3.2 Proof of RH Prevention 30 3.3 Counter Table Size 33 3.4 Architecting TWiCe 35 3.4.1 Location of TWiCe Table 35 3.4.2 Augmenting DRAM Interface with a New Adjacent Row Refresh (ARR) Command 37 3.5 Analysis 40 3.6 Evaluation 42 Optimizing TWiCe to Reduce Implementation Cost 47 4.1 Pseudo-associative TWiCe 47 4.2 Rank-level TWiCe 50 4.3 Adjusting Threshold to Reduce Table Size 55 4.4 Analysis 57 4.5 Evaluation 59 Augmenting TWiCe for Hot-page Detection 62 5.1 Necessity of Counters for Detecting Hot Pages 62 5.2 Previous Studies on Migration for Asymmetric Low-latency DRAM 64 5.3 Extending TWiCe for Dynamic Hot-page Detection 67 5.4 Additional Components and Methodology 70 5.5 Analysis and Evaluation 73 5.5.1 Overhead Analysis 73 5.5.2 Evaluation 75 Conclusion 82 6.1 Future work 84 Bibliography 85 ๊ตญ๋ฌธ์ดˆ๋ก 94Docto

    Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

    Full text link
    Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to such a strict order significantly degrades system performance and persistent memory endurance. This paper introduces a new mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering requirements at significantly lower performance and endurance loss. LOC consists of two key techniques. First, Eager Commit eliminates the need to perform a persistent commit record write within a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the write ordering between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism supports the tracking of committed transaction ID and multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of memory persistence from 66.9% to 34.9% and the memory write traffic overhead from 17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and Distributed System
    • โ€ฆ
    corecore