Search CORE

10,321 research outputs found

Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

Author: Lu Youyou
Mutlu Onur
Shu Jiwu
Sun Long
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/05/2017
Field of study

Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to such a strict order significantly degrades system performance and persistent memory endurance. This paper introduces a new mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering requirements at significantly lower performance and endurance loss. LOC consists of two key techniques. First, Eager Commit eliminates the need to perform a persistent commit record write within a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the write ordering between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism supports the tracking of committed transaction ID and multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of memory persistence from 66.9% to 34.9% and the memory write traffic overhead from 17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and Distributed System

arXiv.org e-Print Archive

Crossref

Emulating and evaluating hybrid memory for managed languages on NUMA hardware

Author: Akram Shoaib
Eeckhout Lieven
McKinley Kathryn S.
Sartor Jennifer
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Non-volatile memory (NVM) has the potential to become a mainstream memory technology and challenge DRAM. Researchers evaluating the speed, endurance, and abstractions of hybrid memories with DRAM and NVM typically use simulation, making it easy to evaluate the impact of different hardware technologies and parameters. Simulation is, however, extremely slow, limiting the applications and datasets in the evaluation. Simulation also precludes critical workloads, especially those written in managed languages such as Java and C#. Good methodology embraces a variety of techniques for evaluating new ideas, expanding the experimental scope, and uncovering new insights. This paper introduces a platform to emulate hybrid memory for managed languages using commodity NUMA servers. Emulation complements simulation but offers richer software experimentation. We use a thread-local socket to emulate DRAM and a remote socket to emulate NVM. We use standard C library routines to allocate heap memory on the DRAM and NVM sockets for use with explicit memory management or garbage collection. We evaluate the emulator using various configurations of write-rationing garbage collectors that improve NVM lifetimes by limiting writes to NVM, using 15 applications and various datasets and workload configurations. We show emulation and simulation confirm each other's trends in terms of writes to NVM for different software configurations, increasing our confidence in predicting future system effects. Emulation brings novel insights, such as the non-linear effects of multi-programmed workloads on NVM writes, and that Java applications write significantly more than their C++ equivalents. We make our software infrastructure publicly available to advance the evaluation of novel memory management schemes on hybrid memories

Crossref

Ghent University Academic Bibliography

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

Author: Antognetti P.
Arafa M.
Arjomand M.
Bhattacharyya A.
Blagodurov S.
Cao Y.
Chang Y.-M.
Cho B.-H.
Das A.
Das A.
Dray C.
Goda A.
Huang Y.
Jayasena N. S.
Kang U.
Kim Y.
Lee D.
Mallik A.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Redaelli A.
Rixner S.
Sandhu B. S.
Seong N. H.
Seshadri V.
Srinivasan J.
Stuecheli J.
Yoon H.
Yue J.
Zhang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

arXiv.org e-Print Archive

Crossref

Improving Phase Change Memory Performance with Data Content Aware Access

Author: Ahn S. J.
Alshboul M.
Awad A.
Awad A.
Bock S.
Bock S.
Bondurant D.
Boroumand A.
Burr G. W.
Chen J.
Chhabra S.
Dogan H.
Du Y.
Ferreira A. P.
Frigo P.
Gueron S.
Guerra J.
Ham T. J.
Hashemi M.
Hsieh K.
Hwang W.
Jia Y.
Jiang L.
Joo Y.
Kang U.
Karlsson M.
Kim J.
Kim Y.
Kim Y.
Lalam A.
Lam C. H.
Lee J. I.
Mallik A.
Marathe V. J.
Meza J.
Morikawa T.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Saileshwar G.
Seong N. H.
Seshadri V.
Stuecheli J.
Villa C.
Wang Y.
Wang Z.
Wuttig M.
Yamada N.
Yang J.
Yue J.
Zhang L.
Zhou M.
Zhou M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

arXiv.org e-Print Archive

Crossref