Search CORE

1,523 research outputs found

Improving Phase Change Memory Performance with Data Content Aware Access

Author: Ahn S. J.
Alshboul M.
Awad A.
Awad A.
Bock S.
Bock S.
Bondurant D.
Boroumand A.
Burr G. W.
Chen J.
Chhabra S.
Dogan H.
Du Y.
Ferreira A. P.
Frigo P.
Gueron S.
Guerra J.
Ham T. J.
Hashemi M.
Hsieh K.
Hwang W.
Jia Y.
Jiang L.
Joo Y.
Kang U.
Karlsson M.
Kim J.
Kim Y.
Kim Y.
Lalam A.
Lam C. H.
Lee J. I.
Mallik A.
Marathe V. J.
Meza J.
Morikawa T.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Saileshwar G.
Seong N. H.
Seshadri V.
Stuecheli J.
Villa C.
Wang Y.
Wang Z.
Wuttig M.
Yamada N.
Yang J.
Yue J.
Zhang L.
Zhou M.
Zhou M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

arXiv.org e-Print Archive

Crossref

Bit-Flip Aware Data Structures for Phase Change Memory

Author: Kulandai Arockia David Roy
Publication venue: e-Publications@Marquette
Publication date: 01/10/2022
Field of study

Big, non-volatile, byte-addressable, low-cost, and fast non-volatile memories like Phase Change Memory are appearing in the marketplace. They have the capability to unify both memory and storage and allow us to rethink the present memory hierarchy. An important draw-back to Phase Change Memory is limited write-endurance. In addition, Phase Change Memory shares with other Non-Volatile Random Access Memories an asym- metry in the energy costs of writes and reads. Best use of Non-Volatile Random Access Memories limits the number of times a Non-Volatile Random Access Memory cell changes contents, called a bit-flip. While the future of main memory is still unknown, we should already start to create data structures for them in order to shape the future era. This thesis investigates the creation of bit-flip aware data structures.The thesis first considers general ways in which a data structure can save bit- flips by smart overwrites and by using the exclusive-or of pointers. It then shows how a simple content dependent encoding can reduce bit-flips for web corpora. It then shows how to build hash based dictionary structures for Linear Hashing and Spiral Storage. Finally, the thesis presents Gray counters, close to bit-flip optimal counters that even enable age- based wear leveling with counters managed by the Non-Volatile Random Access Memories themselves instead of by the Operating Systems

epublications@Marquette

Compression architecture for bit-write reduction in non-volatile memory technologies

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Crossref

Dynamic Virtual Page-based Flash Translation Layer with Novel Hot Data Identification and Adaptive Parallelism Management

Author: Cheung Ray C.C.
Luo Qiwu
Sun Yichuang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/09/2018
Field of study

Solid-state disks (SSDs) tend to replace traditional motor-driven hard disks in high-end storage devices in past few decades. However, various inherent features, such as out-of-place update [resorting to garbage collection (GC)] and limited endurance (resorting to wear leveling), need to be reduced to a large extent before that day comes. Both the GC and wear leveling fundamentally depend on hot data identification (HDI). In this paper, we propose a hot data-aware flash translation layer architecture based on a dynamic virtual page (DVPFTL) so as to improve the performance and lifetime of NAND flash devices. First, we develop a generalized dual layer HDI (DL-HDI) framework, which is composed of a cold data pre-classifier and a hot data post-identifier. Those can efficiently follow the frequency and recency of information access. Then, we design an adaptive parallelism manager (APM) to assign the clustered data chunks to distinct resident blocks in the SSD so as to prolong its endurance. Finally, the experimental results from our realized SSD prototype indicate that the DVPFTL scheme has reliably improved the parallelizability and endurance of NAND flash devices with improved GC-costs, compared with related works.Peer reviewe

University of Hertfordshire Research Archive

Mitigating Limited PCM Write Bandwidth and Endurance in Hybrid Memory Systems

Author: Du Yu
Publication venue
Publication date: 18/06/2015
Field of study

With the rise of big data and cloud computing, there is increasing demand on memory capacity to solve problems of large sizes and consolidate computation tasks. For large capacity memory systems, DRAM is a significant source of energy consumption. Non-volatile memory, such as Phase-Change Memory (PCM), is a promising technology for constructing energy-efficient memory. Unlike DRAM, PCM has negligible background (static) power and allows high density packaging. But PCM also has limited write bandwidth and write endurance. Hybrid memory systems have been proposed to combine the high-density and low standby power of PCM with the good write performance of DRAM. This thesis addresses two challenges which are unique to hybrid memory systems. The first challenge is the limited PCM bandwidth, which can become a performance bottleneck. The second challenge is the non-contiguous physical memory due to retired memory pages. Since PCM cells have limited write endurance, it is inevitable to gradually have increased number of uncorrectable errors during the lifetime. Memory pages that have detected errors are normally retired by the OS, which create unusable “holes” in the physical memory. These unusable holes make it difficult to construct traditional superpages, which can incur significant performance overhead. In this thesis, I propose three solutions to address these two challenges. First, I observed that an unbalanced distribution of modified data bits among PCM chips significantly increases PCM write time and hurts effective write bandwidth. I propose new XOR-based mapping schemes between program data bits and PCM cells to improve PCM write throughput by spreading modified data bits evenly among PCM chips. Second, I propose a compressed DRAM cache scheme to improve DRAM effective capacity and reduce write traffic to PCM. A new adaptive delta-compression technique for modified data is used to achieve a large compression ratio. Third, I propose Gap-tolerant Sequential Mapping, a new memory page mapping scheme, to construct superpages from non-contiguous physical memory. The proposed three solutions have simple and practical designs, and can be easily adopted in future hybrid memory systems

D-Scholarship@Pitt