Search CORE

640 research outputs found

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

Author: Antognetti P.
Arafa M.
Arjomand M.
Bhattacharyya A.
Blagodurov S.
Cao Y.
Chang Y.-M.
Cho B.-H.
Das A.
Das A.
Dray C.
Goda A.
Huang Y.
Jayasena N. S.
Kang U.
Kim Y.
Lee D.
Mallik A.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Redaelli A.
Rixner S.
Sandhu B. S.
Seong N. H.
Seshadri V.
Srinivasan J.
Stuecheli J.
Yoon H.
Yue J.
Zhang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

arXiv.org e-Print Archive

Crossref

Recommended from our members

Scalable tiered main memory management for big data applications

Author: Raybuck Amanda
Publication venue
Publication date: 17/07/2024
Field of study

Tiered memory is becoming an important technology to meet the demands of big data applications. New memory technologies such as non-volatile memory (NVM) and compute express link (CXL) allow for terabytes of main memory. NVM and CXL offer lower performance than that of DRAM, so they will not replace DRAM in datacenter servers. Instead, systems with these technologies provide a tiered memory system with a fast memory tier provided by DRAM and a slow memory tier provided by NVM or CXL. Effective use of tiered memory requires placing application data in the appropriate memory tier based on access frequencies as well as sharing tiered memory among many applications with different performance constraints. Existing memory management techniques struggle to manage this large and complex memory hierarchy. To allow systems to leverage the capacity offered by tiered memory, my thesis work presents two systems for terabyte-scale tiered memory management for big data applications. First, I present HeMem, which explores lightweight, scalable, and asynchronous techniques to manage tiered memory with a single big data application running on an isolated system partition. HeMem uses special hardware performance counters to sample application memory access patterns and places application pages n the appropriate memory tier asynchronously in the background. Second, I present MaxMem, which builds off of HeMem and explores how to share tiered memory among many big data applications. MaxMem uses a quality-of-service aware policy to determine fast memory allocations among applications and ensures that the most frequently accessed data remains in the fast memory tier. This allows MaxMem to balance application performance and server resource utilization.Computer Scienc

Texas ScholarWorks

FHPM: Fine-grained Huge Page Management For Virtualization

Author: Li Chuandong
Luo Yingwei
Sha Sai
Wang Xiaolin
Wang Zhenlin
Yang Xiran
Zeng Yangqing
Publication venue
Publication date: 20/07/2023
Field of study

As more data-intensive tasks with large footprints are deployed in virtual machines (VMs), huge pages are widely used to eliminate the increasing address translation overhead. However, once the huge page mapping is established, all the base page regions in the huge page share a single extended page table (EPT) entry, so that the hypervisor loses awareness of accesses to base page regions. None of the state-of-the-art solutions can obtain access information at base page granularity for huge pages. We observe that this can lead to incorrect decisions by the hypervisor, such as incorrect data placement in a tiered memory system and unshared base page regions when sharing pages. This paper proposes FHPM, a fine-grained huge page management for virtualization without hardware and guest OS modification. FHPM can identify access information at base page granularity, and dynamically promote and demote pages. A key insight of FHPM is to redirect the EPT huge page directory entries (PDEs) to new companion pages so that the MMU can track access information within huge pages. Then, FHPM can promote and demote pages according to the current hot page pressure to balance address translation overhead and memory usage. At the same time, FHPM proposes a VM-friendly page splitting and collapsing mechanism to avoid extra VM-exits. In combination, FHPM minimizes the monitoring and management overhead and ensures that the hypervisor gets fine-grained VM memory accesses to make the proper decision. We apply FHPM to improve tiered memory management (FHPM-TMM) and to promote page sharing (FHPM-Share). FHPM-TMM achieves a performance improvement of up to 33% and 61% over the pure huge page and base page management. FHPM-Share can save 41% more memory than Ingens, a state-of-the-art page sharing solution, with comparable performance

arXiv.org e-Print Archive

????????? ????????? ??????????????? ?????? ????????? ?????? ???????????? ????????? ?????? ??????

Author: Kim JungBeen
Publication venue: Ulsan National Institute of Science and Technology
Publication date: 01/08/2022
Field of study

Department of Computer Science and EngineeringHigh-capacity non-volatile memory is the new main memory. NVM provides up to 8x the memory capacity of DRAM, but can reduce bandwidth by up to 7x and increase latency by up to 2x. In case of using NVM alone, it provides a large capacity but has the disadvantage of low performance, so a system that is used with DRAM is used. However, if the two mem- ories are not managed properly, the performance will be as bad as if NVM is used alone. A lot of optimization work is being done in the most studied tiered memory system to use the two memories. We found that before Intel Optane DC Persistent Memory ????DCPMM???? was com- mercialized, memory systems using both DRAM and NVM memory did not take DCPMM????s performance into consideration. We present High Probability Write Patterns ????HPWP???????? an optimization policy for tiered mem- ory systems, in consideration of the commercialized DCPMM performance. HPWP prevents DCPMM from generating write operations as much as possible through the fact that write per- formance of DCPMM is three times worse than read performance. In a tiered memory system equipped with DCPMM, HPWP provides up to 19% performance improvement in key-value store compared to previous studies.ope

ScholarWorks@UNIST

MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers

Author: Erez Mattan
Kamath Aditya K.
Mansoorshahi Kayvan
Peter Simon
Raybuck Amanda
Zhang Wei
Publication venue
Publication date: 01/12/2023
Field of study

We present MaxMem, a tiered main memory management system that aims to maximize Big Data application colocation and performance. MaxMem uses an application-agnostic and lightweight memory occupancy control mechanism based on fast memory miss ratios to provide application QoS under increasing colocation. By relying on memory access sampling and binning to quickly identify per-process memory heat gradients, MaxMem maximizes performance for many applications sharing tiered main memory simultaneously. MaxMem is designed as a user-space memory manager to be easily modifiable and extensible, without complex kernel code development. On a system with tiered main memory consisting of DRAM and Intel Optane persistent memory modules, our evaluation confirms that MaxMem provides 11% and 38% better throughput and up to 80% and an order of magnitude lower 99th percentile latency than HeMem and Linux AutoNUMA, respectively, with a Big Data key-value store in dynamic colocation scenarios.Comment: 12 pages, 10 figure

arXiv.org e-Print Archive

Lightweight Frequency-Based Tiering for CXL Memory Systems

Author: Liu Sihang
Pekhimenko Gennady
Song Kevin
Yang Jiacheng
Publication venue
Publication date: 07/12/2023
Field of study

Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this trend by utilizing traditional DRAM alongside slow-tier CXL-memory devices in the same system. Unfortunately, most prior tiering systems are recency-based, which cannot accurately identify hot and cold pages, since a recently accessed page is not necessarily a hot page. On the other hand, more accurate frequency-based systems suffer from high memory and runtime overhead as a result of tracking large memories. In this paper, we propose FreqTier, a fast and accurate frequency-based tiering system for CXL memory. We observe that memory tiering systems can tolerate a small amount of tracking inaccuracy without compromising the overall application performance. Based on this observation, FreqTier probabilistically tracks the access frequency of each page, enabling accurate identification of hot and cold pages while maintaining minimal memory overhead. Finally, FreqTier intelligently adjusts the intensity of tiering operations based on the application's memory access behavior, thereby significantly reducing the amount of migration traffic and application interference. We evaluate FreqTier on two emulated CXL memory devices with different bandwidths. On the high bandwidth CXL device, FreqTier can outperform state-of-the-art tiering systems while using 4

\times

less local DRAM memory for in-memory caching workloads. On GAP graph analytics and XGBoost workloads with 1:32 local DRAM to CXL-memory ratio, FreqTier outperforms prior works by 1.04

-

2.04

\times

(1.39

\times

on average). Even on the low bandwidth CXL device, FreqTier outperforms AutoNUMA by 1.14

\times

on average

arXiv.org e-Print Archive