640 research outputs found
Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories
Modern computing systems are embracing hybrid memory comprising of DRAM and
non-volatile memory (NVM) to combine the best properties of both memory
technologies, achieving low latency, high reliability, and high density. A
prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access
latency much higher than DRAM access latency. We call this inter-memory
asymmetry. We observe that parasitic components on a long bitline are a major
source of high latency in both DRAM and NVM, and a significant factor
contributing to high-voltage operations in NVM, which impact their reliability.
We propose an architectural change, where each long bitline in DRAM and NVM is
split into two segments by an isolation transistor. One segment can be accessed
with lower latency and operating voltage than the other. By introducing tiers,
we enable non-uniform accesses within each memory type (which we call
intra-memory asymmetry), leading to performance and reliability trade-offs in
DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we
exploit both inter- and intra-memory asymmetries to allocate and migrate memory
pages between the tiers in DRAM and NVM. Second, we improve the OS's page
allocation decisions by predicting the access intensity of a newly-referenced
memory page in a program and placing it to a matching tier during its initial
allocation. This minimizes page migrations during program execution, lowering
the performance overhead. Third, we propose a solution to migrate pages between
the tiers of the same memory without transferring data over the memory channel,
minimizing channel occupancy and improving performance. Our overall approach,
which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid
tiered memory improves both performance and reliability for both single-core
and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium
on Memory Managemen
Recommended from our members
Scalable tiered main memory management for big data applications
Tiered memory is becoming an important technology to meet the demands of big data applications. New memory technologies such as non-volatile memory (NVM) and compute express link (CXL) allow for terabytes of main memory. NVM and CXL offer lower performance than that of DRAM, so they will not replace DRAM in datacenter servers. Instead, systems with these technologies provide a tiered memory system with a fast memory tier provided by DRAM and a slow memory tier provided by NVM or CXL. Effective use of tiered memory requires placing application data in the appropriate memory tier based on access frequencies as well as sharing tiered memory among many applications with different performance constraints. Existing memory management techniques struggle
to manage this large and complex memory hierarchy. To allow systems to leverage the capacity offered by tiered memory, my thesis work
presents two systems for terabyte-scale tiered memory management for big data applications. First, I present HeMem, which explores lightweight, scalable, and asynchronous techniques to manage tiered memory with a single big data application running on an isolated system partition. HeMem uses special hardware performance counters to sample application memory access patterns and places application pages n the appropriate memory tier asynchronously in the background. Second, I present MaxMem, which builds off of HeMem and explores how to share tiered memory among many
big data applications. MaxMem uses a quality-of-service aware policy to determine fast memory allocations among applications and ensures that the most frequently accessed data remains in the fast memory tier. This allows MaxMem to balance application performance and server resource utilization.Computer Scienc
FHPM: Fine-grained Huge Page Management For Virtualization
As more data-intensive tasks with large footprints are deployed in virtual
machines (VMs), huge pages are widely used to eliminate the increasing address
translation overhead. However, once the huge page mapping is established, all
the base page regions in the huge page share a single extended page table (EPT)
entry, so that the hypervisor loses awareness of accesses to base page regions.
None of the state-of-the-art solutions can obtain access information at base
page granularity for huge pages. We observe that this can lead to incorrect
decisions by the hypervisor, such as incorrect data placement in a tiered
memory system and unshared base page regions when sharing pages.
This paper proposes FHPM, a fine-grained huge page management for
virtualization without hardware and guest OS modification. FHPM can identify
access information at base page granularity, and dynamically promote and demote
pages. A key insight of FHPM is to redirect the EPT huge page directory entries
(PDEs) to new companion pages so that the MMU can track access information
within huge pages. Then, FHPM can promote and demote pages according to the
current hot page pressure to balance address translation overhead and memory
usage. At the same time, FHPM proposes a VM-friendly page splitting and
collapsing mechanism to avoid extra VM-exits. In combination, FHPM minimizes
the monitoring and management overhead and ensures that the hypervisor gets
fine-grained VM memory accesses to make the proper decision. We apply FHPM to
improve tiered memory management (FHPM-TMM) and to promote page sharing
(FHPM-Share). FHPM-TMM achieves a performance improvement of up to 33% and 61%
over the pure huge page and base page management. FHPM-Share can save 41% more
memory than Ingens, a state-of-the-art page sharing solution, with comparable
performance
????????? ????????? ??????????????? ?????? ????????? ?????? ???????????? ????????? ?????? ??????
Department of Computer Science and EngineeringHigh-capacity non-volatile memory is the new main memory. NVM provides up to 8x the memory capacity of DRAM, but can reduce bandwidth by up to 7x and increase latency by up to 2x. In case of using NVM alone, it provides a large capacity but has the disadvantage of low performance, so a system that is used with DRAM is used. However, if the two mem- ories are not managed properly, the performance will be as bad as if NVM is used alone. A lot of optimization work is being done in the most studied tiered memory system to use the two memories. We found that before Intel Optane DC Persistent Memory ????DCPMM???? was com- mercialized, memory systems using both DRAM and NVM memory did not take DCPMM????s performance into consideration.
We present High Probability Write Patterns ????HPWP???????? an optimization policy for tiered mem- ory systems, in consideration of the commercialized DCPMM performance. HPWP prevents DCPMM from generating write operations as much as possible through the fact that write per- formance of DCPMM is three times worse than read performance. In a tiered memory system equipped with DCPMM, HPWP provides up to 19% performance improvement in key-value store compared to previous studies.ope
MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers
We present MaxMem, a tiered main memory management system that aims to
maximize Big Data application colocation and performance. MaxMem uses an
application-agnostic and lightweight memory occupancy control mechanism based
on fast memory miss ratios to provide application QoS under increasing
colocation. By relying on memory access sampling and binning to quickly
identify per-process memory heat gradients, MaxMem maximizes performance for
many applications sharing tiered main memory simultaneously. MaxMem is designed
as a user-space memory manager to be easily modifiable and extensible, without
complex kernel code development. On a system with tiered main memory consisting
of DRAM and Intel Optane persistent memory modules, our evaluation confirms
that MaxMem provides 11% and 38% better throughput and up to 80% and an order
of magnitude lower 99th percentile latency than HeMem and Linux AutoNUMA,
respectively, with a Big Data key-value store in dynamic colocation scenarios.Comment: 12 pages, 10 figure
Lightweight Frequency-Based Tiering for CXL Memory Systems
Modern workloads are demanding increasingly larger memory capacity. Compute
Express Link (CXL)-based memory tiering has emerged as a promising solution for
addressing this trend by utilizing traditional DRAM alongside slow-tier
CXL-memory devices in the same system. Unfortunately, most prior tiering
systems are recency-based, which cannot accurately identify hot and cold pages,
since a recently accessed page is not necessarily a hot page. On the other
hand, more accurate frequency-based systems suffer from high memory and runtime
overhead as a result of tracking large memories.
In this paper, we propose FreqTier, a fast and accurate frequency-based
tiering system for CXL memory. We observe that memory tiering systems can
tolerate a small amount of tracking inaccuracy without compromising the overall
application performance. Based on this observation, FreqTier probabilistically
tracks the access frequency of each page, enabling accurate identification of
hot and cold pages while maintaining minimal memory overhead. Finally, FreqTier
intelligently adjusts the intensity of tiering operations based on the
application's memory access behavior, thereby significantly reducing the amount
of migration traffic and application interference.
We evaluate FreqTier on two emulated CXL memory devices with different
bandwidths. On the high bandwidth CXL device, FreqTier can outperform
state-of-the-art tiering systems while using 4 less local DRAM memory
for in-memory caching workloads. On GAP graph analytics and XGBoost workloads
with 1:32 local DRAM to CXL-memory ratio, FreqTier outperforms prior works by
1.042.04 (1.39 on average). Even on the low bandwidth CXL
device, FreqTier outperforms AutoNUMA by 1.14 on average
- …