640 research outputs found

    Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

    Full text link
    Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

    FHPM: Fine-grained Huge Page Management For Virtualization

    Full text link
    As more data-intensive tasks with large footprints are deployed in virtual machines (VMs), huge pages are widely used to eliminate the increasing address translation overhead. However, once the huge page mapping is established, all the base page regions in the huge page share a single extended page table (EPT) entry, so that the hypervisor loses awareness of accesses to base page regions. None of the state-of-the-art solutions can obtain access information at base page granularity for huge pages. We observe that this can lead to incorrect decisions by the hypervisor, such as incorrect data placement in a tiered memory system and unshared base page regions when sharing pages. This paper proposes FHPM, a fine-grained huge page management for virtualization without hardware and guest OS modification. FHPM can identify access information at base page granularity, and dynamically promote and demote pages. A key insight of FHPM is to redirect the EPT huge page directory entries (PDEs) to new companion pages so that the MMU can track access information within huge pages. Then, FHPM can promote and demote pages according to the current hot page pressure to balance address translation overhead and memory usage. At the same time, FHPM proposes a VM-friendly page splitting and collapsing mechanism to avoid extra VM-exits. In combination, FHPM minimizes the monitoring and management overhead and ensures that the hypervisor gets fine-grained VM memory accesses to make the proper decision. We apply FHPM to improve tiered memory management (FHPM-TMM) and to promote page sharing (FHPM-Share). FHPM-TMM achieves a performance improvement of up to 33% and 61% over the pure huge page and base page management. FHPM-Share can save 41% more memory than Ingens, a state-of-the-art page sharing solution, with comparable performance

    ????????? ????????? ??????????????? ?????? ????????? ?????? ???????????? ????????? ?????? ??????

    Get PDF
    Department of Computer Science and EngineeringHigh-capacity non-volatile memory is the new main memory. NVM provides up to 8x the memory capacity of DRAM, but can reduce bandwidth by up to 7x and increase latency by up to 2x. In case of using NVM alone, it provides a large capacity but has the disadvantage of low performance, so a system that is used with DRAM is used. However, if the two mem- ories are not managed properly, the performance will be as bad as if NVM is used alone. A lot of optimization work is being done in the most studied tiered memory system to use the two memories. We found that before Intel Optane DC Persistent Memory ????DCPMM???? was com- mercialized, memory systems using both DRAM and NVM memory did not take DCPMM????s performance into consideration. We present High Probability Write Patterns ????HPWP???????? an optimization policy for tiered mem- ory systems, in consideration of the commercialized DCPMM performance. HPWP prevents DCPMM from generating write operations as much as possible through the fact that write per- formance of DCPMM is three times worse than read performance. In a tiered memory system equipped with DCPMM, HPWP provides up to 19% performance improvement in key-value store compared to previous studies.ope

    MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers

    Full text link
    We present MaxMem, a tiered main memory management system that aims to maximize Big Data application colocation and performance. MaxMem uses an application-agnostic and lightweight memory occupancy control mechanism based on fast memory miss ratios to provide application QoS under increasing colocation. By relying on memory access sampling and binning to quickly identify per-process memory heat gradients, MaxMem maximizes performance for many applications sharing tiered main memory simultaneously. MaxMem is designed as a user-space memory manager to be easily modifiable and extensible, without complex kernel code development. On a system with tiered main memory consisting of DRAM and Intel Optane persistent memory modules, our evaluation confirms that MaxMem provides 11% and 38% better throughput and up to 80% and an order of magnitude lower 99th percentile latency than HeMem and Linux AutoNUMA, respectively, with a Big Data key-value store in dynamic colocation scenarios.Comment: 12 pages, 10 figure

    Lightweight Frequency-Based Tiering for CXL Memory Systems

    Full text link
    Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this trend by utilizing traditional DRAM alongside slow-tier CXL-memory devices in the same system. Unfortunately, most prior tiering systems are recency-based, which cannot accurately identify hot and cold pages, since a recently accessed page is not necessarily a hot page. On the other hand, more accurate frequency-based systems suffer from high memory and runtime overhead as a result of tracking large memories. In this paper, we propose FreqTier, a fast and accurate frequency-based tiering system for CXL memory. We observe that memory tiering systems can tolerate a small amount of tracking inaccuracy without compromising the overall application performance. Based on this observation, FreqTier probabilistically tracks the access frequency of each page, enabling accurate identification of hot and cold pages while maintaining minimal memory overhead. Finally, FreqTier intelligently adjusts the intensity of tiering operations based on the application's memory access behavior, thereby significantly reducing the amount of migration traffic and application interference. We evaluate FreqTier on two emulated CXL memory devices with different bandwidths. On the high bandwidth CXL device, FreqTier can outperform state-of-the-art tiering systems while using 4×\times less local DRAM memory for in-memory caching workloads. On GAP graph analytics and XGBoost workloads with 1:32 local DRAM to CXL-memory ratio, FreqTier outperforms prior works by 1.04-2.04×\times (1.39×\times on average). Even on the low bandwidth CXL device, FreqTier outperforms AutoNUMA by 1.14×\times on average
    corecore