Search CORE

60,214 research outputs found

Adjacent LSTM-Based Page Scheduling for Hybrid DRAM/NVM Memory Systems

Author: Katsaragakis Manolis
Masouros Dimosthenis
Papadopoulos Lazaros
Soudris Dimitrios
Stavrakakis Konstantinos
Publication venue: OASIcs - OpenAccess Series in Informatics. 14th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 12th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2023)
Publication date: 01/01/2023
Field of study

Recent advances in memory technologies have led to the rapid growth of hybrid systems that combine traditional DRAM and Non Volatile Memory (NVM) technologies, as the latter provide lower cost per byte, low leakage power and larger capacities than DRAM, while they can guarantee comparable access latency. Such kind of heterogeneous memory systems impose new challenges in terms of page placement and migration among the alternative technologies of the heterogeneous memory system. In this paper, we present a novel approach for efficient page placement on heterogeneous DRAM/NVM systems. We design an adjacent LSTM-based approach for page placement, which strongly relies on page accesses prediction, while sharing knowledge among pages with behavioral similarity. The proposed approach leads up to 65.5% optimized performance compared to existing approaches, while achieving near-optimal results and saving 20.2% energy consumption on average. Moreover, we propose a new page replacement policy, namely clustered-LRU, achieving up to 8.1% optimized performance, compared to the default Least Recently Used (LRU) policy

Dagstuhl Research Online Publication Server

Data placement in HPC architectures with heterogeneous off-chip memory

Author: Pavlovic Milan
Puzovic Nikola
Ramírez Bellido Alejandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

The performance of HPC applications is often bounded by the underlying memory system's performance. The trend of increasing the number of cores on a chip imposes even higher memory bandwidth and capacity requirements. The limitations of traditional memory technologies are pushing research in the direction of hybrid memory systems that, besides DRAM, include one or more modules based on some of the higher-density non-volatile memory technologies, where one of them will provide the required bandwidth, while the other will provide the required capacity for the application. This creates many challenges with data placement and migration policies between the modules of such hybrid memory system. In this paper, we propose an architecture with a hybrid memory design that places two technologically different memory modules in a flat address space. On such system, we evaluate several HPC workloads against different data placement and migration policies, compare their performance by means of execution time and the number of non-volatile memory writes, and consider how it can be applied to the future HPC architectures. Our results show that the hybrid memory system with dynamic page migration and limited DRAM capacity, can achieve performance that is comparable to a hypothetical, hard to implement, DRAM-only system.Postprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Low-Overhead Migration of Read-Only and Read-Mostly Data for Adapting Applications to Hybrid Memory Systems

Author: Teague Joseph Townley
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 15/12/2018
Field of study

Memory systems containing different types of memory with varying capacity, latency, and bandwidth are rapidly becoming mainstream. Conventional memory management techniques do not suffice for these systems; they require alternative strategies to appropriately and effectively adapt application memory placement to these heterogeneous memory tiers. Software-based placement and movement strategies are the most desirable due to their flexibility and ease of adoption by end-users. However, there are substantial sources of overhead present when synchronizing low-level data movement with the operating system and running applications.This thesis proposes a novel method of reducing these memory movement overheads on hybrid memory systems. Many data objects are only written to early in their life cycle (i.e. shortly after allocation) and are effectively read-only after these initial writes. If this read-only and read-mostly data is duplicated across memory tiers, as opposed to moved, the application, in many cases, is able to avoid certain types of transfer overhead, such as page table entry (PTE) and MMU cache (TLB) synchronization stalls.This work describes the design and implementation of a kernel module, mtier that implements this optimization on memory that has been explicitly marked as read-only. Our evaluation demonstrates that this approach has the potential to substantially reduce data movement overheads, especially in applications that are multi-threaded and require frequent movement of data, allowing a flexible, software based approach for memory management in hybrid systems

University of Tennessee, Knoxville: Trace

A Page Scheduler using Machine Learning for Hybrid Memory Systems

Author: Kim Minje
Publication venue: Ulsan National Institute of Science and Technology
Publication date: 01/02/2023
Field of study

Department of Computer Science and EngineeringAs the demand for machine learning and big data workloads grows, the memory becomes important in application performance. The main memory is extended by using hybrid memory systems that include different types of memory components. Data placement across multiple memory components has a significant impact on application performance. We propose SMA, an RNN-based page scheduler to learn page access patterns and ensure that pages to be accessed by applications in the future are prepared in a fast memory in advance. This paper utilizes the existing observation that there is a set of pages that are important to application performance. A single RNN model manages all important pages for efficient page management, and the remaining pages are managed using a history-based method. This work reduces training time and memory usage compared to the existing state-of-the-art machine learning based page scheduler while providing higher accuracy. It also shows that a single RNN model can learn general page accesses patterns by achieving similar accuracy with the existing page scheduler for applications that were not included in the training dataset of the model.ope

ScholarWorks@UNIST

Energy Saving Techniques for Phase Change Memory (PCM)

Author: Mittal Sparsh
Publication venue
Publication date: 15/09/2013
Field of study

In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

arXiv.org e-Print Archive

CiteSeerX

Near-Memory Address Translation

Author: Falsafi Babak
Jevdjic Djordje
Picorel Javier
Publication venue
Publication date: 21/08/2017
Field of study

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref