Search CORE

6 research outputs found

Automating the application data placement in hybrid memory systems

Author: Hoppe Hans-Christian
Labarta Mancho Jesús José
Llort German
Mercadal Estanislao
Peña Antonio J.
Servat Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Multi-tiered memory systems, such as those based on Intel® Xeon Phi™processors, are equipped with several memory tiers with different characteristics including, among others, capacity, access latency, bandwidth, energy consumption, and volatility. The proper distribution of the application data objects into the available memory layers is key to shorten the time– to–solution, but the way developers and end-users determine the most appropriate memory tier to place the application data objects has not been properly addressed to date.In this paper we present a novel methodology to build an extensible framework to automatically identify and place the application’s most relevant memory objects into the Intel Xeon Phi fast on-package memory. Our proposal works on top of inproduction binaries by first exploring the application behavior and then substituting the dynamic memory allocations. This makes this proposal valuable even for end-users who do not have the possibility of modifying the application source code. We demonstrate the value of a framework based in our methodology for several relevant HPC applications using different allocation strategies to help end-users improve performance with minimal intervention. The results of our evaluation reveal that our proposal is able to identify the key objects to be promoted into fast on-package memory in order to optimize performance, leading to even surpassing hardware-based solutions.This work has been performed in the Intel-BSC Exascale Lab. Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. We would like to thank the Intel’s DCG HEAT team for allowing us to access their computational resources. We also want to acknowledge this team, especially Larry Meadows and Jason Sewall, as well as Pardo Keppel for the productive discussions. We thank Raphaël Léger for allowing us to access the MAXW-DGTD application and its input.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Exposer les caractéristiques des architectures à mémoires hétérogènes aux applications parallèles

Author: Rubio Proaño Andrès
Publication venue: HAL CCSD
Publication date: 29/06/2020
Field of study

National audienceLa complexitat dels sistemes de memòria ha augmentat considerablement durant l’última dècada. En conseqüència, els supercomputadors inclouen memòries a diversos nivells, heterogènies i no uniformes, amb propietats significativament diferents. Els desenvolupadors d'aplicacions científiques s'enfronten a un gran repte: aprofitar el sistema de memòria de manera eficient per millorar el rendiment i la productivitat. En aquest treball, presentem una interfície per gestionar la complexitat del sistema de memòria, composta per un conjunt d’atributs de memòria i una API per expressar i gestionar aquestes diverses característiques mitjançant mètriques, per exemple, amplada de banda, latència i capacitat. Aquesta permet que els sistemes d’execució, biblioteques paral·leles i aplicacions científiques puguin seleccionar la memòria adequada expressant les seves necessitats per a cada assignació sense haver de modificar el codi de cada plataforma.The complexity of memory systems has increased considerably over the past decade. Consequently, the supercomputers include memories at several levels, heterogeneous and non-uniform, with significantly different properties. Developers of scientific applications face a huge challenge: to harness the memory system efficiently to improve performance and productivity. In this work, we present an interface to manage the complexity of the memory system, composed of a set of memory attributes and an API to express and manage these various characteristics using metrics, for example bandwidth, latency and capacity. It allows runtime systems, parallel libraries and scientific applications to select the appropriate memory by expressing their needs for each allocation without having to modify the code for each platform.La complejidad de los sistemas de memoria ha aumentado considerablemente en la última década. En consecuencia, las supercomputadoras incluyen memorias en varios niveles, heterogéneos y no uniformes, con propiedades significativamente diferentes. Los desarrolladores de aplicaciones científicas enfrentan un gran desafío: aprovechar el sistema de memoria de manera eficiente para mejorar el rendimiento y la productividad. En este trabajo, presentamos una interfaz para administrar la complejidad del sistema de memoria, compuesta por un conjunto de atributos de memoria y una API para expresar y administrar estas diversas características utilizando métricas, por ejemplo ancho de banda, latencia y capacidad. Esta permite que los sistemas en tiempo de ejecución, las bibliotecas paralelas y las aplicaciones científicas seleccionen la memoria adecuada al expresar sus necesidades para cada asignación sin tener que modificar el código para cada plataforma.La complexité des systèmes de mémoire a considérablement augmenté au cours de la dernière décennie. En conséquence, les supercalculateurs incluent des mémoires à plusieurs niveaux, hétérogènes et non uniformes, avec propriétés significativement différentes. Les développeurs d'applications scientifiques sont confrontés à un énorme défi : exploiter efficacement le système de mémoire pour améliorer les performances et la productivité. Dans ce travail, nous présentons une interface pour gérer la complexité du système de mé-moire, composée d'un ensemble d'attributs des mémoires et d'une API pour exprimer et gérer ces diverses caractéristiques à l'aide de métriques, par exemple la bande passante, la latence et la capacité. Elle permet aux supports exécutifs, aux bibliothèques parallèles et aux applications scientifiques de sélectionner la mémoire appropriée en exprimant leurs besoins pour chaque allocation sans avoir à modifier le code pour chaque plate-forme

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

ecoHMEM: Improving object placement methodology for hybrid memory systems in HPC

Author: Ayguadé Parra Eduard
Jordà Peroliu Marc
Labarta Mancho Jesús José
Peña Monferrer Antonio José
Rai Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Recent byte-addressable persistent memory (PMEM) technology offers capacities comparable to storage devices and access times much closer to DRAMs than other non-volatile memory technology. To palliate the large gap with DRAM performance, DRAM and PMEM are usually combined. Users have the choice to either manage the placement to different memory spaces by software or leverage the DRAM as a cache for the virtual address space of the PMEM. We present novel methodology for automatic object-level placement, including efficient runtime object matching and bandwidth-aware placement. Our experiments leveraging Intel® Optane™ Persistent Memory show from matching to greatly improved performance with respect to state-of-the-art software and hardware solutions, attaining over 2x runtime improvement in miniapplications and over 6% in OpenFOAM, a complex production application.This paper received funding from the Intel-BSC Exascale Laboratory SoW 5.1, the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 749516, the EPEEC project from the European Union’s Horizon 2020 research and innovation program under grant agreement No 801051, the DEEP-SEA project from the European Commission’s EuroHPC program under grant agreement 955606, and the Ministerio de Ciencia e Innovacion—Agencia Estatal de Investigación (PID2019-107255GB-C21/AEI/10.13039/501100011033).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Runtime-guided management of stacked DRAM memories in task parallel programs

Author: Balart Jairo
Bauer Michael
Casas Marc
Castillo Emilio
Chandrasekar Kavitha
Grossman Max
Jiang Xiaowei
Khaldi Dounia
Manivannan Madhavan
Mattson Timothy G.
Meswani Mitesh R.
Ramos Sabela
Servat Harald
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited memory capacity is insufficient for modern HPC systems. For this reason, both stacked DRAM and off-chip memories are expected to co-exist in HPC architectures, giving raise to different approaches for architecting the stacked DRAM in the system. This paper proposes a runtime approach to transparently manage stacked DRAM memories in task-based programming models. In this approach the runtime system is in charge of copying the data accessed by the tasks to the stacked DRAM, without any complex hardware support nor modifications to the application code. To mitigate the cost of copying data between the stacked DRAM and the off-chip memory, the proposal includes an optimization to parallelize the copies across idle or additional helper threads. In addition, the runtime system is aware of the reuse pattern of the data accessed by the tasks, and can exploit this information to avoid unworthy copies of data to the stacked DRAM. Results on the Intel Knights Landing processor show that the proposed techniques achieve an average speedup of 14% against the state-of-the-art library to manage the stacked DRAM and 29% against a stacked DRAM architected as a hardware cache.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the European Union’s Horizon 2020 research and innovation programme (grant agreement 779877). M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Analysis of data placement techniques in heterogeneous memories

Author: Βούλγαρης Δημήτριος Α.
Publication venue
Publication date: 01/01/2019
Field of study

University of Thessaly Institutional Repository

Automating the application data placement in hybrid memory systems

Author: Hoppe Hans-Christian
Labarta Mancho Jesús José
Llort German
Mercadal Estanislao
Peña Antonio J.
Servat Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

RECERCAT