Mapping parallel loops on multicore systems

Plata, Oscar; Romero, Felipe; Tabik, Siham; Utrera Iglesias, Gladys Miriam

unknown

Mapping parallel loops on multicore systems

Authors: Oscar Plata
Felipe Romero
Siham Tabik
Gladys Miriam Utrera Iglesias
Publication date: 1 January 2010
Publisher

Abstract

The compute nodes in contemporary HPC systems contain one or more multicore processors. As a result, these nodes constitute a shared-memory multiprocessor, often combining CMP and SMT concurrency technologies. This configuration introduces different levels of sharing in the cache hierarchy, resulting in non-uniform data sharing overheads. In this paper we analyze the data-sharing patterns that exhibit a real multithreaded application when executing on a multicore system, with emphasis in the use of the shared last level cache (LLC) for the concurrent threads. As a consequence of this study, we explore the loop mapping problem in such systems with the aim of optimizing the shared use of the the LLC by all parallel threads. We propose a three-phase loop mapping strategy that deals with workload imbalances, minimizes cache sharing interferences, and maximizes intra-core and inter-core data reuse in the cache hierarchy. Preliminary results show some benefits of our approach. However, this is a work in progress and much more research is being done.Postprint (author’s final draft

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons

oai:upcommons.upc.edu:2117/161...

Last time updated on 17/04/2020

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/161...

Last time updated on 16/06/2016