A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

Abstract

International audienceNowadays, performance optimization involves careful data and task placement to deal with parallel application needs with respect to the underlying hardware topology. Monitoring the application behavior provides useful information that still needs to be matched with the actual placement, for instance to understand whether bottlenecks are caused by the sequential code itself or by shared resources in parallel programs. We propose an insightful monitoring tool based on two cornerstones of hardware performance counters monitoring and hardware locality mod-eling, respectively named PAPI and hwloc. It enables a dynamic visual analysis of parallel applications' phases at runtime, revealing their possibly variable and heterogeneous behaviors and needs. A purpose designed application shows that the topology-aware visual representation of hardware counters can help guring out shared resource bottlenecks and ease the task placement decision process in runtime systems. 1 Introduction The memory wall makes data locality increasingly important on the road to exascale. Data and computing tasks have to be colocated to better exploit the performance of parallel platforms. Many research projects focus on locality-aware data and/or task placement, for parallel programing models ranging from MPI and OpenMP to graphs of tasks. However nding out which placement is the best remains a dicult exercise that depends on the topology and characteristics of the hardware and on the application needs. Indeed, the hardware is increasingly complex, and software anities can be of dierent kinds. For instance memory-bound tasks may prefer being scattered all across the machine, while, on the contrary, communication and synchronization may want to keep them close. Runtime systems require help identifying these needs and bottlenecks before they can place tasks accordingly. Performance monitoring is a very active software area that oers many tools to gather information about the execution of tasks, the bottlenecks, etc. We introduce , in this paper, a new way to analyze performance by crossing the roads of performance monitoring and topology-aware placement. We propose an extension of the Hardware Locality software (hwloc [2]) that enhances its graphica

    Similar works