Automation of Determination of Optimal Intra-Compute Node Parallelism

Brown, James C.; Gómez-Iglesias, Antonio

research

Automation of Determination of Optimal Intra-Compute Node Parallelism

Authors: James C. Brown
Antonio Gómez-Iglesias
Publication date: 1 January 2016
Publisher
Doi

Abstract

Maximizing the productivity of modern multicore and manycore chips requires optimizing parallelism at the compute node level. This is, however, a complex multi-step process. It is an iterative method requiring determining optimal degrees of parallel scalability and optimizing memory access behavior. Further, there are multiple cases to be considered, programs which use only MPI or OpenMP and hybrid (MPI +OpenMP) programs. This paper presents a set of three coordinated workﬂows for determining the optimal parallelism at the program level for MPI programs and at the loop level for hybrid (MPI+OpenMP) cases. The paper also details mostly automated implementations of these workﬂows using the PerfExpert infrastructure. Finally the paper presents case studies demonstrating both the applicability and the effectiveness of optimizing parallelism at the compute node level. The results shown in the paper will provide valuable information to further advance in the full automation of the workﬂows. The software implementing the parallelism scalability optimization is open source and available for download.Texas Advanced Computing Center (TACC)Computer Science

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Texas ScholarWorks

oai:repositories.lib.utexas.ed...

Last time updated on 09/07/2019