1 research outputs found
Geometric Partitioning and Ordering Strategies for Task Mapping on Parallel Computers
We present a new method for mapping applications' MPI tasks to cores of a
parallel computer such that applications' communication time is reduced. We
address the case of sparse node allocation, where the nodes assigned to a job
are not necessarily located in a contiguous block nor within close proximity to
each other in the network, although our methods generalize to contiguous
allocations as well. The goal is to assign tasks to cores so that
interdependent tasks are performed by "nearby" cores, thus lowering the
distance messages must travel, the amount of congestion in the network, and the
overall cost of communication. Our new method applies a geometric partitioning
algorithm to both the tasks and the processors, and assigns task parts to the
corresponding processor parts. We also present a number of algorithmic
optimizations that exploit specific features of the network or application. We
show that, for the structured finite difference mini-application MiniGhost, our
mapping methods reduced communication time up to 75% relative to MiniGhost's
default mapping on 128K cores of a Cray XK7 with sparse allocation. For the
atmospheric modeling code E3SM/HOMME, our methods reduced communication time up
to 31% on 32K cores of an IBM BlueGene/Q with contiguous allocation