1 research outputs found

    Compiler Optimizations for Non-contiguous Remote Data Movement

    No full text
    Abstract. Remote Memory Access (RMA) programming is one of the core concepts behind modern parallel programming languages such as UPC and Fortran 2008 or high-performance libraries such as MPI-3 One Sided or SHMEM. Many applications have to communicate non-contiguous data due to their data layout in main memory. Previous stud-ies showed that such non-contiguous transfers can reduce communication performance by up to an order of magnitude. In this work, we demon-strate a simple scheme for statically optimizing non-contiguous RMA transfers by combining partial packing, communication overlap, and re-mote access pipelining. We determine accurate performance models for the various operations to find near-optimal pipeline parameters. The pro-posed approach is applicable to all RMA languages and does not depend on the availability of special hardware features such as scatter-gather lists or strided copies. We show that our proposed superpipelining leads to significant improvements compared to either full packing or sending each contiguous segment individually. We outline how our approach can be used to optimize non-contiguous data transfers in PGAS programs automatically. We observed a 37 % performance gain over the fastest of either packing or individual sending for a realistic application.
    corecore