3 research outputs found

    Improving the Performance of the MPI_Allreduce Collective Operation through Rank Renaming

    Get PDF
    Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014.Collective operations, a key issue in the global efficiency of HPC applications, are optimized in current MPI libraries by choosing at runtime between a set of algorithms, based on platform-dependent beforehand established parameters, as the message size or the number of processes. However, with progressively more cores per node, the cost of a collective algorithm must be mainly imputed to process-to-processor mapping, because its decisive influence over the network traffic. Hierarchical design of collective algorithms pursuits to minimize the data movement through the slowest communication channels of the multi-core cluster. Nevertheless, the hierarchical implementation of some collectives becomes inefficient, and even impracticable, due to the operation definition itself. This paper proposes a new approach that departs from a frequently found regular mapping, either sequential or round-robin. While keeping the mapping, the rank assignation to the processes is temporarily changed prior to the execution of the collective algorithm. The new assignation makes the communication pattern to adapt to the communication channels hierarchy. We explore this technique for the Ring algorithm when used in the well-known MPI_Allreduce collective, and discuss the obtained performance results. Extensions to other algorithms and collective operations are proposed.The work presented in this paper has been partially supported by EU under the COST programme Action IC1305, ’Network for Sustainable Ultrascale Computing (NESUS)’, and by the computing facilities of Extremadura Research Centre for Advanced Technologies (CETACIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain

    A Fast and Efficient Algorithm for Topology-Aware Coallocation

    No full text
    Abstract. Modern distributed applications require coallocation of massive amounts of resources. Grid level allocation systems must efficiently decide where these applications can be executed. To this end, the resource requests are described as labeled graphs, which must be matched with equivalent labeled graphs of available resources. The coallocation problem described in the paper has real-world requirements and inputs that differ from those of a classical graph matching problem. We propose a new algorithm to solve the coallocation problem. The algorithm is especially tailored for medium to large grid systems, and is currently being integrated into the QosCosGrid system’s allocation module.
    corecore