720 research outputs found

    Preparing HPC Applications for the Exascale Era: A Decoupling Strategy

    Full text link
    Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propose a decoupling strategy to improve the scalability of applications running on large-scale systems. Our strategy separates application operations onto groups of processes and enables a dataflow processing paradigm among the groups. This mechanism is effective in reducing the impact of load imbalance and increases the parallel efficiency by pipelining multiple operations. We provide a proof-of-concept implementation using MPI, the de-facto programming system on current supercomputers. We demonstrate the effectiveness of this strategy by decoupling the reduce, particle communication, halo exchange and I/O operations in a set of scientific and data-analytics applications. A performance evaluation on 8,192 processes of a Cray XC40 supercomputer shows that the proposed approach can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017

    EXA2PRO programming environment:Architecture and applications

    Get PDF
    The EXA2PRO programming environment will integrate a set of tools and methodologies that will allow to systematically address many exascale computing challenges, including performance, performance portability, programmability, abstraction and reusability, fault tolerance and technical debt. The EXA2PRO tool-chain will enable the efficient deployment of applications in exascale computing systems, by integrating high-level software abstractions that offer performance portability and efficient exploitation of exascale systems' heterogeneity, tools for efficient memory management, optimizations based on trade-offs between various metrics and fault-tolerance support. Hence, by addressing various aspects of productivity challenges, EXA2PRO is expected to have significant impact in the transition to exascale computing, as well as impact from the perspective of applications. The evaluation will be based on 4 applications from 4 different domains that will be deployed in JUELICH supercomputing center. The EXA2PRO will generate exploitable results in the form of a tool-chain that support diverse exascale heterogeneous supercomputing centers and concrete improvements in various exascale computing challenges

    Advanced Simulation and Computing FY12-13 Implementation Plan, Volume 2, Revision 0.5

    Full text link

    Parallel programming systems for scalable scientific computing

    Get PDF
    High-performance computing (HPC) systems are more powerful than ever before. However, this rise in performance brings with it greater complexity, presenting significant challenges for researchers who wish to use these systems for their scientific work. This dissertation explores the development of scalable programming solutions for scientific computing. These solutions aim to be effective across a diverse range of computing platforms, from personal desktops to advanced supercomputers.To better understand HPC systems, this dissertation begins with a literature review on exascale supercomputers, massive systems capable of performing 10¹⁸ floating-point operations per second. This review combines both manual and data-driven analyses, revealing that while traditional challenges of exascale computing have largely been addressed, issues like software complexity and data volume remain. Additionally, the dissertation introduces the open-source software tool (called LitStudy) developed for this research.Next, this dissertation introduces two novel programming systems. The first system (called Rocket) is designed to scale all-versus-all algorithms to massive datasets. It features a multi-level software-based cache, a divide-and-conquer approach, hierarchical work-stealing, and asynchronous processing to maximize data reuse, exploit data locality, dynamically balance workloads, and optimize resource utilization. The second system (called Lightning) aims to scale existing single-GPU kernel functions across multiple GPUs, even on different nodes, with minimal code adjustments. Results across eight benchmarks on up to 32 GPUs show excellent scalability.The dissertation concludes by proposing a set of design principles for developing parallel programming systems for scalable scientific computing. These principles, based on lessons from this PhD research, represent significant steps forward in enabling researchers to efficiently utilize HPC systems
    corecore