4 research outputs found

    Final Report for Enhancing the MPI Programming Model for PetaScale Systems

    Get PDF
    This project performed research into enhancing the MPI programming model in two ways: developing improved algorithms and implementation strategies, tested and realized in the MPICH implementation, and exploring extensions to the MPI standard to better support PetaScale and ExaScale systems

    Barrier elision for production parallel programs

    Full text link
    Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose conservative synchronization patterns leading to suboptimal performance. In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers occurring in the same calling context. We present an automatic on-the-fly optimization and a multi-pass guided optimization. We apply our techniques to NWChem - a 6 million line computational chemistry code written in C/C++/Fortran that uses several runtime libraries such as Global Arrays, ComEx, DMAPP, and MPI. Our technique elides a surprisingly high fraction of barriers (as many as 63%) in production runs. This redundancy elimination translates to application speedups as high as 14% on 2048 cores. Our techniques also provided valuable insight about the application behavior, later used by NWChem developers. Overall, we demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack

    A Formal Approach to Detect Functionally Irrelevant Barriers in MPI Programs ⋆

    No full text
    Abstract. We examine the unsolved problem of automatically and efficiently detecting functionally irrelevant barriers in MPI programs. A functionally irrelevant barrier is a set of MPI_Barrier calls, one per MPI process, such that their removal does not alter the overall MPI communication structure of the program. Static analysis methods are incapable of solving this problem, as MPI programs can compute many quantities at runtime, including send targets, receive sources, tags, and communicators, and also can have data-dependent control flows. We offer an algorithm called Fib to solve this problem based on dynamic (runtime) analysis. Fib applies to MPI programs that employ 24 widely used twosided MPI operations. We show that it is sufficient to detect barrier calls whose removal causes a wildcard receive statement placed before or after a barrier to now begin matching a send statement with which it did not match before. Fib determines whether a barrier becomes relevant in any interleaving of the MPI processes of a given MPI program. Since the number of interleavings can grow exponentially with the number of processes, Fib employs a sound method to drastically reduce this number, by computing only the relevant interleavings. We show that many MPI programs do not have data dependent control flows, thus making the results of Fib applicable to all the input data the program can accept.

    Doctor of Philosophy

    Get PDF
    dissertationMessage passing (MP) has gained a widespread adoption over the years, so much so, that even heterogeneous embedded multicore systems are running programs that are developed using message passing libraries. Such a phenomenon is a shift in computing practices, since, traditionally MP programs have been developed specifically for high performance computing. With growing importance and the complexity of MP programs in today's times, it becomes absolutely imperative to have formal tools and sound methodologies that can help reason about the correctness of the program. It has been demonstrated by many researchers in the area of concurrent program verification that a suitable strategy to verify programs which rely heavily on nondeterminism, is dynamic verification. Dynamic verification integrates the best features of testing and model checking. In the area of MP program verification, however, there have been only a handful of dynamic verifiers. These dynamic verifiers, despite their strengths, suffer from the explosion in execution scenarios. All existing dynamic verifiers, to our knowledge, exhaustively explore the nondeterministic choices in an MP program. It is apparent that an MP program with many nondeterministic constructs will quickly inundate such tools. This dissertation focuses on the problem of containing the exponential space of execution scenarios (or interleavings) while providing a soundness and completeness guarantee over safety properties of MP programs (specifically deadlocks). We present a predictive verification methodology and an associated framework, called MAAPED(Messaging Application Analysis with Predictive Error Discovery), that operates in polynomial time over MP programs to detect deadlocks among other safety property violations. In brief, we collect a single execution trace of an MP program and without re-running other execution schedules, reliably construct the artifacts necessary to predict any mishappening in an unexplored execution schedule with the aforementioned formal guarantee. The main contributions of the thesis are the following: The Functionally Irrelevant Barrier Algorithm to increase program productivity and ease in verification complexity. A sound pragmatic strategy to reduce the interleaving space of existing dynamic verifiers which is complete only for a certain class of MPI programs. A generalized matches-before ordering for MP programs. A predictive polynomial time verification framework as an alternate solution in the dynamic MP verification landscape. A soundness and completeness proof for the predictive framework's deadlock detection strategy for many formally characterized classes of MP programs. In the process of developing solutions that are mentioned above, we also collected important experiences relating to the development of dynamic verification schedulers. We present those experiences as a minor contribution of this thesis
    corecore