3 research outputs found

    MPIWiz: subgroup reproducible replay of MPI applications

    Get PDF
    ABSTRACT Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concurrency on distributed computers. Debugging parallel MPI applications, however, has always been a particularly challenging task due to their high degree of concurrent execution and non-deterministic behavior. Deterministic replay is a potentially powerful technique for addressing these challenges, with existing MPI replay tools adopting either data-replay or orderreplay approaches. Unfortunately, each approach has its tradeoffs. Data-replay generates substantial log sizes by recording every communication message. Order-replay generates small logs, but requires all processes to be replayed together. We believe that these drawbacks are the primary reasons that inhibit the wide adoption of deterministic replay as the critical enabler of cyclic debugging of MPI applications. This paper describes subgroup reproducible replay (SRR), a hybrid deterministic replay method that provides the benefits of both data-replay and order-replay while balancing their trade-offs. SRR divides all processes into disjoint groups. It records the contents of messages crossing group boundaries as in data-replay, but records just message orderings for communication within a group as in order-replay. In this way, SRR can exploit the communication locality of traffic patterns in MPI applications. During replay, developers can then replay each group individually. SRR reduces recording overhead by not recording intra-group communication, and at the same time reduces replay overhead by limiting the size of each replay group. Exposing these tradeoffs gives the user the necessary control for making deterministic replay practical for MPI applications. We have implemented a prototype, MPIWiz, to demonstrate and evaluate SRR. MPIWiz employs a replay framework that allows transparent binary instrumentation of both library and system calls. As a result, MPIWiz replays MPI applications with no source code modification and relinking, and handles non-determinism in both MPI and OS system calls. Our preliminary results show that MPIWiz can reduce recording overhead by over a factor of four relative to data-replay, yet without requiring the entire application to be replayed as in order-replay. Recording increases execution time by 27% while the application can be replayed in just 53% of its base execution time

    Doctor of Philosophy

    Get PDF
    dissertationAlmost all high performance computing applications are written in MPI, which will continue to be the case for at least the next several years. Given the huge and growing importance of MPI, and the size and sophistication of MPI codes, scalable and incisive MPI debugging tools are essential. Existing MPI debugging tools have, despite their strengths, many glaring de ficiencies, especially when it comes to debugging under the presence of nondeterminism related bugs, which are bugs that do not always show up during testing. These bugs usually become manifest when the systems are ported to di fferent platforms for production runs. This dissertation focuses on the problem of developing scalable dynamic verifi cation tools for MPI programs that can provide a coverage guarantee over the space of MPI nondeterminism. That is, the tools should be able to detect diff erent outcomes of nondeterministic events in an MPI program and enforce all those di fferent outcomes through repeated executions of the program with the same test harness. We propose to achieve the coverage guarantee by introducing efficient distributed causality tracking protocols that are based on the matches-before order. The matches-before order is introduced to address the shortcomings of the Lamport happens-before order [40], which is not sufficient to capture causality for MPI program executions due to the complexity of the MPI semantics. The two protocols we propose are the Lazy Lamport Clocks Protocol (LLCP) and the Lazy Vector Clocks Protocol (LVCP). LLCP provides good scalability with a small possibility of missing potential outcomes of nondeterministic events while LVCP provides full coverage guarantee with a scalability tradeoff . In practice, we show through our experiments that LLCP provides the same coverage as LVCP. This thesis makes the following contributions: •The MPI matches-before order that captures the causality between MPI events in an MPI execution. • Two distributed causality tracking protocols for MPI programs that rely on the matches-before order. • A Distributed Analyzer for MPI programs (DAMPI), which implements the two aforementioned protocols to provide scalable and modular dynamic verifi cation for MPI programs. • Scalability enhancement through algorithmic improvements for ISP, a dynamic verifi er for MPI programs
    corecore