10 research outputs found

    netloc: Towards a Comprehensive View of the HPC System Topology

    Get PDF
    International audienceThe increasing complexity of High Performance Computing (HPC) server architectures and networks has made topology- and affinity-awareness a critical component of HPC application optimization. Although there is a portable mechanism for accessing the server-internal topology there is no such mechanism for accessing the network topology of modern HPC systems in an equally portable manner. The Network Locality (netloc) project provides mechanisms for portably discovering and abstractly representing the network topology of modern HPC systems. Additionally, netloc provides the ability to merge the network topology with the server-internal topologies resulting in a comprehensive map of the HPC system topology. Using a modular infrastructure, netloc provides support for a variety of network types and discovery techniques. By representing the network topology as a graph, netloc supports any network topology configuration. The netloc architecture hides the topology discovery mechanism from the application developer thus allowing them to focus on traversing and analyzing the resulting map of the HPC system topology

    Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems

    Get PDF
    Scientists use advanced computing techniques to assist in answering the complex questions at the forefront of discovery. The High Performance Computing (HPC) scientific applications created by these scientists are running longer and scaling to larger systems. These applications must be able to tolerate the inevitable failure of a subset of processes (process failures) that occur as a result of pushing the reliability boundaries of HPC systems. HPC system reliability is emerging as a problem in future exascale systems where the time to failure is measured in minutes or hours instead of days or months. Resilient applications (i.e., applications that can continue to run despite process failures) depend on resilient communication and runtime environments to sustain the application across process failures. Unfortunately, these environments are uncommon and not typically present on HPC systems. In order to preserve performance, scalability, and scientific accuracy, a resilient application may choose the invasiveness of the recovery solution, from completely transparent to completely application-directed. Therefore, resilient communication and runtime environments must provide customizable fault recovery mechanisms. Resilient applications often use rollback recovery techniques for fault tolerance: particularly popular are checkpoint/restart (C/R) techniques. HPC applications commonly use the Message Passing Interface (MPI) standard for communication. This thesis identifies a complete set of capabilities that compose to form a coordinated C/R infrastructure for MPI applications running on HPC systems. These capabilities, when integrated into an MPI implementation, provide applications with transparent, yet optionally application configurable, fault tolerance. By adding these capabilities to Open MPI we demonstrate support for C/R process fault tolerance, automatic recovery, proactive process migration, and parallel debugging. We also discuss how this infrastructure is being used to support further research into fault tolerance

    MPI over Scripting Languages: Usability and Performance Tradeoffs

    No full text
    Abstract. We present a comparative study of two popular implementations that make the mpi available on matlab—matlabmpi and mpi-tb. We evaluate their performance through micro-benchmarks on a highperformance Linux cluster and compare those to their corresponding implementations on Octave 1 as well as to the lam-mpi library accessed through a C api. We have discovered that there are significant performance advantages to using an implementation of the MPI that utilizes highly tuned libraries built for high-speed interconnects, such as the Myrinet. However, a price must be paid in terms of higher installation and setup times and a more complicated api. We conclude that even though there are advantages to using the mpi within a high-level scripting language, such as matlab or Octave, there are important philosophical differences between the programming models of scripting languages and a relatively low-level communication library interface, such as the mpi. This points to the need for a more sophisticated long-term support for parallel programming from the language compiler and runtime system.

    A checkpoint and restart service specification for open mpi

    No full text
    Abstract. HPC systems are growing in both complexity and size, increasing the opportunity for system failures. Checkpoint and restart techniques are one of many fault tolerance techniques developed for such adverse runtime conditions. Because of the variety of available approaches for checkpoint and restart, HPC system libraries, such as MPI, seeking to incorporate these techniques would benefit greatly from a portable, extensible checkpoint and restart framework. This paper presents a specification for such a framework in Open MPI that allows for the integration of a variety of checkpoint/restart systems and protocols. The modular design of the framework allows researchers to contribute to specialized areas without requiring knowledge of the entirety of the code base.

    An Extensible Framework for Distributed Testing of MPI Implementations

    No full text
    Abstract. Complex code bases require continual testing to ensure that both new development and routine maintenance do not create unintended side effects. Automation of regression testing is a common mechanism to ensure consistency, accuracy, and repeatability of results. The MPI Testing Tool (MTT) is a flexible framework specifically designed for testing MPI implementations across multiple organizations and environments. The MTT offers a unique combination of features not available in any individual testing framework, including a built-in multiplicative effect for creating and running tests, historical correctness and performance analysis, and support for multiple cluster resource managers.

    The design and implementation of checkpoint/restart process fault tolerance for Open MPI

    No full text
    To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementations that incorporated fault tolerance capabilities have been limited by lack of modularity, scalability and usability. This paper presents the design and implementation of an infrastructure to support checkpoint/restart fault tolerance in the Open MPI project. We identify the general capabilities required for distributed checkpoint/restart and realize these capabilities as extensible frameworks within Open MPI’s modular component architecture. Our design features an abstract interface for providing and accessing fault tolerance services without sacrificing performance, robustness, or flexibility. Although our implementation includes support for some initial checkpoint/restart mechanisms, the framework is meant to be extensible and to encourage experimentation of alternative techniques within a production quality MPI implementation. 1

    A paradigm for parallel matrix algorithms: Scalable Cholesky

    No full text
    Abstract. A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The underlying philosophy is to use block recursion as the exclusive control structure, down to a 2 p × 2 p base case anyway, where hardware favors iterative style to fill its pipe. Use of Morton-ordered matrices yields excellent locality within the memory hierarchy—including block sharing among distributed computers. The recursion generalizes nicely to an SPMD program where such sharing is the only communication. Cholesky factorization of an n × n SPD matrix is used as a simple nontrivial example to expose the paradigm. The program amounts to four functions, two of which are finalizers for the other two. This insight allows final blocks to be shared with inter-node communication ∈ Θ(n 2) for this algorithm ∈ Θ(n 3) flops
    corecore