4,727 research outputs found

    CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

    Get PDF
    In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but it takes a lot of implementation effort. This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes. First, it provides an extendable library that significantly eases the implementation of application-level checkpointing. The most basic and frequently used checkpoint data types are already part of CRAFT and can be directly used out of the box. The library can be easily extended to add more data types. As means of overhead reduction, the library offers a build-in asynchronous checkpointing mechanism and also supports the Scalable Checkpoint/Restart (SCR) library for node level checkpointing. Second, CRAFT provides an easier interface for User-Level Failure Mitigation (ULFM) based dynamic process recovery, which significantly reduces the complexity and effort of failure detection and communication recovery mechanism. By utilizing both functionalities together, applications can write application-level checkpoints and recover dynamically from process failures with very limited programming effort. This work presents the design and use of our library in detail. The associated overheads are thoroughly analyzed using several benchmarks

    Solving Mixed--integer Control Problems by Sum Up Rounding With Guaranteed Integer Gap

    Get PDF
    Probleme der Optimalen Steuerung, die zeitabhaengige diskrete Entscheidungen beinhalten, haben in letzter Zeit zunehmend Beachtung gefunden, da sie in praktischen Anwendungen mit hohem Potential fuer Optimierung auftreten. Typische Beispiele sind die Wahl von Gaengen in Transport-Problemen oder Prozesse, in denen Ventile verwendet werden. Wir praesentieren Rundungsstrategien fuer direkte Methoden der optimalen Steuerung, die zu einer Approximation der Zielfunktion und Nebenbedingungen fuehren, deren Guete durch die Feinheit des Kontrolldiskretisierungsgitters abgeschaetzt werden kann. Erstmals wird gezeigt, dass eine endliche Anzahl von Umschaltungen sowohl im linearen wie im nichtlinearen Fall ausreicht, und dies bei Existenz von Pfad- und Kontrollbeschraenkungen. Ein numerisches Beispiel wird angegeben um die Methodik zu illustrieren

    Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

    Get PDF
    Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 95% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.Comment: 10 pages, 5 figures. Added reference to other recent sparse matrix format

    The Network Origins of Economic Growth

    Full text link
    In this paper, we propose a new approach to represent a country's outward orientation. Prior work mostly uses indicators of aggregate trade intensity, trade policy or trade restrictiveness. Our approach offers a broader perspective as it measures a country's level of integration not only by its set of direct trade connections with the rest of the world but also through the full architecture of its second, third, and all other higher-order connections. We apply our methodology to a sample of 167 countries spanning the period from 1962 to 2009 and perform a Bayesian modelaveraging analysis on the determinants of growth. We find a prominent positive effect of integration on a country's level of per capita income, while the aforementioned traditional measures of outward orientation display only a secondary, largely insignificant, weight. This, we argue, highlights the network basis of economic growth and adds a novel perspective to the notion of economic openness. We also perform several sensitivity checks and conclude that our baseline findings are extremely robust to different data input and alternative assumptions about the computation of country integration

    GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

    Get PDF
    While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure
    corecore