Search CORE

46 research outputs found

CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

Author: Hager Georg
Kreutzer Moritz
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Zeiser Thomas
Publication venue
Publication date: 07/08/2017
Field of study

In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but it takes a lot of implementation effort. This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes. First, it provides an extendable library that significantly eases the implementation of application-level checkpointing. The most basic and frequently used checkpoint data types are already part of CRAFT and can be directly used out of the box. The library can be easily extended to add more data types. As means of overhead reduction, the library offers a build-in asynchronous checkpointing mechanism and also supports the Scalable Checkpoint/Restart (SCR) library for node level checkpointing. Second, CRAFT provides an easier interface for User-Level Failure Mitigation (ULFM) based dynamic process recovery, which significantly reduces the complexity and effort of failure detection and communication recovery mechanism. By utilizing both functionalities together, applications can write application-level checkpoints and recover dynamically from process failures with very limited programming effort. This work presents the design and use of our library in detail. The associated overheads are thoroughly analyzed using several benchmarks

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Author: Basermann Achim
Bishop Alan R.
Fehske Holger
Hager Georg
Kreutzer Moritz
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 95% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.Comment: 10 pages, 5 figures. Added reference to other recent sparse matrix format

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

Author: Basermann Achim
Fehske Holger
Galgon Martin
Hager Georg
Kreutzer Moritz
Pieper Andreas
Röhrig-Zöllner Melven
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Preconditioned Krylov solvers on GPUs

Author: Anzt Hartwig
Dongarra Jack
Gates Mark
Kreutzer Moritz
Köhler Martin
Wellein Gerhard
Publication venue: North-Holland Publishing
Publication date: 29/01/2018
Field of study

KITopen

Performance Engineering for Sparse Eigensolers on Heterogenous Clusters

Author: Basermann Achim
Hager Georg
Kreutzer Moritz
Röhrig-Zöllner Melven
Thies Jonas
Wellein Gerhard
Publication venue
Publication date: 04/04/2016
Field of study

Institute of Transport Research:Publications

ESSEX: Equipping Sparse Solvers for Exascale

Author: Alvermann Andreas
Basermann Achim
Fehske Holger
Galgon Martin
Hager Georg
Kreutzer Moritz
Krämer Lukas
Lang Bruno
Pieper Andreas
Röhrig-Zöllner Melven
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

The ESSEX project investigates computational issues arising at exascale for large-scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method and basic kernel library. Within ESSEX the numerical methods cover both widely applicable solvers such as classic Krylov, Jacobi-Davidson or recent FEAST methods as well as domain specific iterative schemes relevant for the ESSEX quantum physics application. This report introduces the project structure and presents selected results which demonstrate the potential impact of ESSEX for efficient sparse solvers on highly scalable heterogeneous supercomputers

Institute of Transport Research:Publications

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Author: A Pieper
A Pieper
A Stathopoulos
Achim Basermann
Andreas Pieper
C Augonnet
C Chevalier
CG Baker
DP O’Leary
E Chow
E Polizzi
Faisal Shahzad
G Hager
G Schofield
G Schubert
Georg Hager
Gerhard Wellein
GW Stewart
Holger Fehske
Jonas Thies
LS Blackford
M Galgon
M Kreutzer
M Röhrig-Zöllner
Martin Galgon
Melven Röhrig-Zöllner
Moritz Kreutzer
P Ghysels
R Lehoucq
S Kaczmarz
S Tabik
S Williams
TC Oppe
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref