Search CORE

143 research outputs found

Performance Engineering for Real and Complex Tall & Skinny Matrix Multiplication Kernels on GPUs

Author: Ernst Dominik
Hager Georg
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. NVIDIA's current CUBLAS implementation delivers only a fraction of the potential performance as indicated by the roofline model in this case. We describe the challenges and key characteristics of an implementation that can achieve close to optimal performance. We further evaluate different strategies of parallelization and thread distribution, and devise a flexible, configurable mapping scheme. To ensure flexibility and allow for highly tailored implementations we use code generation combined with autotuning. For a large range of matrix sizes in the domain of interest we achieve at least 2/3 of the roofline performance and often substantially outperform state-of-the art CUBLAS results on an NVIDIA Volta GPGPU.Comment: 12 pages, 22 figures. Extended version of arXiv:1905.03136v1 for journal submissio

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

OPEN FAU Online-Publikationssystem der Friedrich-Alexander-Universität Erlangen-Nürnberg

Design of a parallel hybrid direct/iterative solver for CFD problems

Author: Thies Jonas
Wubs Fred
Publication venue: University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science
Publication date: 01/01/2011
Field of study

We discuss the parallel implementation of a hybrid direct/iterative solver for a special class of saddle point matrices arising from the discretization of the steady Navier-Stokes equations on an Arakawa C-grid, the F-matrices. The two-level method described here has the following properties: (i) it is very robust, even hat comparatively high Reynolds Numbers; (ii) a single parameter controls fill and convergence, making the method straightforward to use; (iii) the convergence rate is independent of the number of unknowns; (iv) it can be implemented on distributed memory machines in a natural way; (v) the matrix on the second level has the same structure and numerical properties as the original problem, so the method can be applied recursively. The implementation focusses on generality, modularity, code reuse and recursiveness. The solver is implemented using building blocks of the Trilinos libraries. We show its performance on a parallel computer for the Navier-Stokes equations

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen

Exascale Sparse Eigensolver Developments for Quantum Physics Applications

Author: Basermann Achim
Thies Jonas
Publication venue
Publication date: 19/09/2019
Field of study

In the German Research Foundation (DFG) project ESSEX (Equipping Sparse Solvers for Exascale), we develop scalable sparse eigensolver libraries for large quantum physics problems. Partners in ESSEX are the Universities of Erlangen, Greifswald, Wuppertal, Tokyo and Tsukuba as well as DLR. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method and basic kernel library. The basic building block library supports an elaborate MPI+X approach that is able to fully exploit hardware heterogeneity while exposing functional parallelism and data parallelism to all other software layers in a flexible way. The advanced building blocks were defined and employed by the developments at the algorithms layer. Here, ESSEX provides state-of-the-art library implementations of classic linear sparse eigenvalue solvers including block Jacobi-Davidson, Kernel Polynomial Method (KPM), and Chebyshev filter diagonalization (ChebFD) that are ready to use for production on modern heterogeneous compute nodes with best performance and numerical accuracy. Research in this direction included the development of appropriate parallel adaptive AMG software for the block Jacobi-Davidson method. Contour integral-based approaches were also covered in ESSEX and were extended in two directions: The FEAST method was further developed for improved scalability, and the Sakurai-Sugiura method (SSM) method was extended to nonlinear sparse eigenvalue problems. These developments were strongly supported by Japanese project partners from University of Tokyo, Computer Science, and University of Tsukuba, Applied Mathematics. The applications layer delivers scalable solutions for conservative (Hermitian) and dissipative (non-Hermitian) quantum systems with strong links to optics and biology and to novel materials such as graphene and topological insulators

Institute of Transport Research:Publications

Wind-assisted, electric, and pure wind propulsion - the path towards zero-emission RoRo ships

Author: Ringsberg Jonas
Thies Fabian
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2021
Field of study

Electrical and wind propulsion, together with energy stored in batteries and renewable energies harnessed onboard, can lead the way towards zero-emission ships. This study compares wind propulsion solutions and battery storage possibilities for a RoRo ship operating in the Baltic Sea. The ship energy systems simulation model ShipCLEAN is used to predict the performance of the zero-emission ship in real-life operating conditions. The study showcases how ships can be transferred from a conventional, diesel-powered to a zero-emission ship. For the zero-emission ship, all energy needed for auxiliaries and propulsion is taken from renewable sources onboard or from batteries. Challenges and opportunities, as well as necessary adaptions of the route and logistics, are discussed. Results of the study present which wind propulsion technology is the most suitable for the example RoRo ship, and how the installation of suitably sized battery packs for zero-emission operation affects the cargo capacity of the ship

Chalmers Research

Retrofitting WASP to a RoPax vessel—design, performance and uncertainties

Author: Ringsberg Jonas
Thies Fabian
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Wind-assisted propulsion (WASP) is one of the most promising ship propulsion alternatives\ua0that radically reduce greenhouse gas emissions and are available today. Using the example of a\ua0RoPax ferry, this study presents the performance potential of WASP systems under realistic weather\ua0conditions. Different design alternatives and system layouts are discussed. Further, uncertainties in\ua0the performance prediction ofWASP systems are analyzed. Included in the analysis are the sail forces\ua0as well as the aero- and hydrodynamic interaction effects, i.e., the sail–sail and sail–deck interaction as\ua0well as the drift and yaw of the ship. As a result, this study provides guidelines on the most important\ua0parameters when designing and modeling aWASP ship. Finally, the study presents an analysis of the\ua0expected accuracy of the employed empirical/analytical performance prediction model ShipCLEAN

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Chalmers Research

Design of a parallel hybrid direct/iterative solver for CFD problems

Author: Thies Jonas
Wubs Fred
Publication venue: University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

Author: Hager Georg
Kreutzer Moritz
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Zeiser Thomas
Publication venue
Publication date: 07/08/2017
Field of study

In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but it takes a lot of implementation effort. This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes. First, it provides an extendable library that significantly eases the implementation of application-level checkpointing. The most basic and frequently used checkpoint data types are already part of CRAFT and can be directly used out of the box. The library can be easily extended to add more data types. As means of overhead reduction, the library offers a build-in asynchronous checkpointing mechanism and also supports the Scalable Checkpoint/Restart (SCR) library for node level checkpointing. Second, CRAFT provides an easier interface for User-Level Failure Mitigation (ULFM) based dynamic process recovery, which significantly reduces the complexity and effort of failure detection and communication recovery mechanism. By utilizing both functionalities together, applications can write application-level checkpoints and recover dynamically from process failures with very limited programming effort. This work presents the design and use of our library in detail. The associated overheads are thoroughly analyzed using several benchmarks

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Development of a ship performance model for power estimation of inland waterway vessels

Author: Ringsberg Jonas
Thies Fabian
Zhang Chengqian
Publication venue
Publication date: 01/01/2023
Field of study

A ship performance model is an important factor in energy-efficient navigation. It formulates a speed–power relationship that can be used to adjust the engine loads for dynamic energy optimisation. However, currently available models have been developed for sea-going vessels, where the environmental conditions are significantly different from those experienced on inland waterways. Inland waterway shipping has great potential to become a mode of transport that can both improve safety and reduce emissions. Therefore, this paper presents the development of an energy performance model specifically for inland waterway vessels (IWVs). The holistic ship energy system model is based on empirical methods, from resistance to engine performance prediction, established in a modular code architecture. The resistance and propulsion prediction in confined waterways are captured by a newly developed method, considering a superposing of shallow water and bank effect. Verification against model tests and high-fidelity simulations indicate that the selected empirical methods achieved good accuracy for predicting ship performance. The resistance prediction error was 5.2% for single vessels and 8% for pusher-barge convoys based on empirical methods. The results of a case study investigating the performance of a self-propelled vessel under dynamic waterway data, indicate that the developed model could be used for onboard power monitoring and energy optimisation during operation

Chalmers Research

Software and Performance Engineering for Iterative Eigensolvers

Author: Thies Jonas
Publication venue
Publication date: 27/06/2017
Field of study

The complexity of the latest HPC architectures increasingly limits the productivity of researchers in numerical algorithms and the `time to market' for parallel algorithms. Implementing a new method on a supercomputer today involves at least three levels of parallelism and typically several programming models like MPI, OpenMP and CUDA. Frameworks like Trilinos and PETSc have since many years been useful for testing new ideas in parallel algorithms. But when it comes to e.g. CPU/GPU clusters they fail to deliver convincing performance to date. We look at sparse solvers from a software engineer's point of view and advocate a programming model we call `SPMD+OK', introducing performance models in the test-driven development process and a strategy to facilitate the integration of algorithmic developments into existing applications. Harnessing the peta-scale with the block Jacobi-Davidson method is used as a running example in this talk, and the libraries GHOST and PHIST are presented (https://bitbucket.org/essex/)

Institute of Transport Research:Publications