18 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Multicoloring of grid-structured PDE solvers on shared-memorymultiprocessors
In order to execute a parallel PDE (partial differential equation) solver on a shared-memory multiprocessor, we have to avoid memory conflicts in accessing multidimensional data grids. A new multicoloring technique is proposed for speeding sparse matrix operations. The new technique enables parallel access of grid-structured data elements in the shared memory without causing conflicts. The coloring scheme is formulated as an algebraic mapping which can be easily implemented with low overhead on commercial multiprocessors. The proposed multicoloring scheme bas been tested on an Alliant FX/80 multiprocessor for solving 2D and 3D problems using the CGNR method. Compared to the results reported by Saad (1989) on an identical Alliant system, our results show a factor of 30 times higher performance in Mflops. Multicoloring transforms sparse matrices into ones with a diagonal diagonal block (DDB) structure, enabling parallel LU decomposition in solving PDE problems. The multicoloring technique can also be extended to solve other scientific problems characterized by sparse matrices.published_or_final_versio
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Automatic Performance Optimization of Stencil Codes
A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed.
To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization.
Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow
Geometric multigrid for the gyrokinetic Poisson equation from fusion plasma applications
In order to face climate change and to preserve our ecosystem, we have to reduce the overall emission of carbon dioxide into the atmosphere.
A promising addition to renewable energies is nuclear fusion. Delivering an almost infinite amount of clean and safe energy and with almost inexhaustible resources on earth, plasma fusion would solve all the world's climate and energy problems. However, being extremely complex, the reaction cannot be maintained for sufficient long time, yet, as it is extremely unstable.
As the construction and operation of fusion reactors, e.g. tokamaks, is exceptionally expensive, numerical simulations are required in order to increase our knowledge about the fusion process.
One existing code for plasma simulations in a tokamak is called GyselaX, in which a subroblem consists in solving a two dimensional Poisson equation on many cross-sections of the reactor geometry.
The EoCoE (Energy Oriented Center of Excellence: toward exascale for energy) project, funded by the European Commission, aims for the improvement of the current solver for this equation in order to reduce the simulation times.
In [1] and [2], a geometric multigrid approach using finite differences for the discretization and a combined line smoothing procedure has been developed.
Additionally, an implicit extrapolation technique is used to increase the approximation order of the solution.
In this master's thesis, this GmgPolar solver is detailed and implemented in C++. Moreover, several improvements have been applied to the solver and some parts of the code have been parallelised.
As the full optimization and parallelisation exceeds the scope of this thesis, future work will be required, before comparing the solver with two other possible approaches and integrating it into GyselaX to reduce the simulation time.
[1] Kühn, M. J.; Kruse, C.; Rüde, U. Energy-Minimizing, Symmetric Discretizations for Anisotropic Meshes and Energy Functional Extrapolation, SIAM J. Sci. Comput.Vol. 43(4), pp. A2448-A2473 (2021).
[2] Kühn, M. J.; Kruse, C.; Rüde, U. Implicitly extrapolated geometric multigrid on disk-like domains for the gyrokinetic Poisson equation from fusion plasma applications, Preprint: https://hal.archives-ouvertes.fr/hal-03003307/, Submit-ted to Journal of Scientific Computing, 2021
The Sixth Copper Mountain Conference on Multigrid Methods, part 1
The Sixth Copper Mountain Conference on Multigrid Methods was held on 4-9 Apr. 1993, at Copper Mountain, CO. This book is a collection of many of the papers presented at the conference and as such represents the conference proceedings. NASA LaRC graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection clearly shows its rapid trend to further diversity and depth
Software for Exascale Computing - SPPEXA 2016-2019
This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
Investigating Schwarz domain decomposition based preconditioners for efficient geophysical electromagnetic field simulation
In this thesis, I researched and implemented a number of Schwarz domain decomposition
algorithms with the intent of finding an efficient method to solve the geophysical
EM problem. I began by using finite difference and finite element discretizations
to investigate the domain decomposition algorithms for the Poisson problem. I found
that the Schwarz methods were best used as a preconditioner to a Krylov iteration.
The optimized Schwarz (OS) preconditioner outperformed the related restricted additive
Schwarz (RAS) preconditioner and both of the local and global OS fixed point
iterations. Using finite differences the OS preconditioner performed much better than
the RAS preconditioner, but using finite element in parallel with the FEniCS assembly
library, their performance was similar. The FEniCS library automatically partitions
the global mesh into subdomains and produces irregular partition boundaries. By
creating a serial rectangular subdomain code in FEniCS, I regained the benefit of
the OS preconditioner, suggesting that the irregular partitioning scheme was detrimental
to the convergence behaviour of the OS preconditioner. Based on my work
for the Poisson problem, I decided to attempt both a RAS and OS preconditioned
GMRES iteration for the electromagnetic problem. Due to the unstructured meshes
and source/receiver refinement used in EM modelling I could not avoid the irregular
mesh partitioning, and the OS preconditioner lagged the RAS preconditioner in
terms of iteration count. On the bright side, the RAS preconditioner worked very
well, and outperformed any of the preconditioners bundled with PETSc in terms of
both iteration count and time to solution
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed