551 research outputs found
Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on
the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the
numerical simulations of lattice QCD is a solver of linear equation for a large
sparse matrix that represents the strong interaction among quarks. To establish
widely applicable prescriptions, we examine rather general methods for the SIMD
architecture of KNL, such as using intrinsics and manual prefetching, to the
matrix multiplication and iterative solver algorithms. Based on the performance
measured on the Oakforest-PACS system, we discuss the performance tuning on KNL
as well as the code design for facilitating such tuning on SIMD architecture
and massively parallel machines.Comment: 8 pages, 12 figures. Talk given at LHAM'17 "5th International
Workshop on Legacy HPC Application Migration" in CANDAR'17 "The Fifth
International Symposium on Computing and Networking" and to appear in the
proceeding
Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio
Many problems in fluid modelling require the efficient solution of highly
anisotropic elliptic partial differential equations (PDEs) in "flat" domains.
For example, in numerical weather- and climate-prediction an elliptic PDE for
the pressure correction has to be solved at every time step in a thin spherical
shell representing the global atmosphere. This elliptic solve can be one of the
computationally most demanding components in semi-implicit semi-Lagrangian time
stepping methods which are very popular as they allow for larger model time
steps and better overall performance. With increasing model resolution,
algorithmically efficient and scalable algorithms are essential to run the code
under tight operational time constraints. We discuss the theory and practical
application of bespoke geometric multigrid preconditioners for equations of
this type. The algorithms deal with the strong anisotropy in the vertical
direction by using the tensor-product approach originally analysed by B\"{o}rm
and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219-234]. We extend the
analysis to three dimensions under slightly weakened assumptions, and
numerically demonstrate its efficiency for the solution of the elliptic PDE for
the global pressure correction in atmospheric forecast models. For this we
compare the performance of different multigrid preconditioners on a
tensor-product grid with a semi-structured and quasi-uniform horizontal mesh
and a one dimensional vertical grid. The code is implemented in the Distributed
and Unified Numerics Environment (DUNE), which provides an easy-to-use and
scalable environment for algorithms operating on tensor-product grids. Parallel
scalability of our solvers on up to 20,480 cores is demonstrated on the HECToR
supercomputer.Comment: 22 pages, 6 Figures, 2 Table
A domain decomposing parallel sparse linear system solver
The solution of large sparse linear systems is often the most time-consuming
part of many science and engineering applications. Computational fluid
dynamics, circuit simulation, power network analysis, and material science are
just a few examples of the application areas in which large sparse linear
systems need to be solved effectively. In this paper we introduce a new
parallel hybrid sparse linear system solver for distributed memory
architectures that contains both direct and iterative components. We show that
by using our solver one can alleviate the drawbacks of direct and iterative
solvers, achieving better scalability than with direct solvers and more
robustness than with classical preconditioned iterative solvers. Comparisons to
well-known direct and iterative solvers on a parallel architecture are
provided.Comment: To appear in Journal of Computational and Applied Mathematic
- …