4 research outputs found
Performance analysis and optimization of the JOREK code for many-core CPUs
This report investigates the performance of the JOREK code on the Intel
Knights Landing and Skylake processor architectures. The OpenMP scaling of the
matrix construction part of the code was analyzed and improved synchronization
methods were implemented. A new switch was implemented to control the number of
threads used for the linear equation solver independently from other parts of
the code. The matrix construction subroutine was vectorized, and the data
locality was also improved. These steps led to a factor of two speedup for the
matrix construction
Enhanced Preconditioner for JOREK MHD Solver
The JOREK extended magneto-hydrodynamic (MHD) code is a widely used
simulation code for studying the non-linear dynamics of large-scale
instabilities in divertor tokamak plasmas. Due to the large scale-separation
intrinsic to these phenomena both in space and time, the computational costs
for simulations in realistic geometry and with realistic parameters can be very
high, motivating the investment of considerable effort for optimization. In
this article, a set of developments regarding the JOREK solver and
preconditioner is described, which lead to overall significant benefits for
large production simulations. This comprises in particular enhanced convergence
in highly non-linear scenarios and a general reduction of memory consumption
and computational costs. The developments include faster construction of
preconditioner matrices, a domain decomposition of preconditioning matrices for
solver libraries that can handle distributed matrices, interfaces for
additional solver libraries, an option to use matrix compression methods, and
the implementation of a complex solver interface for the preconditioner. The
most significant development presented consists in a generalization of the
physics based preconditioner to "mode groups", which allows to account for the
dominant interactions between toroidal Fourier modes in highly non-linear
simulations. At the cost of a moderate increase of memory consumption, the
technique can strongly enhance convergence in suitable cases allowing to use
significantly larger time steps. For all developments, benchmarks based on
typical simulation cases demonstrate the resulting improvements
Performance analysis and optimization of the JOREK code for many-core CPUs
This report investigates the performance of the JOREK code on the Intel Knights Landing and Skylake processor architectures. The OpenMP scaling of the matrix construction part of the code was analyzed and improved synchronization methods were implemented. A new switch was implemented to control the number of threads used for the linear equation solver independently from other parts of the code. The matrix construction subroutine was vectorized, and the data locality was also improved. These steps led to a factor of two speedup for the matrix construction
Performance analysis and optimization of the JOREK code for many-core CPUs
This report investigates the performance of the JOREK code on the Intel
Knights Landing and Skylake processor architectures. The OpenMP scaling
of the matrix construction part of the code was analyzed and improved
synchronization methods were implemented. A new switch was implemented
to control the number of threads used for the linear equation solver
independently from other parts of the code. The matrix construction
subroutine was vectorized, and the data locality was also improved.
These steps led to a factor of two speedup for the matrix construction