4 research outputs found

    Performance analysis and optimization of the JOREK code for many-core CPUs

    Get PDF
    This report investigates the performance of the JOREK code on the Intel Knights Landing and Skylake processor architectures. The OpenMP scaling of the matrix construction part of the code was analyzed and improved synchronization methods were implemented. A new switch was implemented to control the number of threads used for the linear equation solver independently from other parts of the code. The matrix construction subroutine was vectorized, and the data locality was also improved. These steps led to a factor of two speedup for the matrix construction

    Enhanced Preconditioner for JOREK MHD Solver

    Get PDF
    The JOREK extended magneto-hydrodynamic (MHD) code is a widely used simulation code for studying the non-linear dynamics of large-scale instabilities in divertor tokamak plasmas. Due to the large scale-separation intrinsic to these phenomena both in space and time, the computational costs for simulations in realistic geometry and with realistic parameters can be very high, motivating the investment of considerable effort for optimization. In this article, a set of developments regarding the JOREK solver and preconditioner is described, which lead to overall significant benefits for large production simulations. This comprises in particular enhanced convergence in highly non-linear scenarios and a general reduction of memory consumption and computational costs. The developments include faster construction of preconditioner matrices, a domain decomposition of preconditioning matrices for solver libraries that can handle distributed matrices, interfaces for additional solver libraries, an option to use matrix compression methods, and the implementation of a complex solver interface for the preconditioner. The most significant development presented consists in a generalization of the physics based preconditioner to "mode groups", which allows to account for the dominant interactions between toroidal Fourier modes in highly non-linear simulations. At the cost of a moderate increase of memory consumption, the technique can strongly enhance convergence in suitable cases allowing to use significantly larger time steps. For all developments, benchmarks based on typical simulation cases demonstrate the resulting improvements

    Performance analysis and optimization of the JOREK code for many-core CPUs

    No full text
    This report investigates the performance of the JOREK code on the Intel Knights Landing and Skylake processor architectures. The OpenMP scaling of the matrix construction part of the code was analyzed and improved synchronization methods were implemented. A new switch was implemented to control the number of threads used for the linear equation solver independently from other parts of the code. The matrix construction subroutine was vectorized, and the data locality was also improved. These steps led to a factor of two speedup for the matrix construction

    Performance analysis and optimization of the JOREK code for many-core CPUs

    No full text
    This report investigates the performance of the JOREK code on the Intel Knights Landing and Skylake processor architectures. The OpenMP scaling of the matrix construction part of the code was analyzed and improved synchronization methods were implemented. A new switch was implemented to control the number of threads used for the linear equation solver independently from other parts of the code. The matrix construction subroutine was vectorized, and the data locality was also improved. These steps led to a factor of two speedup for the matrix construction
    corecore