7 research outputs found
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Early Performance Assessment of the ThunderX2 Processor for Lattice Based Simulations
This paper presents an early performance assessment of the ThunderX2, the most recent Arm-based multi-core processor designed for HPC applications. We use as benchmarks well known stencil-based LBM and LQCD algorithms, widely used to study respectively fluid flows, and interaction properties of elementary particles. We run benchmark kernels derived from OpenMP production codes, we measure performance as a function of the number of threads, and evaluate the impact of different choices for data layout. We then analyze our results in the framework of the roofline model, and compare with the performances measured on mainstream Intel Skylake processors. We find that these Arm based processors reach levels of performance competitive with those of other state-of-the-art options