3,768 research outputs found
The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform
The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity and acceptance is raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), we introduce in this report two OpenMP parallelization methods. Due to the increasing importance of Arm architecture in the HPC scenario, we evaluate our HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a contribution to the Arm ecosystem: along with this technical report, we plan in fact to release our code for boosting the tuning of the HPCG benchmark within the Arm community.Postprint (author's final draft
Memory and Parallelism Analysis Using a Platform-Independent Approach
Emerging computing architectures such as near-memory computing (NMC) promise
improved performance for applications by reducing the data movement between CPU
and memory. However, detecting such applications is not a trivial task. In this
ongoing work, we extend the state-of-the-art platform-independent software
analysis tool with NMC related metrics such as memory entropy, spatial
locality, data-level, and basic-block-level parallelism. These metrics help to
identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for
Embedded Systems (SCOPES '19), May 201
- …