Vectorizing unstructured mesh computations for many-core architectures.

Dagum; Dutykh; Giles; Giles; Giles; Kim; Lindtjorn; Mudalige; Poole

research

Vectorizing unstructured mesh computations for many-core architectures.

Authors: Dagum
Dutykh
Giles
Giles
Giles
Kim
Lindtjorn
Mudalige
Poole
Publication date: 1 February 2016
Publisher: 'Wiley'
Doi

Abstract

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and the Xeon-Phi on a key class of irregular applications: unstructured mesh computations. Using single instruction multiple thread (SIMT) and single instruction multiple data (SIMD) programming models, we show how unstructured mesh computations map to OpenCL or vector intrinsics through the use of code generation techniques in the OP2 Domain Specific Library and explore how irregular memory accesses and race conditions can be organized on different hardware. We benchmark Intel Xeon CPUs and the Xeon-Phi, using a tsunami simulation and a representative CFD benchmark. Results are compared with previous work on CPUs and NVIDIA GPUs to provide a comparison of achievable performance on current many-core systems. We show that auto-vectorization and the OpenCL SIMT model do not map efficiently to CPU vector units because of vectorization issues and threading overheads. In contrast, using SIMD vector intrinsics imposes some restrictions and requires more involved programming techniques but results in efficient code and near-optimal performance, two times faster than non-vectorized code. We observe that the Xeon-Phi does not provide good performance for these applications but is still comparable with a pair of mid-range Xeon chips

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

info:doi/10.1002%2Fcpe.3621

Last time updated on 05/06/2019

Warwick Research Archives Portal Repository

oai:wrap.warwick.ac.uk:83587

Last time updated on 17/11/2016