Optimisation of computational fluid dynamics applications on multicore and manycore architectures

Abstract

This thesis presents a number of optimisations used for mapping the underlying computational patterns of finite volume CFD applications onto the architectural features of modern multicore and manycore processors. Their effectiveness and impact is demonstrated in a block-structured and an unstructured code of representative size to industrial applications and across a variety of processor architectures that make up contemporary high-performance computing systems. The importance of vectorization and the ways through which this can be achieved is demonstrated in both structured and unstructured solvers together with the impact that the underlying data layout can have on performance. The utility of auto-tuning for ensuring performance portability across multiple architectures is demonstrated and used for selecting optimal parameters such as prefetch distances for software prefetching or tile sizes for strip mining/loop tiling. On the manycore architectures, running more than one thread per physical core is found to be crucial for good performance on processors with in-order core designs but not required on out-of-order architectures. For architectures with high-bandwidth memory packages, their exploitation, whether explicitly or implicitly, is shown to be imperative for best performance. The implementation of all of these optimisations led to application speed-ups ranging between 2.7X and 3X on the multicore CPUs and 5.7X to 24X on the manycore processors.Open Acces

    Similar works