Cooperative auto-tuning of parallel skeletons

Collins, Alexander James

Cooperative auto-tuning of parallel skeletons

Authors: Alexander James Collins
Publication date: 26 November 2015
Publisher: The University of Edinburgh

Abstract

Improving program performance through the use of multiple homogeneous processing elements, or cores, is common-place. However, these architectures increase the complexity required at the software level. Existing work is focused on optimising programs that run in isolation on these systems, but ignores the fact that, in reality, these systems run multiple parallel programs concurrently with programs competing for system resources. In order to improve performance in this shared environment, cooperative tuning of multiple, concurrently running parallel programs is required. Moreover, the set of programs running on the system – the system workload – is dynamic and rapidly changing. This makes cooperative tuning a challenge, as it must react rapidly to changes in the system workload. This thesis explores the scope for performance improvement from cooperatively tuning skeleton parallel programs, and techniques that can be used to cooperatively auto-tune parallel programs. Parallel skeletons provide a clear separation between algorithm description and implementation, and provide tuning knobs that the system can use to make high-level changes to a programs implementation. This work is in three parts: (i) how many threads should be allocated to each program running on the system, (ii) on which cores should a programs threads be executed and (iii) what values should be chosen for high-level parameters of the parallel skeletons. We demonstrate that significant performance improvements are available in each of these areas, compared to the current state-of-the-art

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Edinburgh Research Archive

oai:era.ed.ac.uk:1842/15791

Last time updated on 07/06/2021