Search CORE

1 research outputs found

DCAFE: Dynamic load-balanced loop Chunking & Aggressive Finish Elimination for Recursive Task Parallel Programs

Author: Gupta Suyash
Nandivada V. Krishna
Shrivastava Rahul
Publication venue
Publication date: 21/02/2015
Field of study

In this paper, we present two symbiotic optimizations to optimize recursive task parallel (RTP) programs by reducing the task creation and termination overheads. Our first optimization Aggressive Finish-Elimination (AFE) helps reduce the redundant join operations to a large extent. The second optimization Dynamic Load-Balanced loop Chunking (DLBC) extends the prior work on loop chunking to decide on the number of parallel tasks based on the number of available worker threads, at runtime. Further, we discuss the impact of exceptions on our optimizations and extend them to handle RTP programs that may throw exceptions. We implemented DCAFE (= DLBC+AFE) in the X10v2.3 compiler and tested it over a set of benchmark kernels on two different hardwares (a 16-core Intel system and a 64-core AMD system). With respect to the base X10 compiler extended with loop-chunking of Nandivada et al [Nandivada et al.(2013)Nandivada, Shirako, Zhao, and Sarkar](LC), DCAFE achieved a geometric mean speed up of 5.75x and 4.16x on the Intel and AMD system, respectively. We also present an evaluation with respect to the energy consumption on the Intel system and show that on average, compared to the LC versions, the DCAFE versions consume 71.2% less energy

arXiv.org e-Print Archive