1 research outputs found
DCAFE: Dynamic load-balanced loop Chunking & Aggressive Finish Elimination for Recursive Task Parallel Programs
In this paper, we present two symbiotic optimizations to optimize recursive
task parallel (RTP) programs by reducing the task creation and termination
overheads. Our first optimization Aggressive Finish-Elimination (AFE) helps
reduce the redundant join operations to a large extent. The second optimization
Dynamic Load-Balanced loop Chunking (DLBC) extends the prior work on loop
chunking to decide on the number of parallel tasks based on the number of
available worker threads, at runtime. Further, we discuss the impact of
exceptions on our optimizations and extend them to handle RTP programs that may
throw exceptions. We implemented DCAFE (= DLBC+AFE) in the X10v2.3 compiler and
tested it over a set of benchmark kernels on two different hardwares (a 16-core
Intel system and a 64-core AMD system). With respect to the base X10 compiler
extended with loop-chunking of Nandivada et al [Nandivada et
al.(2013)Nandivada, Shirako, Zhao, and Sarkar](LC), DCAFE achieved a geometric
mean speed up of 5.75x and 4.16x on the Intel and AMD system, respectively. We
also present an evaluation with respect to the energy consumption on the Intel
system and show that on average, compared to the LC versions, the DCAFE
versions consume 71.2% less energy