Search CORE

2 research outputs found

Pricing Python Parallelism: A Dynamic Language Cost Model for Heterogeneous Platforms

Author: Ardalani N.
Ardö H.
Armih K.
Baghsorkhi S.
Belikov E.
Catanzaro B.
Chikin A.
Fumero J.
Fumero J.
Hayashi A.
Jacob D.
Jacob D.
Jacob D.
Kamil S.
Kim G.
Kothapalli K.
Lam S. K.
Leung A.
Luebke D.
Luo G.
Meng J.
Qunaibit M.
Samadi M.
Trinder P.
Wang G.
Wolfe M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2020
Field of study

Execution times may be reduced by offloading parallel loop nests to a GPU. Auto-parallelizing compilers are common for static languages, often using a cost model to determine when the GPU execution speed will outweigh the offload overheads. Nowadays scientific software is increasingly written in dynamic languages and would benefit from compute accelerators. The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures. We present the first analytical cost model for auto-parallelizing loop nests in a dynamic language on heterogeneous architectures. Predicting execution time in a language like Python is extremely challenging, since aspects like the element types, size of the iteration space, and amenability to parallelization can only be determined at runtime. Hence the cost model must be both staged, to combine compile and run-time information, and lightweight to minimize runtime overhead. GPU execution time prediction must account for factors like data transfer, block-structured execution, and starvation. We show that a comparatively simple, staged analytical model can accurately determine during execution when it is profitable to offload a loop nest. We evaluate our model on three heterogeneous platforms across 360 experiments with 12 loop-intensive Python benchmark programs. The results show small misprediction intervals and a mean slowdown of just 13.6%, relative to the optimal (oracular) offload strategy

Crossref

Enlighten

Cache Size in a Cost Model for Heterogeneous Skeletons

Author: G. J. Michaelson
K. A. Armih
P. W. Trinder
Publication venue
Publication date: 01/01/2011
Field of study

High performance architectures are increasingly heterogeneous with shared and distributed memory components. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This paper proposes a new architectural cost model that accounts for cache size and improves on heterogeneous architectures, and demonstrates a skeleton-based programming model that simplifies programming heterogeneous architectures. We further demonstrate that the cost model can be exploited by skeletons to improve load balancing on heterogeneous architectures. The heterogeneous skeleton model facilitates performance portability, using the architectural cost model to automatically balance load across heterogeneous components of the architecture. For both a data parallel benchmark, and realistic image processing program we obtain good performance for the heterogeneous skeleton on homogeneous shared and distributed memory architectures, and on three heterogeneous architectures. We also show that taking cache size into account in the model leads to improved balance and performance

CiteSeerX

Enlighten