Search CORE

2 research outputs found

On polynomial Code Generation

Author: Cohen Albert
Darte Alain
Feautrier Paul
Publication venue: HAL CCSD
Publication date: 23/01/2018
Field of study

International audienceIn static analysis, one often has to deal with polynomials in the program control variables, either native to the source code or created by enabling analyses. We have explained elsewhere how to compute dependences in such situations and use them for building polynomial schedules. It remains to explain how to generate polynomial code. The present proposal is to target new parallel programming languages of the async/finish family, like X10 or Habanero, which are "polynomial friendly" and for which efficient compilers exists. Both these languages have barrier-like constructs-clocks for X10 and phasers for Habanero-which may be used to synchronize activities. To understand the behaviour of a clocked program, one has to count the number of clock advance operations since the creation of each activity. Advances with equal counts are synchronized , and these counts may be polynomials. The trick is therefore to insure that before executing an operation, its activity has executed as many advances as the current value of its schedule. This can be obtained by inserting auxilliary loops for executing the necessary advances. This scheme fails if the schedule is not monotone increasing with respect to the execution order in each activity. This problem may be solved by reordering the activities-which is possible since the real execution order is given by the schedule-or in extreme cases by index set splitting

INRIA a CCSD electronic archive server

Revisiting Loop Transformations with X10 Clocks

Author: Yuki Tomofumi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2015
Field of study

International audienceLoop transformations are known to be important for performance of compute-intensive programs, and are often used to expose parallelism. However, many transformations involving loops often obfuscate the code, and are cumbersome to apply by hand. The goal of this paper is to explore alternative methods for expressing parallelism that are more friendly to the programmer. In particular, we seek to expose parallelism without significantly changing the original loop structure. We illustrate how clocks in X10 can be used to express some of the traditional loop transformations, in the presence of parallelism, in a manner that we believe to be less invasive. Specifically, expressing parallelism corresponding to one-dimensional affine schedules can be achieved without modifying the original loop structure and/or statements

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot