5 research outputs found

    Algebraic Tiling

    Get PDF
    International audienceIn this paper, we present an ongoing work whose aim is to propose a new loop tiling technique where tiles are characterized by their volumes-the number of embedded iterations-instead of their sizes-the lengths of their edges. Tiles of quasi-equal volumes are dynamically generated while the tiled loops are running, whatever are the original loop bounds, which may be constant or depending linearly of surrounding loop iterators. The adopted strategy is to successively and hierarchically slice the iteration domain in parts of quasi-equal volumes, from the outermost to the innermost loop dimensions. Since the number of such slices can be exactly chosen, quasi-perfect load balancing is reached by choosing, for each parallel loop, the number of slices as being equal to the number of parallel threads, or to a multiple of this number. Moreover, the approach avoids partial tiles by construction, thus yielding a perfect covering of the iteration domain minimizing the loop control cost. Finally, algebraic tiling makes dynamic scheduling of the parallel threads fairly purposeless for the handled parallel tiled loops

    A general algorithm for tiling the register level

    No full text
    Tiling is a well-known loop transformation that can be used to exploit data reuse at the register level and to improve a program’s ILP. Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iteration space is rectangular. However, they either cannot handle or can only handle limited cases of non-rectangular iteration spaces. Nonrectangular iteration spaces 1 are commonly found in linear algebra algorithms or can arise as a result of applying previous transformations such as loop skewing. In this paper we present a new general algorithm to perform tiling for the register level in more than one dimension in both rectangular and nonrectangular iteration spaces. Our method uses index set splitting to distinguish loop nests that traverse boundary tiles of the tiled iteration space from loop nests that traverse nonboundary tiles. We evaluate our method using as benchmarks typical linear algebra algorithms having non-rectangular iteration spaces. Results measured on both ALPHA 21064 and MIPS R10000 machines show that our method achieves speedups in the range of 1.11 to 5.96 over commercial compilers and preprocessors able to perform optimizing code transformations. 2

    Register tiling in nonrectangular iteration spaces

    No full text
    A mis padres y, especialmente, a Roger. vii Microprocessor-based systems are increasingly becoming the workhorse for all scientific and engineering computation. They have numerical processing capabilities that already rival older generations of supercomputers. Over the last decade, microprocessor design strategies have focused on increasing the computational power available on a single chip. These advances in computational capacity have been achieved by reducing cycle time and also via architectural changes such as pipelined floating-point functional units, multiple instruction issue and out-of-order execution. Nevertheless, a high computation bandwidth is meaningless unless it is matched by a similarly powerful memory subsystem. Unfortunately, while on-chip operation speeds have improved dramatically, the performance of memory has not. The result has been an imbalance between computation speed and memory speed and this imbalance has led machine designers to use complex memory systems based on a hierarchy of levels
    corecore