Search CORE

5 research outputs found

Algebraic Tiling

Author: Clauss Philippe
Rossetti Clément
Publication venue: HAL CCSD
Publication date: 16/01/2023
Field of study

International audienceIn this paper, we present an ongoing work whose aim is to propose a new loop tiling technique where tiles are characterized by their volumes-the number of embedded iterations-instead of their sizes-the lengths of their edges. Tiles of quasi-equal volumes are dynamically generated while the tiled loops are running, whatever are the original loop bounds, which may be constant or depending linearly of surrounding loop iterators. The adopted strategy is to successively and hierarchically slice the iteration domain in parts of quasi-equal volumes, from the outermost to the innermost loop dimensions. Since the number of such slices can be exactly chosen, quasi-perfect load balancing is reached by choosing, for each parallel loop, the number of slices as being equal to the number of parallel threads, or to a multiple of this number. Moreover, the approach avoids partial tiles by construction, thus yielding a perfect covering of the iteration domain minimizing the loop control cost. Finally, algebraic tiling makes dynamic scheduling of the parallel threads fairly purposeless for the handled parallel tiled loops

INRIA a CCSD electronic archive server

Forecasting with Dynamic Factor Models in both finite and infinite dimensional factor spaces.

Author: Della Marra Fabio
Publication venue: Proceedings of Simai 2016. The XIII biannual congress of Simai.
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

A general algorithm for tiling the register level

Author: A. Fernández
E. Morancho
J. M. Llabería
M. Jiménez
Publication venue
Publication date: 01/01/1998
Field of study

Tiling is a well-known loop transformation that can be used to exploit data reuse at the register level and to improve a program’s ILP. Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iteration space is rectangular. However, they either cannot handle or can only handle limited cases of non-rectangular iteration spaces. Nonrectangular iteration spaces 1 are commonly found in linear algebra algorithms or can arise as a result of applying previous transformations such as loop skewing. In this paper we present a new general algorithm to perform tiling for the register level in more than one dimension in both rectangular and nonrectangular iteration spaces. Our method uses index set splitting to distinguish loop nests that traverse boundary tiles of the tiled iteration space from loop nests that traverse nonboundary tiles. We evaluate our method using as benchmarks typical linear algebra algorithms having non-rectangular iteration spaces. Results measured on both ALPHA 21064 and MIPS R10000 machines show that our method achieves speedups in the range of 1.11 to 5.96 over commercial compilers and preprocessors able to perform optimizing code transformations. 2

CiteSeerX

Register tiling in nonrectangular iteration spaces

Author: Agustín Fernández
Ancourt C.
Boulet P.
Callahan D.
Carr S.
Carr S.
Carr S.
Carter L.
Coleman S.
Duesterwald E.
Ferrante J.
Gallivan K.
Gannon D.
Jiménez M.
Jiménez M.
José M. Llabería
Kamath C.
Kandemir M.
Kandemir M.
Kennedy K.
Kodukula I.
Kuck D. J.
Lam M.
Laudon J.
Li W.
Marta Jiménez
Maydan D. E.
Navarro J.
Navarro J. J.
Sarkar V.
Temam O.
Wolf M. E.
Wolf M. E.
Wolfe M.
Wolfe M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Register tiling in nonrectangular iteration spaces

Author: Advisor Agustín Fernández
Marta Jiménez
Marta Jiménez
Publication venue
Publication date
Field of study

A mis padres y, especialmente, a Roger. vii Microprocessor-based systems are increasingly becoming the workhorse for all scientific and engineering computation. They have numerical processing capabilities that already rival older generations of supercomputers. Over the last decade, microprocessor design strategies have focused on increasing the computational power available on a single chip. These advances in computational capacity have been achieved by reducing cycle time and also via architectural changes such as pipelined floating-point functional units, multiple instruction issue and out-of-order execution. Nevertheless, a high computation bandwidth is meaningless unless it is matched by a similarly powerful memory subsystem. Unfortunately, while on-chip operation speeds have improved dramatically, the performance of memory has not. The result has been an imbalance between computation speed and memory speed and this imbalance has led machine designers to use complex memory systems based on a hierarchy of levels

CiteSeerX