Search CORE

18 research outputs found

Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

Author: Aristidis Sotiropoulos
C.-T. King
D. Patterson
E. Hodzic
E. Hodzic
F. Desprez
G. Goumas
Georgios Tsoukalas
H. R. Arabnia
J. Ramanujam
J. Xue
J. Xue
J.-P. Sheu
J.-P. Sheu
K. Hogstedt
M. Kandemir
Maria Athanasaki
N. J. Boden
N. Manjikian
N. Park
Nectarios Koziris
P. Boulet
P. Tsanakas
Panayiotis Tsanakas
S. M. Bhandarkar
T. Andronikos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Parallelization of nested loop codes for non-uniform memory access (NUMA) architectures

Author: Athanasaki Maria
Αθανασάκη Μαρία
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2005
Field of study

Hellenic National Archive of Doctoral Dissertations

An Efficient Code Generation Technique for Tiled Iteration Spaces

Author: Georgios Goumas
Maria Athanasaki
Nectarios Koziris
Publication venue
Publication date: 01/01/2003
Field of study

This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex compiler work, especially when non-rectangular tile shapes and iteration space bounds are concerned. Our method considerably enhances previous work on rewriting tiled loops, by considering parallelepiped tiles and arbitrary iteration space shapes. In order to generate tiled code, we first enumerate all tiles containing points within the iteration space and second sweep all points within each tile. For the first subproblem, 1 we refine upon previous results concerning the computation of new loop bounds of an iteration space that has been transformed by a non-unimodular transformation. For the second subproblem, we transform the initial parallelepiped tile into a rectangular one, in order to generate efficient code with the aid of a non-unimodular transformation matrix and its Hermite Normal Form (HNF). Experimental results show that the proposed method significantly accelerates the compilation process and generates much more efficient code

CiteSeerX

DSpace@NTUA (National Technical Univ. of Athens)

Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs using Memory Mapped Network Interfaces

Author: Evangelos Koukis
Maria Athanasaki
Nectarios Koziris
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2002
Field of study

In this paper we propose several alternative methods for the compile time scheduling of Tiled Nested Loops onto a fixed size parallel architecture. We investigate the distribution of tiles among processors, provided that we have chosen either a non-overlapping communication mode, which involves successive computation and communication steps, or an overlapping communication mode, which supposes a pipelined, concurrent execution of communication and computations. In order to utilize the available processors as efficiently as possible, we can either adopt a cyclic assignment schedule, or assign neighboring tiles to the same CPU, or adapt the size and shape of tiles, so that the required number of processors is exactly equal to the number of the available ones. We theoretically and experimentally compare the proposed schedules, so as to design one which achieves the minimum total execution time, depending on the cluster configuration, (i.e. number and type of nodes, interconnect bandwidth, etc) the internal characteristics of the underlying architecture (i.e. NIC and DMA latencies, etc) and the iteration space size and shape. 1

CiteSeerX

Code Generation Methods for Tiling Transformations

Author: Georgios Goumas
Maria Athanasaki
Nectarios Koziris
Publication venue
Publication date: 01/01/2002
Field of study

Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex compiler work due to non-rectangular tile shapes and arbitrary iteration space bounds. In this paper, we first survey code generation methods for nested loops which are transformed using non-unimodular transformations. All methods are based on Fourier-Motzkin (FM) elimination. Next, we consider and enhance previous work on rewriting tiled loops by considering parallelepiped tiles and arbitrary iteration space shapes. In order to generate tiled code, all methods first enumerate the tiles containing points within the iteration space, and second, sweep the points within each tile. For the first, we extend previous work in order to access all tile origins correctly, while for the latter, we propose the transformation of the initial parallelepiped tile iteration space into a rectangular one, so as to generate code efficiently with the aid of a non-unimodular transformation matrix and its Hermite Normal Form (HNF). The resulting systems of inequalities are much simpler than those appeared in bibliography; thus their solutions are more efficiently determined using the FM elimination. Experimental results which compare all presented approaches, show that the proposed method for generating tiled code is significantly accelerated, thus rewriting any Ò-D tiled loop in a much more efficient and direct way. Keywords: loop tiling, supernodes, non-unimodular transformations, Fourier- Motzkin elimination, code generation

CiteSeerX

DSpace@NTUA (National Technical Univ. of Athens)

Automatic Code Generation for Executing Tiled Nested Loops Onto Parallel Architectures

Author: Georgios Goumas
Maria Athanasaki
Nectarios Koziris
Publication venue
Publication date: 01/01/2002
Field of study

This paper presents a novel approach for the problem of generating tiled code for nested for-loops using a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies as well as to efficiently execute loops onto non-uniform memory access architectures. However, automatic code generation for tiled loops can be a very complex compiler work due to non-rectangular tile shapes and iteration space bounds. Our method considerably enhances previous work on rewriting tiled loops by considering parallelepiped tiles and arbitrary iteration space shapes. The complexity of code generation for tiling transformation is now reduced to the complexity of code generation for any linear transformation. Experimental results which compare all so far presented approaches, show that the proposed approach for generating tiled code is significantly accelerated

CiteSeerX

Crossref

DSpace@NTUA (National Technical Univ. of Athens)

A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping

Author: Aristidis Sotiropoulos
Georgios Tsoukalas
Maria Athanasaki
Nectarios Koziris
Publication venue
Publication date: 01/01/2002
Field of study

This paper proposes a novel approach for the parallel execution of tiled Iteration Spaces onto a cluster of SMP PC nodes. Each SMP node has multiple CPUs and a single memory mapped PCI-SCI Network Interface Card. We apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. In this way, intranode (intragroup) communication is annihilated. Groups are atomically executed inside each node. Nodes exchange data between successive group computations. We schedule groups much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive atomic group executions. The applied non-blocking schedule resembles a pipelined datapath where group computation phases are overlapped with communication ones, instead of being interleaved with them. Our experimental results illustrate that the proposed method outperforms previous approaches involving blocking communication or conventional grouping schemes

CiteSeerX

DSpace@NTUA (National Technical Univ. of Athens)

Compiling Tiled Iteration Spaces for Clusters

Author: Georgios Goumas
Maria Athanasaki
Nectarios Koziris
Nikolaos Drosinos
Publication venue
Publication date: 01/01/2002
Field of study

This paper presents a complete end-to-end framework to generate automatic message-passing code for tiled iteration spaces. It considers general parallelepiped tiling transformations and general convex iteration spaces. We aim to address all problems concerning data parallel code generation efficiently by transforming the initial non-rectangular tile to a rectangular one. In this way, data distribution and communication become simple and straightforward. We have implemented our parallelizing techniques in a tool which automatically generates MPI code and run several experiments on a cluster of PCs. Our experimental results show the merit of general parallelepiped tiling transformations, and confirm previous theoretical work on schedulingoptimal tile shapes

CiteSeerX

DSpace@NTUA (National Technical Univ. of Athens)

Automatic parallel code generation for tiled nested loops

Author: Georgios Goumas
Maria Athanasaki
Nectarios Koziris
Nikolaos Drosinos
Publication venue: ACM Press
Publication date: 01/01/2004
Field of study

This paper presents an overview of our work, concerning a complete end-to-end framework for automatically generating message passing parallel code for tiled nested for-loops. It considers general parallelepiped tiling transformations and general convex iteration spaces. We address all problems regarding both the generation of sequential tiled code and its parallelization. We have implemented our techniques in a tool which automatically generates MPI parallel code and conducted several series of experiments, concerning the compilation time of our tool, the efficiency of the generated code and the speedup attained on a cluster of PCs. Apart from confirming the value of our techniques, our experimental results show the merit of general parallelepiped tiling transformations and verify previous theoretical work on scheduling-optimal tile shapes

CiteSeerX

DSpace@NTUA (National Technical Univ. of Athens)