Search CORE

15 research outputs found

Determining the Idle Time of a Tiling: New Results

Author: Desprez Frédéric
Dongarra Jack
Rastello Fabrice
Robert Yves
Publication venue: HAL CCSD
Publication date: 01/01/1997
Field of study

In the framework of fully permutable loops, tiling has been studied extensively as a source-to-source program transformation. We build upon recent results by Högsted, Carter, and Ferrante~\cite{HogstedtCF97}, who aim at determining the cumulated idle time spent by all processors while executing the partitioned (tiled) computation domain. We propose new, much shorter proofs of all their results and extend these in several important directions. More precisely, we provide an accurate solution for all values of the {\em rise} parameter that relates the shape of the iteration space to that of the tiles, and for all possible distributions of the tiles to processors. In contrast, the authors in~\cite{HogstedtCF97} deal only with a limited number of cases and provide upper bounds rather than exact formulas.Dans le cadre des boucle complètement permutables le pavage a été beaucoup étudié comme une transformation source-à-source. Nous nous basons sur des travaux récents de Högsted, Carter et Ferrante [12] dont le but est de déterminer le temps d'attente cumulé passé par tous les processeurs pendant l'exécution le domaine de calcul partionné (pavé). Nous proposons des nouvelles preuves plus courtes de tous leurs résultats et nous les étendons dans plusieurs directions importantes. Nous donnons une solution plus précise pour toutes les valeurs du paramétre rise qui relie la forme de l'espace d'itérationa celle des tuiles et pour toutes les distributions possibles des tuiles sur les processeurs. Les auteurs dans [12] ne traitent qu'un nombre limité de cas et fournissent des bornes supérieures plutôt que des formules exacte

INRIA a CCSD electronic archive server

Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Crossref

A Multilevel Parallelization Framework for High-Order Stencil Computations

Author: A. Stathopoulos
A. Taflove
F. Desprez
G. Rivera
J. Dongarra
K. Datta
K.J. Barker
M. Bromley
M. Frigo
M. Snir
R. Bleck
R. Harlick
S. Kamil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

Author: Aristidis Sotiropoulos
C.-T. King
D. Patterson
E. Hodzic
E. Hodzic
F. Desprez
G. Goumas
Georgios Tsoukalas
H. R. Arabnia
J. Ramanujam
J. Xue
J. Xue
J.-P. Sheu
J.-P. Sheu
K. Hogstedt
M. Kandemir
Maria Athanasaki
N. J. Boden
N. Manjikian
N. Park
Nectarios Koziris
P. Boulet
P. Tsanakas
Panayiotis Tsanakas
S. M. Bhandarkar
T. Andronikos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An optimal scheduling scheme for tiling in distributed systems

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Crossref

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators

Author: Chen Xiaohui
Publication venue: Scholarship@Western
Publication date: 24/03/2017
Field of study

Parallel programming is gaining ground in various domains due to the tremendous computational power that it brings; however, it also requires a substantial code crafting effort to achieve performance improvement. Unfortunately, in most cases, performance tuning has to be accomplished manually by programmers. We argue that automated tuning is necessary due to the combination of the following factors. First, code optimization is machine-dependent. That is, optimization preferred on one machine may be not suitable for another machine. Second, as the possible optimization search space increases, manually finding an optimized configuration is hard. Therefore, developing new compiler techniques for optimizing applications is of considerable interest. This thesis aims at generating new techniques that will help programmers develop efficient algorithms and code targeting hardware acceleration technologies, in a more effective manner. Our work is organized around a compilation framework, called MetaFork, for concurrency platforms and its application to automatic parallelization. MetaFork is a high-level programming language extending C/C++, which combines several models of concurrency including fork-join, SIMD and pipelining parallelism. MetaFork is also a compilation framework which aims at facilitating the design and implementation of concurrent programs through four key features which make MetaFork unique and novel: (1) Perform automatic code translation between concurrency platforms targeting multi-core architectures. (2) Provide a high-level language for expressing concurrency as in the fork-join model, the SIMD paradigm and the pipelining parallelism. (3) Generate parallel code from serial code with an emphasis on code depending on machine or program parameters (e.g. cache size, number of processors, number of threads per thread block). (4) Optimize code depending on parameters that are unknown at compile-time

Scholarship@Western

Computación paralela y entornos heterogéneos

Author: Moreno de Antonio Luz Marina
Publication venue: Universidad de La Laguna, Servicio de Publicaciones
Publication date: 01/01/2005
Field of study

Esta tesis se enmarca en el contexto de la resolución en paralelo ; En concreto en sistemas donde las máquinas presentan diferentes características. Es necesario revisar las técnicas conocidas para entornos homogéneos para adaptarlas a los nuevos entornos heterogéneos. Los objetivos de esta tesis se centran en el desarrollo de modelos que nos permitan determinar los valores de los parámetros que minimizan el tiempo de resolución de un problema en un sistema heterogéneo y desarrollar herramientas que faciliten la programación y ejecución de este tipo de entornos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de La Laguna

Recommended from our members

Determining the idle time of a tiling

Author: Carter Larry
Ferrante Jean
Hogstedt Karin
Publication venue: eScholarship, University of California
Publication date: 06/07/1999
Field of study

This paper investigates the idle time associated with a parallel computation, that is, the time that processors are idle because they are either waiting for data from other processors or waiting to synchronize with other processors. We study doubly-nested loops corresponding to parallelogram- or trapezoidal-shaped iteration spaces that have been parallelized by the well-known tiling transformation. We introduce the notion of rise r, which relates the shape of the iteration space to that of the tiles. For parallelogram-shaped iteration spaces, we show that when r -2, the idle time is linear in P, the number of processors, but when r -1, it is quadratic in P. In the context of hierarchical tiling, where multiple levels of tiling are used, a good choice of rise can lead to less idle time and better performance. While idle time is not the only cost that should be considered in evaluating a tiling strategy, current architectural trends (of deeper memory hierarchies and multiple levels of parallelism) suggest it has increasing importance.Pre-2018 CSE ID: CS1999-061

eScholarship - University of California

Determining the Idle Time of a Tiling

Author: Jeanne Ferrante
Karin Hogstedt Larry
Larry Carter
Publication venue: ACM Press
Publication date: 01/01/1997
Field of study

This paper investigates the idle time associated with a parallel computation, that is, the time that processors are idle because they are either waiting for data from other processors or waiting to synchronize with other processors. We study doubly-nested loops corresponding to parallelogram- or trapezoidal-shaped iteration spaces that have been parallelized by the wellknown tiling transformation. We introduce the notion of rise r, which relates the shape of the iteration space to that of the tiles. For parallelogram- shaped iteration spaces, we show that when r \Gamma2, the idle time is linear in P , the number of processors, but when r \Gamma1, it is quadratic in P . In the context of hierarchical tiling, where multiple levels of tiling are used, a good choice of rise can lead to less idle time and better performance. While idle time is not the only cost that should be considered in evaluating a tiling strategy, current architectural trends (of deeper memory hierarchies and multipl..

CiteSeerX