15 research outputs found

    Determining the Idle Time of a Tiling: New Results

    Get PDF
    In the framework of fully permutable loops, tiling has been studied extensively as a source-to-source program transformation. We build upon recent results by Högsted, Carter, and Ferrante~\cite{HogstedtCF97}, who aim at determining the cumulated idle time spent by all processors while executing the partitioned (tiled) computation domain. We propose new, much shorter proofs of all their results and extend these in several important directions. More precisely, we provide an accurate solution for all values of the {\em rise} parameter that relates the shape of the iteration space to that of the tiles, and for all possible distributions of the tiles to processors. In contrast, the authors in~\cite{HogstedtCF97} deal only with a limited number of cases and provide upper bounds rather than exact formulas.Dans le cadre des boucle complètement permutables le pavage a été beaucoup étudié comme une transformation source-à-source. Nous nous basons sur des travaux récents de Högsted, Carter et Ferrante [12] dont le but est de déterminer le temps d'attente cumulé passé par tous les processeurs pendant l'exécution le domaine de calcul partionné (pavé). Nous proposons des nouvelles preuves plus courtes de tous leurs résultats et nous les étendons dans plusieurs directions importantes. Nous donnons une solution plus précise pour toutes les valeurs du paramétre rise qui relie la forme de l'espace d'itérationa celle des tuiles et pour toutes les distributions possibles des tuiles sur les processeurs. Les auteurs dans [12] ne traitent qu'un nombre limité de cas et fournissent des bornes supérieures plutôt que des formules exacte

    Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces

    Full text link

    An optimal scheduling scheme for tiling in distributed systems

    Full text link

    MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators

    Get PDF
    Parallel programming is gaining ground in various domains due to the tremendous computational power that it brings; however, it also requires a substantial code crafting effort to achieve performance improvement. Unfortunately, in most cases, performance tuning has to be accomplished manually by programmers. We argue that automated tuning is necessary due to the combination of the following factors. First, code optimization is machine-dependent. That is, optimization preferred on one machine may be not suitable for another machine. Second, as the possible optimization search space increases, manually finding an optimized configuration is hard. Therefore, developing new compiler techniques for optimizing applications is of considerable interest. This thesis aims at generating new techniques that will help programmers develop efficient algorithms and code targeting hardware acceleration technologies, in a more effective manner. Our work is organized around a compilation framework, called MetaFork, for concurrency platforms and its application to automatic parallelization. MetaFork is a high-level programming language extending C/C++, which combines several models of concurrency including fork-join, SIMD and pipelining parallelism. MetaFork is also a compilation framework which aims at facilitating the design and implementation of concurrent programs through four key features which make MetaFork unique and novel: (1) Perform automatic code translation between concurrency platforms targeting multi-core architectures. (2) Provide a high-level language for expressing concurrency as in the fork-join model, the SIMD paradigm and the pipelining parallelism. (3) Generate parallel code from serial code with an emphasis on code depending on machine or program parameters (e.g. cache size, number of processors, number of threads per thread block). (4) Optimize code depending on parameters that are unknown at compile-time

    Computación paralela y entornos heterogéneos

    Get PDF
    Esta tesis se enmarca en el contexto de la resolución en paralelo ; En concreto en sistemas donde las máquinas presentan diferentes características. Es necesario revisar las técnicas conocidas para entornos homogéneos para adaptarlas a los nuevos entornos heterogéneos. Los objetivos de esta tesis se centran en el desarrollo de modelos que nos permitan determinar los valores de los parámetros que minimizan el tiempo de resolución de un problema en un sistema heterogéneo y desarrollar herramientas que faciliten la programación y ejecución de este tipo de entornos

    Determining the Idle Time of a Tiling

    No full text
    This paper investigates the idle time associated with a parallel computation, that is, the time that processors are idle because they are either waiting for data from other processors or waiting to synchronize with other processors. We study doubly-nested loops corresponding to parallelogram- or trapezoidal-shaped iteration spaces that have been parallelized by the wellknown tiling transformation. We introduce the notion of rise r, which relates the shape of the iteration space to that of the tiles. For parallelogram- shaped iteration spaces, we show that when r \Gamma2, the idle time is linear in P , the number of processors, but when r \Gamma1, it is quadratic in P . In the context of hierarchical tiling, where multiple levels of tiling are used, a good choice of rise can lead to less idle time and better performance. While idle time is not the only cost that should be considered in evaluating a tiling strategy, current architectural trends (of deeper memory hierarchies and multipl..
    corecore