7 research outputs found

    Optimal Compilation of HPF Remappings

    No full text
    International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm

    Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

    Get PDF
    Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

    DDT: a research tool for automatic data distribution in HPF

    Get PDF
    This article describes the main features and implementation of our automatic data distribution research tool. The tool (DDT) accepts programs written in Fortran 77 and generates High Performance Fortran (HPF) directives to map arrays onto the memories of the processors and parallelize loops, and executable statements to remap these arrays. DDT works by identifying a set of computational phases (procedures and loops). The algorithm builds a search space of candidate solutions for these phases which is explored looking for the combination that minimizes the overall cost; this cost includes data movement cost and computation cost. The movement cost reflects the cost of accessing remote data during the execution of a phase and the remapping costs that have to be paid in order to execute the phase with the selected mapping. The computation cost includes the cost of executing a phase in parallel according to the selected mapping and the owner computes rule. The tool supports interprocedural analysis and uses control flow information to identify how phases are sequenced during the execution of the application.Peer ReviewedPostprint (published version

    Contributions à la performance du calcul scientifique et embarqué

    Get PDF
    Habilitation à diriger les recherches (HDR)Ce document résume mes résultats de recherche après vingt années d'activité, y compris les travaux en collaboration avec trois étudiants ayant soutenu leur thèse de doctorat~: Julien Zory, Youcef Bouchebaba et Mehdi Amini. Il fournit une vue générale des résultats organisés selon les principales motivations qui ont soutenu leur développement, à savoir la performance, l'élégance, les expériences et la diffusion des connaissances. Ces travaux couvrent l'analyse statique pour la compilation ou la détection de problèmes dans les programmes, la génération de communications avec des méthodes polyédriques, la génération de code pour des accélérateurs matériels, mais aussi des primitives cryptographiques et un algorithme distribué. Ce document ne donne cependant pas une présentation détaillée des résultats, pour laquelle nous orientons le lecteur vers les articles de journaux, de conférences ou de séminaires correspondants. Les quatre thèmes abordés sont~: la performance -- l'essentiel des travaux vise à optimiser les performances de codes sur diverses architectures, du super calculateur à mémoire distribuée, à la carte graphique (GPGPU), jusqu'au système embarqué spécialisé~; l'élégance -- est un objectif des phases de conception, de même que trouver si possible des solutions optimales, tout en devant rester pratiques~; les expériences -- la plupart des algorithmes présentés sont implémentés, en général dans des logiciels libres, par exemple intégrés à des gros projets comme le logiciel PIPS ou diffusés de manière indépendante, de manière à conduire des expériences qui montrent l'intérêt pratique des méthodes~; les connaissances -- une large part de de mon activité est dédiée à la transmission des connaissances, pour des étudiants, des professionnels ou même le grand public. Le document se conclut par un projet de recherche présenté sous la forme d'une discussion et d'un ensemble de sujets de stage ou de thèse

    Optimal Compilation of HPF Remappings (Extended Abstracts)

    No full text
    ) Fabien Coelho Corinne Ancourt Centre de Recherche en Informatique, ' Ecole des mines de Paris, 35, rue Saint-Honor'e, 77305 Fontainebleau Cedex, France. Phone: +33 1 64 69 47 08. Fax: + 33 1 64 69 47 09. URL: http://www.cri.ensmp.fr/pips October 23, 1995 Abstract Applications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. Hpf (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through the realign and redistribute directives and implicitly at procedure calls and returns. However such features are left out of the hpf subset or of the currently discussed hpf kernel for efficiency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replicat..

    Automatic data layout for distributed memory machines

    No full text
    The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. This thesis discusses the design and implementation of a data layout selection tool that generates Fortran D or HPF style data layout specifications automatically. Because the tool is not embedded in the target compiler and will be run only a few times during the tuning phase of an application, it can use techniques that may be considered too computationally expensive for inclusion in today's compilers. The proposed framework for automatic data layout selection builds and examines explicit search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model is used to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. The machine model uses the novel training set approach which determines the costs of arithmetic operations and simple communication patterns. In the proposed framework, instances of NP-complete problems are solved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics prematurely, the framework capitalizes on state-of-the-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype of the data layout assistant tool has been implemented. Experiments indicate that good data layouts can be determined efficiently

    Automatic Data Layout for Distributed Memory Machines

    No full text
    This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19106The goal of languages like Fortran D or High Performance Fortran(HPF) is to provide a simple yet efficient machine-independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. This thesis discusses the design and implementation of a data layout selection tool that generates Fortran D or HPF style data layout specifications automatically. Because the tool is not embedded in the target compiler and will be run only a few times during the tuning phase of an application, it can use techniques that may be considered too computationally expensive for inclusion in today's compilers. The proposed framework for automatic data layout selection builds and examines explicit search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model is used to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. The machine model uses the novel training set approach which determines the costs of arithmetic operations and simple communication patterns. In the proposed framework, instances of NP-complete problems a resolved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics prematurely, the framework capitalizes on state-of-the-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype of the data layout assistant tool has been implemented. Experiments indicate that good data layouts can be determined efficiently
    corecore