Search CORE

15 research outputs found

Towards an Achievable Performance for the Loop Nests

Author: A Darte
AH Ashouri
AW Lim
DA Padua
G Fursin
Georgios Tournavitis
J Demšar
K Kennedy
K Stock
MJ Wolfe
Padua
R Allen
R Cammarota
T Grosser
U Bondhugula
W Li
Zhangxiaowen Gong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Parafrase restructuring of FORTRAN code for parallel processing

Author: Wadhwa Atul
Publication venue
Publication date
Field of study

Parafrase transforms a FORTRAN code, subroutine by subroutine, into a parallel code for a vector and/or shared-memory multiprocessor system. Parafrase is not a compiler; it transforms a code and provides information for a vector or concurrent process. Parafrase uses a data dependency to reveal parallelism among instructions. The data dependency test distinguishes between recurrences and statements that can be directly vectorized or parallelized. A number of transformations are required to build a data dependency graph

NASA Technical Reports Server

The preprocessed doacross loop

Author: Mirchandaney Ravi
Saltz Joel H.
Publication venue
Publication date
Field of study

Dependencies between loop iterations cannot always be characterized during program compilation. Doacross loops typically make use of a-priori knowledge of inter-iteration dependencies to carry out required synchronizations. A type of doacross loop is proposed that allows the scheduling of iterations of a loop among processors without advance knowledge of inter-iteration dependencies. The method proposed for loop iterations requires that parallelizable preprocessing and postprocessing steps be carried out during program execution

NASA Technical Reports Server

Semi-automatic process partitioning for parallel computation

Author: Koelbel Charles
Mehrotra Piyush
Vanrosendale John
Publication venue
Publication date
Field of study

On current multiprocessor architectures one must carefully distribute data in memory in order to achieve high performance. Process partitioning is the operation of rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. A semi-automatic approach to process partitioning is considered in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting task system. This approach is illustrated with a picture processing example written in BLAZE, which is transformed into a task system maximizing locality of memory reference

NASA Technical Reports Server

Hardware Barrier Synchronization: Static Barrier MIMD (SBM)

Author: Dietz Henry G.
O\u27Keefe Matthew T.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1990
Field of study

In this paper, we give the design, and performance analysis, of a new, highly efficient, synchronization mechanism called “Static Barrier MIMD” or “SBM.” Unlike traditional barrier synchronization, the proposed barriers are designed to facilitate the use of static (compile-time) code scheduling for eliminating some synchronizations. For this reason, our barrier hardware is more general than most hardware barrier mechanisms, allowing any subset of the processors to participate in each barrier. Since code scheduling typically operates on fine-grain parallelism, it is also vital that barriers be able to execute in a small number of clock ticks. The SBM is actually only one of two new classes of barrier machines proposed to facilitate static code scheduling; the other architecture is the “Dynamic Barrier MIMD,” or “DBM,” which is described in a companion paper1. The DBM differs from the SBM in that the DBM employs more complex hardware to make the system less dependent on the precision of the static analysis and code scheduling; for example, an SBM cannot efficiently manage simultaneous execution of independent parallel programs, whereas a DBM can

Purdue E-Pubs

The PARSE Programming Paradigm. Part I: Software Development Methodology. Part II: Software Development Support Tools

Author: Casavant T. L.
Dietz Henry G.
Sheu P. C.-Y.
Siegel H. J.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/06/1987
Field of study

The programming methodology of PARSE (parallel software environment), a software environment being developed for reconfigurable non-shared memory parallel computers, is described. This environment will consist of an integrated collection of language interfaces, automatic and semi-automatic debugging and analysis tools, and operating system —all of which are made more flexible by the use of a knowledge-based implementation for the tools that make up PARSE. The programming paradigm supports the user freely choosing among three basic approaches /abstractions for programming a parallel machine: logic-based descriptive, sequential-control procedural, and parallel-control procedural programming. All of these result in efficient parallel execution. The current work discusses the methodology underlying PARSE, whereas the companion paper, “The PARSE Programming Paradigm — II: Software Development Support Tools,” details each of the component tools

Purdue E-Pubs

Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

Author: Bodin Francois
Gannon Dennis
Mehrotra Piyush
Priol Thierry
Publication venue
Publication date
Field of study

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model

NASA Technical Reports Server

Structured dataflow analysis for arrays and its use in an optimizing compiler

Author: Aho
Allen
Allen
Allen
Allen
Allen
Allen
Annaratone
Annaratone
Banerjee
Banerjee
Borkar
Callahan
Cocke
Cohn
Colwell
Fisher
Graham
Gross
Hecht
Kanade
Kennedy
Kennedy
Kuck
Kuck
Lam
Lam
Lamport
Padua
Padua
Scarborough
Tamura
Tarjan
Triolet
Ullman
Wolfe
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

A Rules Based Approach to Analyze Data Dependent Transformation Strategies of a Supercompiler for Parallel Computers.

Author: Mcguffee James Woodson
Publication venue: LSU Digital Commons
Publication date: 01/01/1994
Field of study

A supercompiler is a program that attempts to automatically restructure serial code into an equivalent parallel form. This restructuring is achieved through the application of various transformation strategies designed to remove data dependences. A data dependence is a relation between two programming statements that prevent those two statements from being executed in parallel. This research develops a rules based system to analyze the various data dependent transformation strategies of a supercompiler for parallel computers. With the information obtained from user input and the automated analysis of a program segment, this rules based analysis will be able to determine which of the available transformation strategies is the optimal one to be applied for a particular program segment

Louisiana State University