2 research outputs found

    A Transformation--Based Approach for the Design of Parallel/Distributed Scientific Software: the FFT

    Full text link
    We describe a methodology for designing efficient parallel and distributed scientific software. This methodology utilizes sequences of mechanizable algebra--based optimizing transformations. In this study, we apply our methodology to the FFT, starting from a high--level algebraic algorithm description. Abstract multiprocessor plans are developed and refined to specify which computations are to be done by each processor. Templates are then created that specify the locations of computations and data on the processors, as well as data flow among processors. Templates are developed in both the MPI and OpenMP programming styles. Preliminary experiments comparing code constructed using our methodology with code from several standard scientific libraries show that our code is often competitive and sometimes performs better. Interestingly, our code handled a larger range of problem sizes on one target architecture.Comment: 45 pages, 2 figure

    Conformal Computing: Algebraically connecting the hardware/software boundary using a uniform approach to high-performance computation for software and hardware applications

    Full text link
    We present a systematic, algebraically based, design methodology for efficient implementation of computer programs optimized over multiple levels of the processor/memory and network hierarchy. Using a common formalism to describe the problem and the partitioning of data over processors and memory levels allows one to mathematically prove the efficiency and correctness of a given algorithm as measured in terms of a set of metrics (such as processor/network speeds, etc.). The approach allows the average programmer to achieve high-level optimizations similar to those used by compiler writers (e.g. the notion of "tiling"). The approach presented in this monograph makes use of A Mathematics of Arrays (MoA, Mullin 1988) and an indexing calculus (i.e. the psi-calculus) to enable the programmer to develop algorithms using high-level compiler-like optimizations through the ability to algebraically compose and reduce sequences of array operations. Extensive discussion and benchmark results are presented for the Fast Fourier Transform and other important algorithms
    corecore