2 research outputs found
A Transformation--Based Approach for the Design of Parallel/Distributed Scientific Software: the FFT
We describe a methodology for designing efficient parallel and distributed
scientific software. This methodology utilizes sequences of mechanizable
algebra--based optimizing transformations. In this study, we apply our
methodology to the FFT, starting from a high--level algebraic algorithm
description. Abstract multiprocessor plans are developed and refined to specify
which computations are to be done by each processor. Templates are then created
that specify the locations of computations and data on the processors, as well
as data flow among processors. Templates are developed in both the MPI and
OpenMP programming styles.
Preliminary experiments comparing code constructed using our methodology with
code from several standard scientific libraries show that our code is often
competitive and sometimes performs better. Interestingly, our code handled a
larger range of problem sizes on one target architecture.Comment: 45 pages, 2 figure
Conformal Computing: Algebraically connecting the hardware/software boundary using a uniform approach to high-performance computation for software and hardware applications
We present a systematic, algebraically based, design methodology for
efficient implementation of computer programs optimized over multiple levels of
the processor/memory and network hierarchy. Using a common formalism to
describe the problem and the partitioning of data over processors and memory
levels allows one to mathematically prove the efficiency and correctness of a
given algorithm as measured in terms of a set of metrics (such as
processor/network speeds, etc.). The approach allows the average programmer to
achieve high-level optimizations similar to those used by compiler writers
(e.g. the notion of "tiling").
The approach presented in this monograph makes use of A Mathematics of Arrays
(MoA, Mullin 1988) and an indexing calculus (i.e. the psi-calculus) to enable
the programmer to develop algorithms using high-level compiler-like
optimizations through the ability to algebraically compose and reduce sequences
of array operations. Extensive discussion and benchmark results are presented
for the Fast Fourier Transform and other important algorithms