12 research outputs found
Evaluation of High Performance Fortran through Application Kernels
Since the definition of the High Performance Fortran (HPF) standard, we have been maintaining a suite of application kernel codes with the aim of using them to evaluate the available compilers. This paper presents the results and conclusions from this study, for sixteen codes, on compilers from IBM, DEC, and the Portland Group Inc. (PGI), and on three machines: a DEC Alphafarm, an IBM SP-2, and a Cray T3D. From this, we hope to show the prospective HPF user that scalable performance is possible with modest effort, yet also where the current weaknesses lay
HPF to OpenMP on the Origin2000: a case study
The geophysics group at CRS4 has long developed echo reconstruction codes in HPF on distributed-memory machines. Now, however, with the arrival of shared-memory machines and their native OpenMP compilers, the transfer to OpenMP would seem to present the logical next step in our code development strategy. Recent experience with porting one of our important HPF codes to OpenMP does not bear this out—at least not on the Origin2000. The OpenMP code suffers from the immaturity of the standard, and the operating system's handling of UNIX threads seems to severely penalize OpenMP performance. On the other hand, the HPF code on the Origin2000 is fast, scalable and not disproportionately sensitive to load on the machine.1147–1154Pubblicat
DDT: a research tool for automatic data distribution in HPF
This article describes the main features and implementation of our automatic data distribution research tool. The tool (DDT) accepts programs written in Fortran 77 and generates High Performance Fortran (HPF) directives to map arrays onto the memories of the processors and parallelize loops, and executable statements to remap these arrays. DDT works by identifying a set of computational phases (procedures and loops). The algorithm builds a search space of candidate solutions for these phases which is explored looking for the combination that minimizes the overall cost; this cost includes data movement cost and computation cost. The movement cost reflects the cost of accessing remote data during the execution of a phase and the remapping costs that have to be paid in order to execute the phase with the selected mapping. The computation cost includes the cost of executing a phase in parallel according to the selected mapping and the owner computes rule. The tool supports interprocedural analysis and uses control flow information to identify how phases are sequenced during the execution of the application.Peer ReviewedPostprint (published version
Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers
Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe
HPCCP/CAS Workshop Proceedings 1998
This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey
Code-Optimierung im Polyedermodell - Effizienzsteigerung von parallelen Schleifensätzen
A safe basis for automatic loop parallelization is the polyhedron model which represents the iteration domain of a loop nest as a polyhedron in . However, turning the parallel loop program in the model to efficient code meets with several obstacles, due to which performance may deteriorate seriously -- especially on distributed memory architectures. We introduce a fine-grained model of the computation performed and show how this model can be applied to create efficient code