Search CORE

11 research outputs found

Automatically Harnessing Sparse Acceleration

Sparse linear algebra is central to many scientific programs, yet compilers fail to optimize it well. High-performance libraries are available, but adoption costs are significant. Moreover, libraries tie programs into vendor-specific software and hardware ecosystems, creating non-portable code. In this paper, we develop a new approach based on our specification Language for implementers of Linear Algebra Computations (LiLAC). Rather than requiring the application developer to (re)write every program for a given library, the burden is shifted to a one-off description by the library implementer. The LiLAC-enabled compiler uses this to insert appropriate library routines without source code changes. LiLAC provides automatic data marshaling, maintaining state between calls and minimizing data transfers. Appropriate places for library insertion are detected in compiler intermediate representation, independent of source languages. We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels. Across heterogeneous platforms, applications and data sets we show speedups of 1.1

\times

to over 10

\times

without user intervention.Comment: Accepted to CC 202

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Idiom Recognition in the Polaris Parallelizing Compiler

Author: Bill Pottenger
Rudolf Eigenmann
Publication venue
Publication date: 21/11/2007
Field of study

The elimination of induction variables and the parallelization of reductions in FORTRAN programs have been shown to be integral to performance improvement on parallel computers [7, 8]. As part of the Polaris project [5], compiler passes that recognize these idioms have been implemented and evaluated. Developing these techniques to the point necessary to achieve significant speedups on real applications has prompted solutions to problems that have not been addressed in previous reports on idiom recognition techniques. These include analysis techniques capable of disproving zero-trip loops, symbolic handling facilities to compute closed forms of recurrences, and interfaces to other compilation passes such as the data-dependence test. In comparison, the recognition phase of solving induction variables, which has received most attention so far, has in fact turned out to be relatively straightforward. This paper provides an overview of techniques described in more detail in [12]

CiteSeerX

Targeting a Shared-Address-Space Version of the Seismic

Author: Benchmark Seis Bill
Bill Pottenger
Rudolf Eigenmann
Publication venue
Publication date: 01/01/1995
Field of study

We report on our experiences retargeting the seismic processing message-passing application Seis1.1 [11] to an SGI Challenge shared-memory multiprocessor. Our primary purpose in doing so is to provide a shared-address-space (SAS) version of the application for inclusion in the new high performance SPEChpc96 Benchmark Suite. As a result of this work wehave determined the language constructs necessary to express Seis1.1 in the SAS programming model. In addition, we havecharacterized the performance of the SAS versus message-passing programming models for this application on a shared-memory multiprocessor

CiteSeerX

Parallelization in the Presence of Generalized Induction and Reduction Variables

Author: Bill Pottenger
Rudolf Eigenmann
Publication venue
Publication date
Field of study

The elimination of induction variables and the parallelization of reductions in FORTRAN programs has been shown to be integral to performance improvement on parallel computers [9, 10]. As part of the Polaris project, compiler passes that recognize these idioms have been implemented and evaluated. Developing these techniques to the point necessary for achieving significant speedups on real applications has prompted solutions to problems that have not been addressed in previous reports. These include analysis capabilities to disprove zero-trip loops, symbolic handling facilities to compute closed forms of recurrences, and interfaces to other compilation passes, such as the datadependence test. In comparison, the analysis phase of induction variables, which has received most attention so far, has turned out to be relatively straightforward. 1 Introduction The parallelization of loops requires resolution of many types of data dependences ([2], [5], [17]). In particular, cross-iteration de..

CiteSeerX

cSpace: A Parallel C Information Retrieval Application

Author: Bill Pottenger
Bruce Schatz
Publication venue
Publication date
Field of study

This paper treats the parallelization and performance of cSpace, a

CiteSeerX

Seismic: A Hybrid C/Fortran Seismic Processing Benchmark Bill Pottenger and Rudolf Eigenmann

Author: Bill Pottenger
Center For Supercomputing
Rudolf Eigenmann
Publication venue
Publication date
Field of study

This article is divided into the following sections: ffl The SPEChpc Suite ffl The Seismic Benchmark ffl The Parallelization of Seismic under the SAS Model ffl The Challenge Architecture ffl The Performance of Seismic ffl Conclusion 1 The SPEChpc Suite The SPEChpc benchmark suite has been established under the auspices of the Standard Performance Evaluation Corporation (SPEC) [3]. SPEChpc is defined by a joint effort of industrial members, high-performance computer vendors, and academic institutions. The primary goal is to determine a set of industrially significant applications that can be used to characterize the performance of high-performance computers across a wide range of machine organizations. A secondary goal is to identify a representative workload for high-performance machines which will be made available for scientific study. SPEChpc includes multiple program versions for each application, each targeted at a different class of machine architecture. Currently, both message-passing and sequential program versions are provided for each member of the suite. Seismic version 1.2 has been accepted as the SPECseis96 member of SPEChpc9

CiteSeerX

On the Automatic Parallelization of Sparse and Irregular Fortran Codes

Author: Bill Pottenger
David Padua
Eladio Gutierrez
Emilio Zapata
Rafael Asenjo
Yuan Lin
Publication venue
Publication date
Field of study

This paper studies howwell automatic parallelization techniques work on a collection of real codes with sparse and irregular access patterns. In conducting this work, wehave compared existing technology in the commercial parallelizer PFA from SGI with the Polaris restructurer [7]. In cases This work is supported byU.S.Armycontract #DABT63-95-C-0097 and is not necessarily representativeofthe positions or policies of the ArmyortheGovernmen

CiteSeerX

Polaris: The Next Generation in Parallelizing Compilers

Author: Bill Blume
Bill Pottenger
David Padua
Jay Hoeflinger
John Grout
Keith Faigin
Lawrence Rauchwerger
Paul Petersen
Peng Tu
Rudolf Eigenmann
Stephen Weatherford
Publication venue: Springer-Verlag, Berlin/Heidelberg
Publication date
Field of study

It is the goal of the Polaris project to develop a new parallelizing compiler that will overcome limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from large applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization becomes possible. The techniques needed are interprocedural analysis, scalar and array privatization, symbolic dependence analysis, and advanced induction and reduction recognition and elimination, along with run-time techniques to allow data dependent behavior

CiteSeerX

Restructuring Programs for High-Speed Computers with Polaris

Author: Bill Blume
Bill Pottenger
David Padua
Jaejin Lee
Jay Hoeflinger
John Grout
Keith Faigin
Lawrence Rauchwerger
Paul Petersen
Peng Tu
Rudolf Eigenmann
Stephen Weatherford
Tom Lawrence
Yunheung Paek
Publication venue
Publication date: 01/01/1996
Field of study

The ability to automatically parallelize standard programming languages results in program portability across a wide range of machine architectures. It is the goal of the Polaris project to develop a new parallelizing compiler that overcomes limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from whole applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization becomes feasible for a range of whole applications. The techniques needed are interprocedural analysis, scalar and array privatization, symbolic dependence analysis, and advanced induction and reduction recognition and elimination, along with run-time techniques to permit the parallelization of loops with unknown dependence relations

CiteSeerX