11 research outputs found

    Automatically Harnessing Sparse Acceleration

    Get PDF
    Sparse linear algebra is central to many scientific programs, yet compilers fail to optimize it well. High-performance libraries are available, but adoption costs are significant. Moreover, libraries tie programs into vendor-specific software and hardware ecosystems, creating non-portable code. In this paper, we develop a new approach based on our specification Language for implementers of Linear Algebra Computations (LiLAC). Rather than requiring the application developer to (re)write every program for a given library, the burden is shifted to a one-off description by the library implementer. The LiLAC-enabled compiler uses this to insert appropriate library routines without source code changes. LiLAC provides automatic data marshaling, maintaining state between calls and minimizing data transfers. Appropriate places for library insertion are detected in compiler intermediate representation, independent of source languages. We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels. Across heterogeneous platforms, applications and data sets we show speedups of 1.1×\times to over 10×\times without user intervention.Comment: Accepted to CC 202

    Idiom Recognition in the Polaris Parallelizing Compiler

    No full text
    The elimination of induction variables and the parallelization of reductions in FORTRAN programs have been shown to be integral to performance improvement on parallel computers [7, 8]. As part of the Polaris project [5], compiler passes that recognize these idioms have been implemented and evaluated. Developing these techniques to the point necessary to achieve significant speedups on real applications has prompted solutions to problems that have not been addressed in previous reports on idiom recognition techniques. These include analysis techniques capable of disproving zero-trip loops, symbolic handling facilities to compute closed forms of recurrences, and interfaces to other compilation passes such as the data-dependence test. In comparison, the recognition phase of solving induction variables, which has received most attention so far, has in fact turned out to be relatively straightforward. This paper provides an overview of techniques described in more detail in [12]

    Targeting a Shared-Address-Space Version of the Seismic

    No full text
    We report on our experiences retargeting the seismic processing message-passing application Seis1.1 [11] to an SGI Challenge shared-memory multiprocessor. Our primary purpose in doing so is to provide a shared-address-space (SAS) version of the application for inclusion in the new high performance SPEChpc96 Benchmark Suite. As a result of this work wehave determined the language constructs necessary to express Seis1.1 in the SAS programming model. In addition, we havecharacterized the performance of the SAS versus message-passing programming models for this application on a shared-memory multiprocessor

    Parallelization in the Presence of Generalized Induction and Reduction Variables

    No full text
    The elimination of induction variables and the parallelization of reductions in FORTRAN programs has been shown to be integral to performance improvement on parallel computers [9, 10]. As part of the Polaris project, compiler passes that recognize these idioms have been implemented and evaluated. Developing these techniques to the point necessary for achieving significant speedups on real applications has prompted solutions to problems that have not been addressed in previous reports. These include analysis capabilities to disprove zero-trip loops, symbolic handling facilities to compute closed forms of recurrences, and interfaces to other compilation passes, such as the datadependence test. In comparison, the analysis phase of induction variables, which has received most attention so far, has turned out to be relatively straightforward. 1 Introduction The parallelization of loops requires resolution of many types of data dependences ([2], [5], [17]). In particular, cross-iteration de..

    cSpace: A Parallel C Information Retrieval Application

    No full text
    This paper treats the parallelization and performance of cSpace, a

    Seismic: A Hybrid C/Fortran Seismic Processing Benchmark Bill Pottenger and Rudolf Eigenmann

    No full text
    This article is divided into the following sections: ffl The SPEChpc Suite ffl The Seismic Benchmark ffl The Parallelization of Seismic under the SAS Model ffl The Challenge Architecture ffl The Performance of Seismic ffl Conclusion 1 The SPEChpc Suite The SPEChpc benchmark suite has been established under the auspices of the Standard Performance Evaluation Corporation (SPEC) [3]. SPEChpc is defined by a joint effort of industrial members, high-performance computer vendors, and academic institutions. The primary goal is to determine a set of industrially significant applications that can be used to characterize the performance of high-performance computers across a wide range of machine organizations. A secondary goal is to identify a representative workload for high-performance machines which will be made available for scientific study. SPEChpc includes multiple program versions for each application, each targeted at a different class of machine architecture. Currently, both message-passing and sequential program versions are provided for each member of the suite. Seismic version 1.2 has been accepted as the SPECseis96 member of SPEChpc9

    On the Automatic Parallelization of Sparse and Irregular Fortran Codes

    No full text
    This paper studies howwell automatic parallelization techniques work on a collection of real codes with sparse and irregular access patterns. In conducting this work, wehave compared existing technology in the commercial parallelizer PFA from SGI with the Polaris restructurer [7]. In cases This work is supported byU.S.Armycontract #DABT63-95-C-0097 and is not necessarily representativeofthe positions or policies of the ArmyortheGovernmen

    Polaris: The Next Generation in Parallelizing Compilers

    No full text
    It is the goal of the Polaris project to develop a new parallelizing compiler that will overcome limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from large applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization becomes possible. The techniques needed are interprocedural analysis, scalar and array privatization, symbolic dependence analysis, and advanced induction and reduction recognition and elimination, along with run-time techniques to allow data dependent behavior

    Restructuring Programs for High-Speed Computers with Polaris

    No full text
    The ability to automatically parallelize standard programming languages results in program portability across a wide range of machine architectures. It is the goal of the Polaris project to develop a new parallelizing compiler that overcomes limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from whole applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization becomes feasible for a range of whole applications. The techniques needed are interprocedural analysis, scalar and array privatization, symbolic dependence analysis, and advanced induction and reduction recognition and elimination, along with run-time techniques to permit the parallelization of loops with unknown dependence relations
    corecore