Search CORE

1,569 research outputs found

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Author: Davidson Gavin
Vanderbauwhede Wim
Publication venue
Publication date: 13/11/2017
Field of study

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized kernels. The main contributions of our work are: (1) whole-source refactoring to allow any subroutine in the code to be offloaded to an accelerator. (2) Minimization of the data transfer between the host and the accelerator by eliminating redundant transfers. (3) Pragmatic auto-parallelization of the code to be offloaded to the accelerator by identification of parallelizable maps and reductions. We have validated the code transformation performance of the compiler on the NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow water component of the ocean model Gmodel; the Linear Baroclinic Model, an atmospheric climate model and Flexpart-WRF, a particle dispersion simulator. The automatic parallelization component has been tested on as 2-D Shallow Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the original code on CPU, in both cases this is the same performance as manually ported code.Comment: 12 pages, 5 figures, submitted to "Computers and Fluids" as full paper from ParCFD conference entr

arXiv.org e-Print Archive

Crossref

Enlighten

Non-Strict Independence-Based Program Parallelization Using Sharing and Freeness Information.

Author: Bruynooghe
Bruynooghe
Bueno
Cabeza
Casas
Casas
Casas
Codish
Codish
Cortesi
Daniel Cabeza Gras
Debray
Debray
Gallagher
García de la Banda
Gupta
Gupta
Haridi
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hermenegildo
Hill
Hill
Jacobs
Jacobs
Janson
Karp
King
Li
López-García
Manuel V. Hermenegildo
Mera
Muthukumar
Muthukumar
Muthukumar
Muthukumar
Muthukumar
Navas
Pontelli
Pontelli
Ramkumar
Sato
Shen
Søndergaard
Vaucheret
Warren
Zaffanella
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

The current ubiquity of multi-core processors has brought renewed interest in program parallelization. Logic programs allow studying the parallelization of programs with complex, dynamic data structures with (declarative) pointers in a comparatively simple semantic setting. In this context, automatic parallelizers which exploit and-parallelism rely on notions of independence in order to ensure certain efficiency properties. “Non-strict” independence is a more relaxed notion than the traditional notion of “strict” independence which still ensures the relevant efficiency properties and can allow considerable more parallelism. Non-strict independence cannot be determined solely at run-time (“a priori”) and thus global analysis is a requirement. However, extracting non-strict independence information from available analyses and domains is non-trivial. This paper provides on one hand an extended presentation of our classic techniques for compile-time detection of non-strict independence based on extracting information from (abstract interpretation-based) analyses using the now well understood and popular Sharing + Freeness domain. This includes algorithms for combined compile-time/run-time detection which involve special run-time checks for this type of parallelism. In addition, we propose herein novel annotation (parallelization) algorithms, URLP and CRLP, which are specially suited to non-strict independence. We also propose new ways of using the Sharing + Freeness information to optimize how the run-time environments of goals are kept apart during parallel execution. Finally, we also describe the implementation of these techniques in our parallelizing compiler and recall some early performance results. We provide as well an extended description of our pictorial representation of sharing and freeness information

CiteSeerX

Elsevier - Publisher Connector

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-Parallelism

Author: Carro Liñares Manuel
Casas Amadeo
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2007
Field of study

Most efficient implementations of parallel logic programming rely on complex low-level machinery which is arguably difficult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallellism. We handle a significant portion of the parallel implementation at the Prolog level with the help of a comparatively small number of concurrency.related primitives which take case of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modifications to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary esperiments show thay the performance safcrifieced is reasonable, although granularity of unrestricted parallelism contributes to better observed speedups

CiteSeerX

Archivo Digital UPM

Software for Parallel Computing: the LAM Implementation of MPI

Author: NC DOCKS at The University of North Carolina at Greensboro
Swann Christopher A.
Publication venue
Publication date: 01/01/2001
Field of study

Many econometric problems can benefit from the application of parallel computing techniques, and recent advances in hardware and software have made such application feasible. There are a number of freely available software libraries that make it possible to write message passing parallel programs using personal computers or Unix workstations. This review discusses one of these—the LAM (Local Area Multiprocessor) implementation of MPI (the Message Passing Interface)

The University of North Carolina at Greensboro

The Removal of Numerical Drift from Scientific Models

Author: Anderson Mark
Collins John
Farrimond Brian
Flower David
Gill David
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/03/2013
Field of study

Computer programs often behave differently under different compilers or in different computing environments. Relative debugging is a collection of techniques by which these differences are analysed. Differences may arise because of different interpretations of errors in the code, because of bugs in the compilers or because of numerical drift, and all of these were observed in the present study. Numerical drift arises when small and acceptable differences in values computed by different systems are integrated, so that the results drift apart. This is well understood and need not degrade the validity of the program results. Coding errors and compiler bugs may degrade the results and should be removed. This paper describes a technique for the comparison of two program runs which removes numerical drift and therefore exposes coding and compiler errors. The procedure is highly automated and requires very little intervention by the user. The technique is applied to the Weather Research and Forecasting model, the most widely used weather and climate modelling code.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Edge Hill University Research Information Repository

Semi-automatic fault localization

Author: Jones James Arthur
Publication venue: Georgia Institute of Technology
Publication date: 17/01/2008
Field of study

One of the most expensive and time-consuming components of the debugging process is locating the errors or faults. To locate faults, developers must identify statements involved in failures and select suspicious statements that might contain faults. In practice, this localization is done by developers in a tedious and manual way, using only a single execution, targeting only one fault, and having a limited perspective into a large search space. The thesis of this research is that fault localization can be partially automated with the use of commonly available dynamic information gathered from test-case executions in a way that is eﬀective, eﬃcient, tolerant of test cases that pass but also execute the fault, and scalable to large programs that potentially contain multiple faults. The overall goal of this research is to develop eﬀective and eﬃcient fault localization techniques that scale to programs of large size and with multiple faults. There are three principle steps performed to reach this goal: (1) Develop practical techniques for locating suspicious regions in a program; (2) Develop techniques to partition test suites into smaller, specialized test suites to target speciﬁc faults; and (3) Evaluate the usefulness and cost of these techniques. In this dissertation, the diﬃculties and limitations of previous work in the area of fault-localization are explored. A technique, called Tarantula, is presented that addresses these diﬃculties. Empirical evaluation of the Tarantula technique shows that it is eﬃcient and eﬀective for many faults. The evaluation also demonstrates that the Tarantula technique can loose eﬀectiveness as the number of faults increases. To address the loss of eﬀectiveness for programs with multiple faults, supporting techniques have been developed and are presented. The empirical evaluation of these supporting techniques demonstrates that they can enable eﬀective fault localization in the presence of multiple faults. A new mode of debugging, called parallel debugging, is developed and empirical evidence demonstrates that it can provide a savings in terms of both total expense and time to delivery. A prototype visualization is provided to display the fault-localization results as well as to provide a method to interact and explore those results. Finally, a study on the eﬀects of the composition of test suites on fault-localization is presented.Ph.D.Committee Chair: Harrold, Mary Jean; Committee Member: Orso, Alessandro; Committee Member: Pande, Santosh; Committee Member: Reiss, Steven; Committee Member: Rugaber, Spence

Scholarly Materials And Research @ Georgia Tech

Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs

Author: Luciano Lavagno
Mihai T. Lazarescu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Massive amounts of legacy sequential code need to be parallelized to make better use of modern multiprocessor architectures. Nevertheless, writing parallel programs is still a difficult task. Automated parallelization methods can be effective both at the statement and loop levels and, recently, at the task level, but they are still restricted to specific source code constructs or application domains. We present in this article an innovative toolset that supports developers when performing manual code analysis and parallelization decisions. It automatically collects and represents the program profile and data dependencies in an interactive graphical format that facilitates the analysis and discovery of manual parallelization opportunities. The toolset can be used for arbitrary sequential C programs and parallelization patterns. Also, its program-scope data dependency tracing at runtime can complement the tools based on static code analysis and can also benefit from it at the same time. We also tested the effectiveness of the toolset in terms of time to reach parallelization decisions and of their quality. We measured a significant improvement for several real-world representative applications

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino