Search CORE

3,704 research outputs found

An OpenMP based Parallelization Compiler for C Applications

Author: Hamid Arabnejad
Jorge G. Barbosa
João Bispo
João M. P. Cardoso
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Directive-drive programming models, such as OpenMP, are one solution for exploiting the potential of multi-core architectures, and enable developers to accelerate software applications by adding annotations on for-type loops and other code regions. However, manual parallelization of applications is known to be a non trivial and time consuming process, requiring parallel programming skills. Automatic parallelization approaches can reduce the burden on the application development side. This paper presents an OpenMP based automatic parallelization compiler, named AutoPar-Clava, for automatic identification and annotation of loops in C code. By using static analysis, parallelizable regions are detected, and a compilable OpenMP parallel code from the sequential version is produced. In order to reduce the accesses to shared memory by each thread, each variable is categorized into the proper OpenMP scoping. Also, AutoPar-Clava is able to support reduction on arrays, which is available since OpenMP 4.5. The effectiveness of AutoPar-Clava is evaluated by means of the Polyhedral Benchmark suite, and targeting a N-cores x86-based computing platform. The achieved results are very promising and compare favorably with closely related auto-parallelization compilers such as Intel C/C++ Compiler (i.e., icc), ROSE, TRACO, and Cetus

Repositório Aberto da Universidade do Porto

Recommended from our members

Experiences in porting mini-applications to OpenACC and OpenMP on heterogeneous systems

Author: Budiardja RD
Daley C
Gayatri R
Hernandez O
Joubert W
Vergara Larrea VG
Publication venue: eScholarship, University of California
Publication date: 25/10/2020
Field of study

This article studies mini-applications—Minisweep, GenASiS, GPP, and FF—that use computational methods commonly encountered in HPC. We have ported these applications to develop OpenACC and OpenMP versions, and evaluated their performance on Titan (Cray XK7 with K20x GPUs), Cori (Cray XC40 with Intel KNL), Summit (IBM AC922 with Volta GPUs), and Cori-GPU (Cray CS-Storm 500NX with Intel Skylake and Volta GPUs). Our goals are for these new ports to be useful to both application and compiler developers, to document and describe the lessons learned and the methodology to create optimized OpenMP and OpenACC versions, and to provide a description of possible migration paths between the two specifications. Cases where specific directives or code patterns result in improved performance for a given architecture are highlighted. We also include discussions of the functionality and maturity of the latest compilers available on the above platforms with respect to OpenACC or OpenMP implementations

eScholarship - University of California

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Author: Davidson Gavin
Vanderbauwhede Wim
Publication venue
Publication date: 13/11/2017
Field of study

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized kernels. The main contributions of our work are: (1) whole-source refactoring to allow any subroutine in the code to be offloaded to an accelerator. (2) Minimization of the data transfer between the host and the accelerator by eliminating redundant transfers. (3) Pragmatic auto-parallelization of the code to be offloaded to the accelerator by identification of parallelizable maps and reductions. We have validated the code transformation performance of the compiler on the NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow water component of the ocean model Gmodel; the Linear Baroclinic Model, an atmospheric climate model and Flexpart-WRF, a particle dispersion simulator. The automatic parallelization component has been tested on as 2-D Shallow Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the original code on CPU, in both cases this is the same performance as manually ported code.Comment: 12 pages, 5 figures, submitted to "Computers and Fluids" as full paper from ParCFD conference entr

arXiv.org e-Print Archive

Enlighten

An LLVM Instrumentation Plug-in for Score-P

Author: Brendel Ronny
Döbel Sebastian
Herold Christian
Tschüter Ronny
Weber Matthias
Wesarg Bert
Ziegenbalg Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2017
Field of study

Reducing application runtime, scaling parallel applications to higher numbers of processes/threads, and porting applications to new hardware architectures are tasks necessary in the software development process. Therefore, developers have to investigate and understand application runtime behavior. Tools such as monitoring infrastructures that capture performance relevant data during application execution assist in this task. The measured data forms the basis for identifying bottlenecks and optimizing the code. Monitoring infrastructures need mechanisms to record application activities in order to conduct measurements. Automatic instrumentation of the source code is the preferred method in most application scenarios. We introduce a plug-in for the LLVM infrastructure that enables automatic source code instrumentation at compile-time. In contrast to available instrumentation mechanisms in LLVM/Clang, our plug-in can selectively include/exclude individual application functions. This enables developers to fine-tune the measurement to the required level of detail while avoiding large runtime overheads due to excessive instrumentation.Comment: 8 page

arXiv.org e-Print Archive

Crossref