Search CORE

2,111 research outputs found

Automatic Test Case Reduction for OpenCL

Author: Alastair F Donaldson
Andrei Lascu
Moritz Pflanzer
Publication venue
Publication date: 05/03/2020
Field of study

ABSTRACT We report on an extension to the C-Reduce tool, for automatic reduction of C test cases, to handle OpenCL kernels. This enables an automated method for detecting bugs in OpenCL compilers, by generating large random kernels using the CLsmith generator, identifying kernels that yield result differences across OpenCL platforms and optimisation levels, and using our novel extension to C-Reduce to automatically reduce such kernels to minimal forms that can be filed as bug reports. A major part of our effort involved the design of ShadowKeeper, a new plugin for the Oclgrind simulator that provides accurate detection of accesses to uninitialised data. We present experimental results showing the effectiveness of our method for finding bugs in a number of OpenCL compilers

CiteSeerX

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Author: Davidson Gavin
Vanderbauwhede Wim
Publication venue
Publication date: 13/11/2017
Field of study

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized kernels. The main contributions of our work are: (1) whole-source refactoring to allow any subroutine in the code to be offloaded to an accelerator. (2) Minimization of the data transfer between the host and the accelerator by eliminating redundant transfers. (3) Pragmatic auto-parallelization of the code to be offloaded to the accelerator by identification of parallelizable maps and reductions. We have validated the code transformation performance of the compiler on the NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow water component of the ocean model Gmodel; the Linear Baroclinic Model, an atmospheric climate model and Flexpart-WRF, a particle dispersion simulator. The automatic parallelization component has been tested on as 2-D Shallow Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the original code on CPU, in both cases this is the same performance as manually ported code.Comment: 12 pages, 5 figures, submitted to "Computers and Fluids" as full paper from ParCFD conference entr

arXiv.org e-Print Archive

Enlighten

Loo.py: transformation-based code generation for GPUs and CPUs

Author: Asanovic K.
Ellson J.
Rubinsteyn A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2014
Field of study

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form

arXiv.org e-Print Archive

Crossref

Engineering a static verification tool for GPU kernels

Author: A.F. Donaldson
C. Barrett
C. Flanagan
C.-T. Chou
E. Bardsley
J. Ferrante
J.-C. Filliâtre
K.L. McMillan
L. Moura de
M. Barnett
P. Collingbourne
R. Karrenberg
W.-F. Chiang
Z. Rakamarić
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We report on practical experiences over the last 2.5 years related to the engineering of GPUVerify, a static verification tool for OpenCL and CUDA GPU kernels, plotting the progress of GPUVerify from a prototype to a fully functional and relatively efficient analysis tool. Our hope is that this experience report will serve the verification community by helping to inform future tooling efforts. © 2014 Springer International Publishing

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

GPUVerify: A Verifier for GPU Kernels

Author: Adam Betts
Alastair Donaldson
Boyer M.
Cadar C.
Collingbourne P.
Graf S.
Leino K. R. M.
Li G.
Lokhmotov A.
Nathan Chong
Nyland L.
Paul Thomson
Shaz Qadeer
Tripakis S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream ker-nel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational se-mantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invari-ants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, auto-matic verification of a large number of real-world kernels

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Symbolic crosschecking of data-parallel floating-point code

Author: Cadar
collingbourne
Kelly PHJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/12/2013
Field of study

Spiral - Imperial College Digital Repository