634 research outputs found

    Hardware acceleration of reaction-diffusion systems:a guide to optimisation of pattern formation algorithms using OpenACC

    Get PDF
    Reaction Diffusion Systems (RDS) have widespread applications in computational ecology, biology, computer graphics and the visual arts. For the former applications a major barrier to the development of effective simulation models is their computational complexity - it takes a great deal of processing power to simulate enough replicates such that reliable conclusions can be drawn. Optimizing the computation is thus highly desirable in order to obtain more results with less resources. Existing optimizations of RDS tend to be low-level and GPGPU based. Here we apply the higher-level OpenACC framework to two case studies: a simple RDS to learn the ‘workings’ of OpenACC and a more realistic and complex example. Our results show that simple parallelization directives and minimal data transfer can produce a useful performance improvement. The relative simplicity of porting OpenACC code between heterogeneous hardware is a key benefit to the scientific computing community in terms of speed-up and portability

    Engineering a static verification tool for GPU kernels

    Get PDF
    We report on practical experiences over the last 2.5 years related to the engineering of GPUVerify, a static verification tool for OpenCL and CUDA GPU kernels, plotting the progress of GPUVerify from a prototype to a fully functional and relatively efficient analysis tool. Our hope is that this experience report will serve the verification community by helping to inform future tooling efforts. © 2014 Springer International Publishing

    Using the High Productivity Language Chapel to Target GPGPU Architectures

    Get PDF
    It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

    Operational tsunami modelling with TsunAWI – recent developments and applications

    Get PDF
    In this article, the tsunami model TsunAWI (Alfred Wegener Institute) and its application for hindcasts, inundation studies, and the operation of the tsunami scenario repository for the Indonesian tsunami early warning system are presented. TsunAWI was developed in the framework of the German-Indonesian Tsunami Early Warning System (GITEWS) and simulates all stages of a tsunami from the origin and the propagation in the ocean to the arrival at the coast and the inundation on land. It solves the non-linear shallow water equations on an unstructured finite element grid that allows to change the resolution seamlessly between a coarse grid in the deep ocean and a fine representation of coastal structures. During the GITEWS project and the following maintenance phase, TsunAWI and a framework of pre- and postprocessing routines was developed step by step to provide fast computation of enhanced model physics and to deliver high quality results

    Visual simulation of soil-microbial system using GPGPU technology

    Get PDF
    General Purpose (use of) Graphics Processing Units (GPGPU) is a promising technology for simulation upscaling; in particular for bottom–up modelling approaches seeking to translate micro-scale system processes to macro-scale properties. Many existing simulations of soil ecosystems do not recover the emergent system scale properties and this may be a consequence of “missing” information at finer scales. Interpretation of model output can be challenging and we advocate the “built-in” visual simulation afforded by GPGPU implementations. We apply this GPGPU approach to a reaction–diffusion soil ecosystem model with the intent of linking micro (micron) and core (cm) spatial scales to investigate how microbes respond to changing environments and the consequences on soil respiration. The performance is evaluated in terms of computational speed up, spatial upscaling and visual feedback. We conclude that a GPGPU approach can significantly improve computational efficiency and offers the potential added benefit of visual immediacy. For massive spatial domains distribution over GPU devices may still be required

    VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

    Get PDF
    GPUs have become immensely important computational units on embedded and mobile devices. However, GPGPU developers are often not able to exploit the compute power offered by GPUs on these devices mainly due to the lack of support of traditional programming models such as CUDA and OpenCL. The recent introduction of the Vulkan API provides a new programming model that could be explored for GPGPU computing on these devices, as it supports compute and promises to be portable across different architectures. In this paper we propose VComputeBench, a set of benchmarks that help developers understand the differences in performance and portability of Vulkan. We also evaluate the suitability of Vulkan as an emerging cross-platform GPGPU framework by conducting a thorough analysis of its performance compared to CUDA and OpenCL on mobile as well as on desktop platforms. Our experiments show that Vulkan provides better platform support on mobile devices and can be regarded as a good crossplatform GPGPU framework. It offers comparable performance and with some low-level optimizations it can offer average speedups of 1.53x and 1.66x compared to CUDA and OpenCL respectively on desktop platforms and 1.59x average speedup compared to OpenCL on mobile platforms. However, while Vulkan’s low-level control can enhance performance, it requires a significantly higher programming effort.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU

    GPGPU microbenchmarking for irregular application optimization

    Get PDF
    Irregular applications, such as unstructured mesh operations, do not easily map onto the typical GPU programming paradigms endorsed by GPU manufacturers, which mostly focus on maximizing concurrency for latency hiding. In this work, we show how alternative techniques focused on latency amortization can be used to control overall latency while requiring less concurrency. We used a custom-built microbenchmarking framework to test several GPU kernels and show how the GPU behaves under relevant workloads. We demonstrate that coalescing is not required for efficacious performance; an uncoalesced access pattern can achieve high bandwidth - even over 80% of the theoretical global memory bandwidth in certain circumstances. We also make other further observations on specific relevant behaviors of GPUs. We hope that this study opens the door for further investigation into techniques that can exploit latency amortization when latency hiding does not achieve sufficient performance

    GPU driven finite difference WENO scheme for real time solution of the shallow water equations

    Get PDF
    The shallow water equations are applicable to many common engineering problems involving modelling of waves dominated by motions in the horizontal directions (e.g. tsunami propagation, dam breaks). As such events pose substantial economic costs, as well as potential loss of life, accurate real-time simulation and visualization methods are of great importance. For this purpose, we propose a new finite difference scheme for the 2D shallow water equations that is specifically formulated to take advantage of modern GPUs. The new scheme is based on the so-called Picard integral formulation of conservation laws combined with Weighted Essentially Non-Oscillatory reconstruction. The emphasis of the work is on third order in space and second order in time solutions (in both single and double precision). Further, the scheme is well-balanced for bathymetry functions that are not surface piercing and can handle wetting and drying in a GPU-friendly manner without resorting to long and specific case-by-case procedures. We also present a fast single kernel GPU implementation with a novel boundary condition application technique that allows for simultaneous real-time visualization and single precision simulations even on large ( > 2000 × 2000) grids on consumer-level hardware - the full kernel source codes are also provided online at https://github.com/pparna/swe_pifweno3
    corecore