96,407 research outputs found

    The President as International Leader

    Get PDF
    In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work on SkePU library which is a skeleton library for these systems. We extend the existing SkePU library with a two-dimensional (2D) data type and skeleton operations and implement several new applications using newly made skeletons. Furthermore, we consider the algorithmic choice present in SkePU and implement support to specify and automatically optimize the algorithmic choice for a skeleton call, on a given platform. To show how to achieve performance, we provide a case-study on optimized GPU-based skeleton implementation for 2D stencil computations and introduce two metrics to maximize resource utilization on a GPU. By devising a mechanism to automatically calculate these two metrics, performance can be retained while porting an application from one GPU architecture to another. Another contribution of this thesis is implementation of the runtime support for the SkePU skeleton library. This is achieved with the help of the StarPUruntime system. By this implementation,support for dynamic scheduling and load balancing for the SkePU skeleton programs is achieved. Furthermore, a capability to do hybrid executionby parallel execution on all available CPUs and GPUs in a system, even for a single skeleton invocation, is developed. SkePU initially supported only data-parallel skeletons. The first task-parallel skeleton (farm) in SkePU is implemented with support for performance-aware scheduling and hierarchical parallel execution by enabling all data parallel skeletons to be usable as tasks inside the farm construct. Experimental evaluations are carried out and presented for algorithmic selection, performance portability, dynamic scheduling and hybrid execution aspects of our work

    SkelCL - A Portable Skeleton Library for High-Level GPU Programming

    Get PDF
    While CUDA and OpenCL made general-purpose programming for Graphics Processing Units (GPU) popular, using these programming approaches remains complex and error-prone because they lack high-level abstractions. The especially challenging systems with multiple GPU are not addressed at all by these low-level programming models. We propose SkelCL – a library providing so-called algorithmic skeletons that capture recurring patterns of parallel computation and communication, together with an abstract vector data type and constructs for specifying data distribution. We demonstrate that SkelCL greatly simplifies programming GPU systems. We report the competitive performance results of SkelCL using both a simple Mandelbrot set computation and an industrial-strength medical imaging application. Because the library is implemented using OpenCL, it is portable across GPU hardware of different vendors

    Using eSkel to Implement the Multiple Baseline Stereo Application

    Get PDF
    We give an overview of the Edinburgh Skeleton Library eSkel, a structured parallel programming library which offers a range of skeletal parallel programming constructs to the C/MPI programmer. Then we illustrate the efficacy of such a high level approach through an application of multiple baseline stereo. We describe the application and show different ways to introduce parallelism using algorithmic skeletons. Some performance results will be reported

    Towards an Adaptive Skeleton Framework for Performance Portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

    BSF-skeleton: A Template for Parallelization of Iterative Numerical Algorithms on Cluster Computing Systems

    Full text link
    This article describes a method for creating applications for cluster computing systems using the parallel BSF skeleton based on the original BSF (Bulk Synchronous Farm) model of parallel computations developed by the author earlier. This model uses the master/slave paradigm. The main advantage of the BSF model is that it allows to estimate the scalability of a parallel algorithm before its implementation. Another important feature of the BSF model is the representation of problem data in the form of lists that greatly simplifies the logic of building applications. The BSF skeleton is designed for creating parallel programs in C++ using the MPI library. The scope of the BSF skeleton is iterative numerical algorithms of high computational complexity. The BSF skeleton has the following distinctive features. - The BSF-skeleton completely encapsulates all aspects that are associated with parallelizing a program. - The BSF skeleton allows error-free compilation at all stages of application development. - The BSF skeleton supports OpenMP programming model and workflows.Comment: Submitted to Methods

    A study into the feasibility of generic programming for the construction of complex software

    Get PDF
    A high degree of abstraction and capacity for reuse can be obtained in software design through the use of Generic Programming (GP) concepts. Despite widespread use of GP in computing, some areas such as the construction of generic component libraries as the skeleton for complex computing systems with extensive domains have been neglected. Here we consider the design of a library of generic components based on the GP paradigm implemented with Java. Our aim is to investigate the feasibility of using GP paradigm in the construction of complex computer systems where the management of users interacting with the system and the optimisation of the system’s resources is required.Postprint (author's final draft

    BPLG–BMCS: GPU-sorting algorithm using a tuning skeleton library

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-015-1591-9[Abstract] In this work, we present an efficient and portable sorting operator for GPUs. Specifically, we propose an algorithmic variant of the bitonic merge sort which reduces the number of processing stages and internal steps, increasing the workload per thread and focusing on a multi-batch execution for multiple problems of a small size. This proposal is well matched to current GPU architectures and we apply different CUDA optimizations to improve performance. For portability, we use a library based on tuning building blocks. Thanks to this parametrization, the library can easily be tuned for different CUDA GPU architectures. Our proposals obtain competitive performance on two recent NVIDIA GPU architectures, providing an improvement of up to 11,794 × over CUDPP and up to 6467 × over ModernGPU.Xunta de Galicia; GRC2013/055Ministerio de Economía y Competitividad; TIN2013-42148-PCOST Program Action; IC130

    Automatic rigging and animation of 3D characters

    Get PDF
    Animating an articulated 3D character currently requires manual rigging to specify its internal skeletal structure and to define how the input motion deforms its surface. We present a method for animating characters automatically. Given a static character mesh and a generic skeleton, our method adapts the skeleton to the character and attaches it to the surface, allowing skeletal motion data to animate the character. Because a single skeleton can be used with a wide range of characters, our method, in conjunction with a library of motions for a few skeletons, enables a user-friendly animation system for novices and children. Our prototype implementation, called Pinocchio, typically takes under a minute to rig a character on a modern midrange PC.Solidworks CorporationNational Science Foundation (U.S.). Graduate Research Fellowshi

    Using the SkelCL Library for High-Level GPU Programming of 2D Applications

    Get PDF
    Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library offers pre-implemented recurring computation and communication patterns (skeletons) which greatly simplify programming for single- and multi-GPU systems. In this paper, we focus on applications that work on two-dimensional data. We extend SkelCL by the matrix data type and the MapOverlap skeleton which specifies computations that depend on neighboring elements in a matrix. The abstract data types and a high-level data (re)distribution mechanism of SkelCL shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. We demonstrate how the extended SkelCL is used to implement real-world image processing applications on two-dimensional data. We show that both from a productivity and a performance point of view it is beneficial to use the high-level abstractions of SkelCL
    • …
    corecore