696 research outputs found

    Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

    Get PDF
    Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers preimplemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multiGPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. In this paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous systems in order to simplify their programming

    A Framework for Megascale Agent Based Model Simulations on Graphics Processing Units

    Get PDF
    Agent-based modeling is a technique for modeling dynamic systems from the bottom up. Individual elements of the system are represented computationally as agents. The system-level behaviors emerge from the micro-level interactions of the agents. Contemporary state-of-the-art agent-based modeling toolkits are essentially discrete-event simulators designed to execute serially on the Central Processing Unit (CPU). They simulate Agent-Based Models (ABMs) by executing agent actions one at a time. In addition to imposing an un-natural execution order, these toolkits have limited scalability. In this article, we investigate data-parallel computer architectures such as Graphics Processing Units (GPUs) to simulate large scale ABMs. We have developed a series of efficient, data parallel algorithms for handling environment updates, various agent interactions, agent death and replication, and gathering statistics. We present three fundamental innovations that provide unprecedented scalability. The first is a novel stochastic memory allocator which enables parallel agent replication in O(1) average time. The second is a technique for resolving precedence constraints for agent actions in parallel. The third is a method that uses specialized graphics hardware, to gather and process statistical measures. These techniques have been implemented on a modern day GPU resulting in a substantial performance increase. We believe that our system is the first ever completely GPU based agent simulation framework. Although GPUs are the focus of our current implementations, our techniques can easily be adapted to other data-parallel architectures. We have benchmarked our framework against contemporary toolkits using two popular ABMs, namely, SugarScape and StupidModel.GPGPU, Agent Based Modeling, Data Parallel Algorithms, Stochastic Simulations

    Viewpoints: A high-performance high-dimensional exploratory data analysis tool

    Full text link
    Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large, high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more closely to that to be publishe

    A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

    Full text link
    The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. First, we survey current practices in distributed learning and identify the different types of parallelism used. Then, we present empirical results comparing their performance on large image and language training tasks. Additionally, we address their statistical efficiency and memory consumption behavior. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance

    Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

    Full text link
    In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.Comment: 37 pages, 16 figure

    N-body simulations of gravitational dynamics

    Full text link
    We describe the astrophysical and numerical basis of N-body simulations, both of collisional stellar systems (dense star clusters and galactic centres) and collisionless stellar dynamics (galaxies and large-scale structure). We explain and discuss the state-of-the-art algorithms used for these quite different regimes, attempt to give a fair critique, and point out possible directions of future improvement and development. We briefly touch upon the history of N-body simulations and their most important results.Comment: invited review (28 pages), to appear in European Physics Journal Plu
    • …
    corecore