Search CORE

104,010 research outputs found

Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

Author: Prades Javier
Reano Carlos
Silla Federico
Varghese Blesson
Publication venue
Publication date: 14/06/2016
Field of study

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.Comment: Accepted to the Journal of Parallel and Distributed Computing (JPDC), 10 June 201

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

RiuNet

A Survey of Methods For Analyzing and Improving GPU Energy Efficiency

Author: Mittal Sparsh
Vetter Jeffrey S.
Publication venue
Publication date: 18/04/2014
Field of study

Recent years have witnessed a phenomenal growth in the computational capabilities and applications of GPUs. However, this trend has also led to dramatic increase in their power consumption. This paper surveys research works on analyzing and improving energy efficiency of GPUs. It also provides a classification of these techniques on the basis of their main research idea. Further, it attempts to synthesize research works which compare energy efficiency of GPUs with other computing systems, e.g. FPGAs and CPUs. The aim of this survey is to provide researchers with knowledge of state-of-the-art in GPU power management and motivate them to architect highly energy-efficient GPUs of tomorrow.Comment: Accepted with minor revision in ACM Computing Survey Journal (impact factor 3.85, five year impact of 7.85

arXiv.org e-Print Archive

Crossref

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Author: Chen Danny Z.
Chen Jianxu
Chen Xiaoming
Hu Xiaobo Sharon
Publication venue
Publication date: 29/05/2017
Field of study

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating convolution operations. However, maximally exploiting the available memory bandwidth of GPUs for convolution is a challenging task. This paper introduces a general model to address the mismatch between the memory bank width of GPUs and computation data width of threads. Based on this model, we develop two convolution kernels, one for the general case and the other for a special case with one input channel. By carefully optimizing memory access patterns and computation patterns, we design a communication-optimized kernel for the special case and a communication-reduced kernel for the general case. Experimental data based on implementations on Kepler GPUs show that our kernels achieve 5.16X and 35.5% average performance improvement over the latest cuDNN library, for the special case and the general case, respectively

arXiv.org e-Print Archive

Crossref

Tree Contraction, Connected Components, Minimum Spanning Trees: a GPU Path to Vertex Fitting

Author: Hobson PR
Lopes RHC
Reid ID
Publication venue: Verlag Deutsches Elektronen-Synchrotron
Publication date: 01/01/2014
Field of study

Standard parallel computing operations are considered in the context of algorithms for solving 3D graph problems which have applications, e.g., in vertex finding in HEP. Exploiting GPUs for tree-accumulation and graph algorithms is challenging: GPUs offer extreme computational power and high memory-access bandwidth, combined with a model of fine-grained parallelism perhaps not suiting the irregular distribution of linked representations of graph data structures. Achieving data-race free computations may demand serialization through atomic transactions, inevitably producing poor parallel performance. A Minimum Spanning Tree algorithm for GPUs is presented, its implementation discussed, and its efficiency evaluated on GPU and multicore architectures

DESY Publication Database

DESY

Brunel University Research Archive

Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

Author: Gorlatch Sergei
Kegel Philipp
Steuwer Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2012
Field of study

Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers preimplemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multiGPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. In this paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous systems in order to simplify their programming

Crossref

Enlighten