Search CORE

4,117 research outputs found

Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

Author: Prades Javier
Reano Carlos
Silla Federico
Varghese Blesson
Publication venue
Publication date: 14/06/2016
Field of study

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.Comment: Accepted to the Journal of Parallel and Distributed Computing (JPDC), 10 June 201

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

RiuNet

A Survey of Techniques for Improving Security of GPUs

Author: Abhinaya S. B.
Ali Irfan
Mittal Sparsh
Reddy Manish
Publication venue
Publication date: 01/01/2018
Field of study

Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad

CGAMES'2009

Author
Publication venue: University of Wolverhampton, School of Computing and Information Technology
Publication date: 01/01/2009
Field of study

Wolverhampton Intellectual Repository and E-theses

Parallel simulation of Population Dynamics P systems: updates and roadmap

Author: Macías Ramos Luis Felipe
Martínez del Amor Miguel Ángel
Pérez Jiménez Mario de Jesús
Valencia Cabrera Luis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Population Dynamics P systems are a type of multienvironment P systems that serve as a formal modeling framework for real ecosystems. The accurate simulation of these probabilisticmodels, e.g. with Direct distribution based on Consistent Blocks Algorithm, entails large run times. Hence, parallel platforms such as GPUs have been employed to speedup the simulation. In 2012, the first GPU simulator of PDP systems was presented. However, it was able to run only randomly generated PDP systems. In this paper, we present current updates made on this simulator, involving an input modu le for binary files and an output module for CSV files. Finally, the simulator has been experimentally validated with a real ecosystem model, and its performance has been tested with two high-end GPUs: Tesla C1060 and K40.Ministerio de Economía y Competitividad TIN2012-37434Junta de Andalucía P08-TIC-0420

idUS. Depósito de Investigación Universidad de Sevilla

High Performance Algorithms for Counting Collisions and Pairwise Interactions

Author: A Selle
CMV Benítez
DE Knuth
GE Blelloch
J Elseberg
J Zheng
JP Longmore
L Greengard
M Tang
R Bridson
S Redon
T Brochu
X Provot
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/08/2019
Field of study

The problem of counting collisions or interactions is common in areas as computer graphics and scientific simulations. Since it is a major bottleneck in applications of these areas, a lot of research has been carried out on such subject, mainly focused on techniques that allow calculations to be performed within pruned sets of objects. This paper focuses on how interaction calculation (such as collisions) within these sets can be done more efficiently than existing approaches. Two algorithms are proposed: a sequential algorithm that has linear complexity at the cost of high memory usage; and a parallel algorithm, mathematically proved to be correct, that manages to use GPU resources more efficiently than existing approaches. The proposed and existing algorithms were implemented, and experiments show a speedup of 21.7 for the sequential algorithm (on small problem size), and 1.12 for the parallel proposal (large problem size). By improving interaction calculation, this work contributes to research areas that promote interconnection in the modern world, such as computer graphics and robotics.Comment: Accepted in ICCS 2019 and published in Springer's LNCS series. Supplementary content at https://mjsaldanha.com/articles/1-hpc-ssp

arXiv.org e-Print Archive

Crossref

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Author: Liu Weifeng
Vinter Brian
Publication venue: 'Elsevier BV'
Publication date: 14/09/2015
Field of study

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

arXiv.org e-Print Archive

Copenhagen University Research Information System