Search CORE

23 research outputs found

Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format.

Author: Joseph L Greathouse
Mayank Daga
Publication venue
Publication date: 01/01/2014
Field of study

Abstract-The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. This has led researchers to propose new storage formats. Unfortunately, dynamically transforming CSR into these formats has significant runtime and storage overheads. We propose a novel algorithm, CSR-Adaptive, which keeps the CSR format intact and maps well to GPUs. Our implementation addresses the aforementioned challenges by (i) efficiently accessing DRAM by streaming data into the local scratchpad memory and (ii) dynamically assigning different numbers of rows to each parallel GPU compute unit. CSR-Adaptive achieves an average speedup of 14.7× over existing CSR-based algorithms and 2.3× over clSpMV cocktail, which uses an assortment of matrix formats

CiteSeerX

Water velocity preferences of Coho Salmon during the parr-smolt transformation

Author: A Johnsonn
C Swanson
CA Loretz
Dennis E. Cocherell
EG Grau
EG Grau
FK Sandercock
GA Wedemeyer
GJ Glova
HA Beecher
J McDonald
JE Thorpe
Jonathan Greathouse
Joseph J. Cech
Julie M. Roessig
Justin Graham
L Shapovalov
LE Davis
MC Healey
S Katzman
Shana M. Katzman
TA Flagg
TE McMahon
TO Brenden
WC Clarke
WR Meehan
WS Hoar
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library

Author: Jakub Poła
Joseph L Greathouse
Kent Knox
Kent Knox
Kiran Varaganti
Kiran Varaganti
Mayank
Mayank Daga
{joseph Greathouse
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT Sparse linear algebra is a cornerstone of modern computational science. These algorithms ignore the zero-valued entries found in many domains in order to work on much larger problems at much faster rates than dense algorithms. Nonetheless, optimizing these algorithms is not straightforward. Highly optimized algorithms for multiplying a sparse matrix by a dense vector, for instance, are the subject of a vast corpus of research and can be hundreds of times longer than naïve implementations. Optimized sparse linear algebra libraries are thus needed so that users can build applications without enormous effort. Hardware vendors release proprietary libraries that are highly optimized for their devices, but they limit interoperability and promote vendor lock-in. Open libraries often work across multiple devices and can quickly take advantage of new innovations, but they may not reach peak performance. The goal of this work is to provide a sparse linear algebra library that offers both of these advantages. We thus describe clSPARSE, a permissively licensed opensource sparse linear algebra library that offers state-of-theart optimized algorithms implemented in OpenCL TM . We test clSPARSE on GPUs from AMD and Nvidia and show performance benefits over both the proprietary cuSPARSE library and the open-source ViennaCL library

CiteSeerX

Interference from GPU System Service Requests

Author: Basu Arkaprava
Greathouse Joseph L
Venkataramani Guru
Vesely Jan
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/12/2018
Field of study

Heterogeneous systems combine general-purpose CPUs with domain-specific accelerators like GPUs. Recent heterogeneous system designs have enabled GPUs to request OS services, but the domain-specific nature of accelerators means that they must rely on the CPUs to handle these requests. Such system service requests can unintentionally harm the performance of unrelated CPU applications. Tests on a real heterogeneous processor demonstrate that GPU system service requests can degrade contemporaneous CPU application performance by up to 44% and can reduce energy efficiency by limiting CPU sleep time. The reliance on busy CPU cores to perform the system services can also slow down GPU work by up to 18%. This new form of interference is found only in accelerator-rich heterogeneous designs and may be exacerbated in future systems with more accelerators. We explore mitigation strategies from other fields that, in the face of such interference, can increase CPU and GPU performance by over 20% and 2 x, respectively, and CPU sleep time by 4.8 x. However, these strategies do not always help and offer no performance guarantees. We therefore describe a technique to guarantee quality of service to CPU workloads by dynamically adding backpressure to GPU requests

Crossref

Open Access Repository of IISc Research Publications

Demand-driven software race detection using hardware performance counters

Author: Banerjee U.
Chen H.
Greathouse J. L.
Jalbert N.
Joseph L. Greathouse
Matthew I. Frank
Moore K. E.
Prvulovic M.
Ramesh Peri
Singh K.
Terboven C.
Todd Austin
Zhiqiang Ma
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref