Search CORE

361 research outputs found

A Survey of Techniques for Improving Security of GPUs

Author: Abhinaya S. B.
Ali Irfan
Mittal Sparsh
Reddy Manish
Publication venue
Publication date: 01/01/2018
Field of study

Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad

Unified Shared Memory: Friend or Foe?:Understanding the Implications of Unified Memory on Managed Heaps

Author: Blanaru Florin-Gabriel
Dohrmann Steve
Fumero Alfonso Juan
Kotselidis Christos-Efthymios
Stratikopoulos Athanasios
Viswanathan Sandhya
Publication venue
Publication date: 01/10/2023
Field of study

The University of Manchester - Institutional Repository

Active data structures on GPGPUs

Author: Hall C.
Monro S.
O'Donnell J.
Publication venue
Publication date: 01/01/2013
Field of study

Active data structures support operations that may affect a large number of elements of an aggregate data structure. They are well suited for extremely fine grain parallel systems, including circuit parallelism. General purpose GPUs were designed to support regular graphics algorithms, but their intermediate level of granularity makes them potentially viable also for active data structures. We consider the characteristics of active data structures and discuss the feasibility of implementing them on GPGPUs. We describe the GPU implementations of two such data structures (ESF arrays and index intervals), assess their performance, and discuss the potential of active data structures as an unconventional programming model that can exploit the capabilities of emerging fine grain architectures such as GPUs

Enlighten

A parallel priority queue with fast updates for GPU architectures

Author: Iacono John
Karsin Ben
Sitchinava Nodari
Publication venue
Publication date: 01/06/2019
Field of study

The high computational throughput of modern graphics processing units (GPUs) make them the de-facto architecture for high-performance computing applications. However, to achieve peak performance, GPUs require highly parallel workloads, as well as memory access patterns that exhibit good locality of reference. As a result, many state-of-the-art algorithms and data structures designed for GPUs sacrifice work-optimality to achieve the necessary parallelism. Furthermore, some abstract data types are avoided completely due to there being no corresponding data structure that performs well on the GPU. One such abstract data type is the priority queue. Many well-known algorithms rely on priority queue operations as a building block. While various priority queue structures have been developed that are parallel, cache-aware, or cache-oblivious, none has been shown to be efficient on GPUs. In this paper, we present the parBucketHeap, a parallel, cache-efficient data structure designed for modern GPU architectures that supports standard priority queue operations, as well as bulk update. We analyze the structure in several well-known computational models and show that it provides both optimal parallelism and is cache-efficient. We implement the parBucketHeap and, using it, we solve the single-source shortest path (SSSP) problem. Experimental results indicate that, for sufficiently large, dense graphs with high diameter, we out-perform current state-of-the-art SSSP algorithms on the GPU by up to a factor of 5. Unlike existing GPU SSSP algorithms, our approach is work-optimal and places significantly less load on the GPU, reducing power consumption

arXiv.org e-Print Archive

DI-fusion

Data transfer optimizations for heterogeneous managed runtime systems

Author: Blanaru Florin-Gabriel
Publication venue
Publication date: 01/08/2022
Field of study

The University of Manchester - Institutional Repository

PERFORMANCE OPTIMISATIONS FOR HETEROGENEOUS MANAGED RUNTIME SYSTEMS

Author: Papadimitriou Michail
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository

G-Safe: Safe GPU Sharing in Multi-Tenant Environments

Author: Argyros Anargyros
Bilas Angelos
Chazapis Antony
Mavridis Stelios
Pavlidakis Manos
Vasiliadis Giorgos
Publication venue
Publication date: 17/01/2024
Field of study

Modern GPU applications, such as machine learning (ML) frameworks, can only partially utilize beefy GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different users can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space (GPU context). Previous GPU memory protection approaches have limited deployability because they require specialized hardware extensions or access to source code. This is often unavailable in GPU-accelerated libraries heavily utilized by ML frameworks. In this paper, we present G-Safe, a PTX-level bounds checking approach for GPUs that limits GPU kernels of each application to stay within the memory partition allocated to them. G-Safe relies on three mechanisms: (1) It divides the common GPU address space into separate partitions for different applications. (2) It intercepts and checks data transfers, fencing erroneous operations. (3) It instruments all GPU kernels at the PTX level (available in closed GPU libraries) fencing all kernel memory accesses outside application memory bounds. We implement G-Safe as an external, dynamically linked library that can be pre-loaded at application startup time. G-Safe's approach is transparent to applications and can support real-life, complex frameworks, such as Caffe and PyTorch, that issue billions of GPU kernels. Our evaluation shows that the overhead of G-Safe compared to native (unprotected) for such frameworks is between 4\% - 12\% and on average 9\%

arXiv.org e-Print Archive

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology

Author: Hyun Bongjoon
Kim Taehun
Lee Dongjae
Rhu Minsoo
Publication venue
Publication date: 01/08/2023
Field of study

Processing-in-memory (PIM) has been explored for decades by computer architects, yet it has never seen the light of day in real-world products due to their high design overheads and lack of a killer application. With the advent of critical memory-intensive workloads, several commercial PIM technologies have been introduced to the market ranging from domain-specific PIM architectures to more general-purpose PIM architectures. In this work, we deepdive into UPMEM's commercial PIM technology, a general-purpose PIM-enabled parallel architecture that is highly programmable. Our first key contribution is the development of a flexible simulation framework for PIM. The simulator we developed (aka PIMulator) enables the compilation of UPMEM-PIM source codes into its compiled machine-level instructions, which are subsequently consumed by our cycle-level performance simulator. Using PIMulator, we demystify UPMEM's PIM design through a detailed characterization study. Building on top of our characterization, we conduct a series of case studies to pathfind important architectural features that we deem will be critical for future PIM architectures to suppor

arXiv.org e-Print Archive