361 research outputs found
A Survey of Techniques for Improving Security of GPUs
Graphics processing unit (GPU), although a powerful performance-booster, also
has many security vulnerabilities. Due to these, the GPU can act as a
safe-haven for stealthy malware and the weakest `link' in the security `chain'.
In this paper, we present a survey of techniques for analyzing and improving
GPU security. We classify the works on key attributes to highlight their
similarities and differences. More than informing users and researchers about
GPU security techniques, this survey aims to increase their awareness about GPU
security vulnerabilities and potential countermeasures
Active data structures on GPGPUs
Active data structures support operations that may affect a large number of elements of an aggregate data structure. They are well suited for extremely fine grain parallel systems, including circuit parallelism. General purpose GPUs were designed to support regular graphics algorithms, but their intermediate level of granularity makes them potentially viable also for active data structures. We consider the characteristics of active data structures and discuss the feasibility of implementing them on GPGPUs. We describe the GPU implementations of two such data structures (ESF arrays and index intervals), assess their performance, and discuss the potential of active data structures as an unconventional programming model that can exploit the capabilities of emerging fine grain architectures such as GPUs
A parallel priority queue with fast updates for GPU architectures
The high computational throughput of modern graphics processing units (GPUs)
make them the de-facto architecture for high-performance computing
applications. However, to achieve peak performance, GPUs require highly
parallel workloads, as well as memory access patterns that exhibit good
locality of reference. As a result, many state-of-the-art algorithms and data
structures designed for GPUs sacrifice work-optimality to achieve the necessary
parallelism. Furthermore, some abstract data types are avoided completely due
to there being no corresponding data structure that performs well on the GPU.
One such abstract data type is the priority queue. Many well-known algorithms
rely on priority queue operations as a building block. While various priority
queue structures have been developed that are parallel, cache-aware, or
cache-oblivious, none has been shown to be efficient on GPUs. In this paper, we
present the parBucketHeap, a parallel, cache-efficient data structure designed
for modern GPU architectures that supports standard priority queue operations,
as well as bulk update. We analyze the structure in several well-known
computational models and show that it provides both optimal parallelism and is
cache-efficient. We implement the parBucketHeap and, using it, we solve the
single-source shortest path (SSSP) problem. Experimental results indicate that,
for sufficiently large, dense graphs with high diameter, we out-perform current
state-of-the-art SSSP algorithms on the GPU by up to a factor of 5. Unlike
existing GPU SSSP algorithms, our approach is work-optimal and places
significantly less load on the GPU, reducing power consumption
G-Safe: Safe GPU Sharing in Multi-Tenant Environments
Modern GPU applications, such as machine learning (ML) frameworks, can only
partially utilize beefy GPUs, leading to GPU underutilization in cloud
environments. Sharing GPUs across multiple applications from different users
can improve resource utilization and consequently cost, energy, and power
efficiency. However, GPU sharing creates memory safety concerns because kernels
must share a single GPU address space (GPU context). Previous GPU memory
protection approaches have limited deployability because they require
specialized hardware extensions or access to source code. This is often
unavailable in GPU-accelerated libraries heavily utilized by ML frameworks. In
this paper, we present G-Safe, a PTX-level bounds checking approach for GPUs
that limits GPU kernels of each application to stay within the memory partition
allocated to them. G-Safe relies on three mechanisms: (1) It divides the common
GPU address space into separate partitions for different applications. (2) It
intercepts and checks data transfers, fencing erroneous operations. (3) It
instruments all GPU kernels at the PTX level (available in closed GPU
libraries) fencing all kernel memory accesses outside application memory
bounds. We implement G-Safe as an external, dynamically linked library that can
be pre-loaded at application startup time. G-Safe's approach is transparent to
applications and can support real-life, complex frameworks, such as Caffe and
PyTorch, that issue billions of GPU kernels. Our evaluation shows that the
overhead of G-Safe compared to native (unprotected) for such frameworks is
between 4\% - 12\% and on average 9\%
Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
Processing-in-memory (PIM) has been explored for decades by computer
architects, yet it has never seen the light of day in real-world products due
to their high design overheads and lack of a killer application. With the
advent of critical memory-intensive workloads, several commercial PIM
technologies have been introduced to the market ranging from domain-specific
PIM architectures to more general-purpose PIM architectures. In this work, we
deepdive into UPMEM's commercial PIM technology, a general-purpose PIM-enabled
parallel architecture that is highly programmable. Our first key contribution
is the development of a flexible simulation framework for PIM. The simulator we
developed (aka PIMulator) enables the compilation of UPMEM-PIM source codes
into its compiled machine-level instructions, which are subsequently consumed
by our cycle-level performance simulator. Using PIMulator, we demystify UPMEM's
PIM design through a detailed characterization study. Building on top of our
characterization, we conduct a series of case studies to pathfind important
architectural features that we deem will be critical for future PIM
architectures to suppor
- …