Search CORE

569 research outputs found

A Survey of Techniques for Improving Security of GPUs

Author: Abhinaya S. B.
Ali Irfan
Mittal Sparsh
Reddy Manish
Publication venue
Publication date: 01/01/2018
Field of study

Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad

Data Resource Management in Throughput Processors

Author: Kloosterman John
Publication venue
Publication date
Field of study

Graphics Processing Units (GPUs) are becoming common in data centers for tasks like neural network training and image processing due to their high performance and efficiency. GPUs maintain high throughput by running thousands of threads simultaneously, issuing instructions from ready threads to hide latency in others that are stalled. While this is effective for keeping the arithmetic units busy, the challenge in GPU design is moving the data for computation at the same high rate. Any inefficiency in data movement and storage will compromise the throughput and energy efficiency of the system. Since energy consumption and cooling make up a large part of the cost of provisioning and running and a data center, making GPUs more suitable for this environment requires removing the bottlenecks and overheads that limit their efficiency. The performance of GPU workloads is often limited by the throughput of the memory resources inside each GPU core, and though many of the power-hungry structures in CPUs are not found in GPU designs, there is overhead for storing each thread's state. When sharing a GPU between workloads, contention for resources also causes interference and slowdown. This thesis develops techniques to manage and streamline the data movement and storage resources in GPUs in each of these places. The first part of this thesis resolves data movement restrictions inside each GPU core. The GPU memory system is optimized for sequential accesses, but many workloads load data in irregular or transposed patterns that cause a throughput bottleneck even when all loads are cache hits. This work identifies and leverages opportunities to merge requests across threads before sending them to the cache. While requests are waiting for merges, they can be reordered to achieve a higher cache hit rate. These methods yielded a 38% speedup for memory throughput limited workloads. Another opportunity for optimization is found in the register file. Since it must store the registers for thousands of active threads, it is the largest on-chip data storage structure on a GPU. The second work in this thesis replaces the register file with a smaller, more energy-efficient register buffer. Compiler directives allow the GPU to know ahead of time which registers will be accessed, allowing the hardware to store only the registers that will be imminently accessed in the buffer, with the rest moved to main memory. This technique reduced total GPU energy by 11%. Finally, in a data center, many different applications will be launching GPU jobs, and just as multiple processes can share the same CPU to increase its utilization, running multiple workloads on the same GPU can increase its overall throughput. However, co-runners interfere with each other in unpredictable ways, especially when sharing memory resources. The final part of this thesis controls this interference, allowing a GPU to be shared between two tiers of workloads: one tier with a high performance target and another suitable for batch jobs without deadlines. At a 90% performance target, this technique increased GPU throughput by 9.3%. GPUs' high efficiency and performance makes them a valuable accelerator in the data center. The contributions in this thesis further increase their efficiency by removing data movement and storage overheads and unlock additional performance by enabling resources to be shared between workloads while controlling interference.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146122/1/jklooste_1.pd

Deep Blue Documents at the University of Michigan

Master of Science

Author: Rudy Gabe
Publication venue: University of Utah
Publication date: 29/07/2010
Field of study

thesisThe advent of the era of cheap and pervasive many-core and multicore parallel sys-tems has highlighted the disparity of the performance achieved between novice and expert developers targeting parallel architectures. This disparity is most notiable with software for running general purpose computations on grachics processing units (GPGPU programs). Current methods for implementing GPGPU programs require an expert level understanding of the memory hierarchy and execution model of the hardware to reach peak performance. Even for experts, rewriting a program to exploit these hardware features can be tedious and error prone. Compilers and their ability to make code transformations can assist in the implementation of GPGPU programs, handling many of the target specic details. This thesis presents CUDA-CHiLL, a source to source compiler transformation and code generation framework for the parallelization and optimization of computations expressed in sequential loop nests for running on many-core GPUs. This system uniquely uses a complete scripting language to describe composable compiler transformations that can be written, shared and reused by nonexpert application and library developers. CUDA-CHiLL is built on the polyhedral program transformation and code generation framework CHiLL, which is capable of robust composition of transformations while preserving the correctness of the program at each step. Through its use of powerful abstractions and a scripting interface, CUDA-CHiLL allows for a developer to focus on optimization strategies and ignore the error prone details and low level constructs of GPGPU programming. The high level framework can be used inside an orthogonal auto-tuning system that can quickly evaluate the space of possible implementations. Although specicl to CUDA at the moment, many of the abstractions would hold for any GPGPU framework, particularly Open CL. The contributions of this thesis include a programming language approach to providing transformation abstraction and composition, a unifying framework for general and GPU specicl transformations, and demonstration of the framework on standard benchmarks that show it capable of matching or outperforming hand-tuned GPU kernels

The University of Utah: J. Willard Marriott Digital Library

BEC: Bit-Level Static Analysis for Reliability against Soft Errors

Author: Burgstaller Bernd
Ko Yousun
Publication venue
Publication date: 11/01/2024
Field of study

Soft errors are a type of transient digital signal corruption that occurs in digital hardware components such as the internal flip-flops of CPU pipelines, the register file, memory cells, and even internal communication buses. Soft errors are caused by environmental radioactivity, magnetic interference, lasers, and temperature fluctuations, either unintentionally, or as part of a deliberate attempt to compromise a system and expose confidential data. We propose a bit-level error coalescing (BEC) static program analysis and its two use cases to understand and improve program reliability against soft errors. The BEC analysis tracks each bit corruption in the register file and classifies the effect of the corruption by its semantics at compile time. The usefulness of the proposed analysis is demonstrated in two scenarios, fault injection campaign pruning, and reliability-aware program transformation. Experimental results show that bit-level analysis pruned up to 30.04 % of exhaustive fault injection campaigns (13.71 % on average), without loss of accuracy. Program vulnerability was reduced by up to 13.11 % (4.94 % on average) through bit-level vulnerability-aware instruction scheduling. The analysis has been implemented within LLVM and evaluated on the RISC-V architecture. To the best of our knowledge, the proposed BEC analysis is the first bit-level compiler analysis for program reliability against soft errors. The proposed method is generic and not limited to a specific computer architecture.Comment: 13 pages, 4 figures, to be published in International Symposium on Code Generation and Optimization (CGO) 202

arXiv.org e-Print Archive

A formally verified compiler back-end

Author: A Dold
A Dold
A Hobor
A Pnueli
ACJ Fox
AJ Chlipala
AW Appel
AW Appel
AW Appel
BK Rosen
C Lindig
CW Barrett
D Cachera
D Lacey
D Leinenbach
D Leinenbach
E Eide
F Henderson
G Barthe
G Barthe
G Barthe
G Barthe
G Clemmensen
G Goos
G Klein
G Li
G Li
G Morrisett
G Morrisett
GA Kildall
GC Necula
GC Necula
GC Necula
GC Necula
GJ Chaitin
GP Huet
H-J Boehm
IBM Corporation
J Chen
J Guttman
J Knoop
J Knoop
J McCarthy
J-B Tristan
J-B Tristan
JO Blech
JR Ellis
JS Moore
JS Moore
L Beringer
L Chirica
L George
L Rideau
LD Zuck
M Huisman
M Müller-Olm
M Strecker
MA Dave
N Benton
P Letouzey
P Letouzey
PH Hartel
PW O’Hearn
Q Huang
R Milner
R Stärk
S Beyer
S Blazy
S Blazy
S Coupet-Grimal
S Gulwani
S Lerner
SL Peyton Jones
SS Muchnick
TC Hales
WM McKeeman
X Feng
X Leroy
X Leroy
X Leroy
X Leroy
X Rival
Xavier Leroy
Y Bertot
Y Bertot
Z Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

レイテンシ耐性を持つベクトルプロセッサアーキテクチャに関する研究

Author: Takayashiki Hikaru
Publication venue
Publication date: 24/03/2023
Field of study

Tohoku University博士（情報科学）thesi

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ