Search CORE

69 research outputs found

Disengaged Scheduling for Fair, Protected Access to Fast Computational Accelerators

Author: Dwarakinath A.
GPU
Gupta V.
Intel Corporation
Kato S.
Kato S.
Kyriazis G.
Menychtas K.
Shen K.
Soares L.
Publication venue
Publication date: 04/09/2014
Field of study

Today’s operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4 % overhead on average (max 18%) compared to direct devic

CiteSeerX

Crossref

Glider: A GPU Library Driver for Improved System Security

Author: Sani Ardalan Amiri
Wallach Dan S.
Zhong Lin
Publication venue
Publication date: 13/11/2014
Field of study

Legacy device drivers implement both device resource management and isolation. This results in a large code base with a wide high-level interface making the driver vulnerable to security attacks. This is particularly problematic for increasingly popular accelerators like GPUs that have large, complex drivers. We solve this problem with library drivers, a new driver architecture. A library driver implements resource management as an untrusted library in the application process address space, and implements isolation as a kernel module that is smaller and has a narrower lower-level interface (i.e., closer to hardware) than a legacy driver. We articulate a set of device and platform hardware properties that are required to retrofit a legacy driver into a library driver. To demonstrate the feasibility and superiority of library drivers, we present Glider, a library driver implementation for two GPUs of popular brands, Radeon and Intel. Glider reduces the TCB size and attack surface by about 35% and 84% respectively for a Radeon HD 6450 GPU and by about 38% and 90% respectively for an Intel Ivy Bridge GPU. Moreover, it incurs no performance cost. Indeed, Glider outperforms a legacy driver for applications requiring intensive interactions with the device driver, such as applications using the OpenGL immediate mode API

arXiv.org e-Print Archive

CiteSeerX

FairGV: Fair and Fast GPU Virtualization

Author: Hong Cheol-Ho
Nikolopoulos Dimitrios S.
Spence Ivor
Publication venue
Publication date: 21/06/2017
Field of study

Increasingly high-performance computing (HPC) application developers are opting to use cloud resources due to higher availability. Virtualized GPUs would be an obvious and attractive option for HPC application developers using cloud hosting services. Unfortunately, existing GPU virtualization software is not ready to address fairness, utilization, and performance limitations associated with consolidating mixed HPC workloads. This paper presents FairGV, a radically redesigned GPU virtualization system that achieves system-wide weighted fair sharing and strong performance isolation in mixed workloads that use GPUs with variable degrees of intensity. To achieve its objectives, FairGV introduces a trap-less GPU processing architecture, a new fair queuing method integrated with work-conserving and GPU-centric co-scheduling polices, and a collaborative scheduling method for non-preemptive GPUs. Our prototype implementation achieves near ideal fairness (? 0.97 Min-Max Ratio) with little performance degradation (? 1.02 aggregated overhead) in a range of mixed HPC workloads that leverage GPUs

Queen's University Belfast Research Portal

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Maruchitenantona kuraudo kankyōka deno GPU no daiikkyū keisan shigen toshite no chūshōka

Author: Suzuki Yusuke
スズキユウスケ
鈴木勇介
Publication venue: 慶應義塾大学大学院理工学研究科
Publication date
Field of study

KeiO Academic Resource Archive

LoGA : Low-Overhead GPU Accounting Using Events

Author: Bellosa Frank
Hillenbrand Marius
Kehne Jens
Rittinghaus Marc
Spassov Stanislav
Publication venue: Association for Computing Machinery
Publication date: 01/01/2017
Field of study

Over the last few years, GPUs have become common in computing. However, current GPUs are not designed for a shared environment like a cloud, creating a number of challenges whenever a GPU must be multiplexed between multiple users. In particular, the round-robin scheduling used by today\u27s GPUs does not distribute the available GPU computation time fairly among applications. Most of the previous work addressing this problem resorted to scheduling all GPU computation in software, which induces high overhead. While there is a GPU scheduler called NEON which reduces the scheduling overhead compared to previous work, NEON\u27s accounting mechanism frequently disables GPU access for all but one application, resulting in considerable overhead if that application does not saturate the GPU by itself. In this paper, we present LoGA, a novel accounting mechanism for GPU computation time. LoGA monitors the GPU\u27s state to detect GPU-internal context switches, and infers the amount of GPU computation time consumed by each process from the time between these context switches. This method allows LoGA to measure GPU computation time consumed by applications while keeping all applications running concurrently. As a result, LoGA achieves a lower accounting overhead than previous work, especially for applications that do not saturate the GPU by themselves. We have developed a prototype which combines LoGA with the pre-existing NEON scheduler. Experiments with that prototype have shown that LoGA induces no accounting overhead while still delivering accurate measurements of applications\u27 consumed GPU computation time

KITopen

Recommended from our members

Smart Resource Sharing for Concurrency and Security

Author: Gao Ying
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Different layers of the computer system, from the low-level hardware accelerators and networks-on-chip (NoC) in multi-core systems, to the upper-level operating systems and software applications, rely on the sharing of hardware computing resources. Unfortunately such sharing, when not carefully managed, can introduce a host of protection problems and sources of information leakage. We describe a set of methods by which it is possible to systematically scale performance via hardware sharing without exacerbating security properties by being aware of the design and characteristics of individual layers and components. The key to this is efficiently dealing with security vulnerabilities introduced by sharing in terms of time and space through the creation of new security-conscious sharing interfaces. In a systematic way is to first define coordination techniques into more detailed patterns, and by bridging the gap of less efficient universal measures with provably more performant and secure patterns.Specifically we demonstrate the usefulness of a sharing pattern for hardware and software systems where separation is of concern (interference and timing channel mitigation, etc). The most important insight is that in order to fully utilize computing resources (to improve performance and availability), the entities that share these resources must coordinate in a pre-calculated way. More dynamic approaches to improve performance and concurrency are likely to introduce new interference in the system. While we show that certain static scheduling measures in lower level hardware such as networks-on-chip can provably eliminate timing channels, the dynamic nature of software systems makes covert channels harder to be confined. Besides, software systems also face other types of security problems beyond side channels. To improve concurrency and performance without exacerbating security requires a slightly different approach.To study the obstacles that hinder software applications' scaling in a system because of security concerns, we delve into the Android operating system and its appification ecosystem structure. A prime avenue for attack is introduced because of its distributed sharing eco-pattern. We propose a centralized approach with a single reliable service as a method to enable computation reuse among applications. The proposed centralization technique favors well-protected application-to-system communications over vulnerable application-to-application communications. Thus not only computation concurrency is boosted but also the possibility of an app being attacked through the attack-prone Inter-Component Calls (ICCs) due to possible distributed computation sharing is eliminated. This approach further enables improvements to security with the addition of a novel application-centric grouping for isolation. We show through a prototype on Android how our approach supports and protects inter-app resource sharing, while improving concurrency at scale

eScholarship - University of California

Efficient and portable multi-tasking for heterogeneous systems

Author: Margiolas Christos
Publication venue: The University of Edinburgh
Publication date: 26/11/2015
Field of study

Modern computing systems comprise heterogeneous designs which combine multiple and diverse architectures on a single system. These designs provide potentials for high performance under reduced power requirements but require advanced resource management and workload scheduling across the available processors. Programmability frameworks, such as OpenCL and CUDA, enable resource management and workload scheduling on heterogeneous systems. These frameworks fully assign the control of resource allocation and scheduling to the application. This design sufficiently serves the needs of dedicated application systems but introduces significant challenges for multi-tasking environments where multiple users and applications compete for access to system resources. This thesis considers these challenges and presents three major contributions that enable efficient multi-tasking on heterogeneous systems. The presented contributions are compatible with existing systems, remain portable across vendors and do not require application changes or recompilation. The first contribution of this thesis is an optimization technique that reduces host-device communication overhead for OpenCL applications. It does this without modification or recompilation of the application source code and is portable across platforms. This work enables efficiency and performance improvements for diverse application workloads found on multi-tasking systems. The second contribution is the design and implementation of a secure, user-space virtualization layer that integrates the accelerator resources of a system with the standard multi-tasking and user-space virtualization facilities of the commodity Linux OS. It enables fine-grained sharing of mixed-vendor accelerator resources and targets heterogeneous systems found in data center nodes and requires no modification to the OS, OpenCL or application. Lastly, the third contribution is a technique and software infrastructure that enable resource sharing control on accelerators, while supporting software managed scheduling on accelerators. The infrastructure remains transparent to existing systems and applications and requires no modifications or recompilation. In enforces fair accelerator sharing which is required for multi-tasking purposes

Edinburgh Research Archive

GPU의 실시간 보장 및 더 나은 스케줄링 가능성을 위한 슬라이스 수 탐색

Author: Hayeon Park
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 이창건.This paper proposes a conditionally optimal slice counts searching algorithm to improve GPU's real-time guarantee and better schedulability. Despite the growing importance of GPUs due to the recent advances in deep learning, there is still a lack of technology to utilize them in real-time. This paper assumes a GPU as a uniprocessor and uses non-preemptive EDF to schedule GPU kernels. Additionally, solving the schedulability degradation problem caused by non-preemptive uniprocessor assumption through searching the slice count of each kernel that makes the GPU task set to be schedulable.본 논문은 GPU의 실시간성 보장 및 더 나은 스케줄링 가능성을 위한 조건부 최적 슬라이스 카운트 탐색 알고리즘을 제안한다. 근래 딥러닝의 발전으로 인해 GPU의 중요성이 커지고 있음에도 불구하고, GPU를 실시간으로 활용하기 위한 기술들은 아직 부족한 실정이다. 본 논문은 GPU를 단일 프로세서로 가정하고 비선점형 EDF를 GPU 커널의 스케줄링에 사용한다. 또한 GPU task set을 스케줄링 가능하게 만드는 슬라이스 카운트 탐색 기법을 통해 비선점형 단일 프로세서로의 가정으로 인한 스케줄링 가능성 저하 문제를 해결한다.1 Introduction 1 2 RelatedWorks 3 3 Real-Time Gaurantee of GPUs through Non-Preemptive Uniprocessor Assumption 5 4 Problem Description 9 5 Slice Counts Search 11 5.1 Blocking point, blocking tolerance, and blocking candidates 13 5.2 Searching slice counts for a task set 14 5.3 Stop conditions 18 5.4 Optimality of the slice counts search 18 5.5 Applying slice counts search in Real System 20 6 Experiment Results 21 6.1 Simulation Experiment 21 6.2 Implementation Results 23 7 Conclusion 25 References 26석

SNU Open Repository and Archive

GPrioSwap : Towards a Swapping Policy for GPUs

Author: Bellosa Frank
Gottschlag Mathias
Hillenbrand Marius
Kehne Jens
Merkel Martin
Metter Jonathan
Publication venue: Association for Computing Machinery
Publication date: 29/05/2017
Field of study

Over the last few years, Graphics Processing Units (GPUs) have become popular in computing, and have found their way into a number of cloud platforms. However, integrating a GPU into a cloud environment requires the cloud provider to efficiently virtualize the GPU. While several research projects have addressed this challenge in the past, few of these projects attempt to properly enable sharing of GPU memory between multiple clients: To date, GPUswap is the only project that enables sharing of GPU memory without inducing unnecessary application overhead, while maintaining both fairness and high utilization of GPU memory. However, GPUswap includes only a rudimentary swapping policy, and therefore induces a rather large application overhead. In this paper, we work towards a practicable swapping policy for GPUs. To that end, we analyze the behavior of various GPU applications to determine their memory access patterns. Based on our insights about these patterns, we derive a swapping policy that includes a developer-assigned priority for each GPU buffer in its swapping decisions. Experiments with our prototype implementation show that a swapping policy based on buffer priorities can significantly reduce the swapping overhead

KITopen