69 research outputs found
Disengaged Scheduling for Fair, Protected Access to Fast Computational Accelerators
Todayโs operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4 % overhead on average (max 18%) compared to direct devic
Glider: A GPU Library Driver for Improved System Security
Legacy device drivers implement both device resource management and
isolation. This results in a large code base with a wide high-level interface
making the driver vulnerable to security attacks. This is particularly
problematic for increasingly popular accelerators like GPUs that have large,
complex drivers. We solve this problem with library drivers, a new driver
architecture. A library driver implements resource management as an untrusted
library in the application process address space, and implements isolation as a
kernel module that is smaller and has a narrower lower-level interface (i.e.,
closer to hardware) than a legacy driver. We articulate a set of device and
platform hardware properties that are required to retrofit a legacy driver into
a library driver. To demonstrate the feasibility and superiority of library
drivers, we present Glider, a library driver implementation for two GPUs of
popular brands, Radeon and Intel. Glider reduces the TCB size and attack
surface by about 35% and 84% respectively for a Radeon HD 6450 GPU and by about
38% and 90% respectively for an Intel Ivy Bridge GPU. Moreover, it incurs no
performance cost. Indeed, Glider outperforms a legacy driver for applications
requiring intensive interactions with the device driver, such as applications
using the OpenGL immediate mode API
FairGV: Fair and Fast GPU Virtualization
Increasingly high-performance computing (HPC) application developers are opting to use cloud resources due to higher availability. Virtualized GPUs would be an obvious and attractive option for HPC application developers using cloud hosting services. Unfortunately, existing GPU virtualization software is not ready to address fairness, utilization, and performance limitations associated with consolidating mixed HPC workloads. This paper presents FairGV, a radically redesigned GPU virtualization system that achieves system-wide weighted fair sharing and strong performance isolation in mixed workloads that use GPUs with variable degrees of intensity. To achieve its objectives, FairGV introduces a trap-less GPU processing architecture, a new fair queuing method integrated with work-conserving and GPU-centric co-scheduling polices, and a collaborative scheduling method for non-preemptive GPUs. Our prototype implementation achieves near ideal fairness (? 0.97 Min-Max Ratio) with little performance degradation (? 1.02 aggregated overhead) in a range of mixed HPC workloads that leverage GPUs
LoGA : Low-Overhead GPU Accounting Using Events
Over the last few years, GPUs have become common in computing. However, current GPUs are not designed for a shared environment like a cloud, creating a number of challenges whenever a GPU must be multiplexed between multiple users. In particular, the round-robin scheduling used by today\u27s GPUs does not distribute the available GPU computation time fairly among applications. Most of the previous work addressing this problem resorted to scheduling all GPU computation in software, which induces high overhead. While there is a GPU scheduler called NEON which reduces the scheduling overhead compared to previous work, NEON\u27s accounting mechanism frequently disables GPU access for all but one application, resulting in considerable overhead if that application does not saturate the GPU by itself.
In this paper, we present LoGA, a novel accounting mechanism for GPU computation time. LoGA monitors the GPU\u27s state to detect GPU-internal context switches, and infers the amount of GPU computation time consumed by each process from the time between these context switches. This method allows LoGA to measure GPU computation time consumed by applications while keeping all applications running concurrently. As a result, LoGA achieves a lower accounting overhead than previous work, especially for applications that do not saturate the GPU by themselves. We have developed a prototype which combines LoGA with the pre-existing NEON scheduler. Experiments with that prototype have shown that LoGA induces no accounting overhead while still delivering accurate measurements of applications\u27 consumed GPU computation time
Recommended from our members
Smart Resource Sharing for Concurrency and Security
Different layers of the computer system, from the low-level hardware accelerators and networks-on-chip (NoC) in multi-core systems, to the upper-level operating systems and software applications, rely on the sharing of hardware computing resources. Unfortunately such sharing, when not carefully managed, can introduce a host of protection problems and sources of information leakage. We describe a set of methods by which it is possible to systematically scale performance via hardware sharing without exacerbating security properties by being aware of the design and characteristics of individual layers and components. The key to this is efficiently dealing with security vulnerabilities introduced by sharing in terms of time and space through the creation of new security-conscious sharing interfaces. In a systematic way is to first define coordination techniques into more detailed patterns, and by bridging the gap of less efficient universal measures with provably more performant and secure patterns.Specifically we demonstrate the usefulness of a sharing pattern for hardware and software systems where separation is of concern (interference and timing channel mitigation, etc). The most important insight is that in order to fully utilize computing resources (to improve performance and availability), the entities that share these resources must coordinate in a pre-calculated way. More dynamic approaches to improve performance and concurrency are likely to introduce new interference in the system. While we show that certain static scheduling measures in lower level hardware such as networks-on-chip can provably eliminate timing channels, the dynamic nature of software systems makes covert channels harder to be confined. Besides, software systems also face other types of security problems beyond side channels. To improve concurrency and performance without exacerbating security requires a slightly different approach.To study the obstacles that hinder software applications' scaling in a system because of security concerns, we delve into the Android operating system and its appification ecosystem structure. A prime avenue for attack is introduced because of its distributed sharing eco-pattern. We propose a centralized approach with a single reliable service as a method to enable computation reuse among applications. The proposed centralization technique favors well-protected application-to-system communications over vulnerable application-to-application communications. Thus not only computation concurrency is boosted but also the possibility of an app being attacked through the attack-prone Inter-Component Calls (ICCs) due to possible distributed computation sharing is eliminated. This approach further enables improvements to security with the addition of a novel application-centric grouping for isolation. We show through a prototype on Android how our approach supports and protects inter-app resource sharing, while improving concurrency at scale
Efficient and portable multi-tasking for heterogeneous systems
Modern computing systems comprise heterogeneous designs which combine multiple
and diverse architectures on a single system. These designs provide potentials for
high performance under reduced power requirements but require advanced resource
management and workload scheduling across the available processors.
Programmability frameworks, such as OpenCL and CUDA, enable resource management
and workload scheduling on heterogeneous systems. These frameworks fully
assign the control of resource allocation and scheduling to the application. This design
sufficiently serves the needs of dedicated application systems but introduces significant
challenges for multi-tasking environments where multiple users and applications
compete for access to system resources.
This thesis considers these challenges and presents three major contributions that
enable efficient multi-tasking on heterogeneous systems. The presented contributions
are compatible with existing systems, remain portable across vendors and do not require
application changes or recompilation.
The first contribution of this thesis is an optimization technique that reduces host-device
communication overhead for OpenCL applications. It does this without modification
or recompilation of the application source code and is portable across platforms.
This work enables efficiency and performance improvements for diverse application
workloads found on multi-tasking systems.
The second contribution is the design and implementation of a secure, user-space
virtualization layer that integrates the accelerator resources of a system with the standard
multi-tasking and user-space virtualization facilities of the commodity Linux OS.
It enables fine-grained sharing of mixed-vendor accelerator resources and targets heterogeneous
systems found in data center nodes and requires no modification to the OS,
OpenCL or application.
Lastly, the third contribution is a technique and software infrastructure that enable
resource sharing control on accelerators, while supporting software managed scheduling
on accelerators. The infrastructure remains transparent to existing systems and
applications and requires no modifications or recompilation. In enforces fair accelerator
sharing which is required for multi-tasking purposes
GPU์ ์ค์๊ฐ ๋ณด์ฅ ๋ฐ ๋ ๋์ ์ค์ผ์ค๋ง ๊ฐ๋ฅ์ฑ์ ์ํ ์ฌ๋ผ์ด์ค ์ ํ์
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021.8. ์ด์ฐฝ๊ฑด.This paper proposes a conditionally optimal slice counts searching algorithm to improve GPU's real-time guarantee and better schedulability. Despite the growing importance of GPUs due to the recent advances in deep learning, there is still a lack of technology to utilize them in real-time. This paper assumes a GPU as a uniprocessor and uses non-preemptive EDF to schedule GPU kernels. Additionally, solving the schedulability degradation problem caused by non-preemptive uniprocessor assumption through searching the slice count of each kernel that makes the GPU task set to be schedulable.๋ณธ ๋
ผ๋ฌธ์ GPU์ ์ค์๊ฐ์ฑ ๋ณด์ฅ ๋ฐ ๋ ๋์ ์ค์ผ์ค๋ง ๊ฐ๋ฅ์ฑ์ ์ํ ์กฐ๊ฑด๋ถ ์ต์ ์ฌ๋ผ์ด์ค ์นด์ดํธ ํ์ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. ๊ทผ๋ ๋ฅ๋ฌ๋์ ๋ฐ์ ์ผ๋ก ์ธํด GPU์ ์ค์์ฑ์ด ์ปค์ง๊ณ ์์์๋ ๋ถ๊ตฌํ๊ณ , GPU๋ฅผ ์ค์๊ฐ์ผ๋ก ํ์ฉํ๊ธฐ ์ํ ๊ธฐ์ ๋ค์ ์์ง ๋ถ์กฑํ ์ค์ ์ด๋ค. ๋ณธ ๋
ผ๋ฌธ์ GPU๋ฅผ ๋จ์ผ ํ๋ก์ธ์๋ก ๊ฐ์ ํ๊ณ ๋น์ ์ ํ EDF๋ฅผ GPU ์ปค๋์ ์ค์ผ์ค๋ง์ ์ฌ์ฉํ๋ค. ๋ํ GPU task set์ ์ค์ผ์ค๋ง ๊ฐ๋ฅํ๊ฒ ๋ง๋๋ ์ฌ๋ผ์ด์ค ์นด์ดํธ ํ์ ๊ธฐ๋ฒ์ ํตํด ๋น์ ์ ํ ๋จ์ผ ํ๋ก์ธ์๋ก์ ๊ฐ์ ์ผ๋ก ์ธํ ์ค์ผ์ค๋ง ๊ฐ๋ฅ์ฑ ์ ํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ค.1 Introduction 1
2 RelatedWorks 3
3 Real-Time Gaurantee of GPUs through Non-Preemptive Uniprocessor Assumption 5
4 Problem Description 9
5 Slice Counts Search 11
5.1 Blocking point, blocking tolerance, and blocking candidates 13
5.2 Searching slice counts for a task set 14
5.3 Stop conditions 18
5.4 Optimality of the slice counts search 18
5.5 Applying slice counts search in Real System 20
6 Experiment Results 21
6.1 Simulation Experiment 21
6.2 Implementation Results 23
7 Conclusion 25
References 26์
GPrioSwap : Towards a Swapping Policy for GPUs
Over the last few years, Graphics Processing Units (GPUs) have become popular in computing, and have found their way into a number of cloud platforms. However, integrating a GPU into a cloud environment requires the cloud provider to efficiently virtualize the GPU. While several research projects have addressed this challenge in the past, few of these projects attempt to properly enable sharing of GPU memory between multiple clients: To date, GPUswap is the only project that enables sharing of GPU memory without inducing unnecessary application overhead, while maintaining both fairness and high utilization of GPU memory. However, GPUswap includes only a rudimentary swapping policy, and therefore induces a rather large application overhead.
In this paper, we work towards a practicable swapping policy for GPUs. To that end, we analyze the behavior of various GPU applications to determine their memory access patterns. Based on our insights about these patterns, we derive a swapping policy that includes a developer-assigned priority for each GPU buffer in its swapping decisions. Experiments with our prototype implementation show that a swapping policy based on buffer priorities can significantly reduce the swapping overhead
- โฆ