13 research outputs found

    Practical study of bare metal virtualization solutions

    Get PDF
    With the hardware breakthroughs accomplished through the years, the idea of software defined hardware has become a reality. Hypervisors such as KVM, Xen, Hyper-V and ESXi enable the cloud of today, with hardware consolidation bringing a reduction in operating costs. In this scope, it is imperative to address the performance of all the different virtualization implementations, in order to discover any potential bottlenecks and bugs. In this work, the performance of all the prominent Type-1 virtualization platforms is analyzed, using guests representative of the Windows NT and Linux kernels, in the form of Windows 10 LTSB and Ubuntu Server 16.04 LTS. The effectiveness of the CPU scheduler of each hypervisor is put to the test, as well as the storage backend performance under multiple scenarios (iSCSI, NFS and local). In short, this project provides a snapshot of the current state of the virtualization market, covering CPU, Memory, 2D & 3D Graphics performance of oVirt, Proxmox, XenServer, Hyper-V and VMware Vsphere. All the benchmarks were executed using their own default settings, with some automation scripts, in order to accelerate the process and exclude variability as much as possible. Among the selected benchmarks were: Passmark Performance Test 9 to benchmark Windows performance; Unixbench, providing a way to extrapolate the performance of Linux guests; (ez)FIO allowed in-depth analysis of filesystem performance across platforms. Concluding, there are a few generalizations that can be made from the information gathered: XenServer, oVirt and Proxmox require the presence of xentools/virtio in order to provide good I/O throughput; GPU passthrough provides native performance as long as there is no resource overcommitment; VMware's Vsphere provides impressive CPU performance, edging out the competition, with 98% of the native performance; Hyper-V offers mediocre 2D Desktop performance (28% of the native performance), as such, it should not be used in VMs that provide interactive desktops; Similarly, Hyper-V's performance plunges in memory related workloads, when compared to the remaining platforms and bare metal, with a mere 83%; The remote I/O results crown iSCSI as best performer, with double the performance of NFS; All the open source platforms (Proxmox, oVirt and XenServer) display impressive remote I/O performance, in both iSCSI and NFS.info:eu-repo/semantics/publishedVersio

    Glider: A GPU Library Driver for Improved System Security

    Full text link
    Legacy device drivers implement both device resource management and isolation. This results in a large code base with a wide high-level interface making the driver vulnerable to security attacks. This is particularly problematic for increasingly popular accelerators like GPUs that have large, complex drivers. We solve this problem with library drivers, a new driver architecture. A library driver implements resource management as an untrusted library in the application process address space, and implements isolation as a kernel module that is smaller and has a narrower lower-level interface (i.e., closer to hardware) than a legacy driver. We articulate a set of device and platform hardware properties that are required to retrofit a legacy driver into a library driver. To demonstrate the feasibility and superiority of library drivers, we present Glider, a library driver implementation for two GPUs of popular brands, Radeon and Intel. Glider reduces the TCB size and attack surface by about 35% and 84% respectively for a Radeon HD 6450 GPU and by about 38% and 90% respectively for an Intel Ivy Bridge GPU. Moreover, it incurs no performance cost. Indeed, Glider outperforms a legacy driver for applications requiring intensive interactions with the device driver, such as applications using the OpenGL immediate mode API

    GPrioSwap : Towards a Swapping Policy for GPUs

    Get PDF
    Over the last few years, Graphics Processing Units (GPUs) have become popular in computing, and have found their way into a number of cloud platforms. However, integrating a GPU into a cloud environment requires the cloud provider to efficiently virtualize the GPU. While several research projects have addressed this challenge in the past, few of these projects attempt to properly enable sharing of GPU memory between multiple clients: To date, GPUswap is the only project that enables sharing of GPU memory without inducing unnecessary application overhead, while maintaining both fairness and high utilization of GPU memory. However, GPUswap includes only a rudimentary swapping policy, and therefore induces a rather large application overhead. In this paper, we work towards a practicable swapping policy for GPUs. To that end, we analyze the behavior of various GPU applications to determine their memory access patterns. Based on our insights about these patterns, we derive a swapping policy that includes a developer-assigned priority for each GPU buffer in its swapping decisions. Experiments with our prototype implementation show that a swapping policy based on buffer priorities can significantly reduce the swapping overhead

    LoGA : Low-Overhead GPU Accounting Using Events

    Get PDF
    Over the last few years, GPUs have become common in computing. However, current GPUs are not designed for a shared environment like a cloud, creating a number of challenges whenever a GPU must be multiplexed between multiple users. In particular, the round-robin scheduling used by today\u27s GPUs does not distribute the available GPU computation time fairly among applications. Most of the previous work addressing this problem resorted to scheduling all GPU computation in software, which induces high overhead. While there is a GPU scheduler called NEON which reduces the scheduling overhead compared to previous work, NEON\u27s accounting mechanism frequently disables GPU access for all but one application, resulting in considerable overhead if that application does not saturate the GPU by itself. In this paper, we present LoGA, a novel accounting mechanism for GPU computation time. LoGA monitors the GPU\u27s state to detect GPU-internal context switches, and infers the amount of GPU computation time consumed by each process from the time between these context switches. This method allows LoGA to measure GPU computation time consumed by applications while keeping all applications running concurrently. As a result, LoGA achieves a lower accounting overhead than previous work, especially for applications that do not saturate the GPU by themselves. We have developed a prototype which combines LoGA with the pre-existing NEON scheduler. Experiments with that prototype have shown that LoGA induces no accounting overhead while still delivering accurate measurements of applications\u27 consumed GPU computation time

    ์—ฃ์ง€ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์„ ์œ„ํ•œ ์—ฐ์‚ฐ ์˜คํ”„๋กœ๋”ฉ ์‹œ์Šคํ…œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๋ฌธ์ˆ˜๋ฌต.The purpose of my dissertation is to build lightweight edge computing systems which provide seamless offloading services even when users move across multiple edge servers. I focused on two specific application domains: 1) web applications and 2) DNN applications. I propose an edge computing system which offload computations from web-supported devices to edge servers. The proposed system exploits the portability of web apps, i.e., distributed as source code and runnable without installation, when migrating the execution state of web apps. This significantly reduces the complexity of state migration, allowing a web app to migrate within a few seconds. Also, the proposed system supports offloading of webassembly, a standard low-level instruction format for web apps, having achieved up to 8.4x speedup compared to offloading of pure JavaScript codes. I also propose incremental offloading of neural network (IONN), which simultaneously offloads DNN execution while deploying a DNN model, thus reducing the overhead of DNN model deployment. Also, I extended IONN to support large-scale edge server environments by proactively migrating DNN layers to edge servers where mobile users are predicted to visit. Simulation with open-source mobility dataset showed that the proposed system could significantly reduce the overhead of deploying a DNN model.๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉ์ ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ด๋™ํ•˜๋Š” ๋™์•ˆ์—๋„ ์›ํ™œํ•œ ์—ฐ์‚ฐ ์˜คํ”„๋กœ๋”ฉ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒฝ๋Ÿ‰ ์—ฃ์ง€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ์ธ๊ณต์‹ ๊ฒฝ๋ง (DNN: Deep Neural Network) ์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋„๋ฉ”์ธ์—์„œ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ์งธ, ์›น ์ง€์› ์žฅ์น˜์—์„œ ์—ฃ์ง€ ์„œ๋ฒ„๋กœ ์—ฐ์‚ฐ์„ ์˜คํ”„๋กœ๋“œํ•˜๋Š” ์—ฃ์ง€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ์€ ์›น ์•ฑ์˜ ์‹คํ–‰ ์ƒํƒœ๋ฅผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ํ•  ๋•Œ ์›น ์•ฑ์˜ ๋†’์€ ์ด์‹์„ฑ(์†Œ์Šค ์ฝ”๋“œ๋กœ ๋ฐฐํฌ๋˜๊ณ  ์„ค์น˜ํ•˜์ง€ ์•Š๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Œ)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒํƒœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์˜ ๋ณต์žก์„ฑ์ด ํฌ๊ฒŒ ์ค„์—ฌ์„œ ์›น ์•ฑ์ด ๋ช‡ ์ดˆ ๋‚ด์— ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ์€ ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ํ‘œ์ค€ ์ €์ˆ˜์ค€ ์ธ์ŠคํŠธ๋Ÿญ์…˜์ธ ์›น ์–ด์…ˆ๋ธ”๋ฆฌ ์˜คํ”„๋กœ๋“œ๋ฅผ ์ง€์›ํ•˜์—ฌ ์ˆœ์ˆ˜ํ•œ JavaScript ์ฝ”๋“œ ์˜คํ”„๋กœ๋“œ์™€ ๋น„๊ตํ•˜์—ฌ ์ตœ๋Œ€ 8.4 ๋ฐฐ์˜ ์†๋„ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‘˜์งธ, DNN ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์—ฃ์ง€ ์„œ๋ฒ„์— ๋ฐฐํฌํ•  ๋•Œ, DNN ๋ชจ๋ธ์„ ์ „์†กํ•˜๋Š” ๋™์•ˆ DNN ์—ฐ์‚ฐ์„ ์˜คํ”„๋กœ๋“œ ํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ ์ง„์  ์˜คํ”„๋กœ๋“œ ๋ฐฉ์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ฐ”์ผ ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐฉ๋ฌธ ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋Š” ์—ฃ์ง€ ์„œ๋ฒ„๋กœ DNN ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์ „์— ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜์—ฌ ์ฝœ๋“œ ์Šคํƒ€ํŠธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆ ํ•ฉ๋‹ˆ๋‹ค. ์˜คํ”ˆ ์†Œ์Šค ๋ชจ๋นŒ๋ฆฌํ‹ฐ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ, DNN ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ฉด์„œ ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ œ์•ˆ ํ•˜๋Š” ๋ฐฉ์‹์ด ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.Chapter 1. Introduction 1 1.1 Offloading Web App Computations to Edge Servers 1 1.2 Offloading DNN Computations to Edge Servers 3 Chapter 2. Seamless Offloading of Web App Computations 7 2.1 Motivation: Computation-Intensive Web Apps 7 2.2 Mobile Web Worker System 10 2.2.1 Review of HTML5 Web Worker 10 2.2.2 Mobile Web Worker System 11 2.3 Migrating Web Worker 14 2.3.1 Runtime State of Web Worker 15 2.3.2 Snapshot of Mobile Web Worker 16 2.3.3 End-to-End Migration Process 21 2.4 Evaluation 22 2.4.1 Experimental Environment 22 2.4.2 Migration Performance 24 2.4.3 Application Execution Performance 27 Chapter 3. IONN: Incremental Offloading of Neural Network Computations 30 3.1 Motivation: Overhead of Deploying DNN Model 30 3.2 Background 32 3.2.1 Deep Neural Network 33 3.2.2 Offloading of DNN Computations 33 3.3 IONN For DNN Edge Computing 35 3.4 DNN Partitioning 37 3.4.1 Neural Network (NN) Execution Graph 38 3.4.2 Partitioning Algorithm 40 3.4.3 Handling DNNs with Multiple Paths. 43 3.5 Evaluation 45 3.5.1 Experimental Environment 45 3.5.2 DNN Query Performance 46 3.5.3 Accuracy of Prediction Functions 48 3.5.4 Energy Consumption. 49 Chapter 4. PerDNN: Offloading DNN Computations to Pervasive Edge Servers 51 4.1 Motivation: Cold Start Issue 51 4.2 Proposed Offloading System: PerDNN 52 4.2.1 Edge Server Environment 53 4.2.2 Overall Architecture 54 4.2.3 GPU-aware DNN Partitioning 56 4.2.4 Mobility Prediction 59 4.3 Evaluation 63 4.3.1 Performance Gain of Single Client 64 4.3.2 Large-Scale Simulation 65 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion. 78 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion 78 Bibliography 80Docto

    A Performance Comparison of VMware GPU Virtualization Techniques in Cloud Gaming

    Get PDF
    Cloud gaming is an application deployment scenario which runs an interactive gaming application remotely in a cloud according to the commands received from a thin client and streams the scenes as a video sequence back to the client over the Internet, and it is of interest to both research community and industry. The academic community has developed some open-source cloud gaming systems such as GamingAnywhere for research study, while some industrial pioneers such as Onlive and Gaikai have succeeded in gaining a large user base in the cloud gaming market. Graphical Processing Unit (GPU) virtualization plays an important role in such an environment as it is a critical component that allows virtual machines to run 3D applications with performance guarantees. Currently, GPU pass-through and GPU sharing are the two main techniques of GPU virtualization. The former enables a single virtual machine to access a physical GPU directly and exclusively, while the latter makes a physical GPU shareable by multiple virtual machines. VMware Inc., one of the most popular virtualization solution vendors, has provided concrete implementations of GPU pass-through and GPU sharing. In particular, it provides a GPU pass-through solution called Virtual Dedicated Graphics Acceleration (vDGA) and a GPU-sharing solution called Virtual Shared Graphics Acceleration (vSGA). Moreover, VMware Inc. recently claimed it realized another GPU sharing solution called vGPU. Nevertheless, the feasibility and performance of these solutions in cloud gaming has not been studied yet. In this work, an experimental study is conducted to evaluate the feasibility and performance of GPU pass-through and GPU sharing solutions offered by VMware in cloud gaming scenarios. The primary results confirm that vDGA and vGPU techniques can fit the demands of cloud gaming. In particular, these two solutions achieved good performance in the tested graphics card benchmarks, and gained acceptable image quality and response delay for the tested games
    corecore