604 research outputs found

    HeteroCore GPU to exploit TLP-resource diversity

    Get PDF

    FairGV: Fair and Fast GPU Virtualization

    Get PDF
    Increasingly high-performance computing (HPC) application developers are opting to use cloud resources due to higher availability. Virtualized GPUs would be an obvious and attractive option for HPC application developers using cloud hosting services. Unfortunately, existing GPU virtualization software is not ready to address fairness, utilization, and performance limitations associated with consolidating mixed HPC workloads. This paper presents FairGV, a radically redesigned GPU virtualization system that achieves system-wide weighted fair sharing and strong performance isolation in mixed workloads that use GPUs with variable degrees of intensity. To achieve its objectives, FairGV introduces a trap-less GPU processing architecture, a new fair queuing method integrated with work-conserving and GPU-centric co-scheduling polices, and a collaborative scheduling method for non-preemptive GPUs. Our prototype implementation achieves near ideal fairness (? 0.97 Min-Max Ratio) with little performance degradation (? 1.02 aggregated overhead) in a range of mixed HPC workloads that leverage GPUs

    CUsched: multiprogrammed workload scheduling on GPU architectures

    Get PDF
    Graphic Processing Units (GPUs) are currently widely used in High Performance Computing (HPC) applications to speed-up the execution of massively-parallel codes. GPUs are well-suited for such HPC environments because applications share a common characteristic with the gaming codes GPUs were designed for: only one application is using the GPU at the same time. Although, minimal support for multi-programmed systems exist, modern GPUs do not allow resource sharing among different processes. This lack of support restricts the usage of GPUs in desktop and mobile environment to a small amount of applications (e.g., games and multimedia players). In this paper we study the multi-programming support available in current GPUs, and show how such support is not sufficient. We propose a set of hardware extensions to the current GPU architectures to efficiently support multi-programmed GPU workloads, allowing concurrent execution of codes from different user processes. We implement several hardware schedulers on top of these extensions and analyze the behaviour of different work scheduling algorithms using system wide and per process metrics.Postprint (published version

    Resource Aware GPU Scheduling in Kubernetes Infrastructure

    Get PDF
    Nowadays, there is an ever-increasing number of artificial intelligence inference workloads pushed and executed on the cloud. To effectively serve and manage the computational demands, data center operators have provisioned their infrastructures with accelerators. Specifically for GPUs, support for efficient management lacks, as state-of-the-art schedulers and orchestrators, threat GPUs only as typical compute resources ignoring their unique characteristics and application properties. This phenomenon combined with the GPU over-provisioning problem leads to severe resource under-utilization. Even though prior work has addressed this problem by colocating applications into a single accelerator device, its resource agnostic nature does not manage to face the resource under-utilization and quality of service violations especially for latency critical applications. In this paper, we design a resource aware GPU scheduling framework, able to efficiently colocate applications on the same GPU accelerator card. We integrate our solution with Kubernetes, one of the most widely used cloud orchestration frameworks. We show that our scheduler can achieve 58.8% lower end-to-end job execution time 99%-ile, while delivering 52.5% higher GPU memory usage, 105.9% higher GPU utilization percentage on average and 44.4% lower energy consumption on average, compared to the state-of-the-art schedulers, for a variety of ML representative workloads
    corecore