1,136 research outputs found

    Enabling preemptive multiprogramming on GPUs

    Get PDF
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan- der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al- varez, and Marc Jorda on their comments and help improving our work and this paper. This work is supported by Euro- pean Commission through TERAFLUX (FP7-249013), Mont- Blanc (FP7-288777), and RoMoL (GA-321253) projects, NVIDIA through the CUDA Center of Excellence program, Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557 projects.Peer ReviewedPostprint (author’s final draft

    Adaptive space-time sharing with SCOJO.

    Get PDF
    Coscheduling is a technique used to improve the performance of parallel computer applications under time sharing, i.e., to provide better response times than standard time sharing or space sharing. Dynamic coscheduling and gang scheduling are two main forms of coscheduling. In SCOJO (Share-based Job Coscheduling), we have introduced our own original framework to employ loosely coordinated dynamic coscheduling and a dynamic directory service in support of scheduling cross-site jobs in grid scheduling. SCOJO guarantees effective CPU shares by taking coscheduling effects into consideration and supports both time and CPU share reservation for cross-site job. However, coscheduling leads to high memory pressure and still involves problems like fragmentation and context-switch overhead, especially when applying higher multiprogramming levels. As main part of this thesis, we employ gang scheduling as more directly suitable approach for combined space-time sharing and extend SCOJO for clusters to incorporate adaptive space sharing into gang scheduling. We focus on taking advantage of moldable and malleable characteristics of realistic job mixes to dynamically adapt to varying system workloads and flexibly reduce fragmentation. In addition, our adaptive scheduling approach applies standard job-scheduling techniques like a priority and aging system, backfilling or easy backfilling. We demonstrate by the results of a discrete-event simulation that this dynamic adaptive space-time sharing approach can deliver better response times and bounded relative response times even with a lower multiprogramming level than traditional gang scheduling.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .H825. Source: Masters Abstracts International, Volume: 43-01, page: 0237. Adviser: A. Sodan. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

    Get PDF
    The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected set of processors to real-time operations. A migration mechanism of non-preemptible tasks insures a latency level on these real-time processors. Furthermore, specific load-balancing strategies permit ARTiS to benefit from the full power of the SMP systems: the real-time reservation, while guaranteed, is not exclusive and does not imply a waste of resources. This document describes the theoretical approach of ARTiS as well as the details of the Linux implementation. Several kind of measurements are also presented in order to validate the results

    A Preemption-Based Meta-Scheduling System for Distributed Computing

    Get PDF
    This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments. In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers. Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution. The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application

    Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

    Get PDF
    Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation

    Modeling of an Adaptive Parallel System with Malleable Applications in a Distributed Computing Environment

    Get PDF
    Adaptive parallel applications that can change resources during execution, promise increased application performance and better system utilization. Furthermore, they open the opportunity for developing a new class of parallel applications driven by unpredictable data and events. The research issues in an adaptive parallel system are complex and interrelated. The nature and complexities of the relationships among these issues are not well researched and understood. Before developing adaptive applications or an infrastructure support for adaptive applications, these issues need to be investigated and studied in detail. One way of understanding and investigating these issues is by modeling and simulation. A model for adaptive parallel systems has been developed to enable the investigation of the impact of malleable workloads on its performance. The model can be used to determine how different model parameters impact the performance of the system and to determine the relationships among them Subsequently, a discrete event simulator has been developed to numerically simulate the model. Using the simulator, the impact of the variation in the number of malleable jobs in the workload, the flexibility, the negotiation cost, and the adaptation cost on system performance have been studied. The results and conclusions of these simulation experiments are presented in this dissertation. In general, the simulation results reveal that the performance improves with an increase in the number of malleable jobs in a workload, and that the performance saturates at a certain percentage of rigid to malleable jobs mix. A high percentage of malleable jobs is not necessary to achieve significant improvement in performance. The performance in general improves as the flexibility increases up to a certain point; then, it saturates. The negotiation cost impacts the performance, but not significantly. The number of negotiations for a given workload increases as number of malleable jobs increases up to a certain point, and then it decreases as number of malleable jobs increases further. The performance degrades as the application adaptation cost increases. The impact of the application adaptation cost on performance is much more significant compared to that of the negotiation cost

    Modeling of an Adaptive Parallel System with Malleable Applications in a Distributed Computing Environment

    Get PDF
    Adaptive parallel applications that can change resources during execution, promise increased application performance and better system utilization. Furthermore, they open the opportunity for developing a new class of parallel applications driven by unpredictable data and events. The research issues in an adaptive parallel system are complex and interrelated. The nature and complexities of the relationships among these issues are not well researched and understood. Before developing adaptive applications or an infrastructure support for adaptive applications, these issues need to be investigated and studied in detail. One way of understanding and investigating these issues is by modeling and simulation. A model for adaptive parallel systems has been developed to enable the investigation of the impact of malleable workloads on its performance. The model can be used to determine how different model parameters impact the performance of the system and to determine the relationships among them Subsequently, a discrete event simulator has been developed to numerically simulate the model. Using the simulator, the impact of the variation in the number of malleable jobs in the workload, the flexibility, the negotiation cost, and the adaptation cost on system performance have been studied. The results and conclusions of these simulation experiments are presented in this dissertation. In general, the simulation results reveal that the performance improves with an increase in the number of malleable jobs in a workload, and that the performance saturates at a certain percentage of rigid to malleable jobs mix. A high percentage of malleable jobs is not necessary to achieve significant improvement in performance. The performance in general improves as the flexibility increases up to a certain point; then, it saturates. The negotiation cost impacts the performance, but not significantly. The number of negotiations for a given workload increases as number of malleable jobs increases up to a certain point, and then it decreases as number of malleable jobs increases further. The performance degrades as the application adaptation cost increases. The impact of the application adaptation cost on performance is much more significant compared to that of the negotiation cost

    Scheduling independent stochastic tasks under deadline and budget constraints

    Get PDF
    International audienceThis paper discusses scheduling strategies for the problem of maximizing the expected number of tasks that can be executed on a cloud platform within a given budget and under a deadline constraint. The execution times of tasks follow IID probability laws. The main questions are how many processors to enroll and whether and when to interrupt tasks that have been executing for some time. We provide complexity results and an asymptotically optimal strategy for the problem instance with discrete probability distributions and without deadline. We extend the latter strategy for the general case with continuous distributions and a deadline and we design an efficient heuristic which is shown to outperform standard approaches when running simulations for a variety of useful distribution laws
    • …
    corecore