8 research outputs found
Extra unit-speed machines are almost as powerful as speedy machines for flow time scheduling
We study online scheduling of jobs to minimize the flow time and stretch on parallel machines. We consider algorithms that are given extra resources so as to compensate for the lack of future information. Recent results show that a modest increase in machine speed can provide very competitive performance; in particular, using O(1) times faster machines, the algorithm SRPT (shortest remaining processing time) is 1-competitive for both flow time [C. A. Phillips et al., in Proceedings of STOC, ACM, New York, 1997, pp. 140-149] and stretch [W. T. Chan et al., in Proceedings of MFCS, Springer-Verlag, Berlin, 2005, pp. 236-247] and HDF (highest density first) is O(1)-competitive for weighted flow time [L. Becchetti et al., in Proceedings of RANDOM-APPROX, Springer-Verlag, Berlin, 2001, pp. 36-47]. Using extra unit-speed machines instead of faster machines to achieve competitive performance is more challenging, as a faster machine can speed up a job but extra unit-speed machines cannot. This paper gives a nontrivial relationship between the extra-speed and extra-machine analyses. It shows that competitive results via faster machines can be transformed to similar results via extra machines, hence giving the first algorithms that, using O(1) times unit-speed machines, are 1-competitive for flow time and stretch and O(1)-competitive for weighted flow time. © 2008 Society for Industrial and Applied Mathematics.published_or_final_versio
Parallel Real-Time Scheduling for Latency-Critical Applications
In order to provide safety guarantees or quality of service guarantees, many of today\u27s systems consist of latency-critical applications, e.g. applications with timing constraints. The problem of scheduling multiple latency-critical jobs on a multiprocessor or multicore machine has been extensively studied for sequential (non-parallizable) jobs and different system models and different objectives have been considered. However, the computational requirement of a single job is still limited by the capacity of a single core. To provide increasingly complex functionalities of applications and to complete their higher computational demands within the same or even more stringent timing constraints, we must exploit the internal parallelism of jobs, where individual jobs are parallel programs and can potentially utilize more than one core in parallel. However, there is little work considering scheduling multiple parallel jobs that are latency-critical.
This dissertation focuses on developing new scheduling strategies, analysis tools, and practical platform design techniques to enable efficient and scalable parallel real-time scheduling for latency-critical applications on multicore systems. In particular, the research is focused on two types of systems: (1) static real-time systems for tasks with deadlines where the temporal properties of the tasks that need to execute is known a priori and the goal is to guarantee the temporal correctness of the tasks prior to their executions; and (2) online systems for latency-critical jobs where multiple jobs arrive over time and the goal to optimize for a performance objective of jobs during the execution.
For static real-time systems for parallel tasks, several scheduling strategies, including global earliest deadline first, global rate monotonic and a novel federated scheduling, are proposed, analyzed and implemented. These scheduling strategies have the best known theoretical performance for parallel real-time tasks under any global strategy, any fixed priority scheduling and any scheduling strategy, respectively. In addition, federated scheduling is generalized to systems with multiple criticality levels and systems with stochastic tasks. Both numerical and empirical experiments show that federated scheduling and its variations have good schedulability performance and are efficient in practice.
For online systems with multiple latency-critical jobs, different online scheduling strategies are proposed and analyzed for different objectives, including maximizing the number of jobs meeting a target latency, maximizing the profit of jobs, minimizing the maximum latency and minimizing the average latency. For example, a simple First-In-First-Out scheduler is proven to be scalable for minimizing the maximum latency. Based on this theoretical intuition, a more practical work-stealing scheduler is developed, analyzed and implemented. Empirical evaluations indicate that, on both real world and synthetic workloads, this work-stealing implementation performs almost as well as an optimal scheduler
A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing
This PhD project is motivated by the need to develop and achieve better and energy efficient computing through the use of parallelism and heterogeneous systems. Our contribution consists of both theoretical aspects, as well as in-depth and comprehensive empirical studies that aim to provide more insight into parallel and heterogeneous computing. Our first problem is a theoretical problem that focuses on the scheduling of a special category of jobs known as deteriorating jobs. These kind of jobs will require more effort to complete them if postponed to a later time. They are intended to model several industrial processes including steel production, fire-fighting and financial management. We study the problem in the context of parallel machine scheduling in an online setting where jobs have arbitrary release times. Our main results show that List Scheduling is -competitive and that no deterministic algorithm is better than , where is the largest deteriorating rate. We also extend our results to online deterministic algorithms and show that no deterministic online algorithm is better than -competitive. Our next study concerns the scheduling of jobs with precedence constraints on parallel machines. We are interested in the precedence constraint known as chain precedence constraint where each job can have at most one predecessor and at most one successor. The jobs are modelled as directed acyclic graphs where nodes represent the jobs and edges represent the precedence constraints between jobs. The jobs have a strict deadline that must be met. The parallel machines are considered to be unrelated and a communication network connects each pair of machines. Execution of the jobs on the machines as well as communication across the network incurs costs in the form of time and energy. These costs are given by cost matrices that covers processing and communication. The goal is to construct a feasible schedule that minimizes the total energy required to execute the chain of jobs on the machines, such that all deadlines are met. We present a dynamic programming solution to the problem that leads to a pseudo polynomial time algorithm with running time , where is the largest deadline. We show that the algorithm computes an optimal schedule where one exists. We then proceed to a similar problem that involves the scheduling of jobs to minimize flow time plus energy. This problem is based on a dynamic speed scaling heuristic in literature that is able to adjust the speed of a processor based on the number of \emph{active jobs}, called AJC. We present a comprehensive empirical study that consists of several job selection, speed selection and processor allocation heuristics. We also consider both single processor and multi processor settings. Our main goal is to investigate the viability of designing a fixed-speed counterpart for AJC, that is not as computationally intensive as AJC, while being very simple. We also evaluate the performance of this fixed speed heuristic and compare it with that of AJC. Our fourth and final study involves the use of graphics processing unit (GPU) as an accelerator for compute intensive tasks. The GPU has become a very popular multi processor for heterogeneous computing both from an economical point of view and performance standpoint. Firstly, we contribute to the development of a Bioinformatics tool, called GapsMis, by implementing a heterogeneous version that uses graphics processors for acceleration. GapsMis is a tool designed for the alignment of sequences, like protein and DNA sequences, and allows for the insertion of gaps in the alignment. Then we present a case study that aims to highlight the various aspects, including benefits and challenges, involved in developing heterogeneous applications that is vendor-agnostic. In order to do this we select four algorithms as case studies including GapsMis and the algorithm presented in our second problem. The other two algorithms are based on the Velocity-Verlet integration and the Fruchterman-Reingold force-based method for graph layout. We make use of the Open Computing Language (OpenCL) and C++ for implementation of the algorithms on a range of graphics processors from Advanced Micro Devices (AMD) and NVIDIA Corporation. We evaluate several factors that can affect performance of these applications on each hardware. We also compare the performance of our algorithms in a multi-GPU setting and against single and multi-core CPU implementations. Furthermore, several metrics are defined to capture several aspects of performance including execution time of application kernel(s), execution time of application including communication times, throughput, power and energy consumption
Recommended from our members
Group-EDF: A New Approach and an Efficient Non-Preemptive Algorithm for Soft Real-Time Systems
Hard real-time systems in robotics, space and military missions, and control devices are specified with stringent and critical time constraints. On the other hand, soft real-time applications arising from multimedia, telecommunications, Internet web services, and games are specified with more lenient constraints. Real-time systems can also be distinguished in terms of their implementation into preemptive and non-preemptive systems. In preemptive systems, tasks are often preempted by higher priority tasks. Non-preemptive systems are gaining interest for implementing soft-real applications on multithreaded platforms. In this dissertation, I propose a new algorithm that uses a two-level scheduling strategy for scheduling non-preemptive soft real-time tasks. Our goal is to improve the success ratios of the well-known earliest deadline first (EDF) approach when the load on the system is very high and to improve the overall performance in both underloaded and overloaded conditions. Our approach, known as group-EDF (gEDF), is based on dynamic grouping of tasks with deadlines that are very close to each other, and using a shortest job first (SJF) technique to schedule tasks within the group. I believe that grouping tasks dynamically with similar deadlines and utilizing secondary criteria, such as minimizing the total execution time can lead to new and more efficient real-time scheduling algorithms. I present results comparing gEDF with other real-time algorithms including, EDF, best-effort, and guarantee scheme, by using randomly generated tasks with varying execution times, release times, deadlines and tolerances to missing deadlines, under varying workloads. Furthermore, I implemented the gEDF algorithm in the Linux kernel and evaluated gEDF for scheduling real applications
New resource augmentation analysis of the total stretch of SRPT and SJF in multiprocessor scheduling
LNCS v. 3618 entitled: Mathematical Foundations of Computer Science 2005: 30th International Symposium, MFCS 2005, Gdansk, Poland, August29-September 2. 2005, ProceedingsThis paper studies online job scheduling on multiprocessors and, in particular, investigates the algorithms SRPT and SJF for minimizing total stretch, where the stretch of a job is its flow time (response time) divided by its processing time. SRPT is perhaps the most well-studied algorithm for minimizing total flow time or stretch. This paper gives the first resource augmentation analysis of the total stretch of SRPT, showing that it is indeed O(1)-speed 1-competitive. This paper also gives a simple lower bound result that SRPT is not s-speed 1-competitive for any s < 1.5. This paper also makes contribution to the analysis of SJF. Extending the work of [4], we are able to show that SJF is O(1)-speed 1-competitive for minimizing total stretch. More interestingly, we find that the competitiveness of SJF can be reduced arbitrarily by increasing the processor speed (precisely, SJF is O(s)-speed (1/s)-competitive for any s ≥ 1). We conjecture that SRPT also admits a similar result. © Springer-Verlag Berlin Heidelberg 2005.link_to_subscribed_fulltex
New resource augmentation analysis of the total stretch of SRPT and SJF in multiprocessor scheduling
This paper studies online job scheduling on multiprocessors and, in particular, investigates the algorithms Shortest Remaining Processing Time First (SRPT) and Shortest Job First (SJF) for minimizing total stretch, where the stretch of a job is its flow time (response time) divided by its processing time. SRPT is perhaps the most well-studied algorithm for minimizing total flow time or stretch. This paper gives the first resource augmentation analysis of the total stretch of SRPT, showing that it is indeed O (1)-speed 1-competitive. This paper also gives a simple lower bound result showing that SRPT is not s-speed 1-competitive for any s < 1.5. This paper also makes contribution to the analysis of SJF. Extending the work of [L. Becchetti, S. Leonardi, A. Marchetti-Spaccamela, K. Pruhs, Online weighted flow time and deadline scheduling, in: RANDOM-APPROX, 2001, pp. 36-47], we are able to show that SJF is O (1)-speed 1-competitive for minimizing total stretch. More interestingly, we find that the competitiveness of SJF can be reduced arbitrarily by increasing the processor speed (precisely, SJF is O (s)-speed (1 / s)-competitive for any s ≥ 1). We conjecture that SRPT also admits a similar result. © 2006 Elsevier B.V. All rights reserved.link_to_subscribed_fulltex
New Resource Augmentation Analysis of the Total Stretch of SRPT and SJF in Multiprocessor Scheduling
LNCS v. 3618 entitled: Mathematical Foundations of Computer Science 2005: 30th International Symposium, MFCS 2005, Gdansk, Poland, August29-September 2. 2005, ProceedingsThis paper studies online job scheduling on multiprocessors and, in particular, investigates the algorithms SRPT and SJF for minimizing total stretch, where the stretch of a job is its flow time (response time) divided by its processing time. SRPT is perhaps the most well-studied algorithm for minimizing total flow time or stretch. This paper gives the first resource augmentation analysis of the total stretch of SRPT, showing that it is indeed O(1)-speed 1-competitive. This paper also gives a simple lower bound result that SRPT is not s-speed 1-competitive for any s < 1.5. This paper also makes contribution to the analysis of SJF. Extending the work of [4], we are able to show that SJF is O(1)-speed 1-competitive for minimizing total stretch. More interestingly, we find that the competitiveness of SJF can be reduced arbitrarily by increasing the processor speed (precisely, SJF is O(s)-speed (1/s)-competitive for any s ≥ 1). We conjecture that SRPT also admits a similar result. © Springer-Verlag Berlin Heidelberg 2005.link_to_subscribed_fulltex