REAL-TIME SCHEDULING ON ASYMMETRIC MULTIPROCESSOR PLATFORMS by Yang, Kecheng
REAL-TIME SCHEDULING ON ASYMMETRIC MULTIPROCESSOR PLATFORMS
Kecheng Yang
A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment
of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science.
Chapel Hill
2018
Approved by:
James H. Anderson
Sanjoy K. Baruah
Enrico Bini
Shahriar Nirjon
F. Donelson Smith
©2018
Kecheng Yang
ALL RIGHTS RESERVED
ii
ABSTRACT
Kecheng Yang: Real-Time Scheduling on Asymmetric Multiprocessor Platforms
(Under the direction of James H. Anderson)
Real-time scheduling analysis is crucial for time-critical systems, in which provable timing guarantees
are more important than observed raw performance. Techniques for real-time scheduling analysis initially
targeted uniprocessor platforms but have since evolved to encompass multiprocessor platforms. However,
work directed at multiprocessors has largely focused on symmetric platforms, in which every processor is
identical. Today, it is common for a multiprocessor to include heterogeneous processing elements, as this
offers advantages with respect to size, weight, and power (SWaP) limitations. As a result, realizing modern
real-time systems on asymmetric multiprocessor platforms is an inevitable trend. Unfortunately, principles
and mechanisms regarding real-time scheduling on such platforms are relatively lacking.
The goal of this dissertation is to enrich such principles and mechanisms, by bridging existing analysis
for symmetric multiprocessor platforms to asymmetric ones and by developing new techniques that are
unique for asymmetric multiprocessor platforms. The specific contributions are threefold.
First, for a platform consisting of processors that differ with respect to processing speeds only, this dis-
sertation shows that the preemptive global earliest-deadline-first (G-EDF) scheduler is optimal for scheduling
soft real-time (SRT) task systems. Furthermore, it shows that semi-partitioned scheduling, which is a hybrid
of conventional global and partitioned scheduling approaches, can be applied to optimally schedule both hard
real-time (HRT) and SRT task systems.
Second, on platforms that consist of processors with different functionalities, tasks that belong to different
functionalities may process the same source data consecutively and therefore have producer/consumer
relationships among them, which are represented by directed acyclic graphs (DAGs). End-to-end response-
time bounds for such DAGs are derived in this dissertation under a G-EDF-based scheduling approach, and it
is shown that such bounds can be improved by a linear-programming-based deadline-setting technique.
Third, processor virtualization can lead a symmetric physical platform to be asymmetric. In fact, for a
designated virtual-platform capacity, there exist an infinite number of allocation schemes for virtual processors
iii
and a choice must be made. In this dissertation, a particular asymmetric virtual-processor allocation scheme,
called minimum-parallelism (MP) form, is shown to dominate all other schemes including symmetric ones.
iv
ACKNOWLEDGEMENTS
When I joined the Computer Science Department at UNC, I had absolutely zero experience in research. I
wanted to stay for five years for a Ph.D., but I was also fine with leaving with a Master’s degree. Now, here I
am having completed this dissertation. I have enjoyed my life in Chapel Hill as a Ph.D. student so much, and
it was the wonderful people I met here who made my life so enjoyable. I would like to take this opportunity
to thank many of you.
First and foremost, I would like to thank my advisor Jim Anderson. As a matter of fact, Jim is the creator
of my academic career. I do not know why he chose to work with a zero-experience guy like me in the first
place, but I have been and will always be grateful for this. I also enjoyed the research discussions with Jim a
lot. He always promptly got the point and provided me valuable feedback, and this made our conversations a
great pleasure for me. Furthermore, I have learned a great deal about English writing for free in the Computer
Science program—this is also because of Jim. I would also like to thank the members of my dissertation
committee: Sanjoy Baruah, Enrico Bini, Shahriar Nirjon, and Don Smith. I have also enjoyed and learned a
lot from our discussions about and beyond this dissertation.
I also wish thank all of my co-authors: Tanya Amert, Pontus Ekberg, Glenn Elliott, Zhishan Guo,
Catherine Nemitz, Nathan Otterness, Luca Santinelli, Sergey Voronov, Shige Wang, and Ming Yang. I am
also thankful to many other people I worked with in our group: Bipasa Chattopadhyay, Micaiah Chisholm,
Calvin Deutschbein, Jeremy Erickson, Shiwei Fang, Bashima Islam, Tamzeed Islam, Namhoon Kim, Seulki
Lee, Rui Liu, Mac Mollison, Sims Osborne, Abhishek Singh, Stephen Tang, and Bryan Ward. I am also
grateful to many staff members in the department, most notably Fay Alexander, Robin Brennan, Bridgette
Cyr, Jodie Gregoritsch, Adia Ware, and Missy Wood.
Outside of research, soccer games have played a significant role in my life in Chapel Hill. I very much
enjoyed playing soccer weekly in the Rivercrabs team and the Newbees team. I could not list the names of all
my teammates, because, for many of them, I know their nicknames in the field only. Nevertheless, I would
like to thank them all for the numerous soccer afternoons and nights I enjoyed.
v
Finally, I would like to thank my family, in particular, my parents for their support and patience and
Yiqian for her love.
The research in this dissertation was supported by NSF grants CNS 1016954, CNS 1115284, CNS
1218693, CPS 1239135, CNS 1409175, CPS 1446631, and CNS 1563845, AFOSR grant FA9550-14-1-0161,
ARO grant W911NF-14-1-0499, a grant from General Motors, and a Dissertation Completion Fellowship
from the Graduate School at UNC.
vi
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Real-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Workload Model and Temporal Correctness Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Schedulability, Feasibility, and Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Asymmetric Multiprocessor Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Asymmetric Platforms Due to Differing Processor Speeds . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Asymmetric Platforms Due to Differing Processor Functionalities . . . . . . . . . . . . . . . . . . . 8
1.4.3 Asymmetric Platforms Due to Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 HRT-Feasibility and HRT EDF Scheduling on Uniform Multiprocessors . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Level Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 HRT EDF Scheduling on Uniform Multiprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Tardiness Bounds under G-EDF Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Semi-Partitioned Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 EDF-fm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 EDF-os . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
vii
2.3.3 EDF-ms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Intra-Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 DAG-based Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Compositional Real-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 3: Global EDF Scheduling on Uniform Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 A Necessary and Sufficient SRT-Feasibility Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Preemptive and Non-Preemptive G-EDF Scheduling on Uniform Multiprocessors. . . . . . . . . . . . 32
3.4 Tardiness Increasing without Bound under Non-Preemptive G-EDF . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Tardiness Bounds under Preemptive G-EDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.1 Varying-Period Periodic Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.2 Deriving Tardiness Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 4: Semi-Partitioned Scheduling on Uniform Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 EDF-sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Algorithm EDF-sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 Tardiness Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.2.1 Migrating Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2.2 Fixed Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 EDF-tu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 Feasible Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.2 Algorithm EDF-tu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.3.1 HRT Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
viii
4.3.3.2 SRT Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.4 Alternate Assignment Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Chapter 5: Allowing Intra-Task Parallelism on Uniform Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3 Response-Time Bounds under Preemptive G-EDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3.1 Basic Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3.2 Improved Bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4 Response-Time Bounds under Non-Preemptive G-EDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.1 Basic Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.2 Improved Bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Chapter 6: DAG-Based Task Systems on Unrelated Heterogeneous Platforms . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2 Offset-Based Independent Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Response-Time Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Response-Time Bounds for Obi-Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3.2 From DAG-Based Task Sets to Obi-Task Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.4 Setting Relative Deadlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.1 Linear Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.5 DAG Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.6 Early Releasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.7 Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
ix
6.8 Schedulability Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.8.1 Improvements Enabled by Basic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.8.2 Improvements Enabled by DAG Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Chapter 7: Minimum-Parallelism Multiprocessor Supply on Identical Platforms . . . . . . . . . . . . . . . . . . . . . . 142
7.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.1.1 Periodic Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.1.2 VPs in a Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.1.3 Parallel Supply Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.3 Non-Concrete Asynchronous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.3.1 A Common Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.3.2 Different Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4 Synchronous and Concrete Asynchronous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.5 Indomitability of MP Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Chapter 8: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.2 Other Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
x
LIST OF TABLES
6.1 Case-study task response-time bounds and obi-task offsets assuming implicit deadlines.
Bold entries denote sinks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 Case-study relative-deadline settings, obi-task response-time bounds, and obi-task offsets
when using linear programming to (a) minimize average end-to-end response-time bounds,
(b) minimize maximum end-to-end response-time bounds, and (c) minimize maximum
proportional end-to-end response-time bounds. Bold entries denote sinks. . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3 Observed end-to-end response times with/without early releasing and analytical end-to-end
response-time bounds for the implicit-deadline setting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1 Summary of theorems applying to different VP synchronization assumptions. . . . . . . . . . . . . . . . 144
xi
LIST OF FIGURES
1.1 An example program structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 “Jointly executing” schedule for Example 2.1 generated by the Level Algorithm. . . . . . . . . . . . . . . . . . . 14
2.2 The actual schedule for “jointly executing” J1 and J2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Actual schedule for Example 2.1 generated by the Level Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Final schedule for Example 2.1 generated by the Level Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 EDF-os task assignment for Example 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Counterexample schedules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Feasible schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 A non-preemptive schedule for the system in Section 3.4. Note that the deadline tardiness
is upper bounded by 3 time units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Transforming a sporadic task into a VPP task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 EDF-sh task assignment for Example 4.1. This is the same system as in Example 2.2, but
EDF-sh has a different assignment from EDF-os. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 EDF-sh task assignment for Example 4.2. The width of each column indicates the processor
speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Schedulability under EDF-sh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Absolute tardiness bounds of EDF-ms and EDF-sh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5 Relative tardiness bounds of EDF-ms and EDF-sh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Assignment and schedule for Example 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7 Assignment and schedule for Example 4.4. The width of each rectangle represents the
speed of its corresponding processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.8 A correct schedule for Example 4.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.9 EDF-tu execution phase illustration for Example 4.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.10 Number of migrating tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.11 Maximum number of preemptions of migrating tasks per frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1 Average maximum absolute response-time bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
xii
5.2 Average maximum relative response-time bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.1 A DAG G1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2 Example schedule for the DAG in G1 in Figure 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3 Example schedule of the obi-tasks corresponding to the DAG-based tasks in G1 in Figure 6.1. . . . . . . . 122
6.4 More highly prioritizing the right-side path in this DAG decreases its end-to-end response-
time bound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.5 Illustration of DAG combining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.6 Example schedule of the obi-tasks corresponding to the DAG-based tasks in G1 in Fi-
gure 6.1, when early releasing is allowed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.7 DAGs in the case-study system. G2 has two sinks, so to analyze it, a virtual sink τ62 must
be added that has a WCET of 0 and a response-time bound of 0. We show the resulting
graph in Figure 6.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.8 G′2, where a virtual sink is created for G2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.9 AMERBs as a function of total utilization in each CE pool in the case where each task set
has five DAGs, 20 tasks per DAG, and edgeProb=0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.10 AMERBs as a function of total utilization in each CE pool in the case where the number
of identical DAGs per template is fixed to 40. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.11 AMERBs as a function of the number of identical DAGs per template in the case where
total utilization in each CE pool is fixed to eight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.1 Worst-case supply of Γi (adapted from (Shin and Lee, 2003)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2 Example illustrating parallel supply (adapted from (Lipari and Bini, 2010)). . . . . . . . . . . . . . . . . . . . . . . 147
7.3 The graph of Z(t,Γi), as an illustration of Properties 7.1, 7.2, and 7.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.4 Illustration of Claim 7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.5 Illustration of the worst case of psf∞(t,C) for non-concrete asynchronous VPs. . . . . . . . . . . . . . . . . . . . . 152
7.6 Illustration for the cases in Lemma 7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.7 Illustration of the counterexample in Section 7.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.8 Illustration of the counterexample in Section 7.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.9 A possible scenario for any concrete phases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.10 Illustration of Case 2 of Theorem 7.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
xiii
LIST OF ABBREVIATIONS
AMERB Average Maximum End-to-end Response-time Bound
CE Computational Element
CPU Central Processing Unit
DAG Directed Acyclic Graph
DMPR Deterministic Multiprocessor Periodic Resource
DSP Digital Signal Processor
DVFS Dynamic Voltage and Frequency Scaling
EDF Earliest-Deadline-First
EDP Explicit Deadline Periodic
G-EDF Global Earliest-Deadline-First
G-FL Global Fair-Lateness
FIFO First-In-First-Out
FPGA Field Programmable Gate Arrays
GPU Graphics Processing Unit
HAC Hardware Accelerator
HOG Histogram of Oriented Gradients
HRT Hard Real-Time
I/O Input/Output
LP Linear Program
MC Mixed-Criticality
MP Minimum-Parallelism
MPR Multiprocessor Periodic Resource
MSF Multi Supply Function
PR Periodic Resource
PSF Parallel Supply Function
SBF Supply Bound Function
SRT Soft Real-Time
SWaP Size, Weight, and Power
xiv
VP Virtual Processor
VPP Varying-Period Periodic
WCET Worst-Case Execution Time
xv
CHAPTER 1: INTRODUCTION
In today’s world, computing has become more and more ubiquitous and pervasive. An increasing number
of people rely on various computer systems, from lightweight embedded ones to large-scale distributed ones,
for a growing range of daily activities. As a result, in many modern systems, computing devices are required
to interact with physical processes, some of which are time-critical. Thus, computations in such systems not
only need to produce logically sound results but also need to finish in a timely manner. These systems that
require both logical and temporal correctness are called real-time systems.
To validate the temporal correctness, real-time scheduling-analysis techniques are crucial. After decades
of effort, researchers have established a solid foundation for real-time scheduling on uniprocessor platforms,
which had been the standard hardware setting for traditional real-time systems for many years. Since
the multicore revolution, attention has shifted to supporting multiprocessor platforms. Meanwhile, the
multicore revolution is currently undergoing a second wave of innovation in the form of heterogeneous
processing elements, which offer advantages with respect to size, weight, and power (SWaP) limitations. As
a result, realizing modern real-time systems on asymmetric multiprocessor platforms is an inevitable trend.
Unfortunately, principles and mechanisms regarding real-time scheduling on such platforms are relatively
lacking.
The goal of this dissertation is to enrich such principles and mechanisms, by bridging existing analysis
for symmetric multiprocessor platforms to asymmetric ones and by developing new techniques that are
unique for asymmetric multiprocessor platforms.
We begin this chapter with an introduction to real-time systems. We also present an overview of a
category of asymmetric platforms that are considered in this dissertation. We then state the thesis, summarize
the contributions, and outline the remaining chapters of this dissertation.
1.1 Real-Time Systems
A real-time system requires the validation of both logical and temporal correctness. Logical correctness,
which needs to be verified for almost all computing systems including non-real-time ones, requires the system
1
to always produce the right outputs, whereas temporal correctness further requires the system to do so at the
right time. Since logical correctness is a general problem faced by virtually any system, the particular interest
of real-time systems research mostly lies in temporal correctness.
1.1.1 Workload Model and Temporal Correctness Criterion
In a real-time system, the timing of the completion of workloads is not only a matter of efficiency but
also a matter of correctness. It is therefore critical to validate the temporal properties of a real-time system
before it runs, or even before it is built. Thus, properly modeling the workloads is the first step towards
establishing temporal correctness in a real-time systems.
Since the seminal paper by Liu and Layland (1973), the classic sporadic task model has been widely
received as a fundamental real-time workload model. A sporadic task τi releases a (potentially infinite) set
of jobs, where consecutive job releases are separated by at least Ti time units and each job has a worst-case
execution requirement Ci, which is defined by its worst-case execution time on a unit-speed processor. If
the separation between every consecutive jobs happens to be exactly Ti time units, then τi is also called a
periodic task. For this reason, Ti is called the period of τi, regardless of whether τi is periodic or sporadic.
The utilization of τi is defined as ui = Ci/Ti, which indicates the amount of computing resources τi will
request in the long term.
Each job of τi has an absolute deadline, or simply deadline, Di time units after its release time, where
Di is called the relative deadline of τi. Deadlines are called implicit if Di = Ti. The jth job of τi is denoted
by τi, j, and its release time, finish time, and deadline are denoted by ri, j, fi, j, and di, j, respectively. The
response time of job τi, j is defined by fi, j− ri, j, i.e., the time duration from its release time to its finish time.
Furthermore, if τi, j misses its deadline at di, j, then the difference between its finish time and its deadline is
called its tardiness; on the other hand, if τi, j meets its deadline, then its tardiness is defined to be zero. That
is, the tardiness of job τi, j is define by max{0, fi, j−di, j}.
In a hard real-time (HRT) system, its temporal correctness requires that every job must be guaranteed to
complete by its deadline, i.e., the tardiness of every job must be zero. In contrast, in a soft real-time (SRT)1
system, its temporal correctness can be validated as long as the tardiness of every job can be upper-bounded
by some constant.
1In this dissertation, we adopt the tardiness-based definition of SRT systems. For other definitions of SRT systems, please see
Erickson’s dissertation (Erickson, 2014) for a review.
2
In both HRT and SRT systems, the response time for every job must be upper-bounded for the corre-
sponding temporal correctness to be established. In HRT systems, such as flight control systems, deadlines
represent very specific response-time requirements for the jobs and such requirement are explicit to the
system designers. Therefore, response-time bounds equal to corresponding relative deadlines are required for
temporal correctness. In SRT systems, such as real-time multimedia streaming systems, deadlines may be
just a notion of urgency provisioned by the system designers, where some response-time guarantees that are
not specified explicitly but are derived from analysis may be acceptable. For example, a response time of
1ms, 2ms, or 100ms may not be noticeable in a live basketball game, and even 2,000ms or 5,000ms could
be sufficiently good for most audiences. Some constant response-time bounds, e.g., a relative deadline plus a
constant tardiness bound, are sufficient to guarantee that the response times of jobs will not unboundedly
grow, so that the live streaming will not be choppy, given adequate buffering at the very beginning.
1.1.2 Schedulability, Feasibility, and Optimality
When multiple real-time tasks share a common processing platform for their execution, competition
among tasks for processing resources is inevitable. When such competition occurs, the “judge” who dictates
the resource allocation for the tasks is called a scheduler, and the algorithm a scheduler runs is called a
scheduling algorithm. The term schedulable (feasible, respecttively) is defined for a real-time system with
respect to a particular scheduling algorithm (some scheduling algorithm, respectively), while the term optimal
is defined for a scheduling algorithm with respect to all feasible real-time systems.
Definition 1.1. (schedulable) A real-time system is called HRT-schedulable (SRT-schedulable, respectively)
under a scheduling algorithm A if and only if the tardiness of every job in that system is guaranteed to be
zero (upper-bounded by a constant, respectively) under the scheduling algorithm A.
Definition 1.2. (feasible) A real-time system is called HRT-feasible (SRT-feasible, respectively) if and only
if this system is HRT-schedulable (SRT-schedulable, respectively) under some scheduling algorithm.
Definition 1.3. (optimal) A scheduling algorithm is called HRT-optimal (SRT-optimal, respectively) if
and only if all HRT-feasible (SRT-feasible, respectively) systems are HRT-schedulable (SRT-schedulable,
respectively) under this scheduling algorithm.
3
The concept of optimality is also often defined with respect to a subset of real-time systems. In this case,
feasibility should be interpreted accordingly within that subset of real-time systems as well. For example,2
“Scheduling algorithm A is optimal on uniprocessors.”
should be interpreted as
“All feasible systems on uniprocessors are schedulable under A.”
where “all feasible systems on uniprocessors” should be interpreted as ”all real-time systems on uniprocessors
that are schedulable under some algorithm.”
1.2 Asymmetric Multiprocessor Platforms
The aforementioned classic sporadic task model and its extensions have been widely studied on unipro-
cessor platforms since it was proposed by Liu and Layland (1973). Since the multicore revolution, it has
mostly been studied with respect to symmetric multiprocessor platforms, where every processor is viewed as
identical. In contrast, there is a relative lack of similar work addressing asymmetric multiprocessor platforms,
where processors may differ from each other. Due to the additional complexity that may arise on such
platforms, asymmetric multiprocessor platforms must be modeled more carefully.
The following is a taxonomy of multiprocessors from symmetric to asymmetric, adopted from Pinedo
(1995) and Funk (2004).
• Identical. Every job is executed on any processor at the same speed, which is usually normalized to be
1.0 for simplicity.
• Uniform. Different processors may have different speeds, but on a given processor, every job is
executed at the same speed. The speed of processor p is denoted sp.
• Unrelated. The execution speed of a job depends on both the processor on which it is executed and the
task to which it belongs, i.e., a given processor may execute jobs of different tasks at different speeds.
The execution speed of task τi on processor p is denoted sp,i.
The uniform multiprocessor model might be the most straightforward step from a symmetric multiproces-
sor platform to an asymmetric one. It allows each processor on the platform to have its own speed; workloads
2In this example, “HRT” and “SRT” are omitted for simplicity. They can be added in the same fashion as the three definitions above.
4
performed on each processor progress proportionally, or uniformly, to the corresponding speed. This is
indeed a rather idealistic platform model, because different tasks may often scale differently on two given
different-speed processors, i.e., the speeds of processors can be difficult to determine in practice. Nonetheless,
this model is still of interest because of its simplicity. Theoretical results involving this model can be viewed
as a baseline for moving forward to more complicated system models and might reveal fundamental intuitions
on the differences between symmetric and asymmetric multiprocessor platforms.
On the other hand, the unrelated multiprocessor model might be a more expressive model, which may
be able to represent the case in which processors may not only differ with respect to processing speeds but
also have different functionalities. However, for a particular job, each processor is still characterized by a
single-value speed, although it could be different values for different jobs. This results in the implication of a
uniform execution within a single job. That is, executing the first half of a job on processor p and executing
the second half on processor q is the same as executing the first half of a job on processor q and executing the
second half on processor p. However, this might not be true in practice if processors p and q have different
functionalities. For example, if processor p is “better” at doing the computations in the first half while
processor q is “better” for those in the second half, then the former allocation may have a significantly shorter
worst-case execution time than the latter. Thus, in many actual systems, migrations among processors of
different functionalities are not allowed, at least at the job level. In addition, it is often the case that only a
few different functionalities exist while multiple processors may be of the same functionality. With these
constraints, an unrelated multiprocessor could be able to model processors of different functionalities.
Moreover, even an identical multiprocessor platform may not necessarily imply the absence of asymmetry,
if hardware virtualization is considered. Compositional real-time systems have been proposed in the literature
to support open systems (Deng and Liu, 1997), where separate software components execute together on a
common hardware platform while each component can be developed, analyzed, and certified independently.
In such systems, each component has the “illusion” of executing on a dedicated virtual platform, and it
should be possible to validate the temporal correctness of each component independently. Therefore, a virtual
platform needs to be specified, usually by virtual processors, which are used to characterize partially available
physical processors. Since different characteristics of the availability of a certain physical processor may
result in different virtual processors, a virtual multiprocessor platform can be in fact asymmetric, even if the
underlying physical platform consists of only identical processors.
5
1.3 Thesis Statement
Due to different sources of asymmetry, the modeling of an asymmetric multiprocessor platform is
significantly more complicated than that of simpler traditional platforms, and therefore designing real-time
systems on such platforms is also much more challenging. In this dissertation, we consider a set of asymmetric
multiprocessor models, each of which addresses a particular source of asymmetry, and we develop and analyze
scheduling algorithms for real-time systems design under each model. This leads to the following thesis
statement:
Multiprocessor platforms can be asymmetric due to differing processor speeds, differing pro-
cessor functionalities, or virtualization. On platforms with different-speed processors, optimal
SRT scheduling does not require radically new scheduling techniques, but rather can be done by
simply adapting certain techniques that are known to be SRT-optimal on platforms with identi-
cal processors. On platforms comprised of processors with different functionalities, dataflow
computations can be more effectively supported if prioritization schemes are used that holisti-
cally account for end-to-end data-processing objectives. On asymmetric virtual multiprocessor
platforms, there exists a single virtual-processor allocation scheme that dominates all other
schemes.
1.4 Contributions
The above thesis is supported by the following contributions, summarized by the three aspects of
asymmetry considered in this dissertation.
1.4.1 Asymmetric Platforms Due to Differing Processor Speeds
For a system consisting of processors that differ with respect to processing speeds only, we model
it as a uniform multiprocessor. Since the identical multiprocessor model is a special case of the uniform
multiprocessor model, where the speed of every processor happens to be 1.0, we begin with the global earliest-
deadline-first (G-EDF) scheduling algorithm, which has been widely studied on identical multiprocessors.
In Chapter 3, we examine the SRT-optimality of G-EDF on uniform multiprocessors. Devi and Anderson
(2008) have established the SRT-optimality of G-EDF on identical multiprocessors, but the problem of
6
whether this optimality result also holds on uniform multiprocessors has been open since then. As noted
in Section 1.1.2, when optimality is discussed for a certain set of systems, feasibility must be considered
accordingly. Therefore, we first establish a SRT-feasibility condition for uniform platforms as follows:
A set of n sporadic tasks τ = {τ1,τ2, . . . ,τn} is SRT-feasible on m uniform processors with
speeds {s1,s2, . . . ,sm} if and only if
Un ≤ Sm, and (1.1)
Uk ≤ Sk, for k = 1,2, . . . ,m−1, (1.2)
where Uk denote the sum of the k largest utilizations of tasks in τ and Sk denote the sum of the k
largest processor speeds.
This in fact matches the HRT-feasibility condition for implicit-deadline periodic tasks on uniform platforms
(Funk et al., 2001). According to Devi and Anderson (2008), both preemptive and non-preemptive G-EDF
scheduling algorithms are SRT-optimal on identical multiprocessors. Therefore, we also investigate G-EDF
on uniform multiprocessors with respect to these two variants. By constructing a counterexample in Chapter 3,
we show that the non-preemptive G-EDF scheduling algorithm is not SRT-optimal on uniform platforms. In
contrast, the preemptive G-EDF scheduling algorithm remains SRT-optimal on uniform platforms. A rather
long and complicated proof will be presented in Chapter 3 to support this optimality result.
In Chapter 4, we shift our focus from G-EDF, which follows a global scheduling approach, to a different
scheduling approach, called semi-partitioned scheduling. Traditionally, a scheduling algorithm usually
follows either a global or a partitioned approach. Global scheduling approach allows any migrations, while
partitioned scheduling approach allows no migration at all. One classic hybrid of these two approaches is
clustered scheduling, where only migrations within a subset of processors, or a cluster, are allowed. Semi-
partitioned scheduling is another hybrid, where only migrations of a subset of tasks, called migrating tasks,
are allowed and all other tasks, called fixed tasks, cannot migrate. In Chapter 4, we will design and analyze
two semi-partitioned scheduling algorithms, namely EDF-sh (earliest-deadline-first-based semi-partitioned
scheduler for uniform heterogeneous multiprocessors) and EDF-tu (earliest-deadline-first-based tunable
scheduler for uniform platforms), for uniform multiprocessors. EDF-sh is neither HRT- nor SRT-optimal, but
it restricts task migrations to occur at job boundaries only, i.e., no job migrates even if it is from a migrating
7
task. EDF-tu includes a tunable parameter, so that its HRT- or SRT-optimality and its tardiness bounds can be
tuned by adjusting this parameter at the expense of potentially increased runtime overheads.
In Chapter 5, we re-visit the sporadic task model we consider in the prior two chapters. In the conventional
sporadic task model, each task is considered as a sequential schedulable entity that is often implemented
by a single thread in a real system. Therefore, the jobs of a single sporadic task must execute in sequence.
That is, a job cannot commence execution until all its predecessors finish, even if available processors exist.
This intra-task precedence constraint might jeopardize the feasibility of a system where such a constraint
is in fact unnecessary. For example, when a video is processed frame-by-frame independently, consecutive
jobs (i.e., processing consecutive frames) can execute in parallel, if sufficient processors are available on a
multiprocessor platform. To remove this intra-task precedence constraint when it is unnecessary, in Chapter 5
we introduce a new task model, called npc-sporadic (non-precedence-constraint sporadic) task model, in
which multiple jobs of the same task may execute in parallel (while each individual job remains sequential).
With the npc-sporadic model, we will show that Equation (1.2) is not necessary for SRT-feasibility on uniform
platforms anymore. Furthermore, the non-preemptive G-EDF scheduling algorithm becomes SRT-optimal
again, and the preemptive G-EDF scheduling algorithm remains SRT-optimal with better tardiness bounds.
1.4.2 Asymmetric Platforms Due to Differing Processor Functionalities
In Chapter 6, we consider platforms that consist of processors with different functionalities. As mentioned
in Section 1.2, migrations on such platforms may be subtle, so that even the unrelated multiprocessor model
might also fail to characterize these platforms. Specifically speaking, on an unrelated multiprocessor platform,
a slowest processor can be identified for each task, and that task’s worst-case execution time (WCET) can be
correspondingly defined with respect to its execution on that processor. By then determining WCETs for
the task on other processors, appropriate per-processor speed parameters can be defined for the task. Such
WCET values and speed parameters are valid, provided the task never migrates. However, if migrations are
allowed, then it can be the case that the highest WCET specified for the task—that obtained by considering
its slowest processor—is not actually the largest possible WCET for the task, even if migration costs are
assumed to be negligible. That is, migrations can cause anomalies with respect to execution-time and speed
assumptions. To see this, consider a task τ1, which executes the function f() in Figure 1.1. Suppose that τ1
has a WCET of 100ms when executing entirely on a processor p, and a WCET of 200ms when executing
entirely on a processor q, where processors p and q are two processors of different functionalities. Then, the
8
f()    {
……
/* some code for execution */
/* flag */
……
/* some other code */
/* for execution   */
}
Figure 1.1: An example program structure.
following anomaly is possible: suppose that the WCET for executing from the beginning of f() to the line
flag is 90ms on processor p and 50ms on processor q, and the WCET for executing from the line flag to
the end of f() is 10ms on processor p and 150ms on processor q. In such a situation, if a job of τ1 first
executes on processor p until the line flag and then migrates to processor q to finish its execution, then it
could execute for 240ms, which is greater than any of the WCETs defined for τ1 when it executes entirely on
a single processor type.
The above anomaly occurs because the two processors of different functionalities may “favor” different
kinds of computations, which could exist in different pieces of code, even in the same task. Furthermore, the
line of flag could potentially be any line in the code of the task, and this makes the WCETs and a set of
valid speeds for the unrelated multiprocessor model extremely difficult to obtain. Therefore, we consider
the situation where processors are pooled by their functionalities and tasks are assigned to a particular pool
consisting of a set of identical processors. In this case, inter-pool migrations are not allowed so that the
described anomaly is eliminated, but migrations among processors in the same pool may be allowed. Tasks
are scheduled within each pool, which is a symmetric platform; if the tasks are independent, this is a rather
well-studied problem. However, if producer/consumer relationships may exist among tasks, the isolated
pools would be re-connected by such relationships between tasks and a more holistic analysis on the entire
task set on processors of different functionalities is needed. In Chapter 6, we deal with such systems where
the producer/consumer relationships among tasks may be represented by directed acyclic graphs (DAGs).
By applying G-EDF-based scheduling, an end-to-end response-time bound can be derived for each DAG.
9
Moreover, such bounds can be further improved by leveraging linear-programing (LP)-based techniques to
tune the deadline setting for each individual task.
1.4.3 Asymmetric Platforms Due to Virtualization
In Chapter 7, we consider a potentially asymmetric virtual multiprocessor platform that is implemented
on a symmetric physical multiprocessor platform. Virtual platforms, which are usually described by virtual
processors, have been proposed to enable the design, analysis, and validation of sub-systems on a common
shared physical platform.
In early work in this direction pertaining to uniprocessor platforms, Shin and Lee (2003) proposed
a virtual processor (VP) model called the periodic resource (PR) model, which allows the considerable
body of work on periodic and/or sporadic task scheduling to be exploited in reasoning about the allocation
of processor time to components. In the PR model, a VP is specified by the parameters (Π,Θ), with the
interpretation that Θ time units of processor time is guaranteed to the supported component every Π time
units. While this simple model sufficed in the uniprocessor case, it is inadequate in the multiprocessor case,
because the important issue of parallelism is ignored. To deal with this issue, Shin et al. (2008) proposed
extending the PR model by adding an additional parameter. Specifically, under their multiprocessor periodic
resource (MPR) model, the supply allocated to a component is specified by (Π,Θ,m′), with the interpretation
that Θ time units of processor time is guaranteed to the component every Π time units with at most m′ VPs
providing allocation in parallel. That is, the new parameter m′ specifies the maximum degree of parallelism.
In the MPR model, all VPs allocated to a component are required to have a common period Π that is strictly
synchronized.
A key characteristic of the MPR model is its flexibility. For example, consider a component that is to be
allocated 80% of the capacity of a quad-core machine. The supply interface for that component could be
defined as (100,320,4), meaning that every 100 time units, the component receives 320 units of processing
time on up to four processors. Such a specification does not indicate the precise manner in which processing
time is allocated. For example, the component could be allocated 80% of the capacity of each processor, or
100% of three processors and 20% of the fourth, among other choices. Which choice is best?
In the example above, the second-listed choice is known as minimum-parallelism (MP) form. Under
MP form, each component is allocated at most one partially available processor, with all other processors
allocated to it being fully available. MP form was first proposed by Leontyev and Anderson (2009), who
10
have shown that MP form is the best for supporting SRT tasks. We extend this result in Chapter 7 by showing
that MP form is in fact the best for HRT tasks as well, provided that each VP is modeled by the PR model
and the virtual platform is characterized by parallel supply functions (Bini et al., 2009a).
1.5 Organization
In Chapter 2, we review general background and related prior work; each subsequent chapter is self-
contained with notations and definitions that are specific to it. Focusing on the uniform multiprocessor model,
we then prove both the negative and positive results regarding the SRT-optimality of G-EDF in Chapter 3,
present two semi-partitioned scheduling algorithms in Chapter 4, and discuss the npc-sporadic task model in
Chapter 5. Next, in Chapter 6, we provide techniques to establish and improve end-to-end response-time
bounds for DAG-based task systems on processors of different functionalities. Afterwards, we establish the
dominance of MP form on virtual multiprocessor platforms in Chapter 7. Finally, we conclude in Chapter 8.
11
CHAPTER 2: BACKGROUND
In this chapter, we provide needed background for this dissertation by surveying related prior work, and
by highlighting the contributions of this dissertation in the context of the existing literature.
2.1 HRT-Feasibility and HRT EDF Scheduling on Uniform Multiprocessors
Under the uniform multiprocessor model, a platform pi has m processors, where processor p is identified
by its speed sp (1≤ p≤m, sp ∈R). In this dissertation, when considering a uniform multiprocessor, we index
processors in non-increasing-speed order, i.e., pi = {s1,s2, · · · ,sm}, where sp ≥ sp+1 for 1≤ p≤ m−1; we
also index tasks in non-increasing-utilization order, i.e., τ = {τ1,τ2, · · · ,τn}, where ui ≥ ui+1 for 1≤ i≤ n−1.
By leveraging the Level Algorithm (Horvath et al., 1977), Funk et al. (2001) showed that an implicit-
deadline periodic task system τ = {τ1,τ2, . . . ,τn} is HRT-feasible if and only if
Un ≤ Sm, and (2.1)
Uk ≤ Sk, for k = 1,2, . . . ,m−1, (2.2)
where Uk = ∑ki=1 ui and Sk = ∑
k
i=1 si.
Subsequently, Funk and her colleagues developed several EDF-based scheduling algorithms that support
HRT tasks on uniform multiprocessors (Funk, 2004).
In the rest of this section, we review the Level Algorithm and those EDF-based HRT algorithms by Funk
(2004). A detailed understanding of the Level Algorithm is needed because it will be used as a subroutine in
Chapter 4.
2.1.1 Level Algorithm
The Level Algorithm was proposed by Horvath et al. (1977) for scheduling a set of non-real-time jobs on
a uniform multiprocessor with the goal of minimizing makespan, i.e., the time required for finishing all jobs.
A job’s level is defined by its remaining execution time. The greater a job’s level, the faster the processor on
12
which it is scheduled, and all jobs that attain the same level are thereafter jointly executed, equally sharing the
processors on which they are scheduled. The following example illustrates the Level Algorithm.
Example 2.1. Consider using the Level Algorithm to schedule four jobs, with initial execution requirements
J1 = 12, J2 = 12, J3 = 8.5, and J4 = 7.5, on a uniform platform pi = {s1 = 4, s2 = 3, s3 = 2, s4 = 1}. J1 and
J2 have the same execution cost, or level, so they are jointly executed from the beginning; J3 and J4 attain the
same level at time 1, so they are jointly executed after time 1. At time 2, all jobs attain the same level, and
hence all jobs are jointly executed afterward. Figure 2.1 shows the resulting schedule generated by the Level
Algorithm for this example. Figure 2.2 shows the actual schedule for “jointly executing.” Figure 2.3 shows
the actual schedule for the system. Figure 2.4 shows a slight variation of the actual schedule that leverages
the fact that, when jobs start to jointly execute, we can make every processor involved in this joint execution
start with its currently executing job to reduce unnecessary preemptions and migrations. ♦
Theorem 2.1 (Theorem 1 in (Horvath et al., 1977)). Let J = {J1, J2, · · · , Jn} denote a set of independent
non-real-time jobs to be scheduled on an m-processor uniform multiprocessor pi . Let Xi denote the sum of the
i largest execution requirements in J . Then the Level Algorithm constructs a minimum makespan, which is
given by
max
(
max
1≤i≤m−1
{
Xi
Si
}
,
Xn
Sm
)
.
This is very similar to the feasibility condition given by (2.1) and (2.2), because that feasibility condition
was, in fact, derived from the Level Algorithm (Funk et al., 2001).
2.1.2 HRT EDF Scheduling on Uniform Multiprocessors
Most prior work regarding EDF scheduling for HRT systems on uniform multiprocessors is by Funk and
her colleagues. This work is summarized in detail in Funk’s dissertation (Funk, 2004).
Funk and Baruah (2003) considered the HRT-schedulability problem for EDF with full migration (f-
EDF) on a uniform multiprocessor. In earlier work, Baruah et al. (2003) showed that f-EDF scheduling
on uniform multiprocessors is robust with respect to the processing platform. That is, replacing a uniform
multiprocessor by a more powerful one does not compromise HRT-schedulability under f-EDF, where a
uniform multiprocessor pi1 is said to be more powerful than another one pi2 if m1—the number of processors
on pi1—is at least m2—the number of processors on pi2—and the ith fastest processor on pi1 is no slower
than ith fastest processor on pi2 for 1 ≤ i ≤ m2. In light of this robustness result, Funk and Baruah (2003)
13
J1, J2
J3, J4
J1, J2, J3, J4
0 1 2 3 4
s1=4
s2=3
s3=2
s4=1
time
speed
J3
J4
Figure 2.1: “Jointly executing” schedule for Example 2.1 generated by the Level Algorithm.
J1, J2
J2
J1
J1
J2
Figure 2.2: The actual schedule for “jointly executing” J1 and J2.
14
J2
J1
J3
0 1 2 3 4
s1=4
s2=3
s3=2
s4=1
time
speed
J4
J2
J1
J4
J3
J3
J2
J1
J4
J4
J3
J2
J1
J1
J4
J3
J2
J1
J2
J3
J4
J4
J3
Figure 2.3: Actual schedule for Example 2.1 generated by the Level Algorithm.
J2
J1
J3
0 1 2 3 4
s1=4
s2=3
s3=2
s4=1
time
speed
J4
J3
J2
J1
J4
J4
J3
J2
J1
J1
J4
J3
J2
J1
J2
J4
J3
Figure 2.4: Final schedule for Example 2.1 generated by the Level Algorithm.
15
reduced the f-EDF-HRT-schedulability problem on an arbitrary uniform multiprocessor platform pi to an
HRT-feasibility problem on a less powerful platform pi ′. Therefore, an HRT-schedulability test follows by
leveraging existing HRT-feasibility results.
Funk and Baruah (2005a) then shifted their attention to EDF scheduling with restricted migration
(r-EDF), which requires migrations to occur at job boundaries only. They presented an r-EDF scheduler
for uniform multiprocessors by extending results by Baruah and Carpenter (2005) pertaining to identical
multiprocessors. The r-EDF scheduler proposed by Funk and Baruah (2005a) migrates tasks at job boundaries
only by assigning each job to a particular processor at its release and then using a uniprocessor EDF scheduler
on each individual processor. They provided a utilization-bound-based HRT-schedulability test for the r-EDF
scheduler on a uniform multiprocessor, and presented two techniques to improve schedulability, namely
“semi-partitioning” and “virtual processors.”1
Funk and Baruah (2005b) also investigated partitioned EDF (p-EDF) scheduling on a uniform multi-
processor. They focused on p-EDF scheduling with the any-fit-decreasing (AFD) task-assignment heuristic.
They presented a framework to approximate the utilization bound for applying AFD-EDF on an arbitrary
uniform multiprocessor. This bound depends only on the maximum per-task utilization and the total system
utilization, and any task system satisfying this bound can be successfully partitioned by the AFD heuristic
and therefore is HRT-schedulable under AFD-EDF.
Contributions of this dissertation. In Chapter 3, we will extend the consideration of HRT-feasiblity on a
uniform multiprocessor to incorporate SRT systems as well. Furthermore, the problem of whether G-EDF
is SRT-optimal will be discussed as well. In Chapter 4, we will present two EDF-based semi-partitioned
scheduling algorithms for uniform multiprocessors. In particular, one of them restricts migrations to occur on
job boundaries only, and the other one is SRT-optimal and can be HRT-optimal under certain settings.
2.2 Tardiness Bounds under G-EDF Scheduling
As mentioned in Chapter 1, we have adopted the tardiness-based definition of SRT systems in this
dissertation. In this section, we review related prior work on tardiness bounds. Most of this work focuses on
identical multiprocessors.
1Both terms as used by us in this dissertation have a different meaning from (Funk and Baruah, 2005a). We refer interested readers
to (Funk and Baruah, 2005a) for details concerning the usage of these two techniques in their context.
16
Devi and Anderson (2008) proved tardiness bounds for implicit-deadline sporadic task systems under
G-EDF scheduling on an identical multiprocessor. Their bounds apply to both preemptive and non-preemptive
G-EDF for any (HRT- or SRT-) feasible system. Thus, the SRT-optimality of G-EDF was first established
by them. Subsequently, G-EDF tardiness bounds were improved by Erickson et al. (2010a) by introducing
the concept of compliant vectors, and extended by Erickson et al. (2010b) to apply to sporadic tasks with
arbitrary deadlines. Recently, Valente (2016) proposed an alternative approach to compute tighter tardiness
bounds for preemptive G-EDF at the cost of higher complexity. This work was then improved by Leoncini
et al. (2017) to compute such tardiness bounds more efficiently.
Meanwhile, beyond G-EDF scheduling, Leontyev and Anderson (2007b) derived tardiness bounds for
sporadic task systems under first-in-first-out (FIFO) scheduling on an identical multiprocessor. Leontyev
and Anderson (2010) then extended both tardiness bounds for G-EDF and FIFO, resulting in a framework
for deriving such tardiness bounds under a class of global schedulers called window-constrained schedulers,
which include the G-EDF scheduler, the FIFO scheduler, and all other G-EDF-like schedulers. Under
G-EDF-like scheduling, each job has a priority point: the earlier the priority point, the higher the priority;
every job of the same task has the same constant difference between its priority point and release time. The
G-EDF and FIFO schedulers are both G-EDF-like schedulers: under G-EDF, the priority point of a job is
at its absolute deadline; under FIFO, the priority point of a job is at its release time. This work raised the
question: which G-EDF-like scheduler provides the best tardiness bounds? Erickson et al. (2014) answered
this question by presenting the global fair-lateness (G-FL) scheduler.
The only prior work that addressed G-EDF-related tardiness bounds on asymmetric multiprocessors was
by Tong and Liu (2016). They derived tardiness bounds under a G-EDF variant on a multiprocessor platform
consisting of processors of different speeds. However, they imposed certain utilization restrictions on the task
system to which their tardiness bounds apply and therefore did not establish any SRT-optimality result. We
will discuss such limitations in more detail in Chapter 3.
Contributions of this dissertation. In Chapter 3, we will focus on extending the SRT-optimality results
for G-EDF on identical multiprocessors to uniform multiprocessors. Assuming a uniform multiprocessor
platform, we will disprove the SRT-optimality of non-preemptive G-EDF by giving a counterexample, and
we will prove the SRT-optimality of preemptive G-EDF by deriving a tardiness bound. This work extends
the literature on SRT-optimality from identical multiprocessors to uniform multiprocessors. Moreover, in
17
contrast to (Tong and Liu, 2016), our work focuses on optimality results and therefore imposes no utilization
restrictions other than those required for feasibility.
2.3 Semi-Partitioned Scheduling Algorithms
In Chapter 4, we will present two semi-partitioned scheduling algorithms for uniform multiprocessors.
In this section, we review existing work on semi-partitioned scheduling prior to this dissertation. Most of this
work targeted identical multiprocessors.
Semi-partitioned scheduling. Traditionally, a multiprocessor scheduling algorithm follows either a global
or a partitioned approach. The former allows any migrations while the latter allows no migration at all. As a
hybrid, semi-partitioned scheduling allows only a subset of tasks to migrate. A semi-partitioned scheduling
algorithm usually has two phases: an assignment phase and an execution phase. During the assignment
phase, each task is allocated a non-zero share on certain processors such that the total allocated share on
each processor does not exceed the processor’s capacity and the total allocated share of a task matches its
utilization. If a task has non-zero shares on only one (respectively, multiple) processor(s), then it is a fixed
(respectively, migrating) task. During the execution phase, the scheduler needs to schedule tasks according to
their allocated shares while providing timing guarantees for each task.
Semi-partitioned scheduling was proposed by Anderson et al. (2005) by presenting EDF-fm. Subse-
quently, a number of semi-partitioned scheduling algorithms for identical multiprocessors were proposed:
Andersson and Tovar (2006) proposed EKG, Kato and Yamasaki (2007) proposed Ehd2-SIP, Kato and
Yamasaki (2008) proposed EDDP, Andersson et al. (2008) proposed EDF-SS, Bletsas and Andersson (2009)
proposed a concept of “notional processors,” Kato and Yamasaki (2009) proposed EDF-WM, Guan et al.
(2010) proposed SPA1 and SPA2, Bletsas and Andersson (2011) proposed NPS-F, Burns et al. (2012) propo-
sed a “C=D” task-splitting scheme, Bhatti et al. (2012) proposed 2L-HiSA, Fan and Quan (2012) proposed
HSP-light and HSP, Sousa et al. (2013) proposed Carousel-EDF, Anderson et al. (2016) proposed EDF-os,
and Patterson and Chantem (2016) proposed EDF-hv. In addition, empirical studies for semi-partitioned
scheduling were conducted by Bastoni et al. (2011) and Brandenburg and Gu¨l (2016).
Beyond identical multiprocessors, techniques from semi-partitioned scheduling were used to design
EDF-ms (Leontyev and Anderson, 2007a), which is able to support a multiprocessor platform consisting
of processors of different speeds. EDF-ms in fact is not a semi-partitioned scheduler; instead, it divides
18
processors into groups by their speeds and assigns jobs to groups by a “semi-partitioned” approach similar to
EDF-fm at the group level; global scheduling is employed within each group.
In the following, we will review EDF-fm, EDF-os, and EDF-ms in more detail, as they will be directly
referred to in Chapter 4.
2.3.1 EDF-fm
EDF-fm (Anderson et al., 2005) is an EDF-based semi-partitioned scheduling algorithm for sporadic
task systems on an identical multiprocessor. It has an assignment phase and an execution phase. In the
assignment phase, processors and tasks are considered in turn. Each considered task is assigned to the
currently considered processor until its capacity is exhausted. In this case, the remaining share, if any, of
this task that exceeds the capacity of the currently considered processor will be assigned to next processor,
which will then become the “currently considered processor” for subsequent unassigned tasks. Thus, there
are at most two migrating tasks on each processor—the first assigned one and the last assigned one—and
each migrating task has non-zero shares on exactly two processors.
In the execution phase, all jobs of a fixed task are dispatched to the processor to which that task was
assigned, whereas the jobs of a migrating task need to be dispatched to processors by a more sophisticated
mechanism. The goal of this job-dispatching mechanism is to limit migrations to occur on job boundaries
only while ensuring that the allocated shares on the two processors to which a migrating task was assigned
are maintained in the long run. To this end, EDF-fm dispatches an appropriate fraction of jobs of a migrating
task to each of the two processors to which it was assigned, by leveraging results from Pfair Scheduling
(Baruah et al., 1996). This job-dispatching mechanism provides the following property, where ϕi,p denotes
the fraction of jobs of task τi to execute on processor p.
Property 2.1. For the first z jobs of task τi, at least bϕi,p · zc and at most dϕi,p · ze of them are assigned to
processor p.
On each processor, jobs of migrating tasks are statically prioritized over jobs of fixed tasks, and jobs of
same-type tasks (migrating or fixed) are prioritized against each other by EDF. With such assignment and
execution phases, EDF-fm is able to guarantee bounded tardiness for each task, as long as the sum of the
utilizations of any two migrating tasks that share a common processor does not exceed 1.0. Clearly, with this
19
utilization restriction, EDF-fm is not SRT-optimal. Nonetheless, any SRT-feasible system with a maximum
per-task utilization of at most 0.5 is guaranteed to be SRT-schedulable under EDF-fm.
2.3.2 EDF-os
EDF-os (Anderson et al., 2016) is an EDF-based semi-partitioned SRT-optimal scheduling algorithm for
identical multiprocessors. EDF-os also has an assignment phase and an execution phase.
In the assignment phase, EDF-os considers tasks in non-increasing-utilization order in the following two
steps.
• First, it uses a worst-fit bin-packing heuristic to assign as many tasks as possible to be fixed.
• Second, it considers the remaining tasks to be assigned to processors in turn, and allocates these tasks
on either one (in which case, the task is fixed) or more (in which case, the task is migrating) processors.
The following example illustrates the assignment phase of EDF-os and will be re-visited in Chapter 4.
Example 2.2. Consider scheduling the task set τ = {(5,6), (6,9), (4,6), (2,3), (2,3), (10,30), (1,6)}
(tasks are listed in non-increasing-utilization order) on four identical processors. Figure 2.5 depicts the task
assignment used by EDF-os. In the first step of the EDF-os assignment phase, the first four tasks are assigned
to the four processors as fixed tasks. In the second step, the fifth task needs capacity from processors 1, 2,
and 3 to be allocated, so it is a migrating task that assigns jobs to processors 1, 2, and 3. Similarly, τ6 is a
migrating task, because it has non-zero shares on both processors 3 and 4. However, the last task τ7 is a fixed
task since processor 4 is the only processor on which τ7 has a non-zero share. For the two migrating tasks,
processor 1 is the first processor of τ5, while processor 3 is the first processor of τ6. ♦
In the execution phase, EDF-os applies the same job-dispatching mechanism as EDF-fm, given that
the fraction of jobs of each task to execute on each processor can be derived from the allocated shares in
the assignment phase. We use ϕi,p to denote such fraction of task τi on processor p. Anderson et al. (2016)
showed that the following property follows from Property 2.1. This property will be referred to and reiterated
in Chapter 4.
Property 2.2. For any k consecutive jobs of a migrating task τi, at most ϕi,p · k+2 of them are assigned to
processor sp.
20
τ1=(5,6)
τ5=(2,3)
τ2=(6,9) τ3=(4,6) τ4=(2,3)
τ5=(2,3)
τ5=(2,3)
τ6=(10,30)
τ6=(10,30)
τ7=(1,6)
5
6 2
3
2
3
2
3
1
6
1
6
1
6
1
6
1
6
1
3
100%
0%
Processor 
Capacity
Processor 
ID
1 2 3 4
Figure 2.5: EDF-os task assignment for Example 2.2.
On each processor, the priority rules are as follows. Note that, the processor with the lowest index where
a migrating task is allocated is called its first processor.
• Jobs of migrating tasks are statically prioritized over those of fixed tasks.
• Jobs of fixed tasks are prioritized against each other on an EDF basis.
• On a migrating task’s first processor, its priority is lower than other migrating tasks, but still higher
than fixed ones.
With such assignment and execution phases, EDF-os is SRT-optimal, i.e., tardiness of each task is
bounded under EDF-os for any SRT-feasible system.
2.3.3 EDF-ms
EDF-ms was proposed by Leontyev and Anderson (2007a) to support a multiprocessor platform consisting
of processors of different speeds. Technically speaking, EDF-ms is not a semi-partitioned scheduling
algorithm, because any task can migrate among at least some processors under EDF-ms, i.e., no task is
actually fixed to a single processor. However, EDF-ms groups the processors of the same speed together and
performs “semi-partitioned” scheduling at the group level. It has an assignment phase similar to EDF-fm
21
but at the group level. As a result, in each group, there are at most two intergroup tasks, and those tasks
executing within a single group are called fixed tasks (this is a different definition from that under EDF-fm or
EDF-os, in which fixed tasks are tasks executing on a single processor only). In the execution phase, jobs are
dispatched to groups in the same way as jobs are dispatched to processors under EDF-fm. Within each group,
in which every processor is identical, jobs are prioritized by G-EDF with the exception that any job of a
intergroup task will be immediately promoted to execute until completion once its slack reaches zero. At time
t, the slack of job τi, j is defined by (di, j− t)− (Ci− ei, j(t)), where ei, j(t) denotes the amount of completed
work of τi, j at time t.
Leontyev and Anderson (2007a) showed that, under EDF-ms, all intergroup tasks are guaranteed to meet
their implicit deadlines and the tardiness of any fix task is bounded. However, EDF-ms assumes certain
utilization restrictions to facilitate the assignment phase and therefore is not SRT-optimal. Furthermore,
EDF-ms requires each group to have at least two processors, i.e., EDF-ms does not support platforms on
which there is a processor that has a distinct speed. Nonetheless, EDF-ms is still the only work prior to this
dissertation that is related to both semi-partitioned scheduling and uniform multiprocessors, so we will regard
EDF-ms as a baseline in Chapter 4.
Contributions of this dissertation. All of the scheduling algorithms discussed in this section except EDF-
ms apply to identical multiprocessors only, i.e., they do not support a multiprocessor platform consisting of
processors of different speeds. In Chapter 4, we will present two semi-partitioned scheduling algorithms
designed for uniform multiprocessors where multiple speeds may exist. Compared to EDF-ms, which is able
to support only certain uniform multiprocessors, we will show that both of our algorithms dominate EDF-ms
in terms of schedulability.
2.4 Intra-Task Parallelism
In the sporadic task model, each task is assumed to be sequential as it usually models a piece of sequential
code. However, if multiple invocations (i.e., jobs) of such a piece of code are active at the same time, intra-task
parallelism could be possible, even if this piece of code itself is sequential. To clarify, in this dissertation we
assume tasks under the conventional sporadic task model are strictly sequential, i.e., consecutive jobs of the
same task cannot execute in parallel; in contrast, we will introduce the npc-sporadic task model in Chapter 5,
under which jobs of the same task may execute in parallel as long as each individual job is still sequential. It
22
is clear that both models are the same for HRT tasks with relative deadlines at most their periods, since every
job must be finished by the release time of the next one in this case. Thus, prior work exploiting intra-task
parallelism under the npc-sporadic task model targeted SRT systems or HRT systems with arbitrary deadlines
(the relative deadline of a task can be greater than its period).
In work on HRT systems, Baker and Baruah (2009) derived an HRT-schedulability test for arbitrary-
deadline task systems on an identical multiprocessor, and both the conventional sporadic task model and the
npc-sporadic task model were considered in this work. Subsequently, the impact of such intra-task parallelism
was also considered by Baruah et al. (2012), Bonifaci et al. (2013), and Parri et al. (2015) in the context of
HRT DAG-based task systems on an identical multiprocessor.
In work on SRT systems, Erickson and Anderson (2011) derived a response-time bound for npc-sporadic
tasks systems under preemptive G-EDF on an identical multiprocessor. This is the only existing work prior to
this dissertation pertaining to SRT npc-sporadic task systems.
Contributions of this dissertation. In Chapter 5, we will extend the work by Erickson and Anderson (2011)
to support processors of different speeds by deriving response-time bounds for npc-sporadic task systems on
a uniform multiprocessor. Furthermore, while Erickson and Anderson (2011) considered preemptive G-EDF
only, we will derive such bounds under both preemptive and non-preemptive G-EDF.
2.5 DAG-based Tasks
The literature on real-time systems includes much work pertaining to the scheduling of DAG-based task
systems, but most of this work assumes HRT systems and identical multiprocessor platforms. Baruah et al.
(2012) provided intractability results, speed-up bounds, and EDF-schedulability tests for scheduling one
sporadic DAG. Bonifaci et al. (2013) then studied the feasibility problem for multiple DAGs. Saifullah et al.
(2013) conducted schedulability analysis under the synchronous parallel task model, which is a special case
of the DAG task model. Subsequently, Li et al. (2013), Baruah (2014), Parri et al. (2015), and Jiang et al.
(2016) focused on the global scheduling of DAGs, whereas Li et al. (2014), Baruah (2015a,b), Jiang et al.
(2017), and Li et al. (2017) focused on the federated scheduling of DAGs. Federated scheduling for DAGs
is very similar to partitioned scheduling for sporadic tasks. The difference is that the utilization of a single
DAG may exceed the capacity of a single processor, and federated scheduling assigns multiple processors
dedicated to solely such a DAG.
23
Liu and Anderson (2010) leveraged a task-transformation approach to schedule DAG-based task system
using G-EDF. In contrast to most of the work discussed above, which enforces HRT-schedulability and
therefore has to restrict system utilizations, Liu and Anderson (2010) derived tardiness bounds for DAG-based
task systems with no utilization loss. Nonetheless, this work also assumes an identical multiprocessor
platform.
Beyond the real-time systems community, DAG-based systems implemented on heterogeneous platforms
have been considered before (e.g., (Grandpierre et al., 1999; Bajaj and Agrawal, 2004; Stavrinides and
Karatza, 2011)). However, such work focuses on one-shot, aperiodic DAG-based jobs, rather than periodic
or sporadic DAG-based task systems. Moreover, real-time issues are considered only obliquely from the
perspectives of job admission control or job makespan minimization.
Contributions of this dissertation. In Chapter 6, we will extend the task-transformation techniques by
Liu and Anderson (2010) to heterogeneous platforms where processors may have different functionalities.
Furthermore, in contrast to more focusing on per-node tardiness bounds as in (Liu and Anderson, 2010), we
will more focus on per-DAG end-to-end response-time bounds. In addition, we will further study deadline-
setting techniques in this context and present a LP-based method to set deadlines in order to improve the
end-to-end response-time bounds of DAGs. The preliminary version of this work published in (Yang et al.,
2016) also led to follow-up work by Dong et al. (2017), which presented an alternative technique for deriving
tighter end-to-end response-time bounds in certain scenarios.
2.6 Compositional Real-Time Systems
Existing work in compositional real-time systems has been directed at both uniprocessors and multipro-
cessors.
In work on uniprocessors, Mercer et al. (1994) proposed a mechanism that abstracts the notion of a
processor-capacity reservation as a uniprocessor with reduced speed. Abeni and Buttazzo (1998) proposed
the constant bandwidth server (CBS) to integrate HRT tasks and multimedia applications with soft timing
requirements in a single system. Lipari and Baruah (2001) then extended CBS to a hierarchical scheduling
framework. Mok et al. (2001) proposed the bounded-delay partition model, in which a partition specified
by (α,∆) provides processor supply between α · t and α · (t −∆) from time 0 to time t. Based on the
bounded-delay partition model, Lipari and Bini (2003) derived the “best” setting of parameters for a given
24
application. Shin and Lee (2003) proposed the periodic resource (PR) model model, in which a virtual
processor (VP) is specified by the parameters (Π,Θ), with the interpretation that Θ time units of processor
time is guaranteed to the supported component every Π time units. Easwaran et al. (2007) then extended the
PR model to the explicit deadline periodic (EDP) resource model by adding a parameter ∆ to the PR model,
with the interpretation that the supply must be provided in the first ∆ time units in each period.
In work on multiprocessors, Leontyev and Anderson (2009) initially proposed MP form to schedule each
component using at most one partially available processor in SRT systems. Shin et al. (2008) proposed the
multiprocessor periodic resource (MPR) model, in which the supply allocated to a component is specified by
(Π,Θ,m′), with the interpretation that Θ time units of processor time is guaranteed to the component every Π
time units with at most m′ VPs providing allocation in parallel. Easwaran et al. (2009) derived a cluster-based
hierarchical scheduler by applying the MPR model. Burmyakov et al. (2014) extended the MPR model by
providing information of resource allocation at each degree of parallelism. Xu et al. (2015) extended the
MPR model to the deterministic MPR (DMPR) model by requiring VPs to be allocated in MP form, and
proposed a cache-aware analysis framework.
In much of the discussed work, a supply bound function (SBF) is provided to characterize the minimum
resource allocation of a component, in order to perform schedulability analysis. Furthermore, Bini et al.
(2009b) proposed the multi supply function (MSF) that provides a separate SBF for each VP on a virtual
multiprocessor platform. Subsequently, Bini et al. (2009a) proposed the parallel supply function (PSF),
which is strictly more powerful than MSF. We will review PSF in detail in Chapter 7, in which PSF is heavily
used. To the best of our knowledge, PSF is the most expressive means of characterizing resource-allocation
supply on multiprocessors.
Contributions of this dissertation. In Chapter 7, we will establish the dominance of MP form in terms of the
fundamental supply-bound functions characterized by PSF for virtual multiprocessor platforms. Compared to
the work by Leontyev and Anderson (2009) pertaining to SRT systems, our work applies to HRT systems as
well. Compared to the work by Xu et al. (2015), which assumes a common, synchronous period among all
VPs, our work applies to virtual platforms consisting of VPs that may have different, asynchronous periods.
25
2.7 Chapter Summary
In this chapter, we reviewed prior work on several topics related to this dissertation, namely feasibility
and EDF scheduling for HRT systems on uniform multiprocessors, tardiness bounds for SRT systems,
semi-partitioned scheduling, intra-task parallelism, DAG-based tasks, and compositional real-time systems.
In light of such prior work, we emphasized the difference between the work in this dissertation and prior
work and briefly highlighted the contributions of this dissertation in the context of the existing literature with
respect to each topic.
26
CHAPTER 3: GLOBAL EDF SCHEDULING ON UNIFORM PLATFORMS1
In this chapter, we study the problem of whether G-EDF is SRT-optimal on uniform multiprocessors. Devi
and Anderson (2008) have shown that G-EDF is SRT-optimal on identical multiprocessors, no matter whether
preemptive or non-preemptive scheduling is assumed. In light of the fact that any identical multiprocessor is
a special case of uniform multiprocessors, one conjecture following the work by Devi and Anderson (2008)
is that G-EDF is also SRT-optimal on uniform multiprocessors. However, such an extension was found to be
surprisingly difficult and this conjecture remained open until our work, summarized in this chapter, closed it.
The key difficulty faced when trying to extend prior tardiness analysis for identical multiprocessors to
uniform ones is that, in the uniform case, tasks can execute on processors that are “too slow.” The specific
problematic property required in the prior analysis is the following.
(P) If any job τi, j executes continuously, then it must complete within Ti time units, regardless of
the processor on which it executes.
Clearly, (P) can be violated on a uniform multiprocessor, if τi, j executes entirely on processors of speed
less than ui.
It is tempting to obviate all problematic issues pertaining to Property (P) by simply enforcing scheduling
policies that uphold it. Such an approach was taken by Tong and Liu (2016), who considered a variant of
G-EDF in which each task τi is only allowed to execute on processors with speed at least ui. However, such a
requirement results in non-optimal scheduling. For example, Tong and Liu’s G-EDF variant is not able to
correctly schedule a set of two tasks on two processors such that u1 = 2, u2 = 2, s1 = 3, and s2 = 1. From
(3.5) and (3.6), which together are a SRT-feasibility condition, as will be shown later in Section 3.2, we see
1Contents of this chapter previously appeared in preliminary form in the following papers:
Yang, K. and Anderson, J. (2015a). On the soft real-time optimality of global EDF on multiprocessors: From identical to uniform
heterogeneous. In Proceedings of the 21st IEEE International Conference on Embedded and Real-Time Computing Systems and
Applications, pages 1–10.
Yang, K. and Anderson, J. (2016b). Tardiness bounds for global EDF scheduling on a uniform multiprocessor. In Proceedings of
the 7th International Real-Time Scheduling Open Problems Seminar, pages 3–4.
Yang, K. and Anderson, J. (2017). On the soft real-time optimality of global EDF on uniform multiprocessors. In Proceedings of
the 38th IEEE Real-Time Systems Symposium, pages 319–330.
27
that this task set is SRT-feasible. However, under their algorithm, a task τi can only execute on a processor of
speed at least ui, so both tasks in this example must exclusively execute on the processor with speed s1 = 3.
That processor will be over-utilized if each task releases jobs as soon as possible and always executes for its
worst-case cost, since u1+u2 = 4> s1 = 3.
In other work, we successfully eliminated the need for Property (P) by relaxing the task model to allow
consecutive jobs of the same task to execute in parallel. Under this relaxed task model, we were able to
establish the SRT optimality of G-EDF on uniform platforms (Yang and Anderson, 2014a). Details of this
work will be summarized in Chapter 5.
For the sequential task model being considered in this chapter, we also found that Property (P) is not
necessary if the underlying uniform platform has only two processors (Yang and Anderson, 2015a). However,
we believe that it is unlikely that the particular proof strategy used in (Yang and Anderson, 2015a) can be
extended to the more general m-processor case.
To clearly place our contribution in its proper context in light of this prior work, we emphasize here
several assumptions made hereafter in this chapter:
• we are interested in any SRT-feasible task set, i.e., no constraints on task utilizations other than (3.5)
and (3.6) are assumed;
• intra-task parallelism is strictly forbidden, i.e., jobs of the same task must execute in sequence;
• the uniform platform may have m processors, where m can be any positive integer value.
We will show that, in spite of being SRT-optimal for identical multiprocessors (Devi and Anderson, 2008),
non-preemptive G-EDF is not SRT-optimal for uniform multiprocessors, by providing a counterexample
where a SRT-feasible system may experience unbounded tardiness under non-preemptive G-EDF scheduling.
In contrast, preemptive G-EDF is indeed SRT-optimal for uniform multiprocessors. We will prove this by
deriving a tardiness bound for an arbitrary SRT-feasible system under preemptive G-EDF scheduling; the
proof strategy is significantly different from that in (Devi and Anderson, 2008) for identical multiprocessors.
Organization. In the following sections, we provide needed background and notation (Section 3.1), establish
a necessary and sufficient SRT-feasibility condition (Section 3.2), formally define the two considered variants
of G-EDF (Section 3.3), and then disprove the SRT-optimality for non-preemptive G-EDF (Section 3.4) and
prove the SRT-optimality for preemptive G-EDF (Section 3.5).
28
3.1 System Model
In this chapter, our focus is the uniform multiprocessor model. Specifically, we consider the scheduling
of a set τ of n sequential tasks on a uniform platform pi consisting of m processors, where the processors
are indexed by their speeds in non-increasing order, i.e., si ≥ si+1 for i = 1,2, . . . ,m−1. We denote the sum
of k largest speeds on pi as Si = ∑ki=1 si for k = 1, 2, . . . ,m. Furthermore, we assume m≥ 2, for otherwise,
uniprocessor analysis can be applied. We also assume n ≥ m, for otherwise, there is no point in ever
scheduling any task on any of the m−n slower processors, so m and n can conceptually be deemed as equal
in this case.
We consider the conventional sporadic tasks. A sporadic task τi releases a sequence of jobs with a
minimum separation of Ti time units between invocations. The parameter Ti is called the period of τi. τi also
has a worst-case execution requirement Ci, which is defined as the maximum execution time of any one job
(invocation) of τi on a unit-speed processor. We let Cmax = max{Ci | 1≤ i≤ n}. The utilization of task τi is
given by ui =Ci/Ti. We assume that tasks are indexed in non-increasing order by utilization, i.e., ui ≥ ui+1
for i = 1,2, . . . ,n− 1. We denote the sum of k largest utilizations in τ as Ui = ∑ki=1 ui for k = 1,2, . . . ,n.
Furthermore, we denote the ratio between the largest and the smallest utilizations as ρ = u1/un. As for
scheduling, we assume that deadlines are implicit, i.e., each task τi has a relative deadline parameter equal to
its period Ti. Furthermore, under the conventional sporadic task model, tasks are sequential and intra-task
parallelism is not allowed. That is, an invocation of a task cannot commence execution until all previous
invocations of that task complete. In this chapter, we assume that time is continuous.
The jth job (or invocation) of task τi is denoted as τi, j. Job τi, j has a release time denoted ri, j, an absolute
deadline denoted di, j = ri, j +Ti, and a completion (or finish) time denoted fi, j. The tardiness of job τi, j is
defined by max{0, fi, j−di, j} and its response time by fi, j− ri, j. The tardiness of task τi in some schedule
is the maximum tardiness of any of its jobs in that schedule. A job is pending if it is released but has not
completed, and is ready if it is pending and all preceding jobs of the same task have completed.
Ideal schedule. We define an ideal multiprocessor piI for the task set τ as one that consists of n uniform
processors where the speeds of the n processors exactly match the utilizations of the n tasks in τ , respectively,
i.e., the speed of the ith processor is sIi = ui for i = 1,2, . . . ,n. We define the ideal schedule I to be the
partitioned schedule for τ on piI , where each task τi in τ is assigned to the processor of speed sIi . Then, in I,
every job in τ commences execution at its release time and completes execution within one period (it exactly
29
executes for one period if and only if its actual execution requirement matches its worst-case execution
requirement). Thus, all deadlines are met in I.
Definition of lag. Let A(S,τi, t1, t2) denote the cumulative processor capacity allocated to task τi in an
arbitrary schedule S within the time interval [t1, t2). By the definition of the ideal schedule I,
A(I,τi, t1, t2)≤ ui · (t2− t1). (3.1)
Also, if τi releases jobs periodically and every job’s actual execution requirement equals its worst case of Ci,
then for any t1 and t2 such that ri,1 ≤ t1 ≤ t2,
A(I,τi, t1, t2) = ui · (t2− t1). (3.2)
For an arbitrary schedule S , we denote the difference between the allocation to a task τi in I and in S within
time interval [0, t) as
lag(τi, t,S) = A(I,τi,0, t)−A(S,τi,0, t). (3.3)
The lag function captures the allocation difference between an arbitrary actual schedule S and the ideal
schedule I. If lag(τi, t,S) is positive, then S has performed less work on τi until time t, i.e., τi is “under-
allocated,” while if lag(τi, t,S) is negative, then τi is “over-allocated.” Also, for any two time instants t1 and
t2 such that t1 ≤ t2, we have
lag(τi, t2,S) = lag(τi, t1,S)+A(I,τi, t1, t2)−A(S,τi, t1, t2). (3.4)
At a given time instant t, we say that a task is pending if it has any pending jobs at time t. If task τi
is pending at time t, then it has exactly one ready job τi, j at time t. The deadline of that job is called the
effective deadline of τi at time t and is denoted di(t) = di, j. Similarly, the effective release time of τi at time t
is denoted ri(t) = ri, j. Because deadlines are implicit, di(t) = ri(t)+Ti. Also, ri(t)≤ t holds, for otherwise,
τi, j would not be ready at time t. The following lemma gives a sufficient lag-based condition for a task to be
pending.
Lemma 3.1. If lag(τi, t,S)> 0, then τi is pending at time t in S .
30
Proof. Suppose that lag(τi, t,S) > 0 holds but τi is not a pending task at time t in S. Then, all jobs of τi
released at or before t have completed by time t. Thus, letting W denote the total actual execution requirement
of all such jobs, we have A(S,τi,0, t) =W . In the ideal schedule I, only released jobs can be scheduled
and will not execute for more than their actual execution requirement. Thus, A(I,τi,0, t)≤W holds as well.
By (3.3), these facts imply lag(τi, t,S) = A(I,τi,0, t)−A(S,τi,0, t)≤ 0. This contradicts our assumption
that lag(τi, t,S)> 0 holds.
3.2 A Necessary and Sufficient SRT-Feasibility Condition.
For HRT task sets, Funk et al. (2001) showed that a set of implicit-deadline periodic tasks is feasible on a
uniform platform if and only if the following constraints hold:
Un ≤ Sm, (3.5)
Uk ≤ Sk, for k = 1,2, . . . ,m−1. (3.6)
It can be shown that this constraint set is also a feasibility condition for implicit-deadline sporadic tasks.
Furthermore, the sufficiency of this constraint set for HRT task sets implies its sufficiency for SRT task sets. In
fact, these constraints are necessary for SRT task sets as well. To see this, note that if Un > Sm holds (contrary
to (3.5)), then the total workload over-utilizes the platform, so some task will be increasingly tardy without
bound if tasks release jobs as soon as possible and always execute for their worst-case costs. Furthermore,
if Uk > Sk holds (contrary to (3.6)), then the set of k highest-utilization tasks will be “under-allocated” at
every time instant if they release jobs as soon as possible and always execute for their worst-case costs. This
is because k tasks can be allocated to at most k processors at any time instant and the sum of the speeds
of any k processors can be at most Sk. Thus, ∑ki=1 lag(τi, t,S) will increase without bound, which implies
that lag(τi, t,S) will increase without bound for some i. This implies that task τi will be increasingly tardy
without bound.
To summarize, (3.5) and (3.6) are also a necessary and sufficient feasibility condition for SRT task sets.
Therefore, when henceforth referring to this constraint set as a feasibility condition, we do not need to further
specify whether this is meant for HRT or SRT task sets.
31
3.3 Preemptive and Non-Preemptive G-EDF Scheduling on Uniform Multiprocessors.
From a scheduling point of view, uniform platforms differ from identical ones in a significant way: on a
uniform platform, besides which tasks are scheduled at any time, the scheduler must also decide where they
are scheduled, because different processors may have different speeds. Thus, we must refine the notion of
G-EDF scheduling to be clear about where tasks are scheduled. In particular, we consider the following two
G-EDF scheduling algorithms.
Preemptive G-EDF: If at most m jobs are ready, then all ready jobs are scheduled; otherwise,
the m ready jobs with earliest deadlines are scheduled. At any time, the ready job with the kth
earliest deadline is scheduled on the kth fastest processor for any k. (Note that this implies that a
job may migrate from one processor to another during its execution.) Deadline ties are broken
arbitrarily.
Non-Preemptive G-EDF: Any job enters a deadline-prioritized queue once it becomes ready.
Whenever this queue is non-empty and some processor(s) are idle, the ready job at the head of the
queue is dequeued and scheduled on the fastest idle processor for execution without preemption
nor migration until its completion.
Note that, when a uniform multiprocessor reduces to an identical one (i.e., si = 1.0 for all i), the preemptive
and non-preemptive G-EDF schedulers above also reduce to simpler and more intuitive preemptive and
non-preemptive G-EDF schedulers for identical multiprocessors.
3.4 Tardiness Increasing without Bound under Non-Preemptive G-EDF
Devi and Anderson (2008) proved that both preemptive and non-preemptive G-EDF are SRT-optimal on
identical multiprocessors. If a simple extension to uniform multiprocessors existed for the analysis in (Devi
and Anderson, 2008), they would have been SRT-optimal on uniform multiprocessors too. Unfortunately, this
is not true.
In this section, we prove that non-preemptive G-EDF is not SRT-optimal on uniform platforms by
proving that no work-conserving non-preemptive scheduler is SRT-optimal on uniform platforms by giving a
counterexample. We begin with formally defining “non-preemptive” and “work-conserving” in this context.
32
Non-preemptive. A non-preemptive scheduler schedules ready jobs, and once a job is scheduled, it con-
tinually executes without preemption until it completes. In this section, we further define that under
non-preemptive scheduling, once a job is scheduled, it continually executes on the processor on which it
was scheduled without preemption until it completes, i.e., non-preemptivity means no preemption and no
migration occurs within a single job. This implicitly holds for any non-preemptive scheduler on identical
multiprocessors, where every processor is the same and therefore there is no point to migrating a job that
is currently executing; however, we do have to clarify this here, since the processors in a uniform platform
could be of different speeds.
Work-conserving. A work-conserving scheduler prevents the situation where there is at least one processor
that is idle, and at least one task that has a incomplete ready job but is not scheduled, i.e., whenever a task
could be scheduled somewhere, it is scheduled. Both preemptive and non-preemptive G-EDF are clearly a
work-conserving.
The counterexample. We consider a two-processor uniform platform pi = {sh = 3,sl = 1} and a task set of
two tasks, τ1 = (4,2) and τ2 = (4,2), to be scheduled on pi . Also, we consider the situation where τ1 releases
its first job at time 0 and then releases jobs as soon as possible, and τ2 releases its first job at time 1 and
then releases jobs as soon as possible. Furthermore, we assume every job has an execution requirement that
matches its worst-case execution requirement.
At time 0, τ1,1 is released, a work-conserving scheduler must schedule it on either sh or sl .
Case 1: τ1,1 is scheduled on sh (Figure 3.1 (a)). Then, τ1,1 continuously executes on sh until time 1.33.
Therefore, at time 1 when τ2,1 is released, sl and only sl is available. Thus, a work-conserving scheduler must
schedule τ2,1 on sl , where τ2,1 continuously executes until time 5, which means both τ1,2 (released at time 2)
and τ1,3 (released at time 4) must be scheduled on sh and each of them continuously executes on sh for 1.33
time units. Thus, at time 5 when τ2,1 completes and τ2,2 is ready, sl and only sl is available, which means that
a work-conserving scheduler must scheduler τ2,2 on sl where τ2,2 continuously executes until time 9. This
pattern repeats in the schedule. Figure 3.1 (a) shows the schedule. Observe that τ2 is always scheduled on sl
and therefore becomes unboundedly tardy.
Case 2: τ1,1 is scheduled on sl (Figure 3.1 (b)). Then, τ1,1 continuously executes on sh until time 4, which
means both τ2,1 (released at time 1) and τ2,2 (released at time 3) must be scheduled on sh and each of them
continuously executes on sh for 1.33 time units. Thus, at time 4 when τ1,1 completes and τ1,2 is ready, sl
33
0sh=3
sl=1
Schedule
Repeats
time 2 4 6 8 10
¿1,1 ¿1,2 ¿1,3 ¿1,4 ¿1,5
¿2,1 ¿2,2 ¿2,3
(a) Case 1.
Schedule
Repeats
0
sh=3
sl=1
time 2 4 6 8 10
¿1,1 ¿1,2 ¿1,3
¿2,1 ¿2,2 ¿2,3 ¿2,4 ¿2,5
(b) Case 2.
Figure 3.1: Counterexample schedules.
and only sl is available, which means that a work-conserving scheduler must scheduler τ1,2 on sl where τ1,2
continuously executes until time 8. This pattern repeats in the schedule. Figure 3.1 (b) shows the schedule.
Observe that τ1 is always scheduled on sl and therefore becomes unboundedly tardy.
Thus, in this system, under any work-conserving non-preemptive scheduler, there must be one task that
has unbounded tardiness. However, by (4.29) and (4.30), this system is actually feasible. Figure 3.2 shows a
feasible schedule for this system where all deadlines are met.
In this counterexample, each task releases subsequent jobs as soon as possible, so it is valid not only
for sporadic tasks, but also for periodic tasks where the two tasks have phases 0 and 1, respectively. Also,
34
Schedule
Repeats
0
sh=3
sl=1
time 2 4 6 8 10
¿1,1
¿1,1
¿2,3
¿2,1
¿1,2
¿1,2
¿2,2
¿2,2
¿1,3
¿1,3
¿2,1
¿2,3
¿1,4
¿1,4
¿2,4
¿2,4
¿1,5
¿1,5 ¿2,5
Figure 3.2: Feasible schedule.
0
sh=3
sl=1
Schedule
Repeats
time 2 4 6 8 10
¿1,1 ¿1,2 ¿2,2 ¿2,3 ¿2,4
¿2,1 ¿1,3 ¿2,5
¿1,4 ¿1,5 ¿1,6
¿1,7
¿2,6 ¿2,7 ¿2,8
¿2,9
¿1,8 ¿1,9 ¿1,10
12 14 16 18 20
¿1,3 is ready and sh is idle, but ¿1,3 is not scheduled on sh. 
Not work-conserving!
Figure 3.3: A non-preemptive schedule for the system in Section 3.4. Note that the deadline tardiness is upper bounded
by 3 time units.
the two-processor uniform platform considered in this section is a special case for the more general uniform
platform where the number of processors is arbitrary. Thus, the following theorem holds.
Theorem 3.1. No work-conserving non-preemptive scheduler is SRT-optimal for sequential sporadic or
periodic tasks on uniform multiprocessors.
One might wonder whether if this system with this job release pattern is SRT-feasible under the non-
preemptive restriction, i.e., under non-preemptive scheduling, whether it is possible to have bounded tardiness
for every task in this system. In fact, this system with this job release pattern is indeed SRT-feasible for
non-preemptive scheduling. For example, Figure 3.3 is a non-preemptive schedule for this system, and
deadline tardiness is at most 3 time units.
35
The key in the schedule in Figure 3.3 is that it is not work-conserving. At time 4, τ1,3 is ready and sh
is idle, but in this schedule τ1,3 is not scheduled until time 5. At time 5, when τ2,1 has completed on sl , we
schedule τ2,2 on sh, and schedule τ1,3 on sl . Then, the schedule can be repeated in a way that the two tasks
are scheduled on the faster processor in turn. As shown in Figure 3.3, the maximum tardiness of τ1 is 3 time
units (τ1,3, τ1,7, ...); the maximum tardiness of τ2 is 2 time units (τ2,1, τ2,5, τ2,9,...).
3.5 Tardiness Bounds under Preemptive G-EDF
With the above negative result regarding SRT-optimality of non-preemptive G-EDF, we turn our attention
to preemptive G-EDF and consider the question: is preemptive G-EDF SRT-optimal on uniform multiproces-
sors? In this section, we answer this question by proving a tardiness bound for any feasible sporadic task
system on a uniform multiprocessor. Furthermore, we actually prove a tardiness bound for a more general
task model—VPP tasks—as introduced next. Also, we omit “preemptive” and assume all references to
G-EDF in Section 3.5 to mean the preemptive G-EDF scheduling algorithm as defined in Section 3.3.
3.5.1 Varying-Period Periodic Tasks
An implicit-deadline varying-period periodic (VPP) task τVi has a pre-defined utilization uVi and also
releases a sequence of jobs. However, in contrast to the ordinary periodic task model, each VPP job τVi, j
has its own worst-case execution requirement, denoted Ci, j. After its first invocation, a VPP task τVi will
release each job τVi, j+1 exactly Ti, j =Ci, j/ui time units after τ
V
i, j’s release. Also, each job τVi, j has a deadline
Ti, j time units after its release. For each VPP task τVi , CVi is defined as CVi = max{Ci, j | j ≥ 1} and TVi is
defined as TVi = max{Ti, j | j ≥ 1}. Note that an ordinary periodic task τi is a special case of a VPP task
where Ci, j =Ci,k holds (and hence Ti, j = Ti,k holds) for any j and k. In accordance to the specification of
periodic and sporadic tasks, we also specify a VPP task by τVi = (CVi ,TVi ) where TVi =CVi /ui.
Sporadic tasks. In fact, not only an ordinary periodic task but also a sporadic task is a special case of a
VPP task set. We show this by showing that any instance2 of a sporadic task τi = (Ci,Ti) can be viewed as
an instance of a VPP task τVi = (CVi ,TVi ) where Ci =CVi and Ti = TVi . This transformation is depicted in
Figure 3.4 and explained next.
2An instance of a task is defined by a set of concrete job release times and actual execution requirements that satisfy the specification
of the task.
36
Job Release
Worst-Case
Execution Requirement
Actual
Execution Requirement
0 2 4 6 8 10 12 14 16
0 2 4 6 8 10 12 14 16
Sporadic Task 𝜏𝑖
𝐶𝑖 = 2
𝑇𝑖 = 4
𝑢𝑖 = 0.5
VPP Task 𝜏𝑖
𝑉
𝐶𝑖
𝑉 = 2
𝑇𝑖
𝑉 = 4
𝑢𝑖
𝑉 = 0.5
Figure 3.4: Transforming a sporadic task into a VPP task.
In the sporadic task model, a given instance of a task τi = (Ci,Ti) might have two consecutive jobs that
have a release separation of more than Ti time units. Let τi, j and τi, j+1 be such two jobs, i.e., ri, j+1− ri, j > Ti.
Let us denote ri, j+1− ri, j as (k+1)Ti+Q where k is an integer such that k ≥ 0 and Q is a real number such
that 0≤Q< Ti (k and Q can be easily calculated from ri, j+1− ri, j). To see this is indeed an instance of a VPP
task, we add k jobs, with the `th one released at time ri, j + ` ·Ti for 1≤ `≤ k, all with a worst-case execution
requirement of Ci, plus an additional job, if Q> 0, released at time ri, j+1−Q, where this job has a worst-case
execution requirement of Q · CiTi . We do this whenever ri, j+1− ri, j > Ti holds for two consecutive job releases
ri, j and ri, j+1. The resulting job release times fit the specification of VPP task τVi = (CVi ,TVi ) where CVi =Ci
and TVi = Ti. Given the job release time, in order to obtain a instance of τVi , we can simply define the actual
execution requirement of each added job to be zero and the resulting instance of VPP task τVi is indeed the
instance of sporadic task τi considered at the beginning of this paragraph.
Thus, the VPP task model is a more general model than the sporadic task model and the following claim
is true.
Claim 3.1. The tardiness of any sporadic task system satisfying (3.5) and (3.6) is bounded, if the tardiness
of any VPP task system satisfying (3.5) and (3.6) is bounded.3
Henceforth, we consider a VPP task set in the remainder of this section and omit the superscript “V ”.
3Provided the superscript “V ” is added accordingly in (3.5) and (3.6).
37
Furthermore, the following theorem shows that, under G-EDF, any job(s) executing less than their
worst-case execution requirement will not cause any tardiness increase.
Theorem 3.2. For a given VPP task set τ , let S denote a G-EDF schedule, and let S ′ denote a corresponding
G-EDF schedule where some job(s) have less execution requirement (“corresponding” means that S and S ′
include exactly the same jobs, released at exactly the same time instants). Then, no job finishes later in S ′.
Proof. We prove the theorem by considering jobs inductively in deadline order. (We assume that deadline
ties are broken the same way in both schedules.) Note that, under G-EDF scheduling, the scheduling of a
given job is not impacted by any lower priority jobs.
Base case. The highest-priority job cannot finish later in S ′. In particular, because this job has the
highest priority, it will execute continuously on the fastest processor once released.
Inductive step. Let J denote the set of k highest-priority jobs, and assume that these jobs do not finish
later in S ′. Also, let J denote the (k+1)st highest-priority job. We show that J also does not finish later in S ′.
Because no job in J finishes later in S ′, at any time instant t, the number of ready jobs in S ′ does not
increase4 in comparison to S . Therefore, up to any time instant t after its release, J is allocated in S ′ no less
computing capacity than in S , unless J has completed in S ′ but not in S prior to time t. Finally, because J’s
execution requirement is no greater in S ′ than in S , it cannot finish later in S ′.
Theorem 3.2 implies the following claim.
Claim 3.2. The tardiness of any instance of a VPP task system is upper-bounded by the tardiness of the
instance where all jobs execute for their worst-case execution requirement of this VPP task system.
By Claims 3.1 and 3.2, in order to derive a tardiness bound for any feasible sporadic task set, we just
need to derive a tardiness bound for all VPP task set satisfying (3.5) and (3.6), and all jobs can be assumed to
execute for exactly their worst-case execution requirement. Such a tardiness bound will be shown next in
Section 3.5.2.
4To see this, note that, if a job J′ in J is not ready in S but ready in S ′ at some time t, then J′ must be pending at time t in both S
and S ′, and some preceding job of the same task must be ready in S but completed in S ′.
38
3.5.2 Deriving Tardiness Bounds
In this section, we prove tardiness bounds for an arbitrary feasible VPP task set satisfying (3.5) and (3.6)
on a uniform platform pi , under the following assumption.
(A) Every job of any task executes for its worst-case execution requirement of Ci.
Our objective is to derive tardiness bounds when the G-EDF scheduler is used to schedule τ . We do so
by reasoning about lag values in an arbitrary G-EDF schedule S for τ . The concept of lag is useful for our
purposes because a task that has positive lag at one of its deadlines will have a tardy job. Focusing on VPP
task sets where Assumption (A) holds facilitates much of the lag-based reasoning that is needed.
Properties of lag values and deadlines. We begin by proving a number of properties concerning lag values
and deadlines and relationships between the two. The first such property is given in the following lemma.
Lemma 3.2. If task τi is pending at time t in S , then its effective deadline di(t) has the following relationship
with lag(τi, t,S).
t− lag(τi, t,S)
ui
< di(t)≤ t− lag(τi, t,S)ui +Ti (3.7)
Proof. Let ei, j(t) denote the remaining execution requirement for the ready job τi, j of τi at time t in S.
Because τi, j is ready at time t, it has not finished execution by then, so
0< ei, j(t)≤Ci, j. (3.8)
Furthermore, all jobs of τi prior to τi, j have completed by time t in S. Let W denote the total execution
requirement for all of these jobs. Then, given Assumption (A),5
A(S,τi,0, t) =W +Ci, j− ei, j(t). (3.9)
5Without Assumption (A), τi, j may execute in total for less than its worst-case execution requirement of Ci, and therefore only “≤”
can be claimed in (3.9).
39
Now consider the ideal schedule I. In it, all jobs of τi prior to τi, j have complete by time ri(t)≤ t. Given
Assumption (A), within [ri(t), t), I continuously6 executes job τi, j at a rate of ui. Thus,
A(I,τi,0, t) =W +(t− ri(t))ui. (3.10)
Therefore, an expression for lag(τi, t,S) can be derived as follows.
lag(τi, t,S) = {by (3.3)}
A(I,τi,0, t)−A(S,τi,0, t)
= {by (3.9) and (3.10)}
(t− ri(t))ui− (Ci− ei, j(t))
= {because di(t) = ri(t)+Ti}
(t−di(t)+Ti)ui− (Ci− ei, j(t))
= {because Ti ·ui =Ci}
(t−di(t))ui+ ei, j(t)
By (3.8) and the above expression for lag(τi, t,S), we have
(t−di(t))ui < lag(τi, t,S)≤ (t−di(t))ui+Ci, j.
By the definition of the VPP task mode, Ci, j ≤Ci for any j. Thus,
(t−di(t))ui < lag(τi, t,S)≤ (t−di(t))ui+Ci. (3.11)
Rearranging the terms in (3.11) yields (3.7).
Corollary 3.1. If lag(τi, t,S)≤ L for all t, then the tardiness of task τi is at most L/ui.
Proof. Suppose that
lag(τi, t,S)≤ L (3.12)
6Without Assumption (A), τi, j might not execute “continuously,” so only “≤” can be claimed in (3.10).
40
holds but τi has tardiness exceeding L/ui. Then, there exists a job τi, j that is still pending at some time t ≥ di, j
where
t−di, j > L/ui. (3.13)
Because τi, j is pending at time t, τi is a pending task at time t and its ready job at time t cannot be a job
released later than τi, j. Thus, τi’s effective deadline at t satisfies di(t)≤ di, j. Therefore,
t−di(t)≥ t−di, j
> {by (3.13)}
L/ui
≥ {by (3.12)}
lag(τi, t,S)/ui.
That is, t− lag(τi, t,S)/ui > di(t), which contradicts Lemma 3.2.
Recall that if lag(τi, t,S) is negative, then τi is over-allocated in schedule S compared to schedule I.
However, the actual schedule S cannot execute jobs that are not released and therefore can never get more
than a full job “ahead” of I . Thus, we have the following trivial lower bound7 on lag(τi, t,S), which we state
without proof.
Lemma 3.3. lag(τi, t,S)≥−Cmax.
The following lemma uses the relationship between effective deadlines and lag values established in
Lemma 3.2 to obtain a sufficient lag-based condition for one task to have an earlier effective deadline than
another.
Lemma 3.4. If tasks τi and τk are both pending at time t, and if
lag(τi, t,S)≥ uiuk · lag(τk, t,S)+Ci (3.14)
holds, then di(t)< dk(t).
7A tighter bound is possible, but this simple bound is sufficient for our purposes.
41
Proof.
di(t)≤ {by Lemma 3.2}
t− lag(τi, t,S)
ui
+Ti
≤ {by (3.14)}
t−
ui
uk
· lag(τk, t,S)+Ci
ui
+Ti
= {canceling ui and using Ci/ui = Ti}
t− lag(τk, t,S)
uk
< {by Lemma 3.2}
dk(t)
The lemma follows.
Corollary 3.2. If tasks τi and τk are both pending at time t and if lag(τi, t,S)≥ ρ · lag(τk, t,S)+Cmax holds,
where ρ = u1/un, then di(t)< dk(t).
Proof. Because tasks are indexed from highest utilization to lowest, lag(τi, t,S)≥ ρ · lag(τk, t,S)+Cmax =
u1
un
· lag(τk, t,S)+Cmax ≥ uiuk · lag(τk, t,S)+Ci. By Lemma 3.4, the corollary follows.
Proof strategies for deriving tardiness bounds. Given the relationships established above between lag
values and deadlines, we are now ready to derive tardiness bounds. According to Corollary 3.1, lag bounds
directly imply tardiness bounds. Thus, one natural strategy is to attempt to derive n individual lag bounds,
one per task. However, we were unable to make this strategy work. Intuitively, this is because, in deriving n
individual per-task lag bounds, we must consider how all tasks interact as they are scheduled together. When
doing this, it is difficult to avoid a case explosion that causes the entire proof to collapse. In particular, (3.5)
and (3.6) must ultimately be exploited in the proof. Every attempt we made in deriving per-task lag bounds
resulted in a case explosion that was so unwieldy, we could not discern how (3.5) and (3.6) could possibly
factor into the proof.
Notice that (3.5) simply requires that the platform is not over-utilized, which is something required in
reasoning about identical platforms as well. The constraints in (3.6), however, are unique to the uniform case.
42
Observe that these constraints reference the sum of the k largest utilizations and speeds. Accordingly, we
switched from working on proof strategies that focus on per-task lag bounds to one that focuses on the sum
of the k largest lag values.
Our proof strategy, formally explained. In order to describe this proof strategy more formally, we let τˆ`(t)
denote the task that has the `th largest lag at time instant t, with ties broken arbitrarily. We also denote the `th
largest lag at time instant t as Lagl(t), i.e., Lagl(t) = lag(τˆl(t), t,S). Furthermore, we let T`(t) denote the set
of tasks corresponding to Lag1(t),Lag2(t), . . . ,Lag`(t), i.e., T`(t) = {τˆ1(t), τˆ2(t), . . . , τˆ`(t)}.
To derive tardiness bounds, we show that the following m+1 inequalities, (B1), . . . ,(Bm), and (Bn), hold
at any time t.
Inequality Set (B):
Lag1(t)≤ β1 (B1)
Lag1(t)+Lag2(t)≤ β2 (B2)
Lag1(t)+Lag2(t)+Lag3(t)≤ β3 (B3)
...
...
Lag1(t)+Lag2(t)+ · · ·+Lagk(t)≤ βk (Bk)
...
...
Lag1(t)+Lag2(t)+ · · ·+Lagm(t)≤ βm (Bm)
Lag1(t)+Lag2(t)+ · · ·+Lagn(t)≤ βn (Bn)
If all constraints in the inequality set (B) hold at all time instants t, where β1,β2, . . . ,βm, and βn are constants
(which will depend on task-set parameters), then, by the definition of Lag1(t), β1 is an upper bound on
lag(τi, t,S) for any i and for any t. Given such an upper bound, by Corollary 3.1, tardiness bounds will
follow.
In order to prove that the constraints in (B) hold at all time instants t, we must carefully define
β1,β2, . . . ,βm, and βn. They are defined as follows.
β1 = x1 (X1)
43
β2 = β1+ x2 (X2)
β3 = β2+ x3 (X3)
...
...
βk = βk−1+ xk (Xk)
...
...
βm = βm−1+ xm (Xm)
βn = βm+ xn (Xn)
where
xn =−(n−m−1) ·Cmax (Y1)
xm = (n−m+1) ·Cmax (Y2)
xi = ρ · xi+1+Cmax, for i = m−1,m−2, . . . ,1 (Y3)
Note that, in (Y3), ρ = u1/un.
To see that β1,β2, . . . ,βm, and βn are well-defined, observe that xn and xm can be directly calculated by
(Y1) and (Y2) for any given task set. Then, xm−1,xm−2, . . . ,x1 can be calculated inductively by (Y3). Finally,
β1,β2, . . . ,βm, and βn can be calculated by (X1), . . . ,(Xm), and (Xn).
Formal derivation of tardiness bounds. Having set up our proof strategy, we next present a critical
mathematical property of the lag and Lag functions when they are viewed as a function of t.
Property 3.1. For a given task τi and a given schedule S, lag(τi, t,S) is a continuous function of t. For a
given schedule S , Lag`(t) is a continuous function of t for each `.
Proof. lag(τi, t,S) is a continuous function of t because, by (3.3), lag(τi, t,S) = A(I,τi,0, t)−A(S,τi,0, t),
and A(I,τi,0, t) and A(S,τi,0, t) are both (clearly) continuous functions of t. Furthermore, since taking
the maximum value of a set of continuous functions is also a continuous function, Lag1(t) is a continuous
function of t. For similar reasons, Lag2(t), Lag3(t), . . . , Lagn(t) are all continuous functions of t as well.
We are now ready to prove our main theorem.
44
Theorem 3.3. At every time instant t ≥ 0, each inequality in the set (B) holds.
Proof. Suppose, to the contrary, that the statement of the theorem is not true, and let tc denote the first time
instant such that any inequality in (B) is false. We show that the existence of tc leads to a contradiction.
Claim 3.3. tc > 0.
Proof. It follows by induction using (Y2) and (Y3) (and our assumption from Section 3.1 that
n ≥ m) that xi > 0 for i = 1,2, . . . ,m. By induction again, this time using (X1), . . . ,(Xm), it
further follows that βi > 0 for i = 1,2, . . . ,m. Finally, by (Xm), (Xn), (Y1), and (Y2), βn =
βm−1 +2Cmax > 0. Thus, because Lagi(0) = 0 holds for all i, all of the inequalities in (B) are
true at time 0, implying that tc > 0.
Let t−c = tc−ε , where ε→ 0+.8 By Claim 3.3, t−c ≥ 0, i.e., t−c is well-defined. Because tc is the first time
instant at which any inequality in (B) is false, all such inequalities hold prior to tc, including at time t−c . Also,
because the length of the interval [t−c , tc) is arbitrarily small, a task scheduled on a processor at time t−c will
be continuously scheduled within [t−c , tc).
We call an inequality in (B) critical if and only if it is false at time tc. If (Bk) is critical, then Lag1(t−c )+
· · ·+Lagk(t−c ) = βk. This is because (Bk) holds for any time instant before tc but is falsified at tc and the
left-hand-side of (Bk) is a continuous function of t, by Property 3.1.9 We now consider two cases, which
depend on which inequalities are critical.
Case 1: (Bn) is critical. In this case,
Lag1(t
−
c )+Lag2(t
−
c )+ · · ·+Lagn(t−c ) = βn. (3.15)
Therefore,
8ε does not have to be infinitely close to 0. Instead, it only needs to be a sufficiently small positive constant. However, the criteria
for “sufficiently small” are rather tedious, so we merely define ε → 0+ here for simplicity. Whenever ε is used, we will further
elaborate on its definition in that context.
9If Lag1(t−c )+ · · ·+Lagk(t−c )< βk, then a time t ∈ [t−c , tc) must exist such that Lag1(t)+ · · ·+Lagk(t) = βk. Therefore, a smaller
ε could have been selected so that t−c = t.
45
Lagm(t
−
c )+Lagm+1(t
−
c )+ · · ·+Lagn(t−c )
= {by (3.15)}
βn− (Lag1(t−c )+Lag2(t−c )+ · · ·+Lagm−1(t−c ))
≥ {because (Bm−1) holds at time t−c }
βn−βm−1
= {by (Xm), (Xn), (Y1), and (Y2)}
2Cmax. (3.16)
Furthermore, by definition, Lagm(t−c )≥ Lagm+1(t−c )≥ ·· · ≥ Lagn(t−c ), so Lagm(t−c ) is at least the average of
these n−m+1 values. Therefore,
Lagm(t
−
c )≥
Lagm(t−c )+Lagm+1(t−c )+ · · ·+Lagn(t−c )
n−m+1
≥ {by (3.16)}
2Cmax
n−m+1
> {because Cmax > 0 and n≥ m}
0. (3.17)
Because Lagm(t−c ) denotes the mth largest lag at time t−c , (3.17) implies that, at time t−c , at least m tasks have
positive lag. Thus, by Lemma 3.1, at least m tasks pending at time t−c . Therefore, all of the m processors
are busy during the time interval [t−c , tc). Thus, by (3.5), the total lag in the system does not increase during
the interval [t−c , tc). That is, Lag1(tc)+Lag2(tc)+ · · ·+Lagn(tc) = ∑ni=1 lag(τi, tc,S)≤ ∑ni=1 lag(τi, t−c ,S) =
Lag1(t−c )+Lag2(t−c )+ · · ·+Lagn(t−c ), which by (3.15), implies
Lag1(tc)+Lag2(tc)+ · · ·+Lagn(tc)≤ βn.
This contradicts the assumption of Case 1 that (Bn) is critical.
46
Case 2: (Bk) is critical for some k such that 1≤ k ≤ m. In this case,
Lag1(t
−
c )+Lag2(t
−
c )+ · · ·+Lagk(t−c ) = βk. (3.18)
Our proof for Case 2 utilizes a number of claims, which we prove in turn.
Claim 3.4. Lagk(t−c )≥ xk.
Proof. If k= 1, then by (3.18), Lag1(t−c )= β1. Also, by (X1), β1 = x1, from which Lag1(t−c )≥ x1
follows. The remaining possibility, 2≤ k ≤ m, is addressed as follows.
Lagk(t
−
c ) = {by (3.18)}
βk− (Lag1(t−c )+Lag2(t−c )+ · · ·+Lagk−1(t−c ))
≥ {since (Bk−1) holds at time t−c }
βk−βk−1
= {by (Xk)}
xk
Claim 3.5. If k ≤ m−1, then Lagk+1(t−c )≤ xk+1.
Proof. Because k ≤ m−1 and (by assumption) n≥ m, Lagk+1(t−c ) is well-defined. The claim is
established by the following reasoning.
Lagk+1(t
−
c )≤ {because (Bk+1) holds at time t−c }
βk+1− (Lag1(t−c )+Lag2(t−c )+ · · ·+Lagk(t−c ))
= {by (3.18)}
βk+1−βk
= {by (Xk+1)}
47
xk+1
Claim 3.6. If k = m and n> m, then Lagk+1(t−c )≤ 0.
Proof. k = m implies that (3.18) can be re-written as
Lag1(t
−
c )+Lag2(t
−
c )+ · · ·+Lagm(t−c ) = βm. (3.19)
Also, because (Bn) holds at time t−c ,
Lag1(t
−
c )+Lag2(t
−
c )+ · · ·+Lagn(t−c )≤ βn. (3.20)
Therefore, given n> m (from the statement of the claim), by (3.19) and (3.20), we have
Lagm+1(t
−
c )+Lagm+2(t
−
c )+ · · ·+Lagn(t−c )≤ βn−βm. (3.21)
Therefore,
Lagm+1(t
−
c )≤ {by (3.21)}
βn−βm− (Lagm+2(t−c )+Lagm+3(t−c )+ · · ·+Lagn(t−c ))
≤ {by Lemma 3.3}
βn−βm− ((−Cmax) · (n−m−1))
= {by (Xn) and (Y1)}
− (n−m−1)Cmax+(n−m−1)Cmax
= {canceling}
0. (3.22)
Because k = m, the claim follows from (3.22).
48
Claim 3.7. If k ≤ m−1 or n> m, then Lagk(t−c )≥ ρ ·Lagk+1(t−c )+Cmax.
Proof. If k ≤ m−1, then
Lagk(t
−
c )≥ {by Claim 3.4}
xk
= {by (Y3)}
ρ · xk+1+Cmax
≥ {by Claim 3.5}
ρ ·Lagk+1(t−c )+Cmax. (3.23)
If k = m (recall that k ≤ m by the specification of Case 2) and n> m, then by Claim 3.6,
Lagm+1(t
−
c )≤ 0. (3.24)
Furthermore,
Lagm(t
−
c )≥ {by Claim 3.4}
xm
= {by (Y2)}
(n−m+1)Cmax
> {because Cmax > 0 and n> m}
Cmax
≥ {by (3.24), and because ρ = u1/un > 0}
ρ ·Lagm+1(t−c )+Cmax. (3.25)
Thus, by (3.23) and (3.25) the claim follows.
In considering the next claim, recall the following: at time t, Lag`(t) denotes the `th largest lag among
all n tasks and T`(t) denotes the set of ` tasks with largest lag values.
49
Claim 3.8. The k tasks in Tk(t−c ) are scheduled on the k fastest processors within [t−c , tc).
Proof. Because ρ = u1/un ≥ 1 and (by assumption) n≥ m, by (Y2) and (Y3), it can be shown
that xk ≥Cmax for 1≤ k ≤ m. Therefore, by Claim 3.4, we have Lagk(t−c )≥ xk ≥Cmax > 0. By
Lemma 3.1, this implies that each of the k tasks in Tk(t−c ) is pending at time t−c . Therefore, by
Policy (G), it suffices to prove that the k tasks in Tk(t−c ) have the k earliest effective deadlines at
time t−c (with no tie with the (k+1)st earliest effective deadline).10 If k = m and n = m, then
there are k tasks in total in the system. In this case, the k pending tasks in Tk(t−c ) clearly have the
k earliest effective deadlines at time t−c .
In the rest of the proof, we consider the remaining possibility, i.e., k ≤ m− 1 or n > m. By
Claim 3.7,
Lagk(t
−
c )≥ ρ ·Lagk+1(t−c )+Cmax. (3.26)
By the definition of Lag, (3.26) implies that for any i and j such that 1≤ i≤ k and k+1≤ j≤ n,
we have Lagi(t−c ) ≥ ρ ·Lag j(t−c )+Cmax. This implies that, for any task τp ∈ Tk(t−c ) and any
task τq /∈ Tk(t−c ), lag(τp, t−c ,S) ≥ ρ · lag(τq, t−c ,S)+Cmax. Therefore, if a task τq /∈ Tk(t−c ) is
pending at time t−c , then by Lemma 3.4, its effective deadline is strictly greater than the effective
deadline of any task τp ∈ Tk(t−c ); if τq is not pending at time t−c , then it has no pending jobs and
no effective deadline by definition. Therefore, the k tasks in Tk(t−c ) have the k earliest effective
deadlines at time t−c . The claim follows.
Claim 3.9. Tk(t−c ) = Tk(tc).
Proof. If k = m and n = m, then there are k tasks in total in the system, so the claim clearly
holds. Therefore, in the rest of the proof, we assume k ≤ m−1 or n > m, which implies that
either k ≤ m−1 holds or k = m and n > m hold, by the specification of Case 2. If k ≤ m−1,
then
Lagk(t
−
c )≥ {by Claim 3.4}
10Note that ε can be selected to be small enough to ensure that no scheduling event happens within the interval [t−c , tc), including job
completions. Therefore, any task scheduled at time t−c will continuously execute during time interval [t−c , tc) on the same processor.
50
xk
= {by (Y3)}
ρ · xk+1+Cmax
≥ {because ρ = u1/un ≥ 1 and xk+1 ≥ 0}
xk+1+Cmax
≥ {by Claim 3.5}
Lagk+1(t
−
c )+Cmax. (3.27)
If k = m and n> m, then
Lagk(t
−
c )≥ {by Claim 3.4 and because k = m}
xm
= {by (Y2)}
(n−m+1)Cmax
> {because n> m}
Cmax
≥ {because Lagk+1(t−c )≤ 0, by Claim 3.6}
Lagk+1(t
−
c )+Cmax. (3.28)
Thus, for k ≤ m−1 or n> m, by (3.27) and (3.28), we have
Lagk(t
−
c )−Lagk+1(t−c )≥Cmax. (3.29)
By the definition of Lag, (3.29) implies that for any i and j such that 1≤ i≤ k and k+1≤ j≤ n,
we have Lagi(t−c )−Lag j(t−c )≥Cmax. This implies that, for any task τp ∈ Tk(t−c ) and any task
τq /∈ Tk(t−c ), we have
lag(τp, t−c ,S)− lag(τq, t−c ,S)≥Cmax. (3.30)
51
Therefore,
lag(τp, tc,S)− lag(τq, tc,S)
= {by (3.4)}
lag(τp, t−c ,S)+A(I,τp, t−c , tc)−A(S,τp, t−c , tc)
− lag(τq, t−c ,S)−A(I,τq, t−c , tc)+A(S,τq, t−c , tc)
≥ {since 0≤ A(I,τi, t1, t2)≤ u1 · (t2− t1) and 0≤ A(S,τi, t1, t2)≤ s1 · (t2− t1)}
lag(τp, t−c ,S)+0− ε · s1− lag(τq, t−c ,S)− ε ·u1+0
≥ {by (3.30) and rearranging}
Cmax− ε · (s1+u1)
> {because Cmax > 0 and ε <Cmax/(s1+u1)}11
0.
Thus, at time tc, any task in Tk(t−c ) has a strictly greater lag than any task not in Tk(t−c ). This
implies that Tk(tc) and Tk(t−c ) consist of the same set of tasks.
By Claim 3.8 and Claim 3.9, the k tasks in Tk(tc) are continuously scheduled on the k fastest processor
during time interval [t−c , tc), so
∑
τi∈Tk(tc)
A(S,τi, t−c , tc) = Sk · ε. (3.31)
Also, by (3.1),
∑
τi∈Tk(tc)
A(I,τi, t−c , tc)≤ ∑
τi∈Tk(tc)
ui · ε ≤Uk · ε. (3.32)
Therefore,
Lag1(tc)+Lag2(tc)+ · · ·+Lagk(tc)
= {by Claim 3.9 and by (3.4)}
Lag1(t
−
c )+Lag2(t
−
c )+ · · ·+Lagk(t−c )+ ∑
τi∈Tk(tc)
A(I,τi, t−c , tc)− ∑
τi∈Tk(tc)
A(S,τi, t−c , tc)
11ε can be selected small enough to ensure ε <Cmax/(s1 +u1).
52
≤ {by (3.31) and (3.32), and because (Bk) holds at time t−c }
βk +(Uk−Sk) · ε
≤ {by (3.5) and (3.6); note that Um ≤Un}
βk.
This contradicts the assumption of Case 2 that (Bk) is critical.
Finishing up. We have shown that both Case 1 and Case 2 lead to a contradiction. That is, none of the
conditions (B1), . . . ,(Bm), or (Bn) is critical at tc. This contradicts the definition of tc as the first time instant
at which some inequality in (B) is false, i.e., such a tc does not exist. The theorem follows.
Using Theorem 3.3, we can easily derive a tardiness bound for every task as follows.
Theorem 3.4. In S, the tardiness ∆i of task τi is bounded as follows.
If ρ = 1, then ∆i ≤ n ·Cmaxui ;
if ρ > 1, then ∆i ≤
ρm−1 · (n−m+1)Cmax+ ρ
m−1−1
ρ−1 ·Cmax
ui
.
Proof. By Theorem 3.3, Lag1(t)≤ β1 holds at every time instant t. By the definition of Lag1(t), this implies
that, for each i, lag(τi, t,S)≤ β1 holds for all t. Thus, by Corollary 3.1, task τi has a tardiness bound of β1/ui.
By (X1), β1 = x1, so to complete the proof, we merely need to calculate x1.
If ρ = 1 (i.e., u1/un = 1, which implies that every task has the same utilization), then by (Y3), x1 =
xm+(m−1)Cmax. Thus, by (Y2), we have x1 = n ·Cmax.
If ρ > 1, then rearranging (Y3) results in
xi+
Cmax
ρ−1 = ρ
(
xi+1+
Cmax
ρ−1
)
.
By iterating this recurrence, we have
x1+
Cmax
ρ−1 = ρ
m−1 ·
(
xm+
Cmax
ρ−1
)
.
53
Finally, applying (Y2) yields
x1 = ρm−1 · (n−m+1)Cmax+ ρ
m−1−1
ρ−1 ·Cmax.
The theorem follows.
Discussion. Theorem 3.4 provides a tardiness bound for any task in a VPP task system that satisfies (3.5) and
(3.6). As shown in Section 3.5.1, any sporadic task is a special case of a VPP task, and therefore the tardiness
bound in Theorem 3.4 also applies to a sporadic task system that satisfies (3.5) and (3.6). Furthermore, as
shown in Section 3.2, (3.5) and (3.6) are a necessary and sufficient SRT-feasibility condition for sporadic task
systems on a uniform multiprocessor. Thus, Theorem 3.4 leads to the conclusion that preemptive G-EDF is
SRT-optimal for sporadic task systems on uniform multiprocessors.
In addition, Theorem 3.4 also shows that (3.5) and (3.6) are a sufficient feasibility condition for VPP
task systems on a uniform multiprocessor, and the necessity can be shown by the same reasoning for sporadic
task systems in Section 3.2. Thus, (3.5) and (3.6) are a necessary and sufficient SRT-feasibility condition and
preemptive G-EDF is SRT-optimal also for VPP task systems on a uniform multiprocessor.
3.6 Chapter Summary
In this chapter, we have considered the problem of using G-EDF to schedule sporadic SRT tasks
on a uniform multiprocessor. We have shown that non-preemptive G-EDF is not SRT-optimal for uniform
multiprocessors by providing a counterexample, and this negative result in fact applies to any work-conserving
non-preemptive scheduling algorithm. On the other hand, we have proved that preemptive G-EDF is indeed
SRT-optimal for uniform multiprocessors by deriving tardiness bounds for an arbitrary SRT-feasible system,
and this result applies to the VPP task model, which is a more general task model than the sporadic task
model.
54
CHAPTER 4: SEMI-PARTITIONED SCHEDULING ON UNIFORM PLATFORMS1
In this chapter, we continue to study the problem of scheduling a set of sporadic tasks on a uniform
multiprocessor. In contrast to the global scheduling approach studied in the prior chapter, we will focus on
semi-partitioned scheduling in this chapter. Traditionally, a multiprocessor scheduling algorithm may follow
a global approach, in which any migration is allowed, or a partitioned approach, in which no migration is
allowed. As a hybrid of these two approaches, a semi-partitioned scheduling algorithm allows only a limited
number of tasks to migration but requires all remaining ones to be fixed to processors.
We will present two semi-partitioned scheduling algorithms designed for uniform multiprocessors in this
chapter, namely EDF-sh (earliest-deadline-first-based semi-partitioned scheduler for uniform heterogeneous
multiprocessors) and EDF-tu (earliest-deadline-first-based tunable scheduler for uniform platforms). In
addition to only allowing a limited number of tasks to migrate, EDF-sh further requires these tasks to migrate
at job boundaries only, at the cost of supporting SRT tasks only and being not SRT-optimal. In contrast,
EDF-tu is always SRT-optimal, and may be HRT-optimal if a tunable parameter, called frame size, divides all
task periods. For SRT tasks, any frame size can be selected, and tardiness is upper bounded by the value
of the frame size as long as the system is feasible. However, a smaller frame size potentially leads to more
frequent preemptions and migrations.
Organization. In the following sections, we first introduce the common system model for both algorithms
(Section 4.1), and then present EDF-sh (Section 4.2) and EDF-tu (Section 4.3) respectively in details.
4.1 System Model
In this chapter, we consider scheduling n sporadic tasks on m processors, where n ≥ m. Processor p
is identified by its speed sp (1 ≤ p ≤ m, sp ∈ R). We also assume implicit deadlines, i.e., each task has a
1Contents of this chapter previously appeared in preliminary form in the following papers:
Yang, K. and Anderson, J. (2014b). Soft real-time semi-partitioned scheduling with restricted migrations on uniform heterogeneous
multiprocessors. In Proceedings of the 22nd International Conference on Real-Time Networks and Systems, pages 215–224.
Yang, K. and Anderson, J. (2015b). An optimal semi-partitioned scheduler for uniform heterogeneous multiprocessors. In
Proceedings of the 27th Euromicro Conference on Real-Time Systems, pages 199–210.
55
relative deadline equal to its period. Thus, a task τi can be specified by τi = (Ci,Ti), where Ci is its worst-case
execution requirement and Ti is its period. We define the utilization of a task τi as
ui =
Ci
Ti
. (4.1)
A job is an invocation of a task; the jth job of task τi is denoted τi, j. ri, j is its release time and di, j is its
absolute deadline, where di, j = ri, j +Ti. The tardiness of a job τi, j that completes at time tc is defined as
max{0, tc−di, j}, while its lateness is tc−di, j. The two differ only if τi, j completes before its deadline, in
which case its tardiness is zero but its lateness is negative. The speed of a processor refers to the amount of
work completed in one time unit when a job is executed on that processor. Moreover, jobs of the same task
cannot execute in parallel, i.e., a job can commence execution only when all prior jobs of the same task have
finished.
We index the processors in non-increasing-speed order, i.e., pi = {s1,s2, · · · ,sm}, where sp ≥ sp+1 for
p∈ {1,2, · · · ,m−1}; and we also index the tasks in non-increasing-utilization order, i.e., τ = {τ1,τ2, · · · ,τn},
where ui ≥ ui+1 for i ∈ {1,2, · · · ,n−1}. We also define Uk = ∑ki=1 ui and Sk = ∑ki=1 si.
Feasibility Condition. As shown in Section 3.2, the SRT-feasibility condition matches the HRT-feasibility
condition for scheduling implicit-deadline sporadic tasks on a uniform multiprocessor, and therefore we omit
the “SRT” and “HRT” for feasibility in this case. That is, a set of sporadic tasks τ is feasible on a uniform
multiprocessor pi if and only if
Un ≤ Sm (4.2)
and
Uk ≤ Sk for k = 1,2, · · · ,m−1. (4.3)
Fixed and migrating tasks. Under semi-partitioned scheduling, each task is allocated a non-zero share
on certain processors such that the total allocated share on each processor does not exceed the processor’s
capacity and the total allocated share of a task matches its utilization. If a task has non-zero shares on only
one (multiple) processor(s), then it is a fixed (migrating) task.
56
4.2 EDF-sh
In this section, we present the first algorithm, EDF-sh. Like many other semi-partitioned algorithms, it
has an assignment phase and an execution phase. We first discuss utilization constraints that are required
under EDF-sh and describe its two phases (Section 4.2.1), and then prove tardiness bounds for task systems
under EDF-sh (Section 4.2.2). Finally, we present an evaluation (Section 4.2.3).
4.2.1 Algorithm EDF-sh
We design EDF-sh by extending EDF-os (Anderson et al., 2016) to uniform multiprocessors.
As a result, EDF-sh inherits most of the advantages of EDF-os, such as:
• Under EDF-sh, every job has bounded tardiness.
• Migrations are boundary-limited.
• The underlying platform can be fully utilized, i.e., Un can be as large as Sm.
In the tardiness-bound proof for EDF-os, and for EDF-sh here, it is essential that each task executes only
on processors that have a speed at least its utilization without overutilizing any processor. Unfortunately,
this cannot be ensured for all feasible task systems (recall (4.2) and (4.3)). For example, a task system
τ = {(2,1),(2,1)} to be scheduled on pi = {3,1} is feasible, but if we assign jobs of each task only to
processors with a speed at least its utilization, then the first processor will be overutilized. Because of this
difficulty, we further restrict task utilizations slightly by requiring
∑
ui>sk
ui ≤ ∑
sp>sk
sp for k = 1,2, · · · ,m, (4.4)
which is a little more restrictive than (4.3). Nevertheless, the total utilization Un can be as large as the total
speed Sm.
Note that (4.4) implies (4.3). Thus, we omit (4.3) and hence let (4.2) and (4.4) be our task system
utilization restriction for EDF-sh in Section 4.2.
Similarly to EDF-os, EDF-sh has two phases, an assignment phase and an execution phase. In the
assignment phase, we consider tasks in non-increasing-utilization order. When considering a task, we first
check the current available capacity of each processor to see if this task can be fixed. If so, we assign this task
57
Algorithm 1 EDF-sh task assignment phase
initially ψi,p = 0 and σp = 0 for all i and p index tasks in a non-increasing-utilization order index processors
in a non-increasing-speed order /* p is the index of the last processor to which a
migrating task was assigned (or 1, if no migrating task has been assigned yet).
sp is the first processor for next migrating task if its capacity has not been
exhausted yet. */
p := 1;
for i := 1 to n do
/* If task τi can be fixed, then we assign it to be fixed task via worst-fit
here. */
Select k that sk−σk is maximal;
if sk−σk ≥ ui then
ψi,k = ui;
σk = σk +ui;
else
/* If task τi has to migrate, then we assign its shares on processors to
exhaust processor capacities in turn from the fastest one to the slowest
one. */
remaining := ui;
repeat
ψi,p := min(remaining,(sp−σp))
σp := σp+ψi,p;
remaining := remaining−ψi,p;
if σp = sp then
p := p+1;
end
until remaining = 0;
end
end
to some processor as a fixed task via a bin-packing heuristic. The specific heuristic does not matter in terms
of theoretical schedulability; we choose to use worst-fit here. The assignment phase of EDF-sh is defined
by the pseudo-code in Algorithm 1, where ψi,p denotes the share (which potentially can be zero) of task
τi on processor p and the total share allocation on processor p is denoted as σp = ∑nk=1ψk,p. Algorithm 1
maintains that no processor is overutilized, i.e., σp ≤ 1.0 holds for all p. Also, the total share allocation of a
task τi matches its utilization, i.e., ∑mk=1ψi,k = ui
We use ϕi,p to denote the long-term fraction of task τi’s jobs that execute on processor p. ϕi,p is
commensurate with the share allocated:
ϕi,p =
ψi,p
ui
. (4.5)
The set of all fixed tasks on sp is denoted τ fp , and σ fp = ∑τi∈τ fp ψi,p.
58
τ1=(5,6)
τ5=(2,3)
τ2=(6,9) τ3=(4,6) τ4=(2,3)
τ5=(2,3)
τ5=(2,3)
τ7=(1,6)
τ6=(10,30)
5
6 2
3
2
3
2
3
1
6
1
6
1
6
1
3
1
3
100%
0%
Processor 
Capacity
Processor 
ID
1 2 3 4
Figure 4.1: EDF-sh task assignment for Example 4.1. This is the same system as in Example 2.2, but EDF-sh has a
different assignment from EDF-os.
Example 4.1. To illustrate the difference between the assignment phases of EDF-os and EDF-sh, we revisit
the system in Example 2.2. Note that any identical multiprocessor is a uniform one (with sp = 1 for all p), so
EDF-sh also works on identical multiprocessors. The assignment of the first five tasks by EDF-sh is exactly
the same as that by EDF-os. However, we will attempt to make all remaining tasks fixed as well, and this
results in τ6 being fixed on processor 4 and thereafter τ7 being fixed on processor 3. That is, EDF-sh will
have only one migrating task for this system. Figure 4.1 shows the resulting assignment by EDF-sh. ♦
Example 4.2. We now give an example of the task assignment phase of EDF-sh for the case where processor
speeds are different. In this example, we have a uniform multiprocessor pi = {4,2,2,1}, upon which a set
of sporadic tasks τ = {(3,1),(11,6),(5,3),(4,3),(1,2),(2,6),(1,3)} will be scheduled. Via the worst-fit
heuristic, τ1, τ2, and τ3 are assigned as fixed tasks to s1, s2, and s3, respectively. Thereafter, no single
processor has enough capacity to fix τ4, so τ4 must migrate. It is assigned non-zero shares on s1, s2, and s3.
However, next, τ5 and τ6 can be fixed, specifically on s4. Finally, τ7 must migrate between s3 and s4. The
resulting task assignment is depicted in Figure 4.2. For the two migrating tasks, s3 is the last processor of τ4,
and s4 is the last processor of τ7. ♦
The assignment phase of EDF-sh ensures the following properties.
59
τ1=(3,1)
τ2=(11,6) τ3=(5,3)
τ5=
(1,2)
τ4=(4,3)
τ7=(1,3)τ4=(4,3) τ7=
(1,3)
1
2
1
6
1
6
5
33
s1 = 4
Processor 
Capacity
Processor 
Speed
100%
0%
s2 = 2 s3 = 2 s4 = 1
τ4=(4,3) 1
6
1
11
6
τ6=
(2,6)
1
6
1
3
Figure 4.2: EDF-sh task assignment for Example 4.2. The width of each column indicates the processor speed.
Property 4.1. There are no more than two migrating tasks on sp. If there are exactly two migrating tasks on
sp, then sp is the last processor for exactly one of them.
This property follows from the assignment procedure, and it can be proved by induction.
By Property 4.1, we know that a processor sp will have at most two migrating tasks, and if exactly two,
then they must be of different priorities. Therefore, if there is only one migrating task on a given processor
sp, then we use τl to denote that task; if there are two, then we let τl (τh) denote the migrating task with lower
(higher) priority.
Property 4.2. For any processor sp, σ fp +ψh,p+ψl,p ≤ sp. (If for sp, τh and/or τl do not exist, then we just
consider ψh,p and/or ψl,p to be zero.)
This property holds because in our assignment phase, we do not overutilize any processor, i.e., we always
maintain σp ≤ sp.
Property 4.3. A task has non-zero shares only on processors that have a speed at least its utilization.
Proof. This property clearly holds for fixed tasks. We show that it holds for migrating tasks as well by
contradiction.
60
Suppose there is some migrating task that violates this property. Let τa be the first migrating task to do
so, and let sq be the first processor such that sq < ua where τa is assigned a non-zero share. In Algorithm 1,
we do not assign a migrating task a non-zero share on a slower processor unless the capacity of every faster
one has been exhausted. By the definition of τa, Property 4.3 holds for all previously assigned migrating
tasks and hence all previously assigned tasks, since it trivially holds for any fixed task. Moreover, because
tasks are considered in non-increasing-utilization order, every such prior task has a utilization at least ua and
therefore larger than sq. These facts imply that all prior tasks have been assigned shares only on the first q−1
processors, and including τa, the total allocated shares of the first a tasks exceeds the capacity of the first
q−1 processors. Thus, we have
a
∑
i=1
ui > Sq−1. (4.6)
Since the processors are indexed in non-increasing-speed order,
Sq−1 ≥ ∑
sp>sq
sp. (4.7)
Since ua > sq and the tasks are indexed in non-increasing-utilization order, ui ≥ ua > sq holds for all i≤ a.
That is,
∑
ui>sq
ui ≥
a
∑
i=1
ui. (4.8)
By (4.6), (4.7), and (4.8), we have
∑
ui>sq
ui > ∑
sp>sq
sp,
which contradicts (4.4). Thus, no such τa exists and therefore Property 4.3 holds.
Property 4.4. When we assign a migrating task a non-zero share on a processor, there must be at least one
fixed task on that processor.
Proof. Suppose this property is violated for the first time when migrating task τi is assigned a non-zero
share on processor sp, i.e., there is no fixed task on sp. Since τi is the first migrating task that violates
Property 4.4, no other migrating task is assigned a non-zero share on sp either. Because no prior task (fixed
or migrating) is assigned a non-zero share on sp and sp ≥ ui (by Property 4.3), τi would be assigned as fixed
on sp, which contradicts our assumption that it is a migrating task. Thus, no such τi exists and hence this
property holds.
61
Property 4.5. If there are exactly two migrating tasks on sp, i.e., ψh,p > 0 and ψl,p > 0, then ψh,p+ul < sp.
Proof. By Property 4.4, there must be at least one fixed task τa that was assigned to sp before the two
migrating ones are assigned shares on sp. Since the tasks are considered in non-increasing-utilization order,
we have ul ≤ ua. Also, by the definition of σ fp , σ fp ≥ ua, so ul ≤ σ fp . Therefore,
ul +ψh,p+ψl,p ≤σ fp +ψh,p+ψl,p
≤{by Property 4.2}
sp.
Because ψl,p > 0 here, we get ψh,p+ul < sp.
In the execution phase, every fixed task will only release jobs on the processor to which it assigned,
whereas jobs of migrating tasks will be assigned to processors in the same way as in EDF-os (Anderson et al.,
2016). Therefore, the following property from EDF-os holds for EDF-sh as well.
Property 4.6. For any k consecutive jobs of a migrating task τi, at most ϕi,p · k+2 of them are assigned to
processor sp.
Furthermore, the following scheduling rules are applied on each processor in the execution phase.
• Jobs of migrating tasks are statically prioritized over those of fixed tasks.
• Jobs of fixed tasks are prioritized against each other on an EDF basis.
• On a migrating task’s last processor, its priority is lower than other migrating tasks, but still higher
than fixed ones.
4.2.2 Tardiness Bounds
In this section, we prove tardiness bounds for EDF-sh. We consider migrating tasks and fixed task
separately in Sections. 4.2.2.1 and 4.2.2.2. Moreover, for migrating tasks, rather than tardiness, we upper
bound lateness for each task.
62
4.2.2.1 Migrating Tasks
In this subsection, we derive lateness bounds for migrating tasks. Since migrating tasks are statically
prioritized over fixed ones, we can ignore all fixed tasks when considering migrating ones.
Lemma 4.1. If the lateness of jobs of task τi is upper bounded by ∆i, then in the time interval [t0, tc), the
demand from τi on processor sp is less than
ψi,p · (tc− t0)+ψi,p · (2Ti+∆i)+2Ci.
Proof. Because lateness is upper bounded by ∆i, the jobs of τi released before t0− (Ti+∆i) complete their
execution by t0. Therefore, in the time interval [t0, tc), the demand from τi can only come from its jobs
released in [t0−Ti−∆i, tc). τi can release at most
⌈ tc−(t0−Ti−∆i)
Ti
⌉
jobs in [t0−Ti−∆i, tc). By Property 4.6,
at most ϕi,p ·
⌈ tc−(t0−Ti−∆i)
Ti
⌉
+2 of them are assigned to processor sp. Thus, in the time interval [t0, tc), the
demand from τi on processor sp is at most
(
ϕi,p ·
⌈
tc− (t0−Ti−∆i)
Ti
⌉
+2
)
·Ci
<{since dxe< x+1}(
ϕi,p ·
(
tc− (t0−Ti−∆i)
Ti
+1
)
+2
)
·Ci
={simplifying}(
ϕi,p · tc− t0+2Ti+∆iTi +2
)
·Ci
={by (4.1) and (4.5)}
ψi,p · (tc− t0)+ψi,p · (2Ti+∆i)+2Ci.
The lemma follows.
According to the following lemma, if we assume that all the jobs of a migrating task are moved to its
last processor, then the lateness of these jobs under this assumption upper bounds their lateness in the actual
schedule. This analysis device was first used by Anderson et al. (2016), but assuming such jobs are moved to
63
the task’s first processor. We must instead consider the last processor, because in our case processors may
have different speeds. Thus, we must conservatively assume such moves are with respect to a task’s slowest
processor.
Lemma 4.2. If we execute all jobs of a migrating task on its last processor rather than the processors where
these jobs are actually assigned, then no job of this task will complete its execution earlier. Moreover such
job moves do not impact the other migrating task on this processor (if one exits).
Proof. A migrating task has the highest priority on any processor that is not its last processor. Also, on
any non-last processor, a migrating task is executed at a speed at least that of its last processor, since the
processors are indexed from fastest to slowest. Thus, the first part of Lemma 4.2 holds.
The second part of Lemma 4.2 follows because on its last processor, a migrating task has statically lower
priority than any other migrating task.
Property 4.3 and Property 4.5 ensure that, with respect to migrating tasks, such job moves will not
overutilize the last processor of the considered task.
We now compute a lateness bound for each migrating task assuming its jobs are moved as described
above. If a task is the only migrating task on its last processor, then its lateness bound can be computed
directly; if a migrating task τl shares its last processor with another migrating task τh, then its lateness bound
depends on the lateness bound of τh, which can be computed inductively by the formula in Theorem 4.1.
Lemma 4.3 below ensures that the base case exists.
Lemma 4.3. The migrating task with the largest index does not share its last processor with any other
migrating task.
Proof. Follows directly from Algorithm 1.
Theorem 4.1. Consider a migrating task τl that has sp as its last processor. If it shares sp with some other
migrating task τh, then τh is the only such task (by Property 4.1). Let ∆h be the lateness bound of τh. Then τl
has lateness at most
∆l
def
=

ψh,p·(2Th+∆h)+2Ch+Cl
sp−ψh,p −Tl if τh exists,
Cl
sp
−Tl otherwise.
(4.9)
64
Proof. By Lemma 4.2, we can upper bound the lateness of all jobs of τl by assuming that all such jobs
execute on sp. We make that assumption here.
If τl does not share its last processor sp with any other migrating task, i.e., τh does not exist, then τl has
the highest priority on sp, and by Property 4.3, sp ≥ ul . Therefore, every job of τl completes its execution
within Clsp time units of its release. Thus, lateness is upper bounded by
Cl
sp
−Tl .
In the remainder of the proof, we consider the case where τh does exist. In this case, we prove Theorem 4.1
by contradiction.
Interval [t0, tc). Let τl, j be the first job of τl that has lateness exceeding ∆l and define tc
def
= dl, j +∆l . Let t0 be
the latest time instant before tc such that sp is idle for migrating tasks, i.e., all jobs of τl or τh released before
t0 have completed execution by t0 and a job of τl and/or τh is released at t0. t0 is well defined because if no
such time instant exists within (0, tc), then time 0 must be such a time instant.
Demand from τh. By Lemma 4.1, in the time interval [t0, tc), the demand from τh on sp is less than
ψh,p · (tc− t0)+ψh,p · (2Th+∆h)+2Ch. (4.10)
Demand from τl . The demand from τl on sp comes from jobs of τl released before rl, j and τl, j itself. By
the definition of t0, the number of such jobs released before rl, j is at most
⌊ rl, j−t0
Tl
⌋
. Including τl, j itself, at
most
⌊ rl, j−t0
Tl
⌋
+1 jobs of τl create demand in the interval. Thus, in the time interval [t0, tc), the demand due to
τl on processor sp is at most
(⌊rl, j− t0
Tl
⌋
+1
)
Cl ≤
(rl, j− t0
Tl
+1
)
Cl
={by (4.1)}
ul(rl, j− t0)+Cl. (4.11)
For the purpose of minimizing redundancy in expressions, we define
K = ψh,p · (2Th+∆h)+2Ch+Cl. (4.12)
65
Using this definition, by (4.10), (4.11), and (4.12), the total demand within [t0, tc) due to migrating tasks is
less than
K+ψh,p · (tc− t0)+ul(rl, j− t0)
={rearranging}
K+ψh,p · (tc− rl, j)+(ψh,p+ul)(rl, j− t0)
<{by Property 4.5}
K+ψh,p · (tc− rl, j)+ sp(rl, j− t0).
Since for contradiction, we assumed that τl, j has lateness exceeding ∆l , i.e., τl, j completes execution after tc,
the total demand in the time interval [t0, tc) is greater than the total supply in this interval, which is sp(tc− t0).
This implies
K+ψh,p · (tc− rl, j)+ sp(rl, j− t0)> sp(tc− t0). (4.13)
By simplifying (4.13), we have
K > (tc− rl, j)(sp−ψh,p). (4.14)
By Property 4.5, we have ψh,p+ul < sp. Because ul > 0, this implies ψh,p < sp, i.e.,
sp−ψh,p > 0. (4.15)
By (4.14) and (4.15),
tc− rl, j < Ksp−ψh,p . (4.16)
Replacing the right-hand side of (4.16) by the definition of K in (4.12) and the definition of ∆l in (4.9), we
have
tc− rl, j < ∆l +Tl. (4.17)
Since Tl = dl, j−rl, j, (4.17) implies tc < dl, j+∆l , which contradicts the definition of tc Thus, such an assumed
job τl, j with lateness exceeding ∆l does not exist. Hence, Theorem 4.1 holds.
66
4.2.2.2 Fixed Tasks
In this subsection, instead of lateness, we consider tardiness directly.
To begin with, note that if no migrating task assigns jobs to a processor, then all of the fixed tasks on
that processor have a tardiness bound of zero, since EDF is optimal for uniprocessor scheduling and by
Property 4.2 we do not overutilize any single processor.
Theorem 4.2 below provides a tardiness bound for a fixed task that executes on a processor where
migrating task(s) also execute. In this case, the tardiness bound for the fixed task depends on the lateness
bound(s) for the migrating task(s) on the same processor, which can be computed by Theorem 4.1. By
Property 4.1, at most two migrating tasks have non-zero shares on a processor.
Theorem 4.2. Suppose that one or two migrating tasks have non-zero shares on processor sp. If two, let τl
(τh) be the one with lower (higher) priority; if only one, let τl denote that task and consider τh to be a “null”
task with Ch = 0, ψh,p = 0, and Th = 1. Then, a fixed task τi on sp has tardiness at most
∆i =
ψl,p · (2Tl +∆l)+2Cl +ψh,p · (2Th+∆h)+2Ch
sp−ψl,p−ψh,p . (4.18)
Proof. This proof is similar to that of Theorem 4.1.
Interval [t0, tc). Let τi, j be the first job of any fixed task on sp that has tardiness exceeding the bound in (4.18)
and define tc
def
= di, j +∆i. Let t0 be the latest idle time instant before tc, i.e., all jobs that were released before
t0 and with a priority at least τi, j’s priority have completed execution by t0 and at least one job with a priority
at least τi, j’s priority is released at t0. t0 is well-defined because if no such time instant exists within (0, tc),
then time 0 must be a such time instant.
Demand from mirgrating tasks. By Lemma 4.1, in [t0, tc), the demand from τl on sp is less than
ψl,p · (tc− t0)+ψl,p · (2Tl +∆l)+2Cl, (4.19)
and the demand from τh on sp is less than
ψh,p · (tc− t0)+ψh,p · (2Th+∆h)+2Ch. (4.20)
67
Demand from fixed tasks. A fixed task τk can release at most
⌊di, j−t0
Tk
⌋
jobs with a priority at least τi, j’s
priority in the interval [t0, tc). Thus, the demand from fixed tasks in [t0, tc) is at most
∑
τk∈τ fp
⌊di, j− t0
Tk
⌋
·Ck ≤(di, j− t0) · ∑
τk∈τ fp
Ck
Tk
={by the definition of σ fp }
(di, j− t0) ·σ fp
≤{by Property 4.2}
(di, j− t0)(sp−ψl,p−ψh,p). (4.21)
For the purpose of minimizing redundancy in expressions, we define
K = ψl,p · (2Tl +∆l)+2Cl +ψh,p · (2Th+∆h)+2Ch. (4.22)
Using this definition, by (4.19), (4.20), (4.21), and (4.22), the total demand within [t0, tc) is at most
K+(ψl,p+ψh,p)(tc− t0)+(sp−ψl,p−ψh,p)(di, j− t0).
Since for the purpose of contradiction, we assume τi, j has tardiness exceeding ∆i, i.e., τi, j completes execution
after tc, the total demand in the time interval [t0, tc) is greater than the total supply in the interval which is
sp(tc− t0). That is,
K+(ψl,p+ψh,p)(tc− t0)+(sp−ψl,p−ψh,p)(di, j− t0)> sp(tc− t0). (4.23)
By simplifying (4.23), we have
K > (tc−di, j)(sp−ψl,p−ψh,p). (4.24)
By Property 4.2, we have σ fp +ψh,p+ψl,p ≤ sp. Because σ fp > 0, this implies
sp−ψl,p−ψh,p ≥ σ fp > 0. (4.25)
68
By (4.24) and (4.25),
tc−di, j < Ksp−ψl,p−ψh,p . (4.26)
Replacing the right-hand side of (4.26) by the definition of K in (4.22) and the definition of ∆i in (4.18), we
have
tc−di, j < ∆i. (4.27)
(4.27) implies tc < di, j +∆i, which contradicts the definition of tc. Thus, such an assumed job τi, j with
tardiness exceeding ∆i does not exist. Hence, Theorem 4.2 holds.
4.2.3 Evaluation
To evaluate how restrictive the assumed per-task utilization constraint is and the effectiveness of EDF-sh,
we conducted experiments to assess schedulability and tardiness bounds for EDF-sh.
When conducting such experiments for identical multiprocessors, the assumed platform is implicitly
determined by an assumed total processor capacity, or the number of processors. However, for uniform
multiprocessors, processor speeds must be defined. Given a total processor capacity, there are a infinite
number of speed choices from which to select. Because only selected choices can be considered, no evaluation
can be exhaustive. In our experiments, we considered systems of eight processors with a total processor
capacity of 36. We considered four such platforms, with speeds as follows: pi1 = {6,6,6,6,3,3,3,3},
pi2 = {8,8,4,4,4,4,2,2}, pi3 = {8,7,6,5,4,3,2,1}, pi4 = {15,3,3,3,3,3,3,3}.
The process of randomly generating feasible task systems for the considered platforms also varies from
that for identical ones. For feasibly scheduling tasks on an identical multiprocessor, the per-task utilization
bound is fixed for every task to be 1.0. However, per-task utilization bounds for feasibly scheduling tasks on
a uniform multiprocessor must instead follow (4.3). As such, before generating a new task, we calculated a
per-task utilization cap for it by (4.3), considering previously generated tasks. We then selected the utilization
of that task uniformly at random between zero and the computed cap. This generation process terminates
when the total utilization of all generated tasks exceeds or equals a pre-set total utilization limit. The
utilization of the last generated task is then adjusted so that the total generated utilization matches the pre-set
limit.
We require the number of tasks n to be at least the number of processors m. To ensure this, whenever
n < m held for a generated system, a task was chosen at random and replaced by two tasks with half the
69
utilization of the original one (this process was repeated as necessary). Given a platform and a total task
system utilization, having fewer (more) tasks means having higher (lower) expected per-task utilizations. To
reflect these two extremes, we defined the minimum number of the tasks to be either eight (fewer but heavier
tasks) or 32 (more but lighter tasks) for every considered platform. Also, we selected each task’s execution
requirement uniformly from [5,25] and calculated its period from its utilization and execution requirement.
In all experiments in this section, we varied total utilization within [0,36] by increments of 0.5, and for each
total utilization, we generated 10,000 feasible task sets.
We compare the results with EDF-ms (Leontyev and Anderson, 2007a), which also supports processors
of different speeds as reviewed in Section 2.3.3.
Schedulability.
EDF-sh has the same utilization restrictions (i.e., (4.2) and (4.4)) as EDF-ms, but EDF-sh can support
platforms in which speed groups exist with only one processor, while EDF-ms requires each such group to
have at least two processors. For this reason, EDF-sh dominates EDF-ms in terms of SRT schedulability.
Given this provable dominance over EDF-ms, our assessment of schedulability under EDF-sh focuses
on determining the fraction of randomly generated feasible systems (as defined by (4.2) and (4.3)) it can
successfully schedule for every given total utilization and every platform. Figure 4.3 shows the results of
these experiments. More than 87% of the generated systems were SRT-schedulable under EDF-sh. In general,
the smaller the difference among processor speeds, the better the schedulability. This makes sense, since for
identical multiprocessors, EDF-sh is SRT-optimal, like EDF-os. Furthermore, when there are many lighter
tasks instead of few heavier ones, schedulability is quite close to optimal.
Tardiness Bounds.
We also compared tardiness bounds under EDF-sh to those under EDF-ms. Since EDF-ms requires each
speed group to have at least two processors, i.e., it does not apply to pi3 and pi4, we only computed tardiness
bounds for pi1 and pi2. We compared EDF-sh and EDF-ms in terms of both maximum absolute tardiness
bounds and maximum relative tardiness bounds, where the latter is defined as the ratio of a task’s tardiness
bound to its period. Figure 4.4 shows absolute tardiness bounds and Figure 4.5 shows relative tardiness
bounds.
In most cases, EDF-sh exhibits significantly lower maximum tardiness bounds than EDF-ms. The only
exception to this is when total utilization is close to overutilizing the platform, and even then, EDF-sh is
never substantially worse.
70
0 6 12 18 24 30 36
0%
20%
40%
60%
80%
100%
Task System Total Utilization
Sc
he
du
la
bi
lity
 [1] pi1, fewer but heavier tasks
[2] pi1, more but lighter tasks
[3] pi2, fewer but heavier tasks
[4] pi2, more but lighter tasks
[5] pi3, fewer but heavier tasks
[6] pi3, more but lighter tasks
[7] pi4, fewer but heavier tasks
[8] pi4, more but lighter tasks
[3]
[1]
[2,6]
[7]
[4]
[8]
[5]
Figure 4.3: Schedulability under EDF-sh.
71
[1] EDF−ms, pi1, fewer but heavier tasks
[2] EDF−ms, pi1, more but lighter tasks
[3] EDF−ms, pi2, fewer but heavier tasks
[4] EDF−ms, pi2, more but lighter tasks
[5] EDF−sh, pi1, fewer but heavier tasks
[6] EDF−sh, pi1, more but lighter tasks
[7] EDF−sh, pi2, fewer but heavier tasks
[8] EDF−sh, pi2, more but lighter tasks
0 6 12 18 24 30 36
0
10
20
30
40
50
60
Task System Total Utilization
Av
er
ag
e 
M
ax
im
um
 A
bs
ol
ut
e 
Ta
rd
in
es
s 
Bo
un
d
 
[1]
[2]
[4]
[3]
[5]
[7] [6,8]
Figure 4.4: Absolute tardiness bounds of EDF-ms and EDF-sh.
72
[1] EDF−ms, pi1, fewer but heavier tasks
[2] EDF−ms, pi1, more but lighter tasks
[3] EDF−ms, pi2, fewer but heavier tasks
[4] EDF−ms, pi2, more but lighter tasks
[5] EDF−sh, pi1, fewer but heavier tasks
[6] EDF−sh, pi1, more but lighter tasks
[7] EDF−sh, pi2, fewer but heavier tasks
[8] EDF−sh, pi2, more but lighter tasks
0 6 12 18 24 30 36
0
2
4
6
8
10
12
14
16
18
20
Task System Total Utilization
Av
er
ag
e 
M
ax
im
um
 R
el
at
ive
 T
ar
di
ne
ss
 B
ou
nd
 
[1][2]
[4]
[3]
[7] [6,8]
[5]
Figure 4.5: Relative tardiness bounds of EDF-ms and EDF-sh.
73
4.3 EDF-tu
In this section, we present the second algorithm, EDF-tu. In contrast to EDF-sh, EDF-tu is SRT-optimal
and maybe HRT-optimal under certain settings. We therefore discuss feasible task assignments, which are
an important concept towards these optimality results (Section 4.3.1). Next, we present algorithm EDF-tu
by describing its assignment phase and execution phase (Section 4.3.2). Then, we prove both the HRT- and
SRT-optimality of EDF-tu (Section 4.3.3), show the tightness of our task assignment scheme (Section 4.3.4),
and present an evaluation (Section 4.3.5).
4.3.1 Feasible Assignments
Most semi-partitioned scheduling algorithms are defined by specifying separate assignment and execution
phases. In the former, per-processor shares are defined offline for each task, and fixed tasks are distinguished
from migrating ones. In the latter, an actual schedule is produced at runtime, based on the task share
assignments. In this section, we explore the problem of obtaining share assignments. We show that issues
arise in the case of uniform platforms that have not been considered before.
In addressing such issues, we will need to examine situations where some number of tasks in the
assignment process have been assigned as fixed. We let σ fi denote the sum of the utilizations of the fixed
tasks on processor si, i.e.,
σ fi = ∑
τk is a fixed task
on processor si
uk. (4.28)
We define the residual capacity (i.e., the currently available capacity) of processor si as si−σ fi .
In most prior work on semi-partitioned scheduling on identical platforms, a greedy assignment method is
used wherein the currently considered task is assigned as fixed if possible. Consider the following example.
Example 4.3. Three tasks {τ1 = (2,3), τ2 = (2,3), τ3 = (2,3)} are to be scheduled on two unit-speed
processors. We greedily assign the first two tasks as fixed, and then require the remaining one to migrate. This
results in the share assignment shown in Figure 4.6 (a). As seen in Figure 4.6 (b), we can easily determine a
schedule corresponding to this assignment such that all deadlines are met. ♦
As the next example shows, a greedy assignment strategy can be problematic on a uniform platform.
Example 4.4. Consider scheduling the two tasks {τ1 = (2,1), τ2 = (2,1)} on the two-processor uniform
platform pi ={s1 = 3, s2 = 1}. When we first consider assigning τ1, processor s1 has enough capacity for it,
74
τ2=(2,3)
τ3=(2,3)
2
3
1
3
100%
0%
Processor 
ID
1 2
Processor 
Capacity
τ1=(2,3)
τ3=(2,3)
2
3
1
3
(a) assignment
0                 1                         2                        3
τ1
τ2τ3
τ3Processor 1
Processor 2
Schedule
Repeats
time
(b) schedule
Figure 4.6: Assignment and schedule for Example 4.3.
and if we assign τ1 there, the residual capacity of the system matches the utilization of τ2. The task share
allocations must be as shown in Figure 4.7 (a). The allocation to τ2 implies that it must execute in parallel as
shown in Figure 4.7 (b), so this assignment is infeasible. However, the original system is feasible, as seen in
insets (c) and (d) of Figure 4.7. ♦
We now determine conditions for ensuring that a task assignment is feasible.
After some tasks have been fixed, let {zi} denote the residual capacities of the processors on platform
pi , indexed in non-increasing order. Note that the indexing of {zi} may differ from that of {si}, i.e., zi does
not necessarily correspond to the residual capacity on si. Let p(i) denote the index of the processor with the
remaining capacity zi, i.e., zi is the remaining capacity of the processor of speed sp(i): zi+σ
f
p(i) = sp(i). Also,
let Zk = ∑ki=1 zi.
Theorem 4.3. Any task set that is feasible on the fully available platform pi ′ = {s′1=z1, s′2=z2, . . . , s′m=zm}
can also be correctly scheduled using the residual capacities {zi} of platform pi . (In a correct schedule, all
deadlines are met and all requirements of the sporadic model are respected.)
Proof. We prove this theorem by transforming an arbitrary schedule S ′ on pi ′ to a corresponding schedule S
on pi such that if S ′ is correct, then S is also correct. Moreover, in S , only a capacity of zi is utilized on sp(i)
for each i.
75
τ2=
(2,1)
1
100%
0%
Processor 
Speed
S1=3
Processor 
Capacity
τ1=(2,1)
τ2=(2,1)
2
1
S2=1
(a) infeasible assignment
0                        
1
3
2
3
1
τ1
τ2
τ2S1=3
S2=1
Schedule
Repeats
time
Task executing in parallel 
is not allowed!
(b) illegal schedule
τ1=
(2,1)
1
2
100%
0%
Processor 
Speed
S1=3
Processor 
Capacity
τ1=(2,1)
τ2=(2,1)
S2=1
τ2=
(2,1)
1
2
3
2
3
2
(c) feasible assignment
0                        
1
2
1
τ1
τ2
τ2S1=3
S2=1
Schedule
Repeats
time
τ1
(d) correct schedule
Figure 4.7: Assignment and schedule for Example 4.4. The width of each rectangle represents the speed of its
corresponding processor.
76
We split the time line of S ′ into slices of width ∆ such that, on any processor of pi ′, all preemptions,
migrations, job releases, and job deadlines occur on slice boundaries. This requirement can be met by
choosing ∆ small enough. We construct S on a per-slice basis: within each slice, we schedule in S exactly
the same jobs as scheduled in S ′ and on exactly the same processors. However, in S, the job (if any) that
executes on processor s′i in S ′ is scheduled within the first zisp(i) ·∆ time units of the slice. It is easy to see that
the resulting schedule S is correct if S ′ is. In particular, all deadlines will be met and all requirements of the
sporadic model are respected (including no intra-task parallelism). Also, it is straightforward to see that only
a capacity of zi is utilized on sp(i) for each i.
By Theorem 4.3, after some tasks have been assigned as fixed, the platform defined by the resulting
residual capacities can be viewed as a fully available platform as far as the feasibility of the remaining,
unassigned tasks is concerned.
In the rest of this section, let τ denote the set of the remaining, unassigned tasks, and let pi denote the
platform defined by the residual processor capacities. Also, let {ui} denote the utilizations of the remaining
tasks, and assume they are indexed in non-increasing order. Let Uk = ∑ki=1 ui. Define the total utilization of
the remaining tasks as Uτ and the total residual capacity of the platform as Zpi .
Theorem 4.4. τ is feasible on pi if the following conditions hold.
Uτ ≤ Zpi (4.29)
Uk ≤ Zk for k = 1,2, · · · ,m−1 (4.30)
Proof. Follows from Theorem 4.3 and the feasibility condition for uniform platforms, (4.2) and (4.3).
Note that just “if” is stated in Theorem 4.4 and “only if” cannot be asserted, i.e., (4.29) and (4.30) are
only a sufficient condition for feasibility. This is because (4.30) is not necessary for feasibility, although the
similar condition (4.3) is. We show this by the following counterexample.
Example 4.5. Consider scheduling a single task τ1 = (2,1) on two unit-speed processors. This system is
clearly infeasible since u1 = 2> 1 holds, which violates (4.3).
Now, consider scheduling the same task τ1 on two other processors, where both processors have a residual
capacity of 1, i.e., z1 = z2 = 1. Then, this system violates (4.30), but it could be feasible. For example,
77
0                 
1
2
1
τ1
τ1
reserved
capacity
reserved
capacity
s1 = 2 
with 
residual 
capacity 
z1 = 1
Schedule
Repeats
time
s2 = 2 
with 
residual 
capacity 
z2 = 1
Figure 4.8: A correct schedule for Example 4.5.
assuming the two processors both have an initial speed of 2, Figure 4.8 is a correct schedule for this system.
♦
Nonetheless, by Theorem 4.4, we have a sufficient test to check if a task assignment is guaranteed to
preserve the feasibility of the system. Next, we show that the following scheme can guarantee that any
feasible system can have at most m migrating tasks and still be feasible.
• Consider tasks from lightest to heaviest by utilization.
• Use the best-fit bin-packing heuristic to assign as many tasks as possible as fixed.
The guarantee mentioned above follows from the following lemma. (Note that the task τn mentioned in
the lemma definitely can be assigned as fixed if (4.29) and (4.30) hold.)
Lemma 4.4. Let n denote the number of tasks in τ and assume n≥ m+1. If τ and the residual processor
capacities pi satisfy (4.29) and (4.30), then after using the best-fit heuristic to assign the lightest task in τ
(i.e., τn) as fixed, the remaining task set τ ′ and residual processor capacities pi ′ must satisfy (4.29) and (4.30)
as well.
Proof. We prove this lemma by contradiction by showing that any violation of (4.29) or (4.30) for τ ′ and pi ′
implies a violation of (4.29) or (4.30) for τ and pi .
78
Since we consider tasks from lightest to heaviest, the order of tasks in τ ′ is the same as that in τ except
the absence of the lightest one, τn. Thus,
u′i = ui for 1≤ i≤ n−1. (4.31)
Case 1: τ ′ and pi ′ violate (4.29). Since the only change is that τn is assigned, the total remaining utilization
decreases by un and total residual capacity decreases by un too, i.e., U ′τ ′ =Uτ −un and Z′pi ′ = Zpi −un. Thus,
τ ′ and pi ′ violating (4.29) implies τ and pi violate (4.29) as well.
Case 2: τ ′ and pi ′ violate (4.30). In this case, let zγ be the processor on which τn is fixed. Since {zi} is
indexed non-increasingly, without loss of generality, we can assume that the best-fit bin-packing heuristic
always chooses the highest-indexed zi among those with equal values (if it does not, we can re-index them,
which will not change either {zi} or the resulting {z′i}), i.e., zγ > zγ+1 if γ <m. Thus, as a result of the best-fit
heuristic, we have
un > zi for any i> γ. (4.32)
Moreover, the assignment of τn will not alter the indices of the largest γ−1 capacities in pi , i.e.,
z′i = zi for any i≤ γ−1. (4.33)
Also, other than zγ , the relative ordering in {zi} is preserved in {z′i} as well. That is, letting φ denote the new
index of zγ in pi ′, i.e., z′φ = zγ −un, where γ ≤ φ ≤ m, {z′i} is
{z1, . . . ,zγ−1,zγ+1, . . . ,zφ ,zγ −un,zφ+1, . . . ,zm}. (4.34)
Note that, if φ = γ , then the sequence zγ+1, . . . , zφ is empty; similarly, φ = m implies that the sequence zφ+1,
. . . , zm is empty.
From (4.34),
z′i =

zi if 1≤ i≤ γ−1 or i≥ φ +1,
zi+1 if γ ≤ i≤ φ −1,
zγ −un if i = φ .
(4.35)
79
Case 2.1: τ ′ and pi ′ violate (4.30) at k such that k ≤ γ−1. By (4.31) and (4.35), U ′k =Uk and Z′k = Zk. Thus,
τ and pi violate (4.30) at k as well.
Case 2.2: τ ′ and pi ′ violate (4.30) at k such that k ≥ γ .That is,
U ′k > Z
′
k. (4.36)
First, we show the following inequality holds in Case 2.2 by considering two sub-cases.
Z′k ≥
(
k
∑
i=1
zi
)
−un. (4.37)
Case 2.2.1: γ ≤ k ≤ φ −1.
Z′k =
(
γ−1
∑
i=1
z′i
)
+
(
k−1
∑
i=γ
z′i
)
+ z′k
≥{since k ≤ φ −1 and {z′i} is in non-increasing order}(
γ−1
∑
i=1
z′i
)
+
(
k−1
∑
i=γ
z′i
)
+ z′φ
={by (4.35)}(
γ−1
∑
i=1
zi
)
+
(
k−1
∑
i=γ
zi+1
)
+(zγ −un)
={simplifying}(
k
∑
i=1
zi
)
−un.
Case 2.2.2: k ≥ φ .
Z′k =
(
γ−1
∑
i=1
z′i
)
+
(
φ−1
∑
i=γ
z′i
)
+ z′φ +
(
k
∑
i=φ+1
z′i
)
={by (4.35)}(
γ−1
∑
i=1
zi
)
+
(
φ−1
∑
i=γ
zi+1
)
+(zγ −un)+
(
k
∑
i=φ+1
zi
)
={simplifying}
80
(
k
∑
i=1
zi
)
−un.
From these sub-cases, we can conclude that (4.37) holds. By (4.36) and (4.37), we have
U ′k +un >
(
k
∑
i=1
zi
)
. (4.38)
By the condition of Case 2.2, k ≥ γ , and by (4.32), we have un > zi for any i≥ k+1> γ , which implies
(m− k)un >
(
m
∑
i=k+1
zi
)
. (4.39)
By (4.38), (4.39), and the definition of Zpi ,
U ′k +(m− k+1)un > Zpi . (4.40)
Finally, we have
Uτ = Uk +
n
∑
i=k+1
ui
= {by (4.31)}
U ′k +
n
∑
i=k+1
ui
≥ {since {ui} is in non-increasing order}
U ′k +(n− k)un
≥ {since n≥ m+1}
U ′k +(m− k+1)un. (4.41)
By (4.40) and (4.41), Uτ > Zpi holds, i.e., τ and pi violate (4.29).
In the remainder of this section for EDF-tu, we say that an assignment of a task as fixed to a processor is
legal if and only if (4.29) and (4.30) hold for the remaining, unassigned tasks.
81
Theorem 4.5. For any feasible task system, if we continue to assign tasks as fixed as long as legal assignments
can be made using the best-fit heuristic, with tasks considered from lightest to heaviest by utilization, then at
most m tasks will remain as unassigned.
Proof. By Theorem 4.4 and Lemma 4.4, we can continue to make legal assignments at least until the number
of unassigned tasks is m.
4.3.2 Algorithm EDF-tu
We now describe our new scheduling algorithm EDF-tu by considering its assignment and execution
phases separately.
Assignment phase. The assignment phase must not only distinguish fixed tasks from migrating ones, but
also determine the per-processor share allocations for each migrating task. As for determining which tasks
should be fixed, Theorem 4.5 suggests the way forward: we simply consider tasks from lightest to heaviest
by utilization, and keep assigning tasks as fixed via the best-fit heuristic until all of them are assigned or we
encounter a task that cannot be so assigned legally. The remaining m′ unassigned tasks will be migrating
tasks. By Theorem 4.5, m′ ≤ m. Also, by Theorems 4.3 and 4.5, the set of migrating tasks is feasible on the
resulting platform as defined by the residual processor capacities. In fact, this set of tasks is feasible on the
sub-platform comprised of the m′ processors with the largest residual capacities.
In order to determine per-processor share allocations for migrating tasks, and how such tasks are
scheduled alongside fixed ones, we construct a processor allocation table. This table indicates which task
may execute on which processor within an interval of time, or frame, of length F . As shown later in
Section 4.3.3, if HRT-schedulability is the goal, then the frame size F must meet a certain constraint, but this
constraint is not required if only SRT-schedulability is required.
We construct the processor allocation table A via a two-step process (which is illustrated via an example
below). In the first step, we construct a processor allocation table A′ for the m′ migrating tasks on a
hypothetical platform pi ′ = {s′1 = z1, s′2 = z2, . . . , s′m′ = zm′} by applying the Level Algorithm to schedule
the job set J with execution costs {u1 ·F , u2 ·F , . . . , um′ ·F} on pi ′. We obtain the table A′ by allocating
processor s′i to task τk in each sub-interval where the corresponding job of cost uk ·F executes on processor
s′i. The Level Algorithm ensures that the schedule for J is free of intra-job parallelism. This implies that
task allocations in the table A′ are free of intra-task parallelism. Also, by Theorems 2.1, 4.4, and 4.5, the
82
makespan of the schedule for J is at most F . This implies that A′ gives task allocations over an interval of
length at most F as well. The total allocation recorded for each migrating task τk in A′ is uk ·F .
In the second step, we obtain the final table A by integrating allocations for fixed tasks into A′. Examining
the task allocations recorded in A′, we say that the sub-interval [t1, t2) is a maximal non-preemptive sub-
interval on processor s′i if s
′
i is allocated to the same migrating task throughout [t1, t2) and s
′
i is not allocated
to that task either immediately before t1 or at t2. We construct the processor allocation table A for the real
physical platform pi from A′ by examining all such maximal non-preemptive sub-intervals. In particular, if the
migrating task τk is allocated in A′ to processor s′i throughout the maximal non-preemptive sub-interval [t1, t2),
then we allocate processor sp(i) to τk in A throughout the first zisp(i) of the maximal non-preemptive sub-interval,
i.e.,
[
t1, t1 + zisp(i) · (t2− t1)
)
. We allocate the remainder of the maximal non-preemptive sub-interval, i.e.,[
t1+ zisp(i) · (t2− t1), t2
)
, to fixed tasks on sp(i). We denote this in the table by indicating that the sub-interval[
t1+ zisp(i) · (t2− t1), t2
)
is allocated to σ fp(i). If m
′ <m, then A is extended to incorporate all processors by fully
allocating the processors with residual capacities zm′+1, . . . ,zm to the fixed tasks assigned to those processors.
The pseudo-code for the assignment process is given in Algorithm 2. The following example provides an
illustration.
Example 4.6. Suppose that after all fixed tasks have been identified, we are left with four migrating tasks
with utilizations {3, 3, 2.125, 1.875} to be scheduled on four processors with residual capacities {4, 3,
2, 1}. Further, suppose we are using a frame size of F = 4. Then the schedule resulting from the Level
Algorithm within a frame is identical to Example 2.1. Figure 4.9 (a) shows the resulting table A′, which
provides allocations only for migrating tasks. To obtain the final table A for these four processors, we must
integrate fixed tasks. To illustrate this, suppose that the processor with residual capacity z4 = 1 corresponds
to a physical processor with speed sp(4) = 2, i.e., half of this processor’s capacity is reserved for fixed tasks.
Then the allocations on this processor within each frame will be as depicted in Figure 4.9 (b). ♦
Execution phase. In the execution phase, the processor allocation table A is consulted on a frame-by-frame
basis at runtime to determine which task may execute on which processor at any given time. In particular, the
following rules are applied at any time t ∈ [k ·F,(k+1) ·F), where k ∈ Z+.
• If processor s′i is allocated to the migrating task τk at time t mod F in A, and if τk has an unfinished job
at time t, then the earliest-released such job is scheduled on processor s′i.
83
Algorithm 2 EDF-tu assignment phase
initially σ fp := 0 for all p;
index tasks in non-increasing-utilization order;
index processors in non-increasing-speed order;
/* The first (n − m) lightest tasks are guaranteed to be fixed via the best-fit
heuristic. If n = m, then this for loop is skipped. */
for i := n downto m+1 do
Select k that sk−σ fk is minimal while at least ui;
σ fk := σ
f
k +ui;
end
/* Try to continue fixing tasks until the fixing step is not legal or all tasks
are fixed. */
m′ := m;
isLegal := 1;
repeat
Select k that sk−σ fk is minimal while at least um′ ;
last σ fk := σ
f
k ;
σ fk = σ
f
k +um′ ;
for j := 1 to m′ do
if ∑ j largest(sp−σ fp )< ∑ j largest ui then
isLegal := 0;
end
end
last m′ := m′;
m′ := m′−1;
until isLegal = 0 or m′ = 0;
if isLegal = 0 then
/* If the last fixing step is not legal, restore to last feasible assignment.
*/
m′ := last m′;
σ fk := last σ
f
k ;
/* Now, we have m′ migrating tasks to be scheduled on m′ processors using a
frame-based schedule. */
for j := 1 tom′ do
z j := the jth largest (sp−σ fp );
end
Use the Level Algorithm to construct the processor allocation table for a frame;
else
In this case, there is no migrating task, and a valid partitioned schedule can be generated by applying the
uniprocessor EDF scheduler on each processor;
end
84
τ2
τ1
τ3
z1=4
z2=3
z3=2
z4=1
time
speed
τ4
τ3
τ2
τ1
τ4
τ4
τ3
τ2
τ1
τ1
τ4
τ3
τ2
τ1
τ2
τ4
τ3
0 1
4
F
1
2
F
3
4
F F
(a) example for deriving A′
τ4sp(4)=2
z4=1
time
τ4
τ4
τ4
τ1
τ1
τ2
τ2
τ3
τ3
0 1
4
F
1
2
F
3
4
F F
σfp(4)σ
f
p(4) σ
f
p(4) σ
f
p(4)
σfp(4)
(b) example on one processor for constructing A from A′
Figure 4.9: EDF-tu execution phase illustration for Example 4.6.
85
• If no such job exists, or if processor s′i is either unallocated or allocated to σ
f
i at time t mod F in
A, then an unfinished job of a fixed task on s′i is scheduled on processor s
′
i at time t if one exists. If
multiple such jobs exist, then the one with the earliest deadline is selected. If no such job exists, then
processor s′i is idled.
According to the sporadic task model, it is possible for a task to release a job within a frame, i.e., at some
time k ·F + r, where k ∈ Z+ and 0< r < F . Such a job will receive exactly the same allocation over the next
F time units as it would receive had it been released at a frame boundary. As a result, EDF-tu guarantees the
following two key properties.
Property 4.7. Within any time interval of length F , the processor supply guaranteed to a migrating task τi is
ui ·F . Therefore, within any time interval of length L, the processor supply guaranteed to a migrating task τi
is at least
⌊ L
F
⌋ ·ui ·F .
Property 4.8. Within any time interval of length F , the supply guaranteed to the set of fixed tasks on
processor sp is σ fp ·F . Therefore, within any time interval of length L, the supply guaranteed to the set of all
fixed tasks on processor sp is at least
⌊ L
F
⌋ ·σ fp ·F .
4.3.3 Optimality
We now show that EDF-tu is HRT-optimal, provided the frame size, F , meets a certain requirement. We
also show that EDF-tu is SRT-optimal for any choice of F , with the tardiness of any job being at most F . As
F decreases, preemption frequencies increase, so the choice of F is a tradeoff between temporal guarantees
and run-time overheads.
4.3.3.1 HRT Optimality
HRT optimality is dealt with in the following theorem.
Theorem 4.6. If the frame size F divides the periods of all tasks, then all deadlines will be meet.
Proof. Since the frame size F divides the periods of all tasks, we can represent task periods as
Ti = ki ·F, ki ∈ Z+. (4.42)
86
Migrating tasks. By Property 4.7, within any interval of length Ti, a migrating task τi is guaranteed supply
of at least
⌊Ti
F
⌋ ·ui ·F = ki · ui ·F = Ti · ui =Ci. This implies that no job of any migrating task will miss a
deadline.
Fixed tasks. The proof of this case utilizes the following claim.
Claim 4.1. For real numbers a,b> 0 and x ∈ Z+, ⌊ ax·b⌋≤ 1x ⌊ab⌋ .
Proof. Letting y = babc and c = a− y ·b, we have a = y ·b+ c, where y ∈ Z and 0≤ c< b. The
latter implies 0 ≤ cb < 1. Hence, because x,y ∈ Z,
⌊
y+ cb
x
⌋
=
⌊ y
x
⌋
. Thus, we have 1x
⌊a
b
⌋
= yx ≥⌊ y
x
⌋
=
⌊
y+ cb
x
⌋
=
⌊
y·b+c
x·b
⌋
=
⌊ a
x·b
⌋
.
We now dispense with the case of fixed tasks by contradiction. Let td be the first time a job of any fixed
task on processor sp misses its deadline, and let t0 be the latest time instant before td that is idle for fixed
tasks on processor sp, i.e., all jobs of fixed tasks on processor sp released earlier than t0 have completed by t0
and such a job is released at t0. Within the time interval [t0, td), the demand due to the set of fixed tasks on
processor sp is at most
∑
τi is a fixed task
on processor sp
⌊
td− t0
Ti
⌋
·Ci
= {by (4.1)}
∑
τi is a fixed task
on processor sp
⌊
td− t0
Ti
⌋
·Ti ·ui
= {by (4.42)}
∑
τi is a fixed task
on processor sp
⌊
td− t0
ki ·F
⌋
· ki ·F ·ui
≤ {by Claim 4.1}
∑
τi is a fixed task
on processor sp
1
ki
·
⌊
td− t0
F
⌋
· ki ·F ·ui
= {simplifying}⌊
td− t0
F
⌋
·F · ∑
τi is a fixed task
on processor sp
ui
87
= {by (4.28)}⌊
td− t0
F
⌋
·σ fp ·F.
By Property 4.8, within the time interval [t0, td), the supply guaranteed to the set of fixed tasks on processor
sp is at least ⌊
td− t0
F
⌋
·σ fp ·F.
This implies that a deadline is not missed at time td as assumed.
By Theorem 4.6, to guarantee HRT optimality, the frame size cannot exceed the greatest common divider
(gcd) of all task periods. The gcd could be quite small for some systems (e.g., if at least two periods are
relatively prime), yielding high run-time overheads. However, for some systems (e.g., harmonic ones), the
frame could be of a reasonable size, yielding acceptable overheads.
4.3.3.2 SRT Optimality
SRT optimality is dealt with in the following theorem.
Theorem 4.7. Given any frame size F > 0, no job will have tardiness exceeding F .
Proof. As before, we consider migrating and fixed tasks separately.
Migrating tasks. Consider the jth job of the migrating task τi, denoted τi, j. Let td be the deadline of τi, j and
let t0 be the latest idle instant for task τi at or before the release of τi, j. Also, let tF be the first time instant at
or after td such that tF − t0 is a multiple of F . Then, we have tF − td < F .
The number of jobs with deadlines at or before time td that τi can release at or after time t0 is at most⌊
td−t0
Ti
⌋
≤
⌊
tF−t0
Ti
⌋
. The resulting demand is at most
⌊
tF−t0
Ti
⌋
·Ci ≤ (tF − t0) ·ui. Because (tF − t0) is a multiple
of F , by Prop. 4.7, τi is guaranteed a supply of (tF − t0) ·ui within [t0, tF). This implies that τi, j completes by
time tF . Thus, no job of a migrating task will have tardiness exceeding F .
Fixed tasks. Let td be the deadline of the jth job τi, j of the fixed task τi and t0 be the latest idle instant for
fixed tasks on processor sp at or before the release of τi, j. Also, let tF be the first time instant at or after td
such that tF − t0 is a multiple of F . Then, we have tF − td < F .
88
The number of jobs with deadlines at or before time td that a fixed task τk on processor sp can release at
or after time t0 is at most
⌊
td−t0
Tk
⌋
≤
⌊
tF−t0
Tk
⌋
, so the total demand due to such jobs is at most
∑
τk is a fixed task
on processor sp
⌊
tF − t0
Tk
⌋
·Ck ≤ (tF − t0) · ∑
τk is a fixed task
on processor sp
uk
=(tF − t0) ·σ fp .
Because (tF − t0) is a multiple of F , by Prop. 4.8, the fixed tasks on processor p are guaranteed a supply
of (tF − t0) ·σ fp within [t0, tF). This implies that τi, j completes by tF . Thus, no job of a fixed task will have
tardiness exceeding F .
4.3.4 Alternate Assignment Strategies
Given that at most m tasks are migrating under EDF-tu, and these tasks are the heaviest by utilization,
two natural questions arise.
• Q1: Can we guarantee that fewer than m tasks are migrating?
• Q2: Can we require lighter tasks, instead of heavier ones, to migrate?
In this section, we provide counterexamples that show that the answer to each question is no.
Question Q1. We show that the answer to Question Q1 is no by showing that, for any semi-partitioned
scheduler, if it is guaranteed that there will be at most k migrating tasks for any feasible system, then k cannot
be less than m. This result follows from the following counterexample, which consists of m tasks, all of
which must migrate.
Example 4.7. Consider a system of m tasks, each with parameters τi = (1+ ε,1), where ε < 1/m, to be
scheduled on m uniform processors, where s1 = 1+m · ε and si = 1 for 2 ≤ i ≤ m. Conditions (4.2) and
(4.3) imply that this system is feasible. Now, if we attempt to assign any single task as fixed, then it must
be assigned to processor s1. However, if we do so, in order to receive enough supply, the remaining m−1
tasks must fully use the remaining residual capacities, i.e., they must fully utilize m−1 processors (s2 to sm)
and meanwhile also utilize the residual capacity on s1. Because intra-task parallelism is forbidden, this is
infeasible. ♦
The above counterexample shows that we cannot generally guarantee that fewer than m tasks will migrate.
However, if we examine a specific task system, then it may indeed be possible to require fewer than m tasks
89
to migrate. In fact, in systems that can be fully partitioned, no task will migrate. Unfortunately, determining
the minimum number of migrating tasks for a specific, concrete task system is NP-hard in the strong sense.
This can be shown by transforming from the variable-sized bin-packing problem (Funk, 2004).
Question Q2. The following counterexample shows that the answer to Question Q2 is no as well.
Example 4.8. Consider n tasks to be scheduled on m uniform processors, where s1 = 1+(m+1) · ε , where
ε < (m−1)/(m+1), and si = 1 for 2≤ i≤ m. The n tasks include m heavy ones with parameters (1+ ε,1)
and n−m light ones with parameters (ε,n−m). Conditions (4.2) and (4.3) imply that this system is feasible.
Now, if any one of the m heavy tasks is assigned as fixed, then it must be fixed on processor s1. However, if
we do so, the remaining m−1 heavy tasks cannot all be allocated shares that match their utilizations without
introducing intra-task parallelism. Thus, the remaining system is infeasible. The following is a more formal
reasoning for this. ♦
Formal Reasoning for Example 4.8. Let ψi,p denote the capacity allocated to τi on processor sp, where
0≤ ψi,p ≤ sp. Then, the portion of sp that is allocated to τi is
ηi,p =
ψi,p
sp
, (4.43)
where 0≤ ηi,p ≤ 1. The portion ηi,p is the needed percentage of CPU time on sp for τi to receive processor
supply on sp that matches its allocated capacity ψi,p on sp. Thus, if intra-task parallelism is forbidden, then
the following condition must hold for the system to be feasible.
m
∑
p=1
ηi,p ≤ 1, for any i (4.44)
For illustration, consider a task that needs to utilize 70% of the CPU time on one processor and 80% of the
CPU time on another processor. This is clearly infeasible if intra-task parallelism is not allowed.
Now, let us examine the specific system in Example 4.8. As shown in Example 4.8, if any one of the m
heavy tasks is assigned as fixed, then it must be fixed on processor s1. Without loss of generality, assume that
the heavy task τ1 is fixed on s1 and the remaining m−1 heavy tasks are {τ2, τ3, . . . , τm}. Since τ1 is fixed on
s1, the other heavy tasks cannot be allocated shares on s1 exceeding its residual capacity. Thus,
m
∑
i=2
ψi,1 ≤ s1−u1. (4.45)
90
(4.43) and (4.45) imply
m
∑
i=2
ηi,1 ≤ 1− u1s1 . (4.46)
Moreover, by (4.44), ∑mi=2∑
m
p=1ηi,p ≤ m−1, i.e., ∑mi=2ηi,1+∑mi=2∑mp=2ηi,p ≤ m−1. Therefore,
m
∑
i=2
m
∑
p=2
ηi,p ≤ (m−1)−
m
∑
i=2
ηi,1. (4.47)
Thus, the allocated shares for the remaining m−1 heavy tasks satisfy
m
∑
i=2
m
∑
p=1
ψi,p
= {by (4.43)}
m
∑
i=2
m
∑
p=1
ηi,p · sp
= {rearranging and by sp = 1 for 2≤ p≤ m as in Example 4.8}
m
∑
i=2
ηi,1 · s1+
m
∑
i=2
m
∑
p=2
ηi,p ·1
≤ {by (4.47)}
m
∑
i=2
ηi,1 · s1+(m−1)−
m
∑
i=2
ηi,1
= {rearranging}
(m−1)+
m
∑
i=2
ηi,1 · (s1−1)
≤ {by (4.46)}
(m−1)+(1− u1
s1
) · (s1−1)
= {by the definitions of s1 and u1 in Example 4.8}
(m−1)+
(
1− 1+ ε
1+(m+1)·ε
)
·(1+(m+1)·ε−1)
= {simplifying}
(m−1)+ m · ε
1+(m+1) · ε · (m+1) · ε
= {simplifying}
91
(m−1)+ m · (m+1) · ε
1+(m+1) · ε · ε
= {simplifying}
(m−1)+ m1
(m+1)·ε +1
· ε
< {as stated in Example 4.8, ε < (m−1)/(m+1)}
(m−1)+ m1
(m+1)·(m−1)/(m+1) +1
· ε
= {simplifying}
(m−1)+ m1
m−1 +1
· ε
= {simplifying}
(m−1)+(m−1) · ε,
which is the needed total share allocation of the remaining m−1 heavy tasks. Thus, the remaining system is
not feasible if intra-task parallelism is forbidden.
4.3.5 Evaluation
The frame size F used in EDF-tu is a tunable parameter. For any feasible task system, tardiness will
always be at most F , and if F is set low enough, tardiness will be zero. Given this, it would not be very
interesting to experimentally examine issues related to schedulability. However, for a given task system, the
Level-Algorithm-induced preemption pattern within a frame is the same regardless of its size, and preemption
frequencies over time are higher when F is smaller. Thus, it is interesting to experimentally evaluate the
number of tasks that are required to migrate and the number of preemptions experienced by such tasks, as it
is these tasks that give rise to preemptions induced by the Level Algorithm. In this section, we briefly discuss
an experimental evaluation that focuses on these two metrics.
Experimental setup. We assessed the impact of both metrics by randomly generating feasible task systems
and determining for each generated system the number of migrating tasks and the number of preemptions
experienced by such tasks per frame. In experimental studies that focus on identical platforms, choosing an
overall utilization cap implicitly defines the considered multiprocessor platform. However, in the uniform
case, processor speeds must be selected, and the number of such speed settings is unbounded for a given total
92
0 6 12 18 24 30 36
0
1
2
3
4
5
6
7
8
Task System Total Utilization
Av
er
ag
e 
Nu
m
be
r o
f M
ig
ra
tin
g 
Ta
sk
s
 
 [1] pi1, fewer but heavier tasks
[2] pi1, more but lighter tasks
[3] pi2, fewer but heavier tasks
[4] pi2, more but lighter tasks
[5] pi3, fewer but heavier tasks
[6] pi3, more but lighter tasks
[7] pi4, fewer but heavier tasks
[8] pi4, more but lighter tasks
[7]
[3]
[1] [5]
[2,4,6,8]
Figure 4.10: Number of migrating tasks.
utilization. To reasonably constrain our experiments, we considered systems of eight processors with a total
processor capacity of 36. We considered four such platforms, with speeds as follows: pi1 = {6, 6, 6, 6, 3, 3, 3,
3}, pi2 = {8, 8, 4, 4, 4, 4, 2, 2}, pi3 = {8, 7, 6, 5, 4, 3, 2, 1}, and pi4 = {15, 3, 3, 3, 3, 3, 3, 3}. We used the
same framework as described in Section 4.2.3 to randomly generate feasible task systems. When using this
framework, two categories of task systems are generated: systems that have fewer tasks but heavier tasks (by
utilization), and systems that have more tasks but lighter tasks.
For each platform and task generating pattern, we varied total utilization within [0,36] by increments of
0.5, and for each total utilization, we generated 1,000 feasible task systems. Figure 4.10 plots the average
number of migrating tasks required for each such set of 1,000 task systems. For every generated task system,
we also simulated EDF-tu and recorded the maximum number of preemptions per frame of any migrating
task. Figure 4.11 plots the average of these maximum values for each set of 1,000 task systems.
Results. As seen in Figures 4.10 and 4.11, the number of migrating tasks is often modest, and these tasks
often experience only a moderate number of preemptions per frame. With total utilization as high as 30, the
number of migrating tasks (on average) typically is at most four, and the number of preemptions per frame
(on average) is at most five. Even in the extreme case that the total utilization achieves the total speed of the
platform, the number of preemptions per frame (on average) is still less than 25. While 25 preemptions per
frame may seem somewhat high, recall that in a SRT system, we can define the frame size F to be quite large
at the cost of increasing the tardiness bound.
93
0 6 12 18 24 30 36
0
5
10
15
20
25
Task System Total UtilizationA
ve
ra
ge
 M
ax
im
um
 N
um
be
r o
f P
re
em
pt
io
ns
 
 [1] pi1, fewer but heavier tasks
[2] pi1, more but lighter tasks
[3] pi2, fewer but heavier tasks
[4] pi2, more but lighter tasks
[5] pi3, fewer but heavier tasks
[6] pi3, more but lighter tasks
[7] pi4, fewer but heavier tasks
[8] pi4, more but lighter tasks
[7]
[1]
[5]
[2,4,6,8]
[3]
Figure 4.11: Maximum number of preemptions of migrating tasks per frame.
4.4 Chapter Summary
In this chapter, we have presented two EDF-based semi-partitioned scheduling algorithms for uniform
multiprocessors. The first one, EDF-sh, provides SRT guarantees only and it is not SRT-optimal, i.e.,
some feasible systems are not SRT-schedulable under EDF-sh. Nonetheless, the total utilization of an SRT-
schedulable system under EDF-sh can be as high as the total platform capacity. Furthermore, EDF-sh restricts
task migrations to occur at job boundaries only, i.e., no job, even of a migrating task, migrates under EDF-sh.
In contrast, the second algorithm, EDF-tu, is SRT-optimal and includes a tunable parameter, called frame size.
For any positive value of the frame size, tardiness is guaranteed to be at most this value. Furthermore, if the
frame size divides all task periods, EDF-sh becomes HRT-optimal and ensures zero tardiness.
For scheduling n tasks on m uniform processors, EDF-sh allows at most m−1 tasks to migrate while
EDF-tu allows m. Interestingly, we have given a feasible system where at least m task must migrate, or
unbounded tardiness may be inevitable. This shows that, m as the maximum number of migrating tasks under
EDF-tu is tight for any HRT- or SRT-optimal scheduler. This also explains the lack of optimality for EDF-sh.
94
CHAPTER 5: ALLOWING INTRA-TASK PARALLELISM ON UNIFORM PLATFORMS1
In this chapter, we continue to focus on the uniform multiprocessor model. However, in contrast to
the prior two chapters, which focused on the conventional sporadic task model, we shift our attention to
the npc-sporadic task model here. The conventional sporadic task model restricts each task to execute
sequentially, which might compromise the actual parallelism potential of the system. For example, many
detection algorithms in computer vision, such as the Histogram of Oriented Gradients (HOG) algorithm for
recognizing pedestrians (Dalal and Triggs, 2005), may be performed on each frame of a video independently.
Modeling computation like that by the conventional sporadic task model results in implicitly ruling out the
possibility of processing consecutive frames simultaneously, which could leave a multiprocessor platform
unnecessarily under-utilized. Thus, it could be beneficial to model such a workload by a less-restrictive task
model—the npc-sporadic task model, which allows consecutive jobs of the same task to execute in parallel.
We consider the scheduling of npc-sporadic task systems on a uniform multiprocessor platform under
both preemptive and non-preemptive G-EDF. We show that both of these algorithms guarantee bounded job
response times for any feasible npc-sporadic task system. Interestingly, the SRT-feasibility condition we
establish for npc-sporadic tasks differs from that for sporadic tasks in Section 3.2. Of the two algorithms we
consider, preemptive G-EDF is more greedy in executing jobs on faster processors, and therefore ensures a
better response-time bound; in contrast, non-preemptive G-EDF only ensures a looser response-time bound,
but does not migrate jobs away from slower processors when faster processors becomes available; this
characteristic could lead to energy savings.
Organization. In the rest of this chapter, we describe the considered platforms and task systems more
carefully (Section 5.1), provide proof-specific preliminaries (Section 5.2), prove our response-time bounds for
preemptive (Section 5.3) and non-preemptive (Section 5.4) G-EDF, and present an experimental evaluation
(Section 5.5).
1Contents of this chapter previously appeared in preliminary form in the following paper:
Yang, K. and Anderson, J. (2014a). Optimal GEDF-based schedulers that allow intra-task parallelism on heterogeneous multipro-
cessors. In Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-Time Multimedia, pages 30–39.
95
5.1 System Model
In this chapter, we consider a uniform multiprocessor platform consisting of m processors. Processor i,
where 1≤ i≤ m, has an associated speed denoted by a real number si, which represents the amount of work
that can be done on this processor within one time unit. We also identify each processor by its speed and
assume the processors are decreasingly ordered by their speeds, i.e., we denote a uniform multiprocessor
platform by pi = {s1,s2, . . . ,sm} where si ≥ si+1 for i = 1,2, . . . ,m−1. The cumulative speed of the i fastest
processors is Si = ∑ik=1 sk. Also, we assume time is continuous.
A task is a sequential piece of code and a job is an instance (or an invocation) of a task. We let τ = {τ1,
τ2, . . . , τn} denote the task set to be scheduled. Each task τi is characterized by (Ci,Ti,Di), where Ci is τi’s
worst-case execution time (WCET) on a unit-speed processor, Ti is its period (the minimum separation of any
two jobs), and Di is its relative deadline. τi, j is the jth job of τi. The release time of τi, j is ri, j and its absolute
deadline is di, j, where di, j = ri, j +Di.
On a uniform multiprocessor, Ci, called the worst-case execution requirement of τi, is defined relative to a
processor of speed 1.0. Thus, the WCET of τi when entirely executing on processor sp is Cisp (1≤ p≤m). We
let Cmax = max{Ci |1≤ i≤ n}. Note that, the execution requirement of a job may differ from its execution
time if non-unit-speed processor exists. In prior work on identical multiprocessors, the two are the same,
since speeds are usually normalized to 1.0. The utilization of a task τi is ui = CiTi . The total utilization of τ is
Uτ = ∑ni=1 ui. We also use τ to refer to the set of all jobs generated by tasks in τ .
In this chapter, we consider npc-sporadic tasks, which have no intra-task precedence constraints. The
main difference between the conventional sporadic task model and the npc-sporadic task model is that the
former requires successive jobs of each task to execute in sequence while the latter allows them to execute in
parallel. That is, in the conventional sporadic task model, job τi, j+1 cannot commence execution until its
predecessor τi, j completes, even if ri, j+1, the release time of τi, j+1, has elapsed; in contrast, in npc-sporadic
task model, any job can execute as soon as it is released. Additionally, in the npc-sporadic model, a task is
allowed to have a utilization greater than the fastest processor’s speed. Note that, although we allow intra-task
parallelism, each individual job still must execute sequentially.
Feasibility conditions.
On uniform multiprocessors, Funk et al. (2001) derived the following feasibility condition for conven-
tional HRT implicit-deadline (i.e., Di = Ti) periodic task systems. Let Ui denote the sum of the largest
96
i utilizations in τ and assume the number of tasks n is at least the number of processors m. Then, an
implicit-deadline periodic task set τ is HRT-feasible on a uniform multiprocessor pi if and only if
Uτ ≤ Sm, (5.1)
and
Ui ≤ Si, for i = 1,2, · · · ,m−1. (5.2)
As shown in Section 3.2, this is also a necessary and sufficient feasibility condition for conventional
implicit-deadline sporadic task systems in both HRT and SRT senses. The former requires all deadlines to be
met while the latter only requires response times to be bounded.
In the npc-sporadic task model, where intra-task precedence constraints are eliminated, it is clear that
(5.1) and (5.2) are also a necessary and sufficient feasibility condition for HRT implicit-deadline task systems,
since in such systems, every job has to complete before its successor is released and hence there is no
difference between conventional sporadic task systems and npc-sporadic ones. However, for SRT systems,
(5.2) is not required, as we will show that response-time bounds do not rely on (5.2). Furthermore, for
arbitrary-deadline tasks, our analysis applies as well. On the other hand, (5.1) is always required. A violation
of (5.1) means the system is overutilized, and therefore response times will increase without bound. To
summarize, an npc-sporadic task system τ is SRT-feasible on pi if and only if (5.1) holds.
5.2 Preliminaries
Definition 5.1. At a time instant t, a job τi, j is unreleased if t < ri, j, pending if t ≥ ri, j and τi, j has not
completed execution by t, and complete if τi, j has completed by t.
Definition 5.2. We let A(S,τi, j, t1, t2) denote the cumulative processor capacity allocation to job τi, j in an
arbitrary schedule S within the time interval [t1, t2].
Also, we let A(S,J , t1, t2) denote the cumulative processor capacity allocation to the jobs in job set J in
an arbitrary schedule S within the time interval [t1, t2], i.e.,
A(S,J , t1, t2) = ∑
τi, j∈J
A(S,τi, j, t1, t2). (5.3)
97
Ideal schedule. We let piIDEAL = {u1,u2, . . . ,un} denote an ideal multiprocessor for the task set τ , where piIDEAL
consists of n processors with speeds that exactly match the utilizations of the n tasks in τ , respectively. Let I
be the partitioned schedule for τ on piIDEAL, where each task τi in τ is assigned to the processor of speed ui.
Then, in I , every job in τ commences execution at its release time and completes execution within one period
(it exactly executes for one period if and only if its actual execution requirement matches its worst-case
execution requirement).
Thus, we have, A(I,τi, j, t1, t2)≤ ui · (t2− t1), and for an arbitrary job set J ⊆ τ ,
A(I,J , t1, t2)≤Uτ · (t2− t1). (5.4)
This is similar to the processor sharing (PS) schedule considered in prior work with respect to identical
multiprocessors (Devi and Anderson, 2008). However, the above notion of an ideal schedule is preferable
with respect to uniform multiprocessors, because it clearly prevents a single job from executing in parallel
with itself.
Note that, in the ideal schedule, a constrained-deadline task τi (i.e., Di < Ti) may not complete execution
at its deadline. The following definition gives an upper bound on the amount of work that completes later
than its deadline in the ideal schedule.
Definition 5.3. Let Li = ui ·max{0,Ti−Di}, and let Lτ = ∑ni=1 Li. Then, in I, at any time instant t, the
amount of incomplete work with deadline at or before t is at most Lτ .
Definition 5.4. We denote the difference between the allocation to a job τi, j in I and in a schedule S within
[0, t] as
lag(τi, j, t,S) = A(I,τi, j,0, t)−A(S,τi, j,0, t), (5.5)
and such an allocation difference for an arbitrary job set J is
LAG(J , t,S) = ∑
τi, j∈J
lag(τi, j, t,S). (5.6)
By (5.5) and Definition 5.2, for any time interval [t1, t2] we have
lag(τi, j, t2,S) = lag(τi, j, t1,S)+A(I,τi, j, t1, t2)−A(S,τi, j, t1, t2); (5.7)
98
and by (5.3), (5.6), and (5.7),
LAG(J , t2,S) = LAG(J , t1,S)+A(I,J , t1, t2)−A(S,J , t1, t2). (5.8)
Lemma 5.1. If a job τi, j is unreleased or complete at t, then lag(τi, j, t,S) ≤ 0; if τi, j is pending at t, then
lag(τi, j, t,S)≤Ci.
Proof. Follows immediately from Definitions 5.1 and 5.4.
Job of interest. To derive response-time bounds, we consider an arbitrary job τk,l in τ , and upper bound its
response time. Let td be the absolute deadline of τk,l , i.e., td = dk,l .
Definition 5.5. In the rest of this chapter, we let Ψ be the job set consisting of all jobs with deadlines at or
before td . The jobs in Ψ are called competing jobs for τk,l . At time instant t, the total incomplete work due to
all jobs in Ψ is called competing work at t.
Definition 5.6. If at a time instant t, all of the m processors are executing jobs in Ψ, then t is a busy instant
for Ψ; otherwise t is a non-busy instant for Ψ. If in the time interval [t1, t2] every time instant is a busy instant
for Ψ, then [t1, t2] is a busy interval for Ψ.
Lemma 5.2. If in S, [t1, t2] is a busy interval for Ψ, then LAG(Ψ, t1,S)≥ LAG(Ψ, t2,S).
Proof. Follows from the previous definitions.
Definition 5.7. For any time instant t, we let t+ denote the time instant (t+ ε) and we let t− denote the time
instant (t− ε), where ε → 0+.
5.3 Response-Time Bounds under Preemptive G-EDF
In this section, we consider the preemptive G-EDF scheduler that works as follows.
• At any time instant, if there are at most m pending jobs, then all of them are scheduled; if there are
more than m pending jobs, then the m such jobs with the earliest deadlines are scheduled. Deadline ties
are broken arbitrarily.
• For any two jobs τi, j scheduled on processor sp and τa,b scheduled on processor sq, where p< q, we
have di, j ≤ da,b (note that, given how we order processors, sp ≥ sq).
99
Moreover, in this section, we let S denote a preemptive G-EDF schedule of τ on pi .
5.3.1 Basic Bounds
We first present a proof to derive basic response-time bounds. This proof is more similar to the SRT
analysis framework for G-EDF by Devi and Anderson (2008), and therefore is easier to understand. We will
improve the basic bounds proved here in Section 5.3.2. Recall that τk,l is the analyzed job.
Lemma 5.3. At any non-busy instant t at or before td , LAG(Ψ, t,S)≤ (m−1) ·Cmax.
Proof. We decompose Ψ into three disjoint subsets: Ψ1, Ψ2, and Ψ3 consisting of jobs that are unreleased,
pending, and complete, respectively. Since in the npc-sporadic task model intra-task precedence constraints
are removed, under preemptive G-EDF, there can be at most (m− 1) jobs in Ψ that are pending at t, i.e.,
|Ψ2| ≤ m−1. Thus,
LAG(Ψ,S, t)
={by the definition of Ψ1, Ψ2, and Ψ3}
LAG(Ψ1, t,S)+LAG(Ψ2, t,S)+LAG(Ψ3, t,S)
={ by (5.6) }
∑
τi, j∈Ψ1
lag(τi, j, t,S)+ ∑
τi, j∈Ψ2
lag(τi, j, t,S)+ ∑
τi, j∈Ψ3
lag(τi, j, t,S)
≤{ by Lemma 5.1 }
∑
τi, j∈Ψ1
0+ ∑
τi, j∈Ψ2
Ci+ ∑
τi, j∈Ψ3
0
≤|Ψ2| ·Cmax
≤ (m−1) ·Cmax.
The lemma follow.
Lemma 5.4. After td , once τk,l executes, it will continuously execute until it completes.
Proof. After td , no job with deadline earlier than td can be released, i.e., no job that can preempt τk,l can
be released. Thus, once τk,l executes, it will continually execute until it completes, though it could migrate
among processors.
100
Lemma 5.5. In S, the competing work for τk,l at td is at most Lτ +(m−1) ·Cmax.
Proof. By Definitions 5.3 and 5.4, the competing work pending at td in S is at most Lτ +LAG(Ψ,S, td).
Let t ′ be the latest non-busy instant at or before td (or time 0 if no such non-busy instant exists). Then,
by Lemma 5.2, LAG(Ψ, t ′,S)≥ LAG(Ψ, td ,S). Also, by Lemma 5.3, LAG(Ψ, t ′,S)≤ (m−1) ·Cmax. Thus,
LAG(Ψ, td ,S)≤ (m−1) ·Cmax and therefore the lemma follows.
Lemma 5.6. Let W be the competing work for τk,l at td . Then the job of interest, τk,l , will complete execution
no later than time
td +
W −Ck
Sm
+
Ck
sm
.
Proof. Suppose that τk,l is not complete at or before td . Let δ be the amount of work of τk,l that has been
completed by td and ek,l be the real execution requirement of τk,l . Then the remaining execution work of τk,l
at td is ek,l−δ . If τk,l does not execute within [td , td + W−(ek,l−δ )Sm ), then [td , td +
W−(ek,l−δ )
Sm
) must be a busy
interval for Ψ. In this case, the competing work that is completed within [td , td +
W−(ek,l−δ )
Sm
) is W − (ek,l−δ )
(since within a busy interval, all processors execute competing work and the total speed is Sm), and the
remaining competing work at td +
W−(ek,l−δ )
Sm
is ek,l−δ , which must be totally due to τk,l . Therefore, τk,l will
execute at time td +
W−(ek,l−δ )
Sm
. Thus, if τk,l is not complete at or before td , then the latest time when τk,l
commences execution after td is td +
W−(ek,l−δ )
Sm
. By Lemma 5.4, τk,l will not be preempted once it executes
after td . Also, since the minimum execution speed is sm, τk,l will complete within
ek,l−δ
sm
time units. Therefore,
τk,l will complete by
td +
W − (ek,l−δ )
Sm
+
ek,l−δ
sm
={rearranging}
td +
W
Sm
− ek,l
Sm
+
ek,l
sm
+(
δ
Sm
− δ
sm
)
≤{since δ ≥ 0 and Sm ≥ sm}
td +
W
Sm
− ek,l
Sm
+
ek,l
sm
≤{since ek,l ≤Ck and Sm ≥ sm}
td +
W
Sm
− Ck
Sm
+
Ck
sm
= td +
W −Ck
Sm
+
Ck
sm
.
101
The lemma follows.
Theorem 5.1. The response time of an arbitrary job τk,l in τ under preemptive G-EDF scheduling on pi is at
most
Dk +
Lτ +(m−1) ·Cmax−Ck
Sm
+
Ck
sm
.
Proof. Follows from Lemmas 5.5 and 5.6.
5.3.2 Improved Bounds
We now show that the response-time bound above can be improved in several ways.
First, we can derive a better bound on the LAG at td or even an arbitrary time instant t by considering
LAG non-increasing intervals instead of busy intervals.
Definition 5.8. We introduce an integer Λ such that SΛ−1 <Uτ and SΛ ≥Uτ (1≤ Λ≤ m). If at time instant
t, at least Λ processors are executing jobs in Ψ, then t is a LAG non-increasing instant. If in the time interval
[t1, t2] every time instant is a LAG non-increasing instant, then [t1, t2] is a LAG non-increasing interval for Ψ.
By the rules of preemptive G-EDF, it is clear that at any time instant, if p processors (1≤ p≤m) execute
jobs in Ψ, then they must be the p fastest ones. This property ensures that LAG for Ψ cannot increase within
a LAG non-increasing interval. Then, we can derive following lemma.
Lemma 5.7. For any time instant t, LAG(Ψ, t,S)≤ (Λ−1) ·Cmax.
Proof. This proof is similar to Lemma 5.5. We instead consider the latest time instant that is not a LAG
non-increasing instant at or before t. The definition of LAG non-increasing instant and the property in the
prior paragraph ensure counterparts for Lemmas 5.3 and 5.2, respectively. Thus, the lemma follows.
Furthermore, in Section 5.3.1, we only considered the execution of τk,l after td , and we pessimistically
assumed τk,l is executed at the minimum speed sm. Actually, we can consider the execution of τk,l as early as
it is released, and derive several linear constraints, and solve a corresponding linear program. The following
lemma shows this.
Definition 5.9. As defined in prior work (Funk, 2004), the identicalness of the multiprocessor platform pi is
λ = max
1≤i≤m−1
{
si+1+ si+2+ · · ·+ sm
si
}
.
102
That is,
λ = max
1≤i≤m−1
{
Sm−Si
si
}
. (5.9)
Note that, λ ≤ m−1. Also, λ = m−1 if and only if pi is an identical multiprocessor.
Lemma 5.8. Suppose the competing work for τk,l at rk,l is W . Then the response time of τk,l is upper
bounded by
W
Sm
+
λ
Sm
Ck.
Proof. In the time interval between rk,l and τk,l’s completion, let x0 denote the cumulative time in which τk,l
is not executing, and let xi (1≤ i≤ m) denote the cumulative time in which τk,l is executing on processor si.
Then, the response time of τk,l is ∑mi=0 xi.
Since we cannot execute τk,l for more than its worst-case execution requirement, we have the linear
constraint
m
∑
i=1
sixi ≤Ck.
By the rules of preemptive G-EDF, after rk,l , when τk,l is not complete and is not currently executing, all of
the m processors must execute jobs in Ψ; and when τk,l is executing on processor si, the fastest i processors,
i.e., s1 to si, must execute jobs in Ψ. Since W is the competing work at rk,l , the execution of jobs in Ψ after
rk,l cannot exceed W . Therefore, we have the linear constraint
Smx0+
m
∑
i=1
Sixi ≤W.
We now manually solve this linear programming problem by the Simplex Algorithm (Dantzig, 1998),
assuming that Ck,W,si, and Si (1 ≤ i ≤ m) are constants and each xi (0 ≤ i ≤ m) is a variable. To do so,
we introduce two auxiliary variables, xm+1 and xm+2, to rewrite this problem in slack form. Specifically, we
maximize
z =
m
∑
i=0
xi,
103
subject to 
xm+1 =Ck−∑mi=1 sixi,
xm+2 =W −Smx0−∑mi=1 Sixi,
x0,x1,x2, · · · ,xm+2 ≥ 0.
First, we pivot x0 with xm+2. Then, in the resulting program, we pivot xh, where h satisfies Sm−Shsh =
max1≤i≤m {Sm−Sisi }= λ , with xm+1. The final program is to maximize
z = ∑
1≤i≤m∧ i 6=h
((
Sm−Si
si
)
−
(
Sm−Sh
sh
))
si
Sm
xi −
(
1− Sh
Sm
)
xm+1
sh
− xm+2
Sm
+
W
Sm
+
(
1− Sh
Sm
)
Ck
sh
,
(5.10)
subject to

xh =
Ck
sh
−∑1≤i≤m∧ i 6=h sish xi−
xm+1
sh
,
x0 = WSm −∑1≤i≤m∧ i 6=h
Si
Sm
xi− xm+2Sm −
Sh
Sm
(Cksh −∑1≤i≤m∧ i6=h
si
sh
xi− xm+1sh ),
x0,x1,x2, · · · ,xm+2 ≥ 0.
By the definition of h, all the coefficients of the x terms of z in (5.10) are negative or zero. Therefore, when
xh =
Ck
sh
, x0 = (WSm −
Sh
Sm
· Cksh ), and xi = 0 (for all i 6= 0 and i 6= h), z has its maximum value, which is
zmax =
W
Sm
+
Sm−Sh
Smsh
Ck
={by (5.9) and by the definition of h}
W
Sm
+
λ
Sm
Ck.
The lemma follows.
Next, we upper bound the competing work for τk,l at rk,l by the following lemma.
Lemma 5.9. The competing work for τk,l at rk,l is at most
Uτ ·Dk +Lτ +(Λ−1) ·Cmax.
104
Proof. By Definitions 5.3, 5.4, and 5.5, the competing work for τk,l at rk,l is at most
Lτ +A(I,Ψ,0, td)−A(S,Ψ,0,rk,l)
={by Definition 5.2}
Lτ +A(I,Ψ,0,rk,l)+A(I,Ψ,rk,l, td)−A(S,Ψ,0,rk,l)
={by (5.5) and (5.6)}
A(I,Ψ,rk,l, td)+Lτ +LAG(Ψ,rk,l,S)
≤{by (5.4)}
Uτ · (td− rk,l)+Lτ +LAG(Ψ,rk,l,S)
={since td = dk,l = rk,l +Dk and by Lemma 5.7}
Uτ ·Dk +Lτ +(Λ−1) ·Cmax.
The lemma follows.
Theorem 5.2. The response time of an arbitrary job τk,l in τ under preemptive G-EDF scheduling on pi is at
most
Uτ
Sm
·Dk + 1Sm ·Lτ +
(Λ−1)
Sm
·Cmax+ λSmCk.
Proof. Follows from Lemmas 5.8 and 5.9.
5.4 Response-Time Bounds under Non-Preemptive G-EDF
The non-preemptive G-EDF scheduler is similar to the preemptive G-EDF scheduler, except that once a
job is selected for execution, it runs to completion without preemption or migration. Also, we do not require
faster processors to be favored when scheduling jobs; instead, we can always favor slower ones for energy
efficiency (if desired). In this section, we let S be the non-preemptive G-EDF schedule of τ on pi .
Definition 5.10. Suppose time instant t is a non-busy time instant for Ψ. If every pending job in Ψ is
currently executing, then t is a non-blocking non-busy instant for Ψ; otherwise (i.e., some pending job in Ψ is
blocked by jobs that are not in Ψ), t is a blocking non-busy instant for Ψ. If in a time interval [t1, t2] every
time instant is a blocking non-busy instant for Ψ, then [t1, t2] is a blocking non-busy interval for Ψ.
105
Definition 5.11. The set of jobs that are not in Ψ but currently executing in schedule S at time instant t is
denoted B(t) and the incomplete work of jobs in B(t) is denoted B(t), called blocking work.
It is clear that |B(t)| ≤ m for all t, and therefore B(t)≤ m ·Cmax for all t.
5.4.1 Basic Bounds
As before, we first derive basic response-time bounds. We will improve the basic bounds in Section 5.4.2.
Lemma 5.10. At any non-blocking non-busy instant t, LAG(Ψ,S, t)+B(t)≤ m ·Cmax.
Proof. Let b = |B(t)| (0≤ b≤ m). Then B(t)≤ b ·Cmax, and by the definition of a non-blocking non-busy
instant, the number of pending jobs in Ψ is at most m−b. Similarly to Lemma 5.3, we have LAG(Ψ,S, t)≤
(m−b) ·Cmax. Thus, LAG(Ψ,S, t)+B(t)≤ (m−b) ·Cmax+b ·Cmax = m ·Cmax.
Lemma 5.11. If [t1, t2] is a busy interval for Ψ in S, then LAG(Ψ, t1,S)+B(t1)≥ LAG(Ψ, t2,S)+B(t2).
Proof. Since t1, t2 ∈ [t1, t2] are busy instants for Ψ, B(t1) = B(t2) = /0 and therefore B(t1) = B(t2) = 0. Also,
by Lemma 5.2, LAG(Ψ, t1,S)≥ LAG(Ψ, t2,S). Thus, LAG(Ψ, t1,S)+B(t1)≥ LAG(Ψ, t2,S)+B(t2).
Lemma 5.12. If [t1, t2] is a blocking non-busy interval for Ψ in S , then any blocking job (i.e., any job that is
not in Ψ but executing at some time instant t in [t1, t2]), must execute continuously in [t−1 , t].
Proof. Because [t1, t2] is a blocking non-busy interval for Ψ, at any time instant within [t1, t2], there is at least
one job in Ψ that is pending but not executing. Also, since any job in Ψ has an earlier deadline, or higher
priority, than any job not in Ψ, no job that is not in Ψ and not executing at t−1 can execute in [t1, t2]. Thus, the
lemma follows.
Lemma 5.13. If [t1, t2] is a blocking non-busy interval forΨ in S , then LAG(Ψ, t1,S)+B(t1)≥ LAG(Ψ, t2,S)+
B(t2).
Proof. Let [t, t ′] be a subinterval in [t1, t2] such that |B(t)|= |B(t ′)|. By Lemma 5.12, the blocking jobs at
every time instant in [t, t ′] are exactly the jobs in B(t). Let P denote the set of processors on which those
blocking jobs execute in [t, t ′]. Then,
B(t ′) = B(t)− ∑
si∈P
si · (t ′− t).
106
Since [t, t ′]⊆ [t1, t2] is a blocking non-busy interval, the processors not in P must execute jobs inΨ; otherwise,
it would be a non-blocking non-busy interval. Therefore,
LAG(Ψ, t ′,S)
=LAG(Ψ, t,S)+A(I,Ψ, t, t ′)−A(S,Ψ, t, t ′)
≤LAG(Ψ, t,S)+Uτ · (t ′− t)− ∑
si /∈P
si · (t ′− t).
Thus,
LAG(Ψ, t ′,S)+B(t ′)
≤LAG(Ψ, t,S)+Uτ · (t ′− t)− ∑
si /∈P
si · (t ′− t)+B(t)− ∑
si∈P
si · (t ′− t)
=LAG(Ψ, t,S)+B(t)+Uτ · (t ′− t)− (∑
si /∈P
si+ ∑
si∈P
si) · (t ′− t)
=LAG(Ψ, t,S)+B(t)+Uτ · (t ′− t)−Sm · (t ′− t)
≤{since Uτ ≤ Sm }
LAG(Ψ, t,S)+B(t).
That is, for every such subinterval [t, t ′]⊆ [t1, t2], we have
LAG(Ψ, t,S)+B(t)≥ LAG(Ψ, t ′,S)+B(t ′).
By induction, the lemma follows.
Lemma 5.14. In S, the competing work for τk,l plus the blocking work at td is at most Lτ +m ·Cmax.
Proof. Similarly to Lemma 5.5, the competing work pending at td is at most Lτ + LAG(Ψ,S, td). Also,
the blocking work at td is B(td), so the competing work for τi, j plus the blocking work at td is at most
Lτ +LAG(Ψ,S, td)+B(td).
Let t ′ be the latest non-blocking non-busy instant at or before td (or time 0 if no such non-blocking
non-busy instant exists). Then by Lemma 5.10,
LAG(Ψ,S, t ′)+B(t ′)≤ m ·Cmax.
107
Moreover, by the definition of t ′, [t ′+, td ] consists of busy intervals and/or blocking non-busy intervals.
Therefore, by Lemmas 5.11 and 5.13,
LAG(Ψ,S, t ′)+B(t ′)≥ LAG(Ψ,S, td)+B(td).
Thus, LAG(Ψ,S, td)+B(td)≤ m ·Cmax and the lemma follows.
Lemma 5.15. Let W be the competing work plus the blocking work for τk,l at td . Then the job of interest,
τk,l , will complete execution no later than time
td +
W −Ck
Sm
+
Ck
sm
.
Proof. The proof of this lemma is exactly the same as Lemma 5.6.
Theorem 5.3. The response time of an arbitrary job τk,l in τ under non-preemptive G-EDF scheduling on pi
is at most
Dk +
Lτ +m ·Cmax−Ck
Sm
+
Ck
sm
.
Proof. Follows from Lemmas 5.14 and 5.15.
5.4.2 Improved Bounds
We now show that the response-time bound for non-preemptive G-EDF can also be improved. However,
in contrast to the situation in Section 5.3.2, we still have to pessimistically assume the job of interest executes
entirely at the minimum speed sm, since in non-preemptive G-EDF, a scheduled job cannot migrate among
processors. Nevertheless, we can still consider the execution of τk,l before td .
The following three lemmas are similar to Lemmas 5.4, 5.14, and 5.15.
Lemma 5.16. Under non-preemptive G-EDF, once τk,l executes, it will continuously execute until it comple-
tes.
Proof. Follows from the non-preemptive property.
Lemma 5.17. For any time instant t at or before td , LAG(Ψ,S, t)+B(t)≤ m ·Cmax.
Proof. This proof is exactly the same as that for upper bounding LAG(Ψ,S, td)+B(td) in Lemma 5.14.
108
Lemma 5.18. Let W be the competing work plus the blocking work for τk,l at rk,l . Then the job of interest,
τk,l , will complete execution no later than time
rk,l +
W −Ck
Sm
+
Ck
sm
.
Proof. By Lemma 5.16, this proof is exactly the same as Lemma 5.15.
We now upper bound the competing work plus the blocking work for τk,l at rk,l by the following lemma.
Lemma 5.19. The competing work plus the blocking work for τk,l at rk,l is at most
Uτ ·Dk +Lτ +m ·Cmax.
Proof. By Definitions 5.3, 5.4, 5.5, and 5.11, the competing work plus the blocking work for τk,l at rk,l is at
most
Lτ +A(I,Ψ,0, td)−A(S,Ψ,0,rk,l)+B(rk,l)
={by Definition 5.2}
Lτ +A(I,Ψ,0,rk,l)+A(I,Ψ,rk,l, td)−A(S,Ψ,0,rk,l)+B(rk,l)
={by (5.5) and (5.6)}
Lτ +LAG(Ψ,S,rk,l)+A(I,Ψ,rk,l, td)+B(rk,l)
≤{by (5.4)}
Uτ · (td− rk,l)+Lτ +LAG(Ψ,S,rk,l)+B(rk,l)
≤{since td = dk,l = rk,l +Dk and by Lemma 5.17}
Uτ ·Dk +Lτ +m ·Cmax.
The lemma follows.
Theorem 5.4. The response time of an arbitrary job τk,l in τ under non-preemptive G-EDF scheduling on pi
is at most
Uτ
Sm
·Dk + Lτ +m ·Cmax−CkSm +
Ck
sm
.
109
Proof. Follows from Lemmas 5.18 and 5.19.
5.5 Evaluation
We evaluated the proposed algorithms and derived response-time bounds by randomly generating task
sets and then calculating response-time bounds for each task on certain selected platforms.
In the case of identical multiprocessors, the considered platform is implicitly determined by the number of
processors. However, for uniform multiprocessors, even if given the number of processors, there are an infinite
number of speed combinations to consider. Therefore, there is no way to systematically choose platforms
to evaluate by varying the number of processors. Thus, we chose the following four multiprocessors as
representatives of different uniform multiprocessors in terms of both processor number and speed combination:
pi1 = {2, 2, 2, 2, 1, 1, 1, 1}, pi2 = {3, 3, 2, 2, 1, 1}, pi3 = {3, 3, 1.5, 1.5, 1.5, 1.5}, and pi4 = {4, 4, 2, 2}. We
could also normalize the slowest processor of the latter two platforms to be 1.0; however, we chose instead
to scale all four platforms to have the same total processor capacity to enable comparisons among different
platforms.
In our experiments, we assumed implicit deadlines for simplicity, i.e., relative deadlines are equal to
periods (Di = Ti). For a given total utilization cap, we generated a task set by first randomly selecting its
task count uniformly over [1,20]. Note that we allow the number of tasks to be less than the number of
processors. Next, we randomly assigned a relative utilization or weight, by uniformly generating a number
in (0,1], for each task. We then scaled the relative utilizations to obtain real utilizations by letting the total
utilization match the pre-set cap. Note that, by the scaling step, we may generate tasks with a utilization
greater than the fastest processor’s speed; that is allowed in our task model and analysis. Since all of the four
considered platforms have a total capacity of 12, we varied task-set utilization caps in [0,12] by increments
of 0.2. For each given total utilization cap, we generated 10,000 task sets. The period for each task was
selected uniformly within [10ms,100ms]; its worst-case execution requirement was then determined based
on its utilization and period.
Results. We evaluated response-time bounds in terms of both absolute values and relative values. The
former are directly computed by Theorems 2 and 4; the latter are defined by the ratio of a task’s absolute
response-time bound and its period. We focus here on the maximum response-time bound for each task set.
Figure 5.1 shows absolute response-time bound results while Figure 5.2 shows the relative response-time
110
task system total utilizattion
0 2 4 6 8 10 12
a
ve
ra
ge
 m
ax
im
um
 a
bs
ol
ut
e 
re
sp
on
se
-ti
m
e 
bo
un
d 
(m
s)
0
50
100
150
200
250
300
350
400
450
[1] Preemptive G-EDF, on pi1
[2] Non-preemptive G-EDF, on pi1
[3] Preemptive G-EDF, on pi2
[4] Non-preemptive G-EDF, on pi2
[5] Preemptive G-EDF, on pi3
[6] Non-preemptive G-EDF, on pi3
[7] Preemptive G-EDF, on pi4
[8] Non-preemptive G-EDF, on pi4
[4]
[6]
[8]
[1]
[2]
[7]
[3,5]
Figure 5.1: Average maximum absolute response-time bounds.
task system total utilizattion
0 10 20 30 40 50 60av
e
ra
ge
 m
ax
im
um
 re
la
tiv
e 
re
sp
on
se
-ti
m
e 
bo
un
d 
(m
ult
ipl
e o
f p
eri
od
)
0
1
2
3
4
5
6
7
8
[1] Preemptive G-EDF, on pi1
[2] Non-preemptive G-EDF, on pi1
[3] Preemptive G-EDF, on pi2
[4] Non-preemptive G-EDF, on pi2
[5] Preemptive G-EDF, on pi3
[6] Non-preemptive G-EDF, on pi3
[7] Preemptive G-EDF, on pi4
[8] Non-preemptive G-EDF, on pi4
[2]
[4]
[6]
[8]
[1]
[3,5]
[7]
Figure 5.2: Average maximum relative response-time bounds.
111
bound results. Each data point in each figure is the average of the maximum response-time bounds (absolute
or relative) of the 10,000 generated task sets for a given total utilization. The resulting absolute and relative
response-time bounds are under 450 ms and 8 periods, respectively. Generally, the lower the total utilization
cap, the better the bounds; given a total processor capacity, having fewer processors with faster speeds yields
better bounds.
5.6 Chapter Summary
In this chapter, we have considered both preemptive and non-preemptive G-EDF scheduling on uniform
multiprocessors in the absence of intra-task precedence constraints. Both ensure bounded job response times
for npc-sporadic tasks as long as the underlying multiprocessor platform is not overutilized.
It follows from our work that, on such platforms, different feasibility conditions apply for HRT and SRT
npc-sporadic systems. This stands in contrast to the conventional sporadic task model. For the npc-sporadic
model, the HRT-feasibility condition is the same as that for the conventional sporadic task model; however,
the SRT-feasibility condition for the npc-sporadic model merely requires that the system is not overutilized, in
contrast to the more complicated condition for the sporadic case that requires (5.2). Note that both preemptive
and non-preemptive G-EDF are SRT-optimal for scheduling npc-sporadic task systems.
Preemptive G-EDF is more greedy in executing jobs on faster processors and hence has a better response-
time bound, at the expense of potentially greater preemption and migration frequencies. On the other hand,
non-preemptive G-EDF does not preempt or migrate jobs, but its guaranteed response-time bounds are
relatively higher. Our analysis for non-preemptive G-EDF applies even under the scheduling rule that, when
multiple processors are available to a job, the slowest one is chosen to execute that job. This may yield
benefits from an energy point of view.
112
CHAPTER 6: DAG-BASED TASK SYSTEMS ON UNRELATED HETEROGENEOUS PLATFORMS1
The multicore revolution is currently undergoing a second wave of innovation in the form of heterogene-
ous hardware platforms. In the domain of real-time embedded systems, such platforms may be desirable to
use for a variety of reasons. For example, ARM’s big.LITTLE multicore architecture (ARM, 2018) enables
performance and energy concerns to be balanced by providing a mix of relatively slower, low-power cores
and faster, high-power ones. Unfortunately, the move towards greater heterogeneity is further complicating
software design processes that were already being challenged on account of the significant parallelism that
exists in “conventional” multicore platforms with identical processors. Such complications are impeding
advancements in the embedded computing industry today.
Problem considered herein. In this chapter, we report on our efforts towards solving a particular real-time
analysis problem concerning heterogeneity motivated by an industrial collaboration. This problem pertains to
the processing done by cellular base stations in wireless networks. We refrain from delving into specifics
regarding this particular application domain, opting instead for a more abstract treatment of the problem at
hand.
This problem involves the scheduling of real-time dataflows on heterogeneous computational elements
(CEs), such as CPUs, digital signal processors (DSPs), or one of many types of hardware accelerators. Each
dataflow is represented by a directed acyclic graph (DAG), the nodes (resp., edges) of which represent
tasks (resp., producer/consumer relationships). A given task is restricted to run on a specific CE type. Task
preemption may be impossible for some CEs and should in any case be discouraged. Each DAG has a single
source task that is invoked periodically. Intra-task parallelism is allowed in the sense that consecutive jobs
(i.e., invocations) of the same task can execute in parallel (but each job executes sequentially). In fact, a
later job can finish earlier due to variations in running times.2 The DAGs to be supported are defined using
1Contents of this chapter previously appeared in preliminary form in the following paper:
Yang, K., Yang, M., and Anderson, J. (2016). Reducing response-time bounds for DAG-based task systems on heterogeneous
multicore platforms. In Proceedings of the 24th International Conference on Real-Time Networks and Systems, pages 349–358.
2In the considered application domain, the data produced by subsequent jobs can be buffered until all prior jobs of the same task
have completed.
113
a relatively small number of “templates,” i.e., many DAGs may exist that are structurally identical. The
challenge is to devise a multi-resource, real-time scheduler for supporting dataflows as described here with
accompanying per-dataflow end-to-end response-time analysis. That is, an upper bound on the response time
of a single invocation of each DAG in a system comprised of such DAGs is required.
In this chapter, we formalize the problem described above and then address it by proposing a scheduling
approach and associated end-to-end response-time analysis. In the first part of the chapter, we attack the
problem by presenting a transformation process whereby successive task models are introduced such that: (i)
the first task model directly formalizes the problem above; (ii) prior analysis can be applied to the last model
to obtain response-time bounds under earliest-deadline-first (EDF) scheduling; and (iii) each successive
model is a refinement of the prior one in the sense that all DAG-based precedence constraints are preserved.
Such a transformation approach was previously used by Liu and Anderson (2010) in work on DAG-based
systems, but that work focused on identical multiprocessors. Moreover, our work differs from theirs in that
we allow intra-task parallelism. This enables much smaller end-to-end response-time bounds to be derived.
After presenting this transformation process, we discuss two techniques that can reduce the response-time
bounds enabled by this process. The first technique exploits the fact that some leeway exists in setting tasks’
relative deadlines. By setting more aggressive deadlines for tasks along “long” paths in a DAG, the overall
end-to-end response-time bound of that DAG can be reduced. We show that such deadline adjustments can
be made by solving a linear program.
The second technique exploits the fact that, in the considered context, DAGs are defined using relatively
few templates and typically have quite low utilizations. These facts enable us to reduce response-time bounds
by combining many DAGs into one of larger utilization. As a very simple example, two DAGs with a period
of 10 time units might be combined into one with a period of 5 time units. A response-time-bound reduction
is enabled because these bounds tend to be proportional to periods. In the considered application domain, the
extent of combining can be much more extensive: upwards of 40 DAGs may be combinable.
We evaluate our proposed techniques via case-study and schedulability experiments. These experiments
show that our techniques can significantly reduce response-time bounds. Furthermore, our analysis sup-
ports “early releasing” (Devi, 2006) (see Section 6.6) to improve observed end-to-end response times. We
experimentally demonstrate the efficacy of this as well.
Organization. In the following sections, we formalize the considered problem (Section 6.1), present the
refinements mentioned above that enable the use of prior analysis (Sections 6.2 and 6.3), show that the
114
1
1
11
1
2 3
4
Figure 6.1: A DAG G1.
bounds arising from this analysis can be improved via linear programming (Section 6.4) and DAG combining
(Section 6.5), discuss early releasing (Section 6.6), and present our case-study (Section 6.7) and schedulability
(Section 6.8) experiments.
6.1 System Model
In this section, we formalize the dataflow-scheduling problem described earlier and introduce relevant
terminology. Each dataflow is represented by a DAG, as discussed earlier.
We specifically consider a system G = {G1,G2, . . . ,GN} comprised of N DAGs. The DAG Gi consists
of ni nodes, which correspond to ni tasks, denoted τ1i ,τ2i , . . . ,τ
ni
i . Each task τ
v
i releases a (potentially infinite)
sequence of jobs τvi,1, τ
v
i,2, . . .. The edges in Gi reflect producer/consumer relationships. A particular task τ
v
i ’s
producers are those tasks with outgoing edges directed to τvi , and its consumers are those with incoming
edges directed from τvi . The jth job of task τvi , τvi, j, cannot commence execution until the jth jobs of all of its
producers have completed; this ensures that its necessary input data is available. Such job dependencies only
exist with respect to the same invocation of a DAG, and not across different invocations. That is, while jobs
must execute sequentially, intra-task parallelism is allowed.
115
Example 6.1. Figure 6.1 shows an example DAG, G1. Task τ41 ’s producers are tasks τ21 and τ31 , thus for any
j, τ41, j needs input data from each of τ21, j and τ31, j, so it must wait until those jobs complete. Because intra-task
parallelism is allowed, τ41, j and τ41, j+1 could potentially execute in parallel. ♦
To simplify analysis, we assume that each DAG Gi has exactly one source task τ1i , which has only
outgoing edges, and one sink task τnii , which has only incoming edges. Multi-source/multi-sink DAGs can be
supported with the addition of singular “virtual” sources and sinks that connect multiple sources and sinks,
respectively. Virtual sources and sinks have a worst-case execution time (WCET) of zero.
We consider the scheduling of DAGs as just described on a heterogeneous hardware platform consisting
of different types of CEs. A given CE might be a CPU, DSP, or some specialized hardware accelerator (HAC).
The CEs are organized in M CE pools, where each CE pool pik consists of mk identical CEs. Each task τvi has
a parameter Pvi that denotes the particular CE pool on which it must run, i.e., P
v
i = pik means that each job of
τvi must be scheduled on a CE in the CE pool pik. The WCET of task τvi is denoted Cvi .
Although the problem description at the beginning of this chapter indicated that source tasks are released
periodically, we generalize this to allow sporadic releases, i.e., for the DAG Gi, the job releases of τ1i have a
minimum separation time, denoted Ti. A non-source task τvi (v> 1) releases its jth job τvi, j when the jth jobs
of all its producer tasks in Gi have completed. That is, letting rvi, j and f
v
i, j denote the release and finish times
of τvi, j, respectively,
rvi, j = max{ f wi, j | τwi is a producer of τvi }. (6.1)
The response time of job τvi, j is defined as f vi, j− rvi, j, and the end-to-end response time of the DAG Gi as
f nii, j− r1i, j.
Example 6.2. Figure 6.2 depicts an example schedule for the DAG G1 in Figure 6.1, assuming task τ21 is
required to execute on a DSP and the other tasks are required to execute on a CPU. The first (resp., second)
job of each task has a lighter (resp., darker) shading to make them easier to distinguish. Tasks τ21 and τ31 have
only one producer, τ11 , so when task τ11 finishes a job at times 3 and 8, τ21 and τ31 release a job immediately. In
contrast, task τ41 has two producers, τ21 and τ31 . Therefore, τ
4
1 cannot release a job at time 5 when τ31 finishes
a job, but rather must wait until time 6 when τ21 also finishes a job. Note that consecutive jobs of the same
task might execute in parallel (e.g., τ41,1 and τ41,2 execute in parallel during [11,12)). Furthermore, for a given
task, a later-released job (e.g., τ41,2) may even finish earlier than an earlier-released one (e.g., τ41,1) due to
execution-time variations. ♦
116
Job Release Job Deadline
Time
0
Job Completion
(Assume depicted jobs are scheduled alongside other jobs, which are not shown.) 
τ1
τ1
1
2
5 10 15
τ1,1
1
τ1
4
τ1
3
end-to-end response time
end-to-end response time
CPU Execution DSP Execution
20
τ1,2
1
τ1,1
2
τ1,2
2
τ1,1
3
τ1,2
3
τ1,1
4
τ1,2
4
Figure 6.2: Example schedule for the DAG in G1 in Figure 6.1.
117
Scheduling. Since many CEs are non-preemptible, we use the non-preemptive G-EDF scheduling algorithm
within each CE pool. The deadline of job τvi, j is given by
dvi, j = r
v
i, j +D
v
i , (6.2)
where Dvi is the relative deadline of task τvi . For example, in the example schedule in Figure 6.2, relative
deadlines of D11 = 7, D
2
1 = 4, D
3
1 = 6, and D
4
1 = 8 are assumed.
In the context of this chapter, deadlines mainly serve the purpose of determining jobs’ priorities rather
than strict timing constraints for individual jobs. Therefore, deadline misses are acceptable as long as the
end-to-end response time of each DAG can be reasonably bounded.
Utilization. We denote the utilization of task τvi by
uvi =
Cvi
Ti
. (6.3)
We also use Γk to denote the set of tasks that are required to execute on the CE pool pik, i.e.,
Γk = {τvi | Pvi = pik}. (6.4)
The overutilization of a CE pool could cause unbounded response times, so we require for each k,
∑
τvi ∈Γk
uvi ≤ mk. (6.5)
6.2 Offset-Based Independent Tasks
In this section, we present a second task model, which is a refinement of that just presented, as will be
shown in Section 6.3. The prior model is somewhat problematic because of difficult-to-analyze dependencies
among jobs. In particular, by (6.1), the release times of jobs of non-source tasks depend on the finish times of
other jobs, and hence on their execution times. By (6.2), deadlines (and hence priorities) of jobs are affected
by similar dependencies.
In order to ease analysis difficulties associated with such job dependencies, we introduce here the
offset-based independent task (obi-task) model. Under this model, tasks are partitioned into groups. The ith
118
such group consists of tasks denoted τ1i ,τ2i , . . . ,τ
ni
i , where τ
1
i is a designated source task that releases jobs
sporadically with a minimum separation of Ti. That is, for any positive integer j,
r1i, j+1− r1i, j ≥ Ti. (6.6)
Job releases of each non-source task τvi are governed by a new parameter Φvi , called the offset of τvi .
Specifically, τvi releases its jth job exactly Φvi time units after the release time of the jth job of the source task
τ1i of its group. That is,
rvi, j = r
1
i, j +Φ
v
i . (6.7)
For consistency, we define
Φ1i = 0. (6.8)
Under the obi-task model, a job of a task τvi can be scheduled at any time after its release independently of
the execution of any other jobs, even jobs of the same task τvi .
The definitions so far have dealt with job releases. Additionally, the two per-task parameters Cvi and P
v
i
from Section 6.1 are retained with the same definitions.
The following property shows that every obi-task τvi has a minimum job-release separation of Ti.
Property 6.1. For any obi-task τvi , rvi, j+1− rvi, j ≥ Ti.
Proof.
rvi, j+1− rvi, j = {by (6.7)}
(r1i, j+1+Φ
v
i )− (r1i, j +Φvi )
≥ {by (6.6)}
Ti
6.3 Response-Time Bounds
In this section, we establish two results that enable prior work to be leveraged to establish response-
time bounds for DAG-based task systems. First, we show that, under the obi-task model with arbitrary
119
offset settings, per-task response-time bounds can be derived by exploiting prior work pertaining to npc-
sporadic tasks, which were summarized in Chapter 5. Second, we show that, by properly setting offsets, any
DAG-based task system can be transformed to a corresponding obi-task system.
6.3.1 Response-Time Bounds for Obi-Tasks
An npc-sporadic task τi is specified by (Ci,Ti,Di), where Ci is its WCET, Ti is the minimum separation
time between consecutive job releases of τi, and Di is its relative deadline. As before, τi’s utilization is
ui =Ci/Ti.
The main difference between the conventional sporadic task model and the npc-sporadic task model
is that the former requires successive jobs of each task to execute in sequence while the latter allows them
to execute in parallel. That is, under the conventional sporadic task model, job τi, j+1 cannot commence
execution until its predecessor τi, j completes, even if ri, j+1, the release time of τi, j+1, has elapsed. In contrast,
under the npc-sporadic task model, any job can execute as soon as it is released. Note that, although we allow
intra-task parallelism, each individual job still must execute sequentially.
Chapter 5, we investigated the G-EDF scheduling of npc-sporadic tasks on uniform multiprocessor
platforms where different processors may have different speeds. By setting each processor’s speed to be 1.0,
the following theorem follows from Theorem 5.4.
Theorem 6.1. (Follows from Theorem 5.4) Consider the scheduling of a set of npc-sporadic tasks τ
on m identical multiprocessors. Under non-preemptive G-EDF, each npc-task τi ∈ τ has the following
response-time bound, provided ∑τl∈τ ul ≤ m.
1
m
(
Di · ∑
τl∈τ
ul + ∑
τl∈τ
(
ul ·maxτl∈τ {0,Tl−Dl}
))
+max
τl∈τ
{Cl}+ m−1m Ci.
We now show that Theorem 6.1 can be applied to obtain per-task response-time bounds for any obi-task
set.
Concrete vs. non-concrete. A concrete sequence of job releases that satisfies a task’s specification (under
either the obi- or npc-sporadic task model) is called an instantiation of that task. An instantiation of a task set
is defined similarly. In contrast, a task or a task set that can have multiple (potentially infinite) instantiations
satisfying its specification (e.g., minimum release separation) is called non-concrete.
120
By Property 6.1, any instantiation of an obi-task τvi is an instantiation of the npc-sporadic task τvi =
(Cvi ,Ti,D
v
i ). Hence, any instantiation of an obi-task set {τvi | Pvi = pik} is an instantiation of the npc-sporadic
task set {τvi | Pvi = pik}. Also, since obi-tasks execute independently of one another, obi-tasks executing in
different CE pools cannot affect each other. Since each CE pool pik has mk identical processors, the problem
we must consider is that of scheduling an instantiation of the npc-sporadic task set {τvi | Pvi = pik} on mk
identical processors. Since Theorem 6.1 applies to a non-concrete npc-sporadic task set, it applies to every
concrete instantiation of such a task set. Thus, we have the following response-time bound for each obi-task
τvi :
Rvi =
1
mk
(
Dvi · ∑
τwl ∈Γk
uwl + ∑
τwl ∈Γk
(uwl ·max{0,Tl−Dwl })
)
+ max
τwl ∈Γk
{Cwl }+
mk−1
mk
Cvi , (6.9)
where Γk = {τwl | Pwl = pik}.
Note that (6.9) is applicable as long as all relative deadlines are non-negative, and applies assuming any
arbitrary offset setting.
6.3.2 From DAG-Based Task Sets to Obi-Task Sets
We now show that, by properly setting offsets, any DAG-based task set can be transformed to an obi-task
set, and per-DAG end-to-end response-time bounds can be derived by leveraging the obi-task response-time
bounds just stated.
Any DAG-based task set can be implemented by an obi-task set in an obvious way: each DAG becomes
an obi-task group with the same task designated as its source, and all Ti, Cvi , and P
v
i parameters are retained
without modification. What is less obvious is how to define task offsets under the obi-task model. This is
done by setting each Φvi (v 6= 1) parameter to be a constant such that
Φvi ≥ max
τki ∈prod(τvi )
{Φki +Rki }, (6.10)
where prod(τvi ) denotes the set of obi-tasks corresponding to the DAG-based tasks that are the producers of
the DAG-based task τvi in Gi, and Rki denotes a response-time bound for the obi-task τki . For now, we assume
that Rki is known, but later, we will show how to compute it.
Example 6.3. Consider again the DAG G1 in Figure 6.1. Assume that, after applying the above transfor-
mation, the obi-tasks have response-time bounds of R11 = 9, R
2
1 = 5, R
3
1 = 7, and R
4
1 = 9, respectively. Then,
121
Time
0
(Assume depicted jobs are scheduled alongside other jobs, which are not shown.) 
τ1
1
5 10 15
R1
1
R12
R1
3
R1
4
Φ1 = 0
1
Φ1
2
R1 : the end-to-end response-time bound for G1
20 25 30
Job Release Job Deadline Job Completion CPU Execution DSP Execution
end-to-end response time
end-to-end response time
Φ1
3
Φ1
4
τ1
2
τ1
3
τ1
4
Figure 6.3: Example schedule of the obi-tasks corresponding to the DAG-based tasks in G1 in Figure 6.1.
we can set Φ11 = 0, Φ21 = 9, Φ31 = 9, and Φ
4
1 = 16, respectively, and satisfy (6.10). With these response-time
bounds, the end-to-end response-time bound that can be guaranteed is determined by R11, R
3
1, and R
4
1 and is
given by R1 = 25. Figure 6.3 depicts a possible schedule for these obi-tasks and illustrates the transformation.
Like in Figure 6.2, the first (resp., second) job of each task has a lighter (resp., darker) shading, and intra-task
parallelism is possible (e.g., τ41,1 and τ41,2 in time interval [23,24)). ♦
The following properties follow from this transformation process. According to Property 6.2, a DAG-
based task set can be implemented by a corresponding set of obi-tasks, and all producer/consumer constraints
122
in the DAG-based specification will be implicitly guaranteed, provided the offsets of the obi-tasks are properly
set (i.e., satisfy (6.10)).
Property 6.2. If τki is a producer of τvi in the DAG-based task system, then for the jth jobs of the correspon-
ding two obi-tasks, f ki, j ≤ rvi, j.
Proof. By (6.7), rki, j = r
1
i, j +Φki , and by the definition of Rki , f ki, j ≤ rki, j +Rki . Thus,
f ki, j ≤ r1i, j +Φki +Rki . (6.11)
By (6.7), rvi, j = r
1
i, j +Φvi , and by (6.10), Φvi ≥Φki +Rki . Thus,
rvi, j ≥ r1i, j +Φki +Rki . (6.12)
By (6.11) and (6.12), f ki, j ≤ rvi, j.
Property 6.3 shows how to compute an end-to-end response-time bound Ri.
Property 6.3. In the obi-task system, for each j, all jobs τ1i, j,τ2i, j, · · · ,τnii, j finish their execution within Ri time
units after r1i , where
Ri =Φnii +R
ni
i . (6.13)
Proof. By (6.7) and the definition of Rvi , τvi, j finishes by time r1i, j +Φvi +Rvi . Thus, τ
ni
i, j in particular finishes
within Φnii +R
ni
i = Ri time units after r
1
i . Also, by (6.10), Φvi +Rvi ≤ Φnii , since τnii is the single sink in Gi.
Because Φnii ≤ Ri, this implies that, for any v, τvi, j finishes within Ri time units after r1i .
Thus, a DAG-based task set can be transformed to an obi-task set with the same per-task parameters.
Given these per-task parameters, a response-time bound for each obi-task can be computed by (6.9) for
any arbitrary offset setting. Then, we can properly set the offsets for each obi-task according to (6.10) by
considering the corresponding tasks in each DAG in topological order (Cormen et al., 2001), starting with
Φ1i = 0 for each source task τ1i , by (6.8). By Property 6.2, the resulting obi-task set satisfies all requirements
of the original DAG-based task system, and by Property 6.3, an end-to-end response-time bound Ri can be
computed for each DAG Gi.
123
10
(8)
10
(8)
10
(8)
10
(15)
10
(8)
Figure 6.4: More highly prioritizing the right-side path in this DAG decreases its end-to-end response-time bound.
Note that the response-time bound for a virtual source/sink is not computed by (6.9), but is zero by
definition, since its WCET is zero. Any job of such a task completes in zero time as soon as it is released.
6.4 Setting Relative Deadlines
In the prior sections, we showed that, by applying our proposed transformation techniques, an end-to-end
response-time bound for each DAG can be established, given arbitrary but fixed relative-deadline settings.
That is, given Dvi ≥ 0 for any i,v, we can compute corresponding end-to-end response-time bounds (i.e., Ri
for each i) by (6.9), (6.8), (6.10), and (6.13).
Similar DAG transformation approaches have been presented previously (Elliott et al., 2014; Liu and
Anderson, 2010), but under the assumption that intra-task precedence constraints exist (i.e., jobs of the same
task must execute in sequence). Moreover, in this prior work, per-task relative deadlines have been defined in
a DAG-oblivious way. By considering the actual structure of such a DAG, it may be possible to reduce its
end-to-end response-time bound by setting its tasks’ relative deadlines so as to favor certain critical paths.
Consider, for example, the DAG illustrated in Figure 6.4. Suppose that the prior analysis yields a
response-time bound of 10 for each task, as depicted within each node. The corresponding end-to-end
response-time bound would then be 40 and is obtained by considering the right-side path. Now, suppose that
we alter the tasks’ relative-deadline settings to favor the tasks along this path at the possible expense of the
124
remaining task on the left. Further, suppose this modification changes the per-task response-time bounds
to be as depicted in parentheses. Then, this modification would have the impact of reducing the end-to-end
bound to 32.
In this section, we show that the problem of determining the “best” relative-deadline settings can be cast
as a linear-programming problem, which can be solved in polynomial time. The proposed linear program
(LP) is developed in the next two subsections.
6.4.1 Linear Program
In our LP, there are three variables per task τvi : Dvi , Φvi , and Rvi . The parameters Ti, Cvi , and mk are viewed
as constants. Thus, there are 3|V | variables in total, where |V | is the total number of tasks (i.e., nodes) across
all DAGs in the system. Before stating the required constraints, we first establish the following theorem,
which shows that a relative-deadline setting of Dyx > Tx is pointless to consider.
Theorem 6.2. If Dyx > Tx, then by setting Dyx = Tx, Ryx, the response-time bound of task τyx , will decrease, and
each other task’s response-time bound will remain the same.
Proof. To begin, note that, by (6.9), τyx does not impact the response-time bounds of those tasks executing on
CE pools other than Pyx . Therefore, the response-time bounds {Rvi | Pvi 6= Pyx } for such tasks are not altered by
any change to Dyx.
In the remainder of the proof, we consider a task τvi such that Pvi = P
y
x . Let Rvi and R
′v
i denote the
response-time bounds for τvi before and after, respectively, reducing D
y
x to Tx. If i = x and v = y, then by (6.9),
Ryx−R′yx =
Dyx−Tx
mk
· ∑
τwl ∈τk
uwl +
uyx
mk
(max{0,Tx−Dyx}−max{0,Tx−Tx})
>{since Dyx > Tx}
0.
Alternatively, if i 6= x or v 6= y, then by (6.9),
Rvi −R′vi =
uyx
mk
(max{0,Tx−Dyx}−max{0,Tx−Tx})
={since Dyx > Tx}
0.
125
Thus, the theorem follows.
By Theorem 6.2, the reduction of Dyx mentioned in the theorem does not increase the response-time
bound for any task. By (6.10), this implies that none of the offsets, {Φvi }, needs to be increased. Therefore,
by Property 6.3, no end-to-end response-time bound increases. These properties motivate our first set of
linear constraints.
Constraint Set (i): For each task τvi ,
0≤ Dvi ≤ Ti.
2|V | individual linear inequalities arise from this constraint set, where |V | is the total number of tasks.
Another issue we must address is that of ensuring that the offset settings, given by (6.10), are encoded in
our LP. This gives rise to the next constraint set.
Constraint Set (ii): For each edge from τwi to τvi in a DAG Gi,
Φvi ≥Φwi +Rwi .
There are |E| distinct constraints in this set, where |E| is the total number of edges in all DAGs in this system.
Finally, we have a set of constraints that are linear equality constraints.
Constraint Set (iii): With Constraints Set (i), it is clear that we can re-write (6.9) as follows, for each task
τvi ,
Rvi =
1
mk
(
Dvi · ∑
τwl ∈Γk
uwl + ∑
τwl ∈Γk
(
uwl · (Tl−Dwl )
))
+ max
τwl ∈Γk
{Cwl }+
mk−1
mk
Cvi , (6.14)
where Γk = {τwl |Pwl = pik}. Moreover, by (6.8), for each DAG Gi,
Φ1i = 0.
Constraint Set (iii) yields |V |+ |G| linear equations, where |G| denotes the number of DAGs.
Constraint Sets (i), (ii), and (ii) fully specify our LP, with the exception of the objective function. In this
LP, there are 3|V | variables, 2|V |+ |E| inequality constraints, and |V |+ |G| linear equality constraints.
126
6.4.2 Objective Function
Different objective functions can be specified for our LP that optimize end-to-end response-time bounds
in different senses. Here, we consider a few examples.
Single-DAG systems. For systems where only a single DAG exists, the optimization criterion is rather clear.
In order to optimize the end-to-end response-time bound of the single DAG, the objective function should
minimize the end-to-end response-time bound of the only DAG, G1. That is, the desired LP is as follows.
minimize Φn11 +R
n1
1
subject to Constraint Sets (i), (ii), and (iii)
Multiple-DAG systems. For systems containing multiple DAGs, choices exist as to the optimization criteria
to consider. We list three here.
Minimizing the average end-to-end response-time bound:
minimize ∑
i
(Φnii +R
ni
i )
subject to Constraint Sets (i), (ii), and (iii)
Minimizing the maximum end-to-end response-time bound:
minimize Y
subject to ∀i : Φnii +Rnii ≤ Y
Constraint Sets (i), (ii), and (iii)
Minimizing the maximum proportional end-to-end response-time bound:
minimize Y
subject to ∀i : (Φnii +Rnii )/Ti ≤ Y
Constraint Sets (i), (ii), and (iii)
127
G2
G1
G[1,2]
DAG Releases
T
T/2
Identical DAGs
Corresponding 
to Invocations of: G1 G1 G1G2 G2 G2
Figure 6.5: Illustration of DAG combining.
6.5 DAG Combining
In the application domain that motivates our work, the DAGs to be scheduled are typically of quite low
utilizations and are defined based on a relatively small number of templates that define various computational
patterns. Two DAGs defined using the same template are structurally identical: they are defined by graphs
that are isomorphic, corresponding nodes from the two graphs perform identical computations, the source
nodes are released at the same time, etc. Such structurally identical graphs can be combined into one graph
with a reduced period and larger utilization, as long any overutilization of the underlying hardware platform
is avoided. Such combining can be a very effective technique, because as the experiments presented later
show, our response-time bounds tend to be proportional to periods.
We illustrate this idea with a simple example. Consider two DAGs G1 and G2 with a common period
of T that are structurally identical. A schedule of these two DAGs is illustrated abstractly at the top of
Figure 6.5. As illustrated at the bottom of the figure, if these two DAGs are combined, then they are replaced
by a structurally identical graph, denoted here as G[1,2], with a period of T/2. With this change, the provided
response-time bounds have to be slightly adjusted. For example, if G[1,2] has a response-time bound of
R[1,2], then this would also be a response-time bound for G1, but that for G2 would be R[1,2]+ T2 , because in
combining the two graphs, the releases of G2 are effectively shifted forward by T2 time units. While this graph
128
combining idea is really quite simple, the experiments presented later suggest that it can have a profound
impact in the considered application domain. In particular, in that domain, per-DAG utilizations are low
enough that upwards of 40 DAGs can be combined into one. Thus, the actual period reduction is not merely
by a factor of 12 but by a factor as high as
1
40 .
6.6 Early Releasing
Transforming a DAG-based task system to a corresponding obi-task system enabled us to derive an
end-to-end response-time bound for each DAG. However, such a transformation may actually cause observed
end-to-end response times at runtime to increase, because the offsets introduced in the transformation may
prevent a job from executing even if all of its producers have already finished. For example, in Figure 6.3,
τ31,1 cannot execute until time 9, even though its corresponding producer job, τ
1
1,1, has finished by time 3.
Observed response times can be improved under deadline-based scheduling without altering analytical
response-time bounds by using a technique called early releasing (Devi, 2006). When early releasing is
allowed, a job is eligible for execution as soon as all of its corresponding producer jobs have finished, even if
this condition is satisfied before its actual release time.
Early releasing does not affect the response-time analysis for npc-sporadic tasks because that analysis is
based on the total demand for processing time due to jobs with deadlines at or before a particular time instant.
Early releasing does not change upper bounds on such demand, because every job’s actual release time and
hence deadline are unaltered by early releasing. Thus, the response-time bounds and therefore the end-to-end
response-time bounds previously established without early releasing still hold with early releasing.
Example 6.4. Considering G1 in Figure 6.1 again, Figure 6.3 is a possible schedule, without early releasing,
for the obi-tasks that implement G1, as discussed earlier. When we allow early releasing, we do not change
any release times or deadlines, but simply allow a job to become eligible for execution before its release time
provided its producers have finished. Figure 6.6 depicts a possible schedule where early releasing is allowed,
assuming the same releases and deadlines as in Figure 6.3. Several jobs (e.g., τ21,1, τ21,2, τ31,2, τ
4
1,1, and τ41,2)
now commence execution before their release times. As a result, observed end-to-end response times are
reduced, while still retaining all response-time bounds (per-task and end-to-end). ♦
129
end-to-end response time
end-to-end response time
Time
0
(Assume depicted jobs are scheduled alongside other jobs, which are not shown.) 
5 10 15 20 25 30
Job Release Job Deadline Job Completion CPU Execution DSP Execution
τ1
1
τ1
2
τ1
3
τ1
4
R1
1
Φ1 = 0
1
R1
2
R1
3
R1
4
Φ1
2
Φ1
3
Φ1
4
R1 : the end-to-end response-time bound for G1
Figure 6.6: Example schedule of the obi-tasks corresponding to the DAG-based tasks in G1 in Figure 6.1, when early
releasing is allowed.
130
6.7 Case Study
To illustrate the computational details of our analysis, we consider here a case-study system consisting of
three DAGs, G1, G2, and G3, which are specified in Figure 6.7. pi1 is a CE pool consisting of two identical
CPUs, and pi2 is a CE pool consisting of two identical DSPs. Thus, m1 = m2 = 2. These three DAGs have
fewer nodes and higher utilizations than typically found in our considered application domain. However,
one can imagine that these graphs were obtained from combining many identical graphs of lower utilization.
While it would have been desirable to consider larger graphs with more nodes, graphs from our chosen
domain typically have tens of nodes, and this makes them rather unwieldy to discuss. Still, the general
conclusions we draw here are applicable to larger graphs.
Utilization check. First, we must calculate the total utilization of all tasks assigned to each CE pool to
make sure that neither is overutilized. We have ∑τvi ∈Γ1 u
v
i = (200+100+300)/500+(133+78+197+73+
5)/1000 = 1.686< 2, and ∑τvi ∈Γ2 u
v
i = 380/500+(16+83+242)/1000 = 1.101< 2.
Virtual source/sink. Note that all DAGs have a single source and sink, except for G2, which has two sinks.
For it, we connect its two sinks to a single virtual sink τ62 , which has a WCET of 0 and a response-time bound
of 0. We call the resulting DAG G′2, which is shown in Figure 6.8.
Implicit deadlines. We now show how to compute response-time bounds assuming implicit deadines, i.e.,
Dv1 = 500 for 1≤ v≤ 4, Dv2 = 1000 for 1≤ v≤ 5 (the relative deadline of the virtual sink is irrelevant), and
Dv3 = 1000 for 1≤ v≤ 3. In order to derive an end-to-end response-time bound, we first transform the original
DAG-based tasks into obi-tasks as described in Section 6.2. Next, we calculate a response-time bound Rvi for
each obi-task τvi by (6.9). The resulting task response-time bounds, {Rvi }, are listed in Table 6.1. Note that, as
a virtual sink, the response-time bound for the virtual sink τ62 does not need to be computed by (6.9), but is 0
by definition. By (6.8) and (6.10), the offsets of the obi-tasks can now be computed in topological order with
respect to each DAG. The resulting offsets, {Φvi }, are also shown in Table 6.1. Finally, by Property 6.3, we
have an end-to-end response-time bound for each DAG: R1 =Φ41+R41 = 2538.25, R2 =Φ62+R
6
2 = 4361.5,
and R3 =Φ33+R
3
3 = 3376.5.
LP-based deadline settings. If we use LP techniques to optimize end-to-end response-time bounds, then
choices exist regarding the objective function, because our system has multiple DAGs. We consider three
choices here.
131
τ1
τ1
τ1
τ1
1
2
4
3
C1 = 200
1
P1 = π1
1
C1 = 380
2
P1 = π2
2
C1 = 100
3
P1 = π1
3
C1 = 300
4
P1 = π1
4
T1 = 500
(a) G1
τ2 τ2
τ2 τ2
τ2
1 2
3 4
5
C2 = 133
1
P2 = π1
1
C2 = 16
2
P2 = π2
2
C2 = 83
3
P2 = π2
3
C2 = 78
5
P2 = π1
5
C2 = 197
4
P2 = π1
4
T2 = 1000
(b) G2
τ3 τ3 τ3
1 2 3
C3 = 73
1
P3 = π1
1
C3 = 242
2
P3 = π2
2
C3 = 5
3
P3 = π1
3
T3 = 1000
(c) G3
Figure 6.7: DAGs in the case-study system. G2 has two sinks, so to analyze it, a virtual sink τ62 must be added that has
a WCET of 0 and a response-time bound of 0. We show the resulting graph in Figure 6.8.
132
τ2 τ2
τ2 τ2
τ2
1 2
3 4
5
T2 = 1000
τ2
6
C2 = 0
6
P2 = N/A
6
R2 = 0
6
C2 = 133
1
P2 = π1
1
C2 = 16
2
P2 = π2
2
C2 = 83
3
P2 = π2
3
C2 = 197
4
P2 = π1
4
C2 = 78
5
P2 = π1
5
Figure 6.8: G′2, where a virtual sink is created for G2.
R11 R
2
1 R
3
1 R
4
1 R
1
2 R
2
2 R
3
2 R
4
2 R
5
2 R
6
2 R
1
3 R
2
3 R
3
3
821.5 845.25 771.5 871.5 1209.5 938.5 972 1241.5 1182 0 1179.5 1051.5 1145.5
Φ11 Φ
2
1 Φ
3
1 Φ
4
1 Φ
1
2 Φ
2
2 Φ
3
2 Φ
4
2 Φ
5
2 Φ
6
2 Φ
1
3 Φ
2
3 Φ
3
3
0 821.5 821.5 1666.75 0 1209.5 2148 3120 2148 4361.5 0 1179.5 2231
Table 6.1: Case-study task response-time bounds and obi-task offsets assuming implicit deadlines. Bold entries denote
sinks.
Minimizing the average end-to-end response-time bound. For this choice, relative-deadline settings,
obi-task response-time bounds, and obi-task offsets are as shown in Table 6.2 (a). The resulting end-to-end
response-time bounds are R1 =Φ41+R41 = 3134.5, R2 =Φ62+R
6
2 = 2341.2, and R3 =Φ
3
3+R
3
3 = 1736.2.
Minimizing the maximum end-to-end response-time bound. For this choice, relative-deadline settings,
obi-task response-time bounds, and obi-task offsets are as shown in Table 6.2 (b). The resulting end-to-end
response-time bounds are R1 =Φ41+R41 = 2650.4, R2 =Φ62+R
6
2 = 2650.4, and R3 =Φ
3
3+R
3
3 = 2650.4.
Minimizing the maximum proportional end-to-end response-time bound. For this choice, relative-deadline
settings, obi-task response-time bounds, and obi-task offsets are as shown in Table 6.2 (c). The resulting
end-to-end response-time bounds are R1 =Φ41+R41 = 2208.9, R2 =Φ62+R
6
2 = 4417.8, and R3 =Φ
3
3+R
3
3 =
4261.0.
Early releasing. As discussed in Section 6.6, early releasing can improve observed response times without
compromising response-time bounds. The value of allowing early releasing can be seen in the results reported
133
D11 D
2
1 D
3
1 D
4
1 D
1
2 D
2
2 D
3
2 D
4
2 D
5
2 D
6
2 D
1
3 D
2
3 D
3
3
500 500 500 500 0 0 0 0 772.84 0 0 0 0
R11 R
2
1 R
3
1 R
4
1 R
1
2 R
2
2 R
3
2 R
4
2 R
5
2 R
6
2 R
1
3 R
2
3 R
3
3
1034.4 1015.7 984.36 1084.4 579.36 558.5 592 611.36 1203.4 0 549.36 671.5 515.36
Φ11 Φ
2
1 Φ
3
1 Φ
4
1 Φ
1
2 Φ
2
2 Φ
3
2 Φ
4
2 Φ
5
2 Φ
6
2 Φ
1
3 Φ
2
3 Φ
3
3
0 1034.4 1049.2 2050.1 0 579.36 1137.9 1729.9 1137.9 2341.2 0 549.36 1220.9
(a)
D11 D
2
1 D
3
1 D
4
1 D
1
2 D
2
2 D
3
2 D
4
2 D
5
2 D
6
2 D
1
3 D
2
3 D
3
3
0 500 359.06 500 0 0 0 584.52 1000 0 505.63 1000 0
R11 R
2
1 R
3
1 R
4
1 R
1
2 R
2
2 R
3
2 R
4
2 R
5
2 R
6
2 R
1
3 R
2
3 R
3
3
642.06 894.75 894.75 1113.6 608.56 437.5 471 1133.3 1424.1 0 1004.8 1101 544.56
Φ11 Φ
2
1 Φ
3
1 Φ
4
1 Φ
1
2 Φ
2
2 Φ
3
2 Φ
4
2 Φ
5
2 Φ
6
2 Φ
1
3 Φ
2
3 Φ
3
3
0 642.06 642.06 1536.8 0 608.56 1046.1 1517.1 1128.8 2650.4 0 1004.8 2105.8
(b)
D11 D
2
1 D
3
1 D
4
1 D
1
2 D
2
2 D
3
2 D
4
2 D
5
2 D
6
2 D
1
3 D
2
3 D
3
3
0 0 200.53 0 1000 0 253.21 1000 1000 0 1000 1000 1000
R11 R
2
1 R
3
1 R
4
1 R
1
2 R
2
2 R
3
2 R
4
2 R
5
2 R
6
2 R
1
3 R
2
3 R
3
3
679.95 798.99 798.99 729.95 1489.4 616.99 789.89 1521.4 1461.9 0 1459.4 1280.5 1425.4
Φ11 Φ
2
1 Φ
3
1 Φ
4
1 Φ
1
2 Φ
2
2 Φ
3
2 Φ
4
2 Φ
5
2 Φ
6
2 Φ
1
3 Φ
2
3 Φ
3
3
0 679.95 679.95 1478.9 0 1489.4 2106.4 2896.3 2552.7 4417.8 0 1508.3 2835.6
(c)
Table 6.2: Case-study relative-deadline settings, obi-task response-time bounds, and obi-task offsets when using
linear programming to (a) minimize average end-to-end response-time bounds, (b) minimize maximum end-to-end
response-time bounds, and (c) minimize maximum proportional end-to-end response-time bounds. Bold entries denote
sinks.
134
in Table 6.3. This table gives the largest observed end-to-end response time of each DAG in Figure 6.7,
assuming implicit deadlines with and without early releasing, in a schedule that was simulated for 50,000
time units. Analytical bounds are shown as well.
G1 G2 G3
Early releasing 1006 897 453
No early releasing 1966.75 3536.25 2586.0
Bounds 2538.25 4361.5 3376.5
Table 6.3: Observed end-to-end response times with/without early releasing and analytical end-to-end response-time
bounds for the implicit-deadline setting.
6.8 Schedulability Studies
In this section, we expand upon the specific case study just described by considering general schedulability
trends seen in experiments involving randomly generated task systems.
6.8.1 Improvements Enabled by Basic Techniques
We first consider the improvements enabled by the basic techniques covered in Sections 6.3 and 6.4
that underlie our work: allowing intra-task parallelism as provided by the npc-sporadic task model, and
determining relative-deadline settings by solving an LP.
Random system generation. In our experiments, we considered a heterogeneous platform comprised of
three CE pools, each consisting of eight identical CEs. Each pool was assumed to have the same total
utilization. We considered all choices of total per-pool utilizations in the range [1,8] in increments of 0.5.
We generated DAG-based task systems using a method similar to that used by others (Baruah, 2014; Li
et al., 2013). These systems were generated by first specifying the number of DAGs in the system, N, and
the number of tasks per DAG, n. For each considered pair N and n, we randomly generated 50 task-system
structures, each comprised of N DAGs with n nodes. Each node in such a structure was randomly assigned
to one of the CE pools, and for each DAG in the structure, one node was designated as its source, and one
as its sink. Further, each pair of internal nodes (not a source or a sink) was connected by an edge with
probability edgeProb, a settable parameter. Such an edge was directed from the lower-indexed node to the
higher-indexed node, to preclude cycles. Finally, an edge was added from the source to each internal node
with no incoming edges, and to the sink from each internal node with no outgoing edges.
135
For each considered per-pool untilization and each generated task-system structure, we randomly genera-
ted 50 actual task systems by generating task utilizations using the MATLAB function randfixedsum() (Staf-
ford, 2006). According to the application domain that motivates this work,3 we defined each DAG’s
period to be 1 ms. (A task’s WCET is determined by its utilization and period.) For each considered
value of N, n, and total per-pool utilization (one point in one of our graphs), we considered 50 (task
system structures)×50 (utilizations) = 2,500 task sets.
Comparison setup. We compared three strategies: (i) transforming to a conventional sporadic task system
and using implicit relative deadlines, which is a strategy used in prior work on identical platforms (Liu and
Anderson, 2010); (ii) transforming to an npc-sporadic task system and using implicit relative deadlines; and
(iii) transforming to an npc-sporadic task system and using LP-based relative deadlines. When applying
our LP techniques, we chose the objective function that minimizes the maximum end-to-end response-time
bound. Although an identical platform was assumed in (Liu and Anderson, 2010), the techniques from that
paper can be extended to heterogeneous platforms in a similar way to this chapter.
Results. In all cases that we considered, the two evaluated techniques improved end-to-end response-time
bounds, often significantly. Similar trends were observed in all experiments we conducted, and we present
here only the case where N = 5, n = 20, and edgeProb= 0.5 for demonstration. For each generated task
set, we recorded the maximum end-to-end response-time bound among its five DAGs. For each given total
per-pool utilization point, we report here the average of the maximum end-to-end response-time bounds
among the 2,500 task sets generated for that point. We call this metric the average maximum end-to-end
response-time bound (AMERB). Figure 6.9 plots AMERBs as a function of total per-CE-pool utilization. As
seen, the application of both techniques reduced AMERBs by 39.42% to 81.65%.
6.8.2 Improvements Enabled by DAG Combining
As mentioned in Section 6.5, in the application domain that motivates our work, DAGs are usually defined
using several well-defined computational templates, and as a result, many identical DAGs will exist. We
proposed the technique of DAG combining in Section 6.5 to exploit this fact to further reduce response-time
bounds. We now discuss schedulability experiments that we conducted to evaluate this technique.
3In applications usually considered in the real-time-systems community, much larger periods are the norm. The considered domain is
quite different.
136
Total Utilization in Each CE Pool
1 2 3 4 5 6 7 8A
ve
ra
ge
 M
ax
im
um
 E
nd
-to
-E
nd
 R
es
po
ns
e-
Ti
m
e 
Bo
un
ds
 (m
s
)
0
5
10
15
20
25
30
35
40
conventional sporadic tasks, implicit deadlines
npc-sporadic tasks, implicit deadlines
npc-sporadic tasks, LP-based deadlines
Figure 6.9: AMERBs as a function of total utilization in each CE pool in the case where each task set has five DAGs,
20 tasks per DAG, and edgeProb=0.5.
137
Random system generation. We employed a process of randomly generating systems that is similar to that
discussed in Section 6.8.1, except that, instead of generating task-system structures comprised of N DAGs,
we generated structures comprised of N templates. Additionally, we introduced a new parameter K that
indicates the number of identical DAGs per template. A period of 1 ms was still associated with each DAG.
Comparison setup. We compared two strategies: (i) do no combining, and compute end-to-end response-
time bounds assuming N ·K independent DAGs; (ii) combine identical DAGs, and compute end-to-end
response-time bounds assuming N DAGs, making adjustments as discussed in Section 6.5 to obtain actual
response-time bounds for the DAGs that were combined. Under both strategies, the general techniques
evaluated in Section 6.8.1 were applied.
Results. In all cases that we considered, the DAG combining technique improved end-to-end response-time
bounds significantly. Similar trends were observed in all experiments we conducted, and we present here
only the case where each system has five templates, each of which has 20 nodes, and edgeProb = 0.5 for
demonstration. For this case, Figure 6.10 plots AMERBs as a function of total per-pool utilization, when
the number of identical DAGs per template is fixed to 40 (this number is close to what would be expected
in the application domain that motivates this work). Note that the AMERBs in Figure 6.10 are much lower
than those in Figure 6.9, even before applying the DAG combining technique. This is because the systems
considered in Figure 6.10 have far more DAGs than those in Figure 6.9. As a result, for each given total
per-pool utilization, the systems in Figure 6.10 have much lower per-DAG and per-task utilizations. As
also seen in Figure 6.10, when DAG combining is applied, the AMERBs are not very much influenced by
the total per-CE-pool utilization. That is because DAG combining resulted in quite small response-time
bounds by (6.9), so total end-to-end bounds were mainly impacted by the introduced shifting, rather than total
per-CE-pool utilization. Also, Figure 6.11 plots AMERBs as a function of the number of identical DAGs per
template, when every CE pool is fully utilized (i.e., the total utilization of each pool is eight). In this case, the
AMERB metric was calculated over all task sets that have the same number of identical DAGs per template.
According to our industry partners, in the considered application domain, a DAG’s end-to-end response-
time bound should typically be at most 2.35 ms. As observed in Figure 6.10, in the absence of DAG
combining, AMERBs in this experiment were as high as 8.2 ms. However, the introduction of DAG
combining enabled a drop to less than 2.0 ms, even when the platform was fully utilized. This demonstrates
that DAG combining—as simple as it may seem—can have a powerful impact in the targeted domain.
138
Total Utilization in Each CE Pool
1 2 3 4 5 6 7 8A
ve
ra
ge
 M
ax
im
um
 E
nd
-to
-E
nd
 R
es
po
ns
e-
Ti
m
e 
Bo
un
ds
 (m
s
)
0
2
4
6
8
10
Do not combine identical DAGs
Combine identical DAGs
Figure 6.10: AMERBs as a function of total utilization in each CE pool in the case where the number of identical
DAGs per template is fixed to 40.
139
Number of Identical DAGs per Template
5 10 15 20 25 30 35 40 45 50A
ve
ra
ge
 M
ax
im
um
 E
nd
-to
-E
nd
 R
es
po
ns
e-
Ti
m
e 
Bo
un
ds
 (m
s
)
0
2
4
6
8
10
12
Do not combine identical DAGs
Combine identical DAGs
Figure 6.11: AMERBs as a function of the number of identical DAGs per template in the case where total utilization
in each CE pool is fixed to eight.
140
6.9 Chapter Summary
In this chapter, we presented task-transformation techniques to provide end-to-end response-time bounds
for DAG-based tasks implemented on heterogeneous multiprocessor platforms where intra-task parallelism is
allowed. We also presented an LP-based method for setting relative deadlines and a DAG combining technique
that can be applied to improve these bounds. We evaluated the efficacy of these results by considering a
case-study task system and by conducting schedulability studies.
141
CHAPTER 7: MINIMUM-PARALLELISM MULTIPROCESSOR SUPPLY ON IDENTICAL PLAT-
FORMS1
Open-systems (Deng and Liu, 1997) frameworks allow separate software components to execute together
on a common hardware platform, with each component having the “illusion” of executing on a dedicated
virtual platform. Providing such an illusion can ease software-development efforts, not only when mixing
different applications, but also when integrating separately developed components of the same application.
In domains where real-time constraints exist, temporal isolation among components should be ensured,
i.e., it should be possible to validate the timing constraints of each component independently. Therefore, a
specification of the computing capacity allocated to a component is needed.
In early work in this direction pertaining to uniprocessor platforms, Shin and Lee (2003) proposed a
virtual processor (VP) model called the periodic resource (PR) model, which allows the considerable body of
work on periodic task scheduling (Liu and Layland, 1973) to be exploited in reasoning about the allocation
of processor time to components. In the PR model, a VP is specified by the parameters (Π,Θ), with the
interpretation that Θ time units of processor time is guaranteed to the supported component every Π time
units.
While this simple model sufficed in the uniprocessor case, it is inadequate in the multiprocessor case,
because the important issue of parallelism is ignored. To deal with this issue, Shin et al. (2008) proposed
extending the PR model by adding an additional parameter. Specifically, under their multiprocessor periodic
resource (MPR) model, the supply allocated to a component is specified by (Π,Θ,m′), with the interpretation
that Θ time units of processor time is guaranteed to the component every Π time units with at most m′ VPs
providing allocation in parallel. That is, the new parameter m′ specifies the maximum degree of parallelism.
In the MPR model, all VPs allocated to a component are required to have a common period Π that is strictly
synchronized.
1Contents of this chapter previously appeared in preliminary form in the following paper:
Yang, K. and Anderson, J. (2016a). On the dominance of minimum-parallelism multiprocessor supply. In Proceedings of the 37th
IEEE Real-Time Systems Symposium, pages 215–226.
142
A key characteristic of the MPR model is its flexibility. For example, consider a component that is to be
allocated 80% of the capacity of a quad-core machine. The supply interface for that component could be
defined as (100,320,4), meaning that every 100 time units, the component receives 320 units of processing
time on up to four processors. Such a specification does not indicate the precise manner in which processing
time is allocated. For example, the component could be allocated 80% of the capacity of each processor, or
100% of three processors and 20% of the fourth, among other choices. Which choice is best?
MP form. In the example just discussed, the second-listed choice is known as minimum-parallelism (MP)
form. Under MP form, each component is allocated at most one partially available processor, with all other
processors allocated to it being fully available. MP form was first proposed by Leontyev and Anderson (2009)
to support SRT container hierarchies, which allow components to include sub-components, which in turn can
include their own sub-components, etc. Assuming MP form, they showed that container hierarchies with an
unlimited number of levels can be supported with bounded deadline tardiness and no utilization loss. In work
directed at HRT systems, Xu et al. (2015) observed that, by enforcing MP form in the context of the MPR
model, per-component schedulability can be improved.
Because this improvement in schedulability was considered in the context of the MPR model, a common,
synchronized allocation period was assumed to be used on all processors allocated to a component. In
practice, however, situations exist in which such an assumption may be problematic. A good example of this
can be seen in recent work of Durrieu et al. (2014), who considered a flight management system implemented
on a multicore platform wherein clocks on different processors “do not drift [but] have unpredictable initial
offsets.” In the future, the assumption of tight synchrony may become even more problematic, as manycore
platforms evolve in which core counts soar into the hundreds if not thousands. Similar observations have
been made by Lipari and Bini (2010) and Bini et al. (2009b), who suggested generalizing the MPR model so
that the VPs allocated to a single component may have different periods with different initial phasings. Does
MP form still retain its advantages over other supply forms in the HRT case under this more general notion of
VP allocation?
In chapter, we answer this question in the affirmative by showing that MP form dominates all other
supply forms in the context of these cases: VPs are synchronous, concrete asynchronous, or non-concrete
asynchronous (these terms are defined in Section 7.1). In each of these cases, we consider two sub-cases:
requiring a common period for all VPs, and allowing such periods to differ. The prior work noted above by
143
Common Period Different Periods
Synchronous Theorem 7.5 Theorem 7.6
Concrete
Asynchronous
Theorem 7.6 Theorem 7.6
Non-Concrete
Asynchronous
Theorem 7.2 Theorems 7.3 and 7.4
Table 7.1: Summary of theorems applying to different VP synchronization assumptions.
Xu et al. (2015) on the MPR model implies that MP form dominates all other forms in the case of synchronous
VPs with a common period. For each other case, we show that an arbitrary component is always dominated
by an MP-form component of the same bandwidth (i.e., total processor capacity—see Section 7.1), provided
its period is defined properly. These results follow from the theorems listed in Table 7.1. Additionally, in all
six cases, we show that an MP-form component can never be dominated by a non-MP-form component of
the same bandwidth, regardless of how periods are defined. The issue of MP dominance under the considered
cases is not as straightforward as one might think at first glance. Indeed, many subtleties arise.
Organization. In the following sections, we introduce our system model (Section 7.1), provide some
preliminary properties and theorems (Section 7.2), show the dominance of MP form for non-concrete
asynchronous VPs (Section 7.3) and synchronous and concrete asynchronous VPs (Section 7.4), and show
that MP form cannot be dominated by any other form (Section 7.5).
7.1 System Model
We consider a compositional system executing upon a physical multiprocessor platform with identical
processors. Each component is provided processor time by a set of VPs, each defined according to the PR
model, as discussed next.
7.1.1 Periodic Resource Model
Under the PR model (Shin and Lee, 2003), a VP Γi is characterized by two parameters (Πi,Θi), which
indicate that Γi supplies Θi units of processor time every Πi time units, where 0<Θi ≤Πi. In this chapter,
we assume continuous time, thus Πi and Θi are real numbers. The bandwidth of the VP Γi is given by
wi =Θi/Πi. Note that, for anyΠi, Γi = (Πi,Πi) defines a VP corresponding to a dedicated physical processor
that is always available.
144
𝑡Π𝑖
Θ𝑖
𝑡Γ𝑖
′
Π𝑖 − Θ𝑖
𝑡Γ𝑖
′
Π𝑖
∙ Θ𝑖 𝜖Γ𝑖
Figure 7.1: Worst-case supply of Γi (adapted from (Shin and Lee, 2003)).
The supply bound function (SBF) of the VP Γi, denoted Z(t,Γi), indicates the minimum processor time Γi
can supply during any time interval of length t. Shin and Lee (Shin and Lee, 2003) have shown that Z(t,Γi)
can be defined as
Z(t,Γi) =

0 if t ′Γi < 0⌊
t ′Γi
Πi
⌋
·Θi+ εΓi if t ′Γi ≥ 0
(7.1)
where
t ′Γi = t− (Πi−Θi), (7.2)
εΓi = max
(
t ′Γi−Πi
⌊
t ′Γi
Πi
⌋
− (Πi−Θi),0
)
. (7.3)
This definition reflects the worst-case scenario illustrated in Figure 7.1.
7.1.2 VPs in a Component
We consider a component C that consists of a set of VPs, denoted C = {Γi}, where Γi = (Πi,Θi) for
1≤ i≤ |C|. The supply of a component is the sum of the supply of all VPs in this component.
Since Γi = (Πi,Πi) indicates a dedicated processor regardless of the value of Πi, we let p denote the
number of such dedicated processors and do not bother to specify their periods. Thus, we alternatively denote
145
the component C by C = (p,T ), where T = {Γi | Γi ∈ C ∧0< wi < 1}. It is clear that
|C|= p+ |T |. (7.4)
We define the bandwidth of component C as
bw(C) = ∑
Γi∈C
wi. (7.5)
The bandwidth bw(C) indicates the total processor share allocation to which C is entitled. Minimum-
parallelism (MP) form is defined as follows.
Definition 7.1. A component C = (p,T ) is in MP form if and only if |T | ≤ 1.
Concrete vs. non-Concrete. We consider the possibility that the VPs in a component are asynchronous,
meaning that they can have different phases—a VP Γi with a phase of φi is initialized to begin at time φi, i.e.,
its first allocation of Θi time units occurs within the interval [φi,φi+Πi), its second within [φi+Πi,φi+2Πi),
and so on. As it turns out, the results we obtain depend on whether phases are known or unknown prior to
runtime. In the first case, we say that the VPs are concrete asynchronous, and only a particular phase for each
VP needs to be considered in schedulability (supply) analysis. In the second case, we say that the VPs are
non-concrete asynchronous, and the worst case among all possible phases must be considered in schedulability
(supply) analysis . Synchronous VPs can be considered as a special case of concrete asynchronous VPs where
all phases are required to be zero. In this chapter, we consider all of the three phasing assumptions regarding
VPs: they can be synchronous, concrete asynchronous, or non-concrete asynchronous.
7.1.3 Parallel Supply Function
The SBF definition in (7.1) for the PR model hinges only on considering uniprocessor supply allocations.
In the multiprocessor case, however, SBFs must also address the important issue of parallelism. Various
multiprocessor SBFs have been proposed. The most expressive of these considered to date is the parallel
supply function (PSF), proposed by Bini et al. (2009a). The PSF describes the supply of a component C by a
set of functions, {psf j(t,C) | j ∈ Z+}, where each function psf j(t,C) is defined as follows.
Definition 7.2. psf j(t,C) denotes the minimum supply of C during any time interval of length t with a degree
of parallelism at most j.
146
Γ1
Γ2
Γ3
0       1       2       3        4        5       6       7        8        9      10      11 
Figure 7.2: Example illustrating parallel supply (adapted from (Lipari and Bini, 2010)).
We illustrate the above definition with the following example, and refer readers to the work of Bini et al.
(2009a) for a more formal treatment.
Example 7.1. (Adapted from Lipari and Bini (2010).) Let Γ1, Γ2,, and Γ3 be three VPs that compose C.
Assume that the processor time they make available within the time interval [0,11) is shown in Figure 7.2,
where the gray boxes represent available processor time. Suppose that all three VPs are fully available at or
after time 11. Then, [0,11) is the interval of length 11 that provides the minimum supply at every degree of
parallelism. In this case, psf1(t,C) = 10 because there are 10 time units in [0,11) during which at least one
VP provides available processor time. psf2(t,C) = 16 because all three VPs provide available processor time
simulanteously only in [4,5), so psf2(t,C) is one less than the total available processor time in [0,11). This
total available time is given by psf3(t,C) = 17. ♦
In this chapter, we use PSF functions to describe exact lower bounds on supply in order to compare the
supply of different components exactly. That is, for any j and t ≥ 0, there exists a possible scenario in which,
over some interval of length t, the supply provided by C with a degree of parallelism at most j is exactly
psf j(t,C).
By Def. 7.2, we have the following property.
(∀C,∀ j ≥ 1,∀t ≥ 0 :: psf j(t,C)≤ jt) (7.6)
Also, By Lemma 1 in (Bini et al., 2009a), the following properties hold.
(∀C,∀ j ≥ 1,∀t ≥ 0 :: psf j(t,C)≤ psf j+1(t,C)) (7.7)
147
(∀C,∀ j ≥ |C|,∀t ≥ 0 :: psf j(t,C) = psf j+1(t,C)) (7.8)
In accordance with Def. 7.2, psf∞(t,C) represents the minimum supply that C is guaranteed to provide
during any time interval of length t with no constraint on the degree of parallelism. By Def. 7.2, psf∞(t,C) =
psf |C|(t,C), because there are at most |C| dedicated or non-dedicated resources that can provide supply in
parallel in C.
7.2 Preliminaries
In this section, we provide a condition for establishing the superiority of MP form. This condition will
allow us to conclude that MP form dominates other forms. Dominance is defined with respect to component
supply based on PSF:
Definition 7.3. A component C′ dominates another component C if and only if (∀ j≥ 1,∀t ≥ 0 :: psf j(t,C)≤
psf j(t,C′)) holds.
By Def. 7.3, in order to show the dominance of an arbitrary component C′ over another arbitrary
component C, we must consider all relevant PSF functions. However, the following theorem shows that it
suffices to consider only two specific PSF functions.
Theorem 7.1. Let C be an arbitrary component, and let C∗ be a component in MP form. If (∀t :: psf∞(t,C)≤
psf∞(t,C∗)) holds, then C∗ dominates C.
Proof. Let C = (p,T ) and C∗ = (p∗,T ∗). Because C∗ has p∗ dedicated processors,
(∀1≤ j ≤ p∗,∀t ≥ 0 :: psf j(t,C∗) = jt). (7.9)
On the other hand, for C, by (7.6), we have
(∀1≤ j ≤ p∗,∀t ≥ 0 :: psf j(t,C)≤ jt). (7.10)
By (7.9) and (7.10),
(∀1≤ j ≤ p∗,∀t ≥ 0 :: psf j(t,C)≤ psf j(t,C∗)). (7.11)
148
Π𝑖 − Θ𝑖 Π𝑖 − Θ𝑖 Π𝑖 − Θ𝑖Θ𝑖
Θ𝑖
𝑍(𝑡, Γ𝑖)
(𝑡 − (Π𝑖 − Θ𝑖)) 𝑤𝑖
(𝑡 − 2(Π𝑖 − Θ𝑖))𝑤𝑖
Θ𝑖
Figure 7.3: The graph of Z(t,Γi), as an illustration of Properties 7.1, 7.2, and 7.3.
Because C∗ is in MP form, |T | ≤ 1, and by (7.4), |C∗|= p∗+ |T ∗| ≤ p∗+1. Therefore, by (7.8),
(∀ j ≥ p∗+1,∀t ≥ 0 :: psf j(t,C∗) = psf∞(t,C∗)). (7.12)
On the other hand, for C, by (7.7),
(∀ j ≥ p∗+1,∀t ≥ 0 :: psf j(t,C)≤ psf∞(t,C)). (7.13)
Now, by (7.12), (7.13), and psf∞(t,C)≤ psf∞(t,C∗) (from the statement of the theorem), we have
(∀ j ≥ p∗+1,∀t ≥ 0 :: psf j(t,C)≤ psf j(t,C∗)). (7.14)
By (7.11), (7.14), and Def. 7.3, C∗ dominates C.
Before endeavoring to use Theorem 7.1 to establish the dominance of MP form, we first provide several
useful properties concerning the supply function Z(t,Γi) of an arbitrary VP Γi. Property 7.1 directly follows
from the definition of Z(t,Γi) as given by (7.1)–(7.3). Property 7.2 is established in Lemma 1 in (Shin and
Lee, 2003), and Property 7.3 is established in (Easwaran et al., 2007). The intuition behind these properties is
illustrated by the graph of Z(t,Γi) shown in Figure 7.3.
149
Property 7.1. Z(t,Γi) = 0 for 0≤ t ≤ 2(Πi−Θi).
Property 7.2. Z(t,Γi)≥max{(t−2(Πi−Θi))wi,0}.
Property 7.3. Z(t,Γi)≤max{(t− (Πi−Θi))wi,0}.
We state two more properties below, in which an alternate definition of Z(t,Γi) is indirectly considered
that is based on the following function f :
f (x,Γi) =
⌊
x
Πi
⌋
·Θi+max
(
x−Πi
⌊
x
Πi
⌋
−(Πi−Θi),0
)
. (7.15)
Note that, by (7.1) (7.2) and (7.3),
Z(t,Γi) = f (t ′Γi ,Γi), if t
′
Γi ≥ 0. (7.16)
When Γi is fixed, i.e., Πi andΘi are constants, the following properties apply to f (x,Γi). These properties
can be seen intuitively by considering the graph of f (x,Γi), which is similar to that of Z(t,Γi) as illustrated
in Figure 7.3. Property 7.5 can be seen by observing that the slope of any two points in the graph of f (x,Γi)
is at most one.
Property 7.4. f (x,Γi) is monotonically increasing for non-negative x, i.e., f (x1,Γi)≤ f (x2,Γi) if 0≤ x1 ≤
x2.
Property 7.5. For any x,y ≥ 0, f (x+ y,Γi) ≤ f (x,Γi)+ y, which also implies f (x− y,Γi) ≥ f (x,Γi)− y,
provided that x− y≥ 0 holds.
We also utilize the two straightforward claims below.
Claim 7.1. The supply of a VP Γi can be zero within any time interval of length Πi−Θi, regardless of how
the interval aligns with the VP’s periods of allocation.
This claim is different from Property 7.1. In order to have a supply of zero within a time interval of
length up to 2(Πi−Θi), as stated in Property 7.1, the interval must have a specific alignment with respect to
the periods of allocation of Γi as shown in Figure 7.1. However, according to this claim, the supply within
any time interval of length Πi−Θi can be a zero. Figure 7.4 shows the only two possibilities that can occur:
the considered interval is either included within a single period of allocation, or spans two such periods. In
either situation, supply within the interval can be zero.
150
Π𝑖 − Θ𝑖 Π𝑖 − Θ𝑖
Figure 7.4: Illustration of Claim 7.1.
Claim 7.2. Let C∗ = (p∗,T ∗) be a component in MP form. If |T ∗|= 0, then psf∞(t,C∗) = t · p∗. If |T ∗|= 1,
then letting Γ∗ denote the lone VP in T ∗, psf∞(t,C∗) = t · p∗+Z(t,Γ∗)
This claim follows directly from the definitions above.
7.3 Non-Concrete Asynchronous
In this section, we consider the case of non-concrete asynchronous VPs. In order to apply Theorem 7.1
in this case to establish the dominance of MP form, we begin by providing an exact calculation of psf∞(t,C).
For any time interval of length t, a dedicated resource supplies t time units of processor time, and by
(7.1), a non-dedicated resource Γ supplies at least Z(t,Γ) time units. Therefore, with the degree of parallelism
unconstrained, a component C = (p,T ) provides a supply of at least t p+∑Γi∈T Z(t,Γi). Moreover, this
minimum does indeed happen, as shown in Figure 7.5. (Note that the alignment shown in the figure can
happen because we are assuming for now that VPs are non-concrete asynchronous.) Thus, for any component
C = (p,T ),
psf∞(t,C) = t p+ ∑
Γi∈T
Z(t,Γi). (7.17)
In the next two subsections, we establish the dominance of MP form in two steps. First, we consider the
case in which all VPs in C share a common period. Second, we build upon this result by considering the case
in which the VPs in C may have different periods.
7.3.1 A Common Period
We first consider the case in which the VPs in C share a common period Π, i.e., (∀Γi = (Πi,Θi) ∈ C ::
Πi =Π) holds. We establish our key proof obligation in Theorem 7.2 below. The following lemma is used in
its proof. Specifically, we use it to show how to combine two VPs “locally” in a way that is in accordance
151
𝑍(𝑡, Γ1)
𝑡
𝑍(𝑡, Γ2)
𝑍(𝑡, Γ3)
Figure 7.5: Illustration of the worst case of psf∞(t,C) for non-concrete asynchronous VPs.
with MP form. Figure 7.6 illustrates the three cases of the lemma. A rigorous proof is rather tedious and
mechanical. Readers who are not interested may skip these mathematical details to Page 159.
Lemma 7.1. Let Γi = (Π,Θi) and Γ j = (Π,Θ j) be two VPs that are not dedicated processors, and without
loss of generality, assume Θi ≤Θ j, i.e., 0< wi ≤ w j < 1. Then, we have the following three exhaustive cases
for wi+w j and corresponding conclusions.
1. If 0< wi+w j < 1, then Z(t,Γi)+Z(t,Γ j)≤ Z(t,Γk), where Γk = (Π,Θk) and Θk =Θi+Θ j.
2. If wi+w j = 1, then Z(t,Γi)+Z(t,Γ j)≤ t.
3. If 1< wi+w j < 2, then Z(t,Γi)+Z(t,Γ j)≤ t+Z(t,Γk), where Γk = (Π,Θk) and Θk =Θi+Θ j−Π.
Proof. We consider the three cases of the lemma individually.
Case 1: In this case,
Θk =Θi+Θ j. (7.18)
so
Θi ≤Θ j <Θk, (7.19)
152
𝑡𝑍(𝑡, Γ𝑖)
𝑍(𝑡, Γ𝑗)
𝑍(𝑡, Γ𝑘)
+
≤
(a) Illustration for Case 1
𝑡
𝑍(𝑡, Γ𝑖)
𝑍(𝑡, Γ𝑗)
𝑡
+
≤
(b) Illustration for Case 2
𝑡
𝑍(𝑡, Γ𝑖)
𝑍(𝑡, Γ𝑗)
𝑡
+
≤
𝑍(𝑡, Γ𝑘)
+
(c) Illustration for Case 3
Figure 7.6: Illustration for the cases in Lemma 7.1.
153
because Θi ≤Θ j is assumed by the statement of the lemma. By (7.2), (7.18), and (7.19),
t ′Γi ≤ t ′Γ j < t ′Γk . (7.20)
In the next paragraph, we dispense with all possibilities that occur when at least one of t ′Γi and t
′
Γ j is negative.
First, if t ′Γi ≤ t ′Γ j < 0, then by (7.1), Z(t,Γi)+Z(t,Γ j) = 0 ≤ Z(t,Γk). Second, if t ′Γi < 0 ≤ t ′Γ j , then
by (7.1), Z(t,Γi)+Z(t,Γ j) = 0+Z(t,Γ j)≤ Z(t,Γk). Therefore, in the rest of the proof for Case 1, we focus
on the remaining possibility, 0≤ t ′Γi ≤ t ′Γ j , which by (7.20), implies
0≤ t ′Γi ≤ t ′Γ j < t ′Γk . (7.21)
Applying (7.16) to Z(t,Γi), Z(t,Γ j), and Z(t,Γk), respectively, we have the following.
Z(t,Γi) ={by (7.16)}
f (t ′Γi ,Γi)
≤{by (7.21) and Property 7.4 }
f (t ′Γk ,Γi)
={by (7.15)}⌊
t ′Γk
Π
⌋
Θi+max
(
t ′Γk−Π
⌊
t ′Γk
Π
⌋
−(Π−Θi),0
)
. (7.22)
Similarly, for the same reasons,
Z(t,Γ j)≤
⌊
t ′Γk
Π
⌋
Θ j +max
(
t ′Γk−Π
⌊
t ′Γk
Π
⌋
−(Π−Θ j),0
)
. (7.23)
By (7.15) and (7.16),
Z(t,Γk) =
⌊
t ′Γk
Π
⌋
Θk +max
(
t ′Γk−Π
⌊
t ′Γk
Π
⌋
−(Π−Θk),0
)
. (7.24)
For notational simplicity, we introduce the two terms below.
Φ= t ′Γk −Π
⌊
t ′Γk/Π
⌋
(7.25)
154
∆= max(Φ− (Π−Θi),0)+max(Φ− (Π−Θ j),0)
−max(Φ− (Π−Θk),0) (7.26)
Now, by (7.30), (7.31) and (7.32), we have
Z(t,Γi)+Z(t,Γ j)−Z(t,Γk)≤
⌊
t ′Γi
Π
⌋
(Θi+Θ j−Θk)+∆
={by (7.18)}
∆. (7.27)
Given the derivation above, we can complete the proof by showing ∆≤ 0. This result is implied by the
following claim.
Claim 7.3. In Case 1, ∆≤ 0.
Proof. By (7.25), 0≤Φ<Π. Also, by (7.19), Π−Θk <Π−Θ j ≤Π−Θi. Given these ranges,
the following cases are exhaustive.
Case 1.1: Φ ∈ [0,Π−Θk), which implies ∆= 0.
Case 1.2: Φ ∈ [Π−Θk,Π−Θ j), which implies ∆ = 0− (Φ− (Π−Θk)) ≤ 0, because Φ ≥
Π−Θk holds in this case.
Case 1.3: Φ ∈ [Π−Θ j,Π−Θi), which implies ∆= (Φ− (Π−Θ j))− (Φ− (Π−Θk)) =Θ j−
Θk < 0, by (7.19).
Case 1.4: Φ ∈ [Π−Θi,Π), which implies ∆= (Φ− (Π−Θi))+(Φ− (Π−Θ j))− (Φ− (Π−
Θk)) =Φ−Π+Θi+Θ j−Θk < 0, by (7.18) and the fact that Φ<Π holds in this case.
Claim 7.3 and (7.27) together imply Z(t,Γi)+Z(t,Γ j)≤ t+Z(t,Γk), as required.
Case 2: In this case, wi +w j = 1. By Property 7.3, Z(t,Γi) ≤ max{(t− (Π−Θi))wi,0} ≤ twi. Similarly,
Z(t,Γ j)≤ tw j. Thus, Z(t,Γi)+Z(t,Γ j)≤ t(bwi+bw j) = t.
155
Case 3: In this case,
Θk =Θi+Θ j−Π, (7.28)
so
Θk <Θi ≤Θ j, (7.29)
since Θ j < Π holds and Θi ≤ Θ j is assumed by the statement of the lemma. By (7.2), (7.28), and (7.29),
t ′Γk < t
′
Γi ≤ t ′Γ j . In the next paragraph, we dispense with all possibilities that occur when at least one of t ′Γk , t ′Γi ,
and t ′Γ j is negative.
First, if t ′Γk < t
′
Γi ≤ t ′Γ j < 0, then by (7.1), Z(t,Γi) + Z(t,Γ j) = 0 ≤ t + 0 = t + Z(t,Γk). Second, if
t ′Γk < t
′
Γi < 0 ≤ t ′Γ j , then by (7.1), Z(t,Γi)+Z(t,Γ j) = 0+Z(t,Γ j) ≤ 0+ t = t + 0 = t +Z(t,Γk). Third, if
t ′Γk < 0≤ t ′Γi ≤ t ′Γ j , then we have t ′Γk < 0, which by (7.2), implies t− (Π−Θk)< 0, and hence, t <Π−Θk.
By the statement of the lemma (and in particular, Case 3), Π−Θk = 2Π−Θi−Θ j ≤ 2(Π−Θi). Thus, we
have t < 2(Π−Θi), which by Property 7.1, implies Z(t,Γi) = 0. Therefore, by (7.1), Z(t,Γi)+Z(t,Γ j) =
Z(t,Γ j)≤ t = t+Z(t,Γk).
Next, we focus on the remaining possibility in Case 3, namely, 0≤ t ′Γk < t ′Γi ≤ t ′Γ j . Applying (7.16) to
Z(t,Γi), Z(t,Γ j), and Z(t,Γk), respectively, we have the following.
Z(t,Γi) = f (t ′Γi ,Γi),
={by (7.15) and Πi =Π}⌊
t ′Γi
Π
⌋
·Θi+
(
t ′Γi−Π
⌊
t ′Γi
Π
⌋
− (Π−Θi),0
)
(7.30)
Z(t,Γ j) = f (t ′Γ j ,Γ j)
={rearranging}
f (t ′Γi +(t
′
Γ j − t ′Γi),Γ j)
≤{by Property 7.5; note that t ′Γ j − t ′Γi ≥ 0}
f (t ′Γi ,Γ j)+(t
′
Γ j − t ′Γi)
={by (7.2) and Π j =Πi =Π}
f (t ′Γi ,Γ j)+Θ j−Θi
={by (7.15) and Π j =Π}
156
⌊
t ′Γi
Π
⌋
·Θ j +Θ j−Θi+
max
(
t ′Γi−Π
⌊
t ′Γi
Π
⌋
− (Π−Θ j),0
)
(7.31)
Z(t,Γk) = f (t ′Γk ,Γ j)
={rearranging}
f (t ′Γi− (t ′Γi− t ′Γk),Γk)
≥{by Property 7.5; note that t ′Γi− t ′Γk ≥ 0}
f (t ′Γi ,Γk)+(t
′
Γi− t ′Γk)
={by (7.2) and Πi =Πk =Π}
f (t ′Γi ,Γk)+Θi−Θk
={by (7.28)}
f (t ′Γi ,Γk)+Π−Θ j
={by (7.15) and Πk =Π}⌊
t ′Γi
Π
⌋
·Θk +Π−Θ j+
max
(
t ′Γi−Π
⌊
t ′Γi
Π
⌋
− (Π−Θk),0
)
(7.32)
For notational simplicity, we introduce the two terms below.
Φ′ = t ′Γi−Π
⌊
t ′Γi/Π
⌋
(7.33)
∆′ = max
(
Φ′− (Π−Θi),0
)
+max
(
Φ′− (Π−Θ j),0
)
−max(Φ′− (Π−Θk),0) (7.34)
Now, by (7.30), (7.31) and (7.32), we have
(Z(t,Γi)+Z(t,Γ j))− (t+Z(t,Γk))
≤
⌊
t ′Γi
Π
⌋
(Θi+Θ j−Θk)+Θ j−Θi−Π+Θ j− t+∆′
157
={rearranging and by (7.2) and (7.28)}⌊
t ′Γi/Π
⌋ ·Π+2Θ j−Θi−Π+∆′− (t ′Γi +Π−Θi)
={rearranging}⌊
t ′Γi/Π
⌋ ·Π− t ′Γi−2(Π−Θ j)+∆′
={by (7.33)}
∆′−Φ′−2(Π−Θ j). (7.35)
Given the derivation above, we can complete the proof by showing ∆′−Φ′−2(Π−Θ j)≤ 0. This result
is implied by the following claim.
Claim 7.4. In Case 3, ∆′−Φ′−2(Π−Θ j)< 0.
Proof. By (7.33), 0≤Φ′ <Π. Also, by (7.29), Π−Θ j ≤Π−Θi <Π−Θk. Given these ranges,
the following cases are exhaustive.
Case 3.1: Φ′ ∈ [0,Π−Θ j), which implies ∆′−Φ′−2(Π−Θ j) =−Φ′−2(Π−Θ j)< 0.
Case 3.2: Φ′ ∈ [Π−Θ j,Π−Θi), which implies ∆′−Φ′−2(Π−Θ j) =−3(Π−Θ j)< 0.
Case 3.3: Φ′ ∈ [Π−Θi,Π−Θk), which implies
∆′−Φ′−2(Π−Θ j)
={by (7.34)}
Φ′−3(Π−Θ j)− (Π−Θi)
<{in this case, Φ′ <Π−Θk holds}
(Π−Θk)−3(Π−Θ j)− (Π−Θi)
={rearranging}
−2Π+2Θ j +(Θi+Θ j−Π−Θk)
={by (7.28)}
−2Π+2Θ j
158
<{w j < 1 in the lemma statement implies Θ j <Π}
0.
Case 3.4: Φ′ ∈ [Π−Θk,Π), which implies
∆′−Φ′−2(Π−Θ j)
={by (7.34)}
−3Π+Θi+3Θ j−Θk
={rearranging}
−2Π+2Θ j +(Θi+Θ j−Π−Θk)
={by (7.28)}
−2Π+2Θ j
<{w j < 1 in the lemma statement implies Θ j <Π}
0.
Claim 7.4 and (7.35) together imply Z(t,Γi)+Z(t,Γ j)≤ t+Z(t,Γk), as required.
Based on Lemma 7.1, we prove the following theorem by induction.
Theorem 7.2. Given an arbitrary component C = (p,T ) such that (∀Γi ∈ T :: Πi =Π), C is dominated by
the MP-form component C′ = (p∗,T ∗) such that bw(C∗) = bw(C) and (∀Γi ∈ T ∗ :: Πi =Π).
Proof. We prove the theorem by induction on |T |.
Base Case: |T | ≤ 1. In this case, C and C∗ are identical, because bw(C∗) = bw(C) and (∀Γi ∈ T ∗ ::Πi =Π).
Therefore, by Definition 7.3, C∗ dominates C.
Inductive Step. Suppose the theorem holds for any component C such that |T | ≤ k where k ≥ 1. We prove
that it also holds for any component C such that |T |= k+1.
159
Because k ≥ 1, |T |= k+1≥ 2. Therefore, T has at least two VPs that are not dedicated processors. Let
Γi and Γ j be two arbitrary such VPs. Without loss of generality, assume 0< wi ≤ w j < 1.
To complete the proof, we show the existence of a component C′ = (p′,T ′) such that C′ has the same
bandwidth and period as C, but fewer VPs that are not dedicated processors, and psf∞(t,C)≤ psf∞(t,C′). C′
is constructed via three cases that hinge on the value of wi+w j.
Case 1: If 0 < wi +w j < 1, then let p′ = p and T ′ = T \{Γi,Γ j}∪{Γ′k} where Γ′k is a new VP such that
Π′k =Π and Θ
′
k =Θi+Θ j. Clearly, bw(C) = bw(C′). Also,
psf∞(t,C)−psf∞(t,C′)
={by (7.17)}
(p− p′)t+ ∑
Γl∈T
Z(t,Γl)− ∑
Γl∈T ′
Z(t,Γl)
=Z(t,Γi)+Z(t,Γ j)−Z(t,Γ′k)
≤{by Lemma 7.1}
0.
Case 2: If wi+w j = 1, then let p′ = p+1 and T ′ = T \{Γi,Γ j}. Clearly, bw(C) = bw(C′). Also,
psf∞(t,C)−psf∞(t,C′)
={by (7.17)}
(p− p′)t+ ∑
Γl∈T
Z(t,Γl)− ∑
Γl∈T ′
Z(t,Γl)
= − t+Z(t,Γi)+Z(t,Γ j)
≤{by Lemma 7.1}
0.
Case 3: If 1< wi+w j < 2, then let p′ = p+1 and T ′ = T \{Γi,Γ j}∪{Γk} where Γk is a new VP such that
Πk =Π and Θk =Θi+Θ j−Π. Clearly, bw(C) = bw(C′). Also,
psf∞(t,C)−psf∞(t,C′)
160
={by (7.17)}
(p− p′)t+ ∑
Γl∈T
Z(t,Γl)− ∑
Γl∈T ′
Z(t,Γl)
= − t+Z(t,Γi)+Z(t,Γ j)−Z(t,Γ′k)
≤{by Lemma 7.1}
0.
In all three cases, the following two expressions hold.
bw(C′) = bw(C) = bw(C∗) (7.36)
psf∞(t,C)≤ psf∞(t,C′) (7.37)
Also, in Cases 1 and 3, we have |T ′| = |T |− 1, while in Case 2, we have |T ′| = |T |− 2, so |T ′| ≤
|T |− 1 = (k+ 1)− 1 = k. Therefore, by (7.36) and by the inductive hypothesis, C′ is dominated by C∗.
Hence, by Definition 7.3,
psf∞(t,C′)≤ psf∞(t,C∗). (7.38)
By (7.37) and (7.38), psf∞(t,C)≤ psf∞(t,C∗). Also, since C∗ is in MP form, by Theorem 7.1, C∗ dominates
C.
The above theorem shows that, given a bandwidth and a common period shared by a set of asynchronous
VPs, a component’s supply is maximized when it is in MP form.
7.3.2 Different Periods
We now shift our focus by considering components that consist of a set of asynchronous VPs that may
have different periods. Specifically, we consider a component C = (p,T ), where for any two VPs Γi,Γ j in T ,
Πi 6=Π j may hold. We investigate whether such a component C is dominated by a component in MP with the
same bandwidth.
Towards this end, let C∗ be a component in MP form such that bw(C) = bw(C∗). To begin, note that
if bw(C) is an integer, then C∗ clearly dominates C, because C∗ has only dedicated processors that provide
161
22
11
10
10
1.1(1 + 𝜗)
10 − 𝜗
1
2𝜗 𝜗
Γ1 = (11,1.1(1 + 𝜗))
Γ2 = (10,10 − 𝜗)
Γ∗ = (10,1)
Dedicated Processor
Figure 7.7: Illustration of the counterexample in Section 7.3.
supply constantly. In the rest of this section, we consider the more interesting case wherein bw(C) is not an
integer. In this case, because C∗ is in MP form, |T ∗|= 1. Let Γ∗ = (Π∗,Θ∗) denote the lone VP in T ∗.
It is easy to see that, if C∗ is to dominate C, then the period Π∗ generally will be dependent on the periods
of the VPs in C. In particular, if Π∗ is selected to be very large in comparison to the periods of the VPs in
C, then Γ∗ may be unable to gurantee any supply over relatively long intervals in which the VPs in C do.
One obvious conjecture is that C∗ will dominate C as long as Π∗ ≤min{Πi |Γi ∈ T } holds. However, the
following counterexample shows that this conjecture is not true.
Counterexample. Consider a component C with these two VPs: Γ1 =(11,1.1(1+ϑ)) and Γ2 =(10,10−ϑ),
where ϑ is an arbitrary small positive real number, i.e., ϑ → 0+. An MP-form component C∗ with the
same bandwidth also has two VPs: a dedicated processor and Γ∗ = (10,1). Note that, in this setting,
Π∗≤min{Πi |Γi ∈T } holds. As illustrated in Figure 7.7, psf∞(22,C) = 1.1(1+ϑ)+22−4ϑ = 23.1−2.9ϑ ,
while psf∞(22,C∗) = 1+22 = 23. Because ϑ → 0+, 23.1−2.9ϑ > 23. That is, psf∞(22,C)> psf∞(22,C∗),
which implies that C∗ does not dominate C.
Despite the negative implications of this counterexample, we show next that C∗ does indeed dominate C
if Π∗ is further restricted.
Theorem 7.3. C is dominated by the MP-form component C∗ as defined above as long asΠ∗≤ 12 min{Πi |Γi ∈
T }.
162
Proof. Given C, we first construct a new component C′ such that p′ = p and |T ′|= |T |. Each Γ′i = (Π′i,Θ′i) ∈
T ′ is constructed from the VP Γi = (Πi,Θi) ∈ T by defining Π′i = Π∗ and Θ′i = ΘiΠ
∗
Πi . These definitions
imply
w′i =
Θ′i
Π′i
=
Θi
Πi
= wi. (7.39)
By Property 7.2,
Z(t,Γ′i)≥max{(t−2(Π′i−Θ′i))w′i,0}
={by (7.39) and because Π′i =Π∗}
max{(t−2Π∗(1−wi))wi,0}
≥{because Π∗ ≤ 1
2
min{Πi |Γi ∈ T }}
max{(t−Πi(1−wi))wi,0}.
On the other hand, by Property 7.3,
Z(t,Γi)≤max{(t− (Πi−Θi))wi,0}
=max{(t−Πi(1−wi))wi,0},
from which we can conclude the following.
(∀i : 1≤ i≤ |T |= |T ′| :: Z(t,Γi)≤ Z(t,Γ′i)) (7.40)
Also, p = p′, and therefore, by (7.17)
psf∞(t,C)≤ psf∞(t,C′). (7.41)
Because (∀Γ′i :: Π′i =Π∗) holds, and by (7.39), bw(C′) = p′+∑Γ′i∈T ′ w′i = p+∑Γi∈T wi = bw(C) = bw(C∗)
holds, C′ is a component in which all VPs share the same period, and its bandwidth equals the bandwidth of
the MP-form component C∗. Therefore, by Theorem 7.2, C∗ dominates C′. By Definition 7.3, this implies
psf∞(t,C′)≤ psf∞(t,C∗). (7.42)
163
By (7.41) and (7.42), psf∞(t,C)≤ psf∞(t,C∗) holds, so by Theorem 7.1, the MP-form component C∗ domi-
nates C.
In some cases, the dominance of C∗ over C can be established with a weaker restriction on the period
Π∗. The following theorem gives such a case; note that harmonic and loose-harmonic2 periods satisfy the
condition given in this theorem.
Theorem 7.4. For the component C = (p,T ) defined above, let Πmin = min{Πi |Γi ∈ T }. If the condition
(∀Γi ∈ C ::Πi =Πmin∨Πi ≥ 2Πmin) holds, then C is dominated by the MP-form component C∗ as defined
above if Π∗ is set equal to Πmin.
Proof. We construct C′ in the same way as in the proof of Theorem 7.3 such that Π′i =Π∗ =Πmin and Θ′i =
ΘiΠ
∗
Πi . Given the statement of Theorem 7.4, we have for each i, Πi =Π
∗ =Π′i or Πi ≥ 2Πmin = 2Π∗ = 2Π′i.
In the former case, Z(t,Γi) = Z(t,Γ′i) holds; in the latter case, Z(t,Γi)≤ Z(t,Γ′i) can be shown to follow from
Properties 7.2 and 7.3 using the same reasoning as in the proof of Theorem 7.3. Thus, we can establish
(7.40) in the context of this new theorem, and then show exactly as done in the proof of Theorem 7.3 that C∗
dominates C.
7.4 Synchronous and Concrete Asynchronous
In this section, we consider components consisting of VPs with specified phases, i.e., both concrete
asynchronous and synchronous VPs. These VPs may have either a common period or different periods.
The case of synchronous VPs and a common period is highly related to the MPR model (Shin and Lee,
2003), as that model enforces both of these requirements. The following theorem is easily implied by prior
work on the MPR model (Xu et al., 2015) that shows that, by enforcing MP form, a component abstracted by
the MPR model achieves its maximum supply.
Theorem 7.5. (Follows from (Xu et al., 2015)) If C = (p,T ) is a synchronous component and (∀Γi ∈ T ::
Πi =Π) holds, then it is dominated by the MP-form component C′ = (p∗,T ∗), where bw(C∗) = bw(C) and
(∀Γi ∈ T ∗ :: Πi =Π).
Because synchronous VPs are a special case of concrete asynchronous VPs where all VP phases happen
to be zero, one might expect that Theorem 7.5 can be extended to concrete asynchronous VPs, and speculate
2The smallest period divides any larger period.
164
32
Π
Γ1 = (Π, 𝜗)
𝜙2 =
1
2
Π
Γ∗ = (Π, 2𝜗)
𝜙1 = 0
Γ2 = (Π, 𝜗)
𝜗
2𝜗
2(Π − 2𝜗)
Figure 7.8: Illustration of the counterexample in Section 7.4.
that an arbitrary concrete asynchronous component with a common period is dominated by the MP-form
component of the same bandwidth and period. However, this is unfortunately not true.
Counterexample. Consider a non-MP-form component C that has two VPs, Γ1 = (Π,ϑ) and Γ2 = (Π,ϑ),
with a common period and arbitrarily small budget, i.e., ϑ → 0+. Suppose these two VPs have different
phases, φ1 = 0 and φ2 = 12Π, as shown in Figure 7.8. Observe that any time interval of length
3
2Π must
include exactly one period of allocation of Γ1 or Γ2. Therefore, psf1(32Π,C)≥ ϑ . In contrast, consider the
MP-form counterpart of C: C∗ = {Γ∗}, where Γ∗ = (Π,2ϑ). By the worst case illustrated in Figure 7.8,
psf1(t,C∗) = 0 holds for any t ≤ 2(Π− 2ϑ). Because ϑ → 0+, we have 32Π < 2(Π− 2ϑ). Therefore,
psf1(
3
2Π,C∗) = 0< ϑ ≤ psf1(32Π,C), which implies that C∗ does not dominate C.
Nonetheless, we provide next a theorem that shows that a component in non-MP-form will still be
dominated by an MP-form component of the same bandwidth, provided the period of the latter is properly
selected. Furthermore, the required period selection is valid not only for concrete asynchronous VPs with a
common period, but also for synchronous or concrete asynchronous VPs with different periods. The theorem
is stated assuming concrete asynchronous VPs, a category that subsumes these other possibilities.
Theorem 7.6. If C = (p,T ) is a concrete asynchronous component, then it is dominated by the MP-form
component C′ = (p∗,T ∗), where bw(C∗) = bw(C), provided the following condition holds: if |T ∗|= 1, then
165
𝑡𝑡0 + 𝑡𝑡0
𝜓1
𝜓2
𝑍(𝑡, Γ𝑙)
𝑤1 =
2
3
𝑤2 =
1
2
supply
at most
𝑡 ∙ 𝑤1
supply
at most
𝑡 ∙ 𝑤2
Figure 7.9: A possible scenario for any concrete phases.
the period of the lone VP in T must satisfy
Π∗ ≤ Πl(1−wl)wl
2(1−w∗)w∗ , (7.43)
where l is defined by
Πl−Θl = min{Πi−Θi | Γi ∈ T }. (7.44)
Proof. Because C∗ is in MP form, |T | ≤ 1 holds. If |T |= 0 holds, then C∗ has dedicated processors only.
Because bw(C∗) = bw(C) is assumed, this clearly implies that C∗ dominates C. In the rest of the proof, we
focus on the more interesting case wherein |T ∗| = 1 holds. In this case, bw(C∗) is not integral, so bw(C)
is also not integral. This implies that |T | > 0 holds. We now show that psf∞(t,C) ≤ psf∞(t,C∗) holds by
considering two cases.
Case 1: t ≤ Πl −Θl . By Claim 7.1 and (7.44), any VP Γi in T can provide zero supply within any time
interval of length t, where t ≤Πl−Θl . Within any such time interval, the p dedicated processors of C provide
supply continually. Because C∗ is in MP form, p≤ p∗ holds. Therefore, psf∞(t,C) = t · p≤ t · p∗≤ psf∞(t,C∗).
Case 2: t > Πl −Θl . Let Γl be a VP such that Πl −Θl = min{Πi−Θi | Γi ∈ T }. Then, the allocations
described next and illustrated in Figure 7.9 are possible for any concrete VP phases (i.e., synchronous or
166
concrete asynchronous). Let t0 be a time instant such that Γl gets its minimal supply Z(t,Γl) within the time
interval [t0, t0+ t). For any other VP Γ j, where j 6= l, let ψ j denote the distance from t0 to the start of its next
allocation period, i.e., the next allocation period of Γ j at or after time t0 starts at time t0 +ψ j. (Note that
the value of ψ j will depend on the phase of Γ j.) In this possible allocation sequence, if Γ j has an allocation
period that includes t0 (as depicted), then assume that it provides a supply of (Π j−ψ j) ·w j time units within
that allocation period before t0, i.e., in [t0− (Π j−ψ j), t0). Regardless of whether Γ j has an allocation period
that includes t0, assume that it provides supply as late as possible in each of its allocation periods beyond
time t0. It is easy to show that, in this situation, each Γ j provides a supply of at most t ·w j time units during
[t0, t0+ t). By Definition 7.2, the PSF functions capture the minimum allocation that can occur, which is upper
bounded by that demonstrated in the possible allocation sequence just discussed. Therefore, we have
psf∞(t,C)
≤ t · p+Z(t,Γl)+ ∑
Γ j∈T ∧ j 6=l
t ·w j
≤{by Property 7.3}
t · p+max{wl · (t− (Πl−Θl)),0}+ ∑
Γ j∈T ∧ j 6=l
t ·w j.
≤{by our assumption in Case 2 that t >Πl−Θl holds}
t · p+wl · (t− (Πl−Θl))+ ∑
Γ j∈T ∧ j 6=l
t ·w j.
={rearranging}
t ·
(
p+ ∑
Γi∈T
wi
)
−wl · (Πl−Θl)
={by (7.5) and the definition of wl}
t ·bw(C)−Πl(1−wl)wl. (7.45)
By Claim 7.2 and our assumption that |T ∗|= 1 holds, we have
psf∞(t,C∗)
= t · p∗+Z(t,Γ∗)
≥{by Property 7.2}
167
t · p∗+max{w∗ · (t−2(Π∗−Θ∗)),0}
≥{because max{x,y} ≥ x}
t · p∗+w∗ · (t−2(Π∗−Θ∗))
={rearranging and using the definition of w∗}
t · (p∗+w∗)−2Π∗(1−w∗)w∗
={by (7.5)}
t ·bw(C∗)−2Π∗(1−w∗)w∗. (7.46)
By (7.45) and (7.46),
psf∞(t,C)−psf∞(t,C∗)
≤{because bw(C) = bw(C∗)}
2Π∗(1−w∗)w∗−Πl(1−wl)wl
≤{by (7.43)}
0.
That is, psf∞(t,C)≤ psf∞(t,C∗) for t >Πl−Θl .
Combining Cases 1 and 2, we have psf∞(t,C)≤ psf∞(t,C∗) for any t ≥ 0. Also, C∗ is in MP form. Thus,
by Theorem 7.1, C∗ dominates C.
7.5 Indomitability of MP Form
Although we have shown that an arbitrary component can always be dominated by a component in MP
form with the same bandwidth, this result requires restrictions on the period of the MP-form component in
some cases. This raises the question of whether the dominance is really due to the definition of MP form or
just side effect of the period restrictions. In this section, we address this question. We show that an MP-form
component can never be dominated by a non-MP-form component of the same bandwidth, regardless of any
restrictions that may be applied to the non-MP-form component.
168
The following theorem holds, regardless of whether the VPs are synchronous, concrete asynchronous, or
non-concrete asynchronous.
Theorem 7.7. Given an MP-form component C∗ and an arbitrary non-MP-form component C such that
bw(C∗) = bw(C) holds, C does not dominate C∗, no matter how {Πi | Γi ∈ C} is defined.
Proof. Let p and p∗ denote the number of dedicated processors in C and C∗, respectively. Because C∗ is
in MP form and bw(C) = bw(C∗) holds, we have p ≤ p∗. We consider the two cases p < p∗ and p = p∗
separately below.
Case 1: p < p∗. By Claim 7.1, regardless of the VPs’ phases, the supply of each VP Γi ∈ T can be zero
for any time interval of length t such that 0< t ≤Πi−Θi, so psf∞(t,C) = t · p for any t such that 0< t ≤ ts,
where ts = min{Πi − Θi |Γi ∈ T }. On the other hand, psf∞(t,C∗)≥ t · p∗ for any t > 0. Thus, for any t such
that 0 < t ≤ ts, we have psf∞(t,C) = t · p < t · p∗ ≤ psf∞(t,C∗), i.e., psf∞(t,C) < psf∞(t,C∗). Note that the
stated range for t is not vacuous. This is because C is not in MP form, which implies that |T | > 0 holds,
and hence that ts > 0 holds as well. Because psf∞(t,C) < psf∞(t,C∗) holds, by Definition 7.3, C does not
dominate C∗.
Case 2: p = p∗. In this case, we have |T ∗|= 1, because if |T ∗|= 0 holds, then either C is also in MP form
or bw(C)> bw(C∗), neither of which is allowed by the statement of the theorem. Let Γ∗ denote the lone VP
in C∗ and let w∗ denote its bandwidth. Then, w∗ = ∑Γi∈T wi, since bw(C∗) = bw(C). Also, because C is not
in MP form, by Definition 7.1, both |T | ≥ 2 and (∀Γi ∈ T :: wi > 0) hold. Therefore, (∀Γi ∈ T :: wi < w∗).
Letting wmax = max{wi | Γi ∈ T }, this implies
wmax < w∗. (7.47)
Let δ be the greatest common divisor of the values in {Πi | Γi ∈ T }. Then, the processor-time allocation
illustrated in Figure 7.10, where every VP provides δ ·wi time units of processor time at the end of every
aligned time window of δ time units, is possible regardless of any assumptions regarding the VPs’ phases.
This is because, in this schedule, each VP Γi is allocated Θi time units within any time interval of length Πi.
Such an allocation satisfies the specification of Γi regardless of how phases are defined. Under this allocation
pattern, each VP other than the one with the maximum bandwidth wmax provides all of its supply in parallel
with that maximum-bandwidth VP. Furthermore, with the depicted allocations, the minimum supply during
169
Π1 = 4𝛿
Π2 = 3𝛿
Π3 = 5𝛿
4𝛿 ∙ 𝑤1
3𝛿 ∙ 𝑤2
5𝛿 ∙ 𝑤3
𝛿
Γ1 = (Π1, Π1 ∙ 𝑤1)
Γ2 = (Π2, Π2 ∙ 𝑤2)
Γ3 = (Π3, Π3 ∙ 𝑤3)
𝑡
𝑡
𝛿
𝛿𝑤𝑚𝑎𝑥
psf1 𝑡, 𝒯 ≤
max 𝑡 −
𝑡
𝛿
𝛿 − 1 − 𝑤𝑚𝑎𝑥 𝛿, 0
𝛿𝑤𝑚𝑎𝑥 = 𝛿𝑤1
𝑤3 < 𝑤2 < 𝑤1 =𝑤𝑚𝑎𝑥
Figure 7.10: Illustration of Case 2 of Theorem 7.7.
170
any time interval of length t with a degree of parallelism of one is
⌊ t
δ
⌋
δwmax+max{t−
⌊ t
δ
⌋
δ − (1−wmax)δ ,0} ≤ t ·wmax.
Because the PSF functions, by Definition 7.2, capture the worst case among all possible allocation
scenarios,
psf1(t,T )≤ t ·wmax. (7.48)
Therefore, given that C has p dedicated processors,
psf p+1(t,C) = t p+psf1(t,T )≤ t(p+wmax). (7.49)
On the other hand, for C∗ , for any t ≥ 2(Π∗−Θ∗) = 2Π∗(1−w∗), by (7.17)
psf p∗+1(t,C∗)
={by (7.8) and because C∗ is in MP form}
psf∞(t,C∗)
={by (7.17)}
t p∗+Z(t,Γ∗)
≥{by Property 7.2, and since t ≥ 2(Π∗−Θ∗)}
t p∗+w∗(t−2(Π∗−Θ∗))
={rearranging and using w∗ =Θ∗/Π∗}
t(p∗+w∗)−2Π∗w∗(1−w∗).
Because p = p∗ holds in Case 2,
psf p+1(t,C∗)≥ t(p+w∗)−2Π∗w∗(1−w∗). (7.50)
By (7.49) and (7.50), for any t ≥ 2Π∗(1−w∗),
psf p+1(t,C∗)−psf p+1(t,C)≥ t(w∗−wmax)−2Π∗w∗(1−w∗).
171
Hence, by (7.47), for any t > 2Π
∗w∗(1−w∗)
w∗−wmax > 2Π
∗(1−w∗), psf p+1(t,C)< psf p+1(t,C∗).
Thus, by Definition 7.3, C does not dominate C∗. Note that the above argument is valid regardless of the
definition of {Πi | Γi ∈ C}.
Theorem 7.7 shows that, no matter how the periods of a non-MP-form component are defined, it cannot
dominate any component in MP form with the same total bandwidth.
7.6 Chapter Summary
In this chapter, we studied processor allocations to components comprised of multiple VPs, which
may be synchronous, concrete asynchronous, or non-concrete asynchronous. We showed that any arbitrary
component is always dominated by an MP-form component of the same bandwidth, provided the period used
in defining the MP-form component meets certain requirements. We also showed that a component in MP
form can never be dominated by any non-MP-form component of the same bandwidth, regardless of how
periods are defined.
172
CHAPTER 8: CONCLUSION
Real-time scheduling analysis is crucial for time-critical systems, in which provable timing guarantees
are more important than observed raw performance. Although significant work has been done with respect to
real-time scheduling on symmetric multiprocessor platforms, the same is not true for asymmetric ones. The
main objective of the research presented in this dissertation was to provide fundamental results and analysis
techniques to mitigate this insufficiency in the literature. Towards this goal, we designed and analyzed a
few real-time scheduling algorithms under several system models, addressing asymmetric multiprocessor
platforms due to differing processor speeds, processor functionalities, or virtualization, respectively. In
the following, we summarize the results presented in this dissertation (Section 8.1), briefly describe other
publications by the author that have been done in parallel with but beyond the scope of this dissertation
(Section 8.2), and discuss future work (Section 8.3).
8.1 Summary of Results
Focusing on real-time scheduling on asymmetric multiprocessor platforms, the results presented in this
dissertation can be summarized as follows.
Results regarding the SRT-optimality of G-EDF on uniform multiprocessors. In Chapter 3, we answe-
red the question: is G-EDF is SRT-optimal on uniform multiprocessors? The answer was different from that
for the similar question with respect to identical multiprocessors, where both preemptive and non-preemptive
G-EDF are SRT-optimal (Devi and Anderson, 2008). By providing a counterexample, we showed that
non-preemptive G-EDF is not SRT-optimal for uniform multiprocessors. On the other hand, by developing a
new framework to bound tardiness, we proved that preemptive G-EDF is indeed SRT-optimal for uniform
multiprocessors. Both results in fact apply to a broader range of problems beyond the G-EDF scheduling of
sporadic tasks. The negative result applies to any work-conserving non-preemptive scheduling algorithm,
whereas the positive result applies the VPP task model, which is a more general task model than the sporadic
task model.
173
Two semi-partitioned scheduling algorithms for uniform multiprocessors. In Chapter 4, we presented
two semi-partitioned scheduling algorithms designed for uniform multiprocessors, namely EDF-sh and
EDF-tu. Both algorithms limit the number of migrating tasks: for scheduling n tasks on m uniform processors,
EDF-sh allows at most m−1 tasks to migrate while EDF-tu allows m. EDF-sh further requires these migrating
tasks to migrate at job boundaries only, at the cost of supporting SRT tasks only and being not SRT-optimal.
In contrast, EDF-tu is SRT-optimal and includes a tunable parameter, the frame size. For any positive value
of the frame size, tardiness is guaranteed to be at most this value. Furthermore, if the frame size divides all
task periods, EDF-sh becomes HRT-optimal and ensures zero tardiness.
Allowing intra-task parallelism on uniform multiprocessors. In Chapter 5, we introduced a new task
model, called the npc-sporadic task model. In contrast to the conventional sporadic task model where jobs
of the same task must execute in sequence, such jobs are allowed to execute in parallel in the npc-sporadic
task model. For scheduling npc-sporadic tasks on a uniform multiprocessor, the HRT-feasibility condition
is the same as that for the conventional sporadic task model; however, the SRT-feasibility condition for the
npc-sporadic model merely requires that the system is not overutilized while the rather complicated per-task
utilization constraint for the conventional sporadic task model can be eliminated. We further showed that
both preemptive and non-preemptive G-EDF are SRT-optimal on uniform multiprocessors for scheduling
npc-sporadic task systems. Preemptive G-EDF is more greedy in executing jobs on faster processors and
hence has a better response-time bound, at the expense of potentially greater preemption and migration
frequencies. On the other hand, non-preemptive G-EDF does not preempt or migrate jobs, but its guaranteed
response-time bounds are relatively higher.
Techniques for deriving and improving end-to-end response-time bounds for DAG-based task systems
on unrelated heterogeneous platforms. In Chapter 6, we addressed the problem of scheduling real-time
dataflows on heterogeneous CEs. We formalized this problem by representing each dataflow by a DAG, the
nodes (resp., edges) of which represent tasks (resp., producer/consumer relationships). We then presented
task-transformation techniques to provide end-to-end response-time bounds for such DAG-based task systems
implemented on heterogeneous multiprocessor platforms. We further presented an LP-based method for
setting relative deadlines that can be applied to improve these bounds. In addition, the early-releasing
technique was shown to be able to improve observed end-to-end response times while not compromising
their analytical bounds. For systems where multiple DAGs are structurally identical, which is the case in our
174
targeted application domain, we also presented a DAG combining technique that was shown to be able to
further improve the end-to-end response-time bounds for DAGs.
Dominance of MP-form supply on virtual multiprocessor platforms. In Chapter 7, we focused on VP
allocation schemes for constituting a virtual multiprocessor platform, or a component, on a symmetric
physical multiprocessor platform. Even for a designated component capacity and a given physical platform,
there are an infinite number of VP allocation schemes to apply and a choice must be made. Furthermore, the
allocation periods of VPs may be synchronous, concrete asynchronous, or non-concrete asynchronous, and
the sizes of such periods may be the same for all VPs or may vary from one VP to another. In each of these
cases, a VP allocation scheme, called MP form, is shown to dominate any other scheme. Under MP form,
each component is allocated at most one partially available processor, with all other processors allocated to
it being fully available. Specifically, we showed that any arbitrary component is always dominated by an
MP-form component of the same bandwidth, provided the period used in defining the MP-form component
meets certain requirements. We also showed that a component in MP form can never be dominated by any
non-MP-form component of the same bandwidth, regardless of how periods are defined.
8.2 Other Publications
The following is a brief summary of other work done by the author during his doctoral studies.
Supporting real-time computer-vision workloads using OpenVX.1 For computer-vision algorithms,
graphics processing units (GPUs) are a particularly compelling accelerator to consider, as GPUs are well suited
for efficiently performing the matrix-oriented computations inherent in many computer-vision applications.
To ease the development of such applications on heterogeneous platforms such as those in which GPUs
are employed, and to enable system-level optimization (Rainey et al., 2014), a standard, called OpenVX,
has been created and ratified (Khronos Group, 2014). In OpenVX, a set of basic operations, or primitives,2
commonly used in computer-vision algorithms are provided to programmers. OpenVX also defines a set
1Details of this contribution have been published in the following papers:
Elliott, G., Yang, K., and Anderson, J. (2015). Supporting real-time computer vision workloads using OpenVX on multicore+GPU
platforms. In Proceedings of the 36th IEEE Real-Time Systems Symposium, pages 273–284.
Yang, K., Elliott, G., and Anderson, J. (2015). Analysis for supporting real-time computer vision workloads using OpenVX on
multicore+GPU platforms. In Proceedings of the 23rd International Conference on Real-Time Networks and Systems, pages 77–86.
2In OpenVX, these basic operations are called “kernels.”
175
of data objects,3 and has a graph-based execution model. The programmer constructs a computer-vision
algorithm by instantiating primitives as nodes and data objects as parameters and binding parameters to node
inputs and outputs. Node dependencies (i.e., edges) are not explicitly declared. Rather, the structure of a
graph is derived from how parameters are bound to nodes. A video frame is processed by executing such an
OpenVX graph end-to-end. Since each node may use a mix of the processing elements of a heterogeneous
platform, OpenVX enables a single computer-vision algorithm to execute across CPUs, GPUs, DSPs, etc..
However, the original OpenVX specification does not fit in any existing real-time task model and
therefore no real-time analysis framework can apply directly. In (Elliott et al., 2015; Yang et al., 2015), we
modified an OpenVX modification by NVIDIA, called VisionWorks®, to enable modeling each node in an
OpenVX graph as a sporadic task. We then showed that, for acyclic OpenVX graphs, real-time scheduling and
analysis techniques for DAG-based systems (Liu and Anderson, 2010) can be applied to derive an end-to-end
response-time bound for each DAG. However, in some OpenVX graphs, cycles may exist due to delay edges,
which specify data dependencies on the processing of prior frames. To address this problem, we presented
techniques to refine a cyclic OpenVX graph to be acyclic while preserving all precedence constraints among
tasks. Finally, in contrast to the original OpenVX specification that requires a graph to execute end-to-end
before it may be executed again, our modification enabled graph pipelining. To support this, we resolved a
data-object overwriting problem by replicating data objects and we derived an upper bound on the number of
needed object replicas.
Multiprocessor real-time locking protocols for replicated resources.4 In a real-time system, processors
may not be the only resource tasks share. Many other resources in the system, such as memory objects and
I/O devices, may also be shared among tasks. Algorithms designed for managing such non-processor shared
resources are called real-time locking protocols.
Most real-time locking protocols support only non-replicated resources, i.e., mutual exclusion is as-
sumed for each lock. A few k-exclusion real-time locking protocols were designed for sharing replicated
resources (Brandenburg and Anderson, 2011; Elliott and Anderson, 2011; Yang et al., 2013). However, all
of them assume that each task may request only one replica at a time. In (Nemitz et al., 2016), we devised
3Types of data objects include simple data structures such as scalars, arrays, matrices, and images as well as higher-level data objects
common to computer-vision algorithms such as histograms, image pyramids, and lookup tables.
4Details of this contribution have been published in the following paper:
Nemitz, C., Yang, K., Yang, M., Ekberg, P., and Anderson, J. (2016). Multiprocessor real-time locking protocols for replicated
resources. In Proceedings of the 28th Euromicro Conference on Real-Time Systems, pages 50–60.
176
multiprocessor real-time locking protocols for allocating replicated resources where individual tasks may
request multiple replicas. We identified an allocation problem and an assignment problem related to the
design of such a protocol and provided algorithms to solve each problem, respectively. Specifically, we
presented a wait-free algorithm that can be applied to any replica-allocation protocol to solve the assignment
problem. To solve the allocation problem, we presented both overhead-optimized and blocking-optimized
replica-allocation protocols. For the overhead-optimized protocol, we provided a holistic blocking analysis
technique that mitigates some of the pessimism in conventional blocking analysis; for the blocking-optimized
protocol, we employed a cutting-ahead mechanism for which blocking is asymptotically optimal.
Uniprocessor mixed-criticality scheduling with permitted failure probability.5 While the WCET ab-
straction plays an important role in the analysis of real-time systems, its estimation can be extremely difficult.
As a result, even for a single piece of code, a broad range of estimates can be made for its WCET with
different degrees of confidence. The higher the confidence, the greater the pessimism usually is as assumed
in analysis, resulting in a larger WCET estimate. For some real-time systems, provisioning such WCETs
by relatively pessimistic estimates may cause the underlying platform to be significantly under-utilized; on
the other hand, provisioning such WCETs by relatively optimistic estimates may not cover the actual worst
case and therefore violate safety requirements. As a solution, mixed-criticality (MC) scheduling has been
proposed (Vestal, 2007).
In much work regarding MC scheduling (see (Burns and Davis, 2018) for a survey), tasks are categorized
to two criticality levels, HI and LO. Each HI-task has two provisioned WCET estimates, a more pessimistic
one and a more optimistic one. In contrast, each LO-task has one provisioned optimistic WCET only. The
goal of MC scheduling is to allow both HI- and LO-tasks to complete in normal cases, while sacrificing
LO-tasks for HI-tasks to complete in extreme cases. Thus, many MC scheduling algorithms drop all LO-jobs
when any single HI-job overruns its optimistic WCET, so that every HI-job thereafter can be guaranteed to
execute for up to its pessimistic WCET before its deadline. However, most such work was based on the
assumption that all HI-tasks may have an active job overrunning its optimistic WCET simultaneously. This
could be of extremely low probability that is not necessarily to be covered in the system design. In (Guo et al.,
2015), we addressed this issue by introducing a new parameter, called failure probability, for each HI-task.
5Details of this contribution have been published in the following paper:
Guo, Z., Santinelli, L., and Yang, K. (2015). EDF schedulability analysis on mixed-criticality systems with permitted failure
probability. In Proceedings of the 21st IEEE International Conference on Embedded and Real-Time Computing Systems and
Applications, pages 187–196.
177
We then designed an EDF-based uniprocessor MC scheduling algorithm, under which the probability of
missing any deadline of a HI-job is no greater than a specified permitted system failure probability for any
system deemed schedulable by our presented schedulability tests.
8.3 Future Work
In this last section, we discuss several promising directions for future work following this dissertation.
Improving the tardiness bound under G-EDF. In Chapter 3, we proved a tardiness bound under preemp-
tive G-EDF for scheduling a set of sporadic tasks on a uniform multiprocessor. However, neither this
bound nor those derived by Devi (2006) and Erickson (2014) with respect to G-EDF scheduling on identical
multiprocessors is known to be tight. That is, we were not able to construct a system in which observed
maximum tardiness in a simulated schedule matches our analytical tardiness bound. In fact, we believe
none of these bounds is actually tight. Future work could be directed at developing new frameworks and/or
techniques to improve these bounds.
Tardiness bounds under global fixed-priority scheduling for npc-sporadic tasks. In most work regar-
ding tardiness bounds, only dynamic-priority scheduling algorithms, such as G-EDF, are considered, perhaps
because tardiness can easily grow without bound for global fixed-priority scheduling (Devi, 2006). However,
this is true only if the conventional sporadic task model is assumed. In Chapter 5, we showed that, for the
same task parameters, some task systems that are infeasible under the conventional sporadic task model can
become feasible if the npc-task model is applied. Thus, we conjecture that tardiness is bounded under any
fixed-priority scheduler for any feasible npc-sporadic task system. This conjecture needs future work to
verify, and such work could be conducted for identical multiprocessors first and then extended to uniform
multiprocessors.
Dynamic asymmetric multiprocessor platforms. As computing hardware continually evolves, it is not
hard to foresee that processing platforms for many future real-time systems might be not only asymmetric
but also dynamically varying during runtime. In fact, the dynamic voltage and frequency scaling (DVFS)
technology may be able to allow speeds to vary during runtime. Furthermore, many embedded real-time
systems are built with field programmable gate arrays (FPGAs), which are integrated circuits that can
be configured to perform different functionalities. Some modern FPGAs are capable of dynamic partial
reconfiguration, which allows one portion of an FPGA chip to be reconfigured while the rest keeps running.
178
This may result in a multiprocessor platform where processors can dynamically change their functionalities
during runtime. As for virtualization, dynamically reallocating hardware resources for VPs is commonly
supported by modern hypervisors. Future work could be directed at these scenarios.
Real-time locking protocols for asymmetric multiprocessors. Most existing real-time locking protocols
were designed and analyzed for uniprocessors and symmetric multiprocessors. Such protocols and analyses
are either inapplicable or overly pessimistic on asymmetric platforms. For example, considering that
processors may have different speeds, the length of a critical section can be either scaled by the speeds (e.g.,
regular CPU computation using a shared data object) or not scaled by the speeds (e.g., accessing an external
I/O device). While these two kinds of critical sections are considered the same in most existing real-time
locking protocols and analyses, it intuitively may be better to treat them separately when different speeds are
considered: scaled critical sections should tend to be scheduled on fast processors in order to reduce the time
duration for which they may block other tasks; non-scaled critical sections should tend to be scheduled on
slow processors in order to reserve fast processors to other tasks in need. Using this intuition to develop new
real-time multiprocessor locking protocols and blocking analyses could be another direction for future work.
179
BIBLIOGRAPHY
Abeni, L. and Buttazzo, G. (1998). Integrating multimedia applications in hard real-time systems. In
Proceedings of the 19th IEEE Real-Time Systems Symposium, pages 4–13.
Anderson, J., Bud, V., and Devi, U. (2005). An EDF-based scheduling algorithm for multiprocessor soft
real-time systems. In Proceedings of the 17th Euromicro Conference on Real-Time Systems, pages
199–208.
Anderson, J., Erickson, J., Devi, U., and Casses, B. (2016). Optimal semi-partitioned scheduling in soft
real-time systems. Journal of Signal Processing Systems, 84(1):3–23.
Andersson, B., Bletsas, K., and Baruah, S. (2008). Scheduling arbitrary-deadline sporadic task systems on
multiprocessors. In Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 385–394.
Andersson, B. and Tovar, E. (2006). Multiprocessor scheduling with few preemptions. In Proceedings of the
12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications,
pages 322–334.
ARM (2018). Technologies—big.LITTLE. http://www.arm.com/products/processors/
technologies/biglittleprocessing.php.
Bajaj, R. and Agrawal, D. (2004). Scheduling multiple task graphs in heterogeneous distributed real-time
systems by exploiting schedule holes with bin packing techniques. IEEE Transactions on Parallel and
Distributed Systems, 15(2):107–118.
Baker, T. and Baruah, S. (2009). An analysis of global EDF schedulability for arbitrary-deadline sporadic
task systems. Real-Time Systems, 43(1):3–24.
Baruah, S. (2014). Improved multiprocessor global schedulability analysis of sporadic DAG task systems. In
Proceedings of the 26th Euromicro Conference on Real-Time Systems, pages 97–105.
Baruah, S. (2015a). The federated scheduling of constrained-deadline sporadic DAG task systems. In
Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition, pages
1323–1328.
Baruah, S. (2015b). Federated scheduling of sporadic DAG task systems. In Proceedings of the 29th IEEE
International Parallel and Distributed Processing Symposium, pages 179–186.
Baruah, S., Bonifaci, V., Marchetti-Spaccamela, A., Stougie, L., and Wiese, A. (2012). A generalized parallel
task model for recurrent real-time processes. In Proceedings of the 33rd IEEE Real-Time Systems
Symposium, pages 63–72.
Baruah, S. and Carpenter, J. (2005). Multiprocessor fixed-priority scheduling with restricted interprocessor
migrations. Journal of Embedded Computing, 1(2):169–178.
Baruah, S., Cohen, N., Plaxton, C., and Varvel, D. (1996). Proportionate progress: A notion of fairness in
resource allocation. Algorithmica, 15(6):600–625.
Baruah, S., Funk, S., and Goossens, J. (2003). Robustness results concerning edf scheduling upon uniform
multiprocessors. IEEE Transactions on Computers, 52(9):1185–1195.
Bastoni, A., Brandenburg, B., and Anderson, J. (2011). Is semi-partitioned scheduling practical? In
Proceedings of the 23rd Euromicro Conference on Real-Time Systems, pages 125–135.
180
Bhatti, M., Belleudy, C., and Auguin, M. (2012). A semi-partitioned real-time scheduling approach for
periodic task systems on multicore platforms. In Proceedings of the 27th Annual ACM Symposium on
Applied Computing, pages 1594–1601.
Bini, E., Bertogna, M., and Baruah, S. (2009a). Virtual multiprocessor platforms: Specification and use. In
Proceedings of the 30th IEEE Real-Time Systems Symposium, pages 437–446.
Bini, E., Buttazzo, G., and Bertogna, M. (2009b). The multi supply function abstraction for multiprocessors.
In Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing
Systems and Applications, pages 294–302.
Bletsas, K. and Andersson, B. (2009). Notional processors: An approach for multiprocessor scheduling. In
Proceedings of the 15th IEEE Real-Time and Embedded Technology and Applications Symposium, pages
3–12.
Bletsas, K. and Andersson, B. (2011). Preemption-light multiprocessor scheduling of sporadic tasks with
high utilisation bound. Real-Time Systems, 47(4):319–355.
Bonifaci, V., Marchetti-Spaccamela, A., Stiller, S., and Wiese, A. (2013). Feasibility analysis in the sporadic
DAG task model. In Proceedings of the 25th Euromicro Conference on Real-Time Systems, pages
225–233.
Brandenburg, B. and Anderson, J. (2011). Real-time resource-sharing under clustered scheduling: Mutex,
reader-writer, and k-exclusion locks. In Proceedings of the ACM International Conference on Embedded
Software, pages 69–78.
Brandenburg, B. and Gu¨l, M. (2016). Global scheduling not required: Simple, near-optimal multiprocessor
real-time scheduling with semi-partitioned reservations. In Proceedings of the 37th IEEE Real-Time
Systems Symposium, pages 99–110.
Burmyakov, A., Bini, E., and Tovar, E. (2014). Compositional multiprocessor scheduling: the GMPR
interface. Real-Time Systems, 50(3):342–376.
Burns, A. and Davis, R. (2018). Mixed criticality systems—a review. https://www-users.cs.york.ac.uk/
burns/review.pdf.
Burns, A., Davis, R., Wang, P., and Zhang, F. (2012). Partitioned EDF scheduling for multiprocessors using a
C=D scheme. Real-Time Systems, 48(1):3–33.
Cormen, T., Leiserson, C., Rivest, R., and Stein, C. (2001). Introduction to Algorithms. McGraw-Hill Higher
Education, 2nd edition.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages
886–893.
Dantzig, G. (1998). Linear Programming and Extensions. Princeton University Press, 11th edition.
Deng, Z. and Liu, J. (1997). Scheduling real-time applications in an open environment. In Proceedings of the
18th IEEE Real-Time Systems Symposium, pages 308–319.
Devi, U. (2006). Soft Real-Time Scheduling on Multiprocessors. PhD thesis, University of North Carolina,
Chapel Hill, NC.
181
Devi, U. and Anderson, J. (2008). Tardiness bounds under global EDF scheduling on a multiprocessor.
Real-Time Systems, 38(2):133–189.
Dong, Z., Liu, C., Gatherer, A., McFearin, L., Yan, P., and Anderson, J. (2017). Optimal dataflow scheduling
on a heterogeneous multiprocessor with reduced response time bounds. In Proceedings of the 29th
Euromicro Conference on Real-Time Systems, pages 15:1–15:22.
Durrieu, G., Fauge`re, M., Girbal, S., Pe´rez, D., Pagetti, C., and Puffitsch, W. (2014). Predictable flight
management system implementation on a multicore processor. In Embedded Real Time Software and
Systems.
Easwaran, A., Anand, M., and Lee, I. (2007). Compositional analysis framework using EDP resource models.
In Proceedings of the 28th IEEE Real-Time Systems Symposium, pages 129–138.
Easwaran, A., Shin, I., and Lee, I. (2009). Optimal virtual cluster-based multiprocessor scheduling. Real-Time
Systems, 43(1):25–59.
Elliott, G. and Anderson, J. (2011). An optimal k-exclusion real-time locking protocol motivated by multi-gpu
systems. In Proceedings of the 19th International Conference on Real-Time and Network Systems, pages
15–24.
Elliott, G., Kim, N., Erickson, J., Liu, C., and Anderson, J. (2014). Minimizing response times of automotive
dataflows on multicore. In Proceedings of the 20th IEEE International Conference on Embedded and
Real-Time Computing Systems and Applications, pages 1–10.
Elliott, G., Yang, K., and Anderson, J. (2015). Supporting real-time computer vision workloads using
OpenVX on multicore+GPU platforms. In Proceedings of the 36th IEEE Real-Time Systems Symposium,
pages 273–284.
Erickson, J. (2014). Managing Tardiness Bounds and Overload in Soft Real-Time Systems. PhD thesis,
University of North Carolina, Chapel Hill, NC.
Erickson, J. and Anderson, J. (2011). Response time bounds for G-EDF without intra-task precedence
constraints. In Proceedings of the 15th International Conference On Principles Of Distributed Systems,
pages 128–142.
Erickson, J., Anderson, J., and Ward, B. (2014). Fair lateness scheduling: Reducing maximum lateness in
G-EDF-like scheduling. Real-Time Systems, 50(1):5–47.
Erickson, J., Devi, U., and Baruah, S. (2010a). Improved tardiness bounds for global EDF. In Proceedings of
the 22nd Euromicro Conference on Real-Time Systems, pages 14–23.
Erickson, J., Guan, N., and Baruah, S. (2010b). Tardiness bounds for global EDF with deadlines different
from periods. In Proceedings of the 14th International Conference On Principles Of Distributed Systems,
pages 286–301.
Fan, M. and Quan, G. (2012). Harmonic semi-partitioned scheduling for fixed-priority real-time tasks on
multi-core platform. In Proceedings of the Conference on Design, Automation and Test in Europe, pages
503–508.
Funk, S. (2004). EDF Scheduling on Heterogeneous Multiprocessors. PhD thesis, University of North
Carolina, Chapel Hill, NC.
182
Funk, S. and Baruah, S. (2003). Characteristics of EDF schedulability on uniform multiprocessors. In
Proceedings of the 15th Euromicro Conference on Real-Time Systems, pages 211–218.
Funk, S. and Baruah, S. (2005a). Restricting EDF migration on uniform heteroteneous multiprocessors.
Technique et Science Informatiques, 24(8):917–938.
Funk, S. and Baruah, S. (2005b). Task assignment on uniform heterogeneous multiprocessors. In Proceedings
of the 17th Euromicro Conference on Real-Time Systems, pages 219–226.
Funk, S., Goossens, J., and Baruah, S. (2001). On-line scheduling on uniform multiprocessors. In Proceedings
of the 22nd IEEE Real-Time Systems Symposium, pages 183–192.
Grandpierre, T., Lavarenne, C., and Sorel, Y. (1999). Optimized rapid prototyping for real-time embedded
heterogeneous multiprocessors. In Proceedings of the 7th International Workshop on Hardware/Software
Codesign, pages 74–78.
Guan, N., Stigge, M., Yi, W., and Yu, G. (2010). Fixed-priority multiprocessor scheduling with Liu and
Layland’s utilization bound. In Proceedings of the 16th IEEE Real-Time and Embedded Technology and
Applications Symposium, pages 165–174.
Guo, Z., Santinelli, L., and Yang, K. (2015). EDF schedulability analysis on mixed-criticality systems with
permitted failure probability. In Proceedings of the 21st IEEE International Conference on Embedded
and Real-Time Computing Systems and Applications, pages 187–196.
Horvath, E., Lam, S., and Sethi, R. (1977). A level algorithm for preemptive scheduling. Journal of the ACM,
24(1):32–43.
Jiang, X., Guan, N., Long, X., and Yi, W. (2017). Semi-federated scheduling of parallel real-time tasks on
multiprocessors. In Proceedings of the 38th IEEE Real-Time Systems Symposium, pages 80–91.
Jiang, X., Long, X., Guan, N., and Wan, H. (2016). On the decomposition-based global edf scheduling of
parallel real-time tasks. In Proceedings of the 37th IEEE Real-Time Systems Symposium, pages 237–246.
Kato, S. and Yamasaki, N. (2007). Real-time scheduling with task splitting on multiprocessors. In Proceedings
of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and
Applications, pages 441–450.
Kato, S. and Yamasaki, N. (2008). Portioned EDF-based scheduling on multiprocessors. In Proceedings of
the 8th ACM International Conference on Embedded Software, pages 139–148.
Kato, S. and Yamasaki, N. (2009). Semi-partitioned fixed-priority scheduling on multiprocessors. In
Proceedings of the 15th IEEE Real-Time and Embedded Technology and Applications Symposium, pages
23–32.
Khronos Group (2014). The OpenVX™ specification. Version 1.0, Revision r28647, https://
www.khronos.org/registry/OpenVX/specs/1.0/OpenVX Specification 1 0.pdf.
Leoncini, M., Montangero, M., and Valente, P. (2017). A branch-and-bound algorithm to compute a tighter
bound to tardiness for preemptive global EDF scheduler. In Proceedings of the 25th International
Conference on Real-Time Networks and Systems, pages 128–137.
Leontyev, H. and Anderson, J. (2007a). Tardiness bounds for EDF scheduling on multi-speed multicore
platforms. In Proceedings of the 13th IEEE International Conference on Embedded and Real-Time
Computing Systems and Applications, pages 103–111.
183
Leontyev, H. and Anderson, J. (2007b). Tardiness bounds for FIFO scheduling on multiprocessors. In
Proceedings of the 19th Euromicro Conference on Real-Time Systems, pages 71–80.
Leontyev, H. and Anderson, J. (2009). A hierarchical multiprocessor bandwidth reservation scheme with
timing guarantees. Real-Time Systems, 43(1):60–92.
Leontyev, H. and Anderson, J. (2010). Generalized tardiness bounds for global multiprocessor scheduling.
Real-Time Systems, 44(1):26–71.
Li, J., Agrawal, K., Lu, C., and Gill, C. (2013). Analysis of global EDF for parallel tasks. In Proceedings of
the 25th Euromicro Conference on Real-Time Systems, pages 3–13.
Li, J., Ferry, D., Ahuja, S., Agrawal, K., Gill, C., and Lu, C. (2017). Mixed-criticality federated scheduling
for parallel real-time tasks. Real-Time Systems, 53(5):760–811.
Li, J., Saifullah, A., Agrawal, K., Gill, C., and Lu, C. (2014). Analysis of federated and global scheduling for
parallel real-time tasks. In Proceedings of the 26th Euromicro Conference on Real-Time Systems, pages
85–96.
Lipari, G. and Baruah, S. (2001). A hierarchical extension to the constant bandwidth server framework. In
Proceedings of the 7th IEEE Real-Time Technology and Applications Symposium, pages 26–35.
Lipari, G. and Bini, E. (2003). Resource partitioning among real-time applications. In Proceedings of the
15th Euromicro Conference on Real-Time Systems, pages 151–158.
Lipari, G. and Bini, E. (2010). A framework for hierarchical scheduling on multiprocessors: from application
requirements to run-time allocation. In Proceedings of the 31st IEEE Real-Time Systems Symposium,
pages 249–258.
Liu, C. and Anderson, J. (2010). Supporting soft real-time DAG-based systems on multiprocessors with no
utilization loss. In Proceedings of the 31st IEEE Real-Time Systems Symposium, pages 3–13.
Liu, C. and Layland, J. (1973). Scheduling algorithms for multiprogramming in a hard real-time environment.
Journal of the ACM, 30(1):46–61.
Mercer, C., Savage, S., and Tokuda, H. (1994). Processor capacity reserves: Operating system support for
multimedia applications. In Proceedings of IEEE International Conference on Multimedia Computing
and Systems, pages 90–99.
Mok, A., Feng, X., and Chen, D. (2001). Resource partition for real-time systems. In Proceedings of the 7th
IEEE Real-Time Technology and Applications Symposium, pages 75–84.
Nemitz, C., Yang, K., Yang, M., Ekberg, P., and Anderson, J. (2016). Multiprocessor real-time locking
protocols for replicated resources. In Proceedings of the 28th Euromicro Conference on Real-Time
Systems, pages 50–60.
Parri, A., Biondi, A., and Marinoni, M. (2015). Response time analysis for G-EDF and G-DM scheduling of
sporadic DAG-tasks with arbitrary deadline. In Proceedings of the 23rd International Conference on
Real Time and Networks Systems, pages 205–214.
Patterson, J. and Chantem, T. (2016). EDF-hv: An energy-efficient semi-partitioned approach for hard
real-time systems. In Proceedings of the 24th International Conference on Real-Time and Network
Systems, pages 267–276.
184
Pinedo, M. (1995). Scheduling, Theory, Algorithms, and Systems. Prentice Hall.
Rainey, E., Villarreal, J., Dedeoglu, G., Pulli, K., Lepley, T., and Brill, F. (2014). Addressing system-level
optimization with OpenVX graphs. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pages 658–663.
Saifullah, A., Agrawal, K., Lu, C., and Gill, C. (2013). Multi-core real-time scheduling for generalized
parallel task models. Real-Time Systems, 49(4):404–435.
Shin, I., Easwaran, A., and Lee, I. (2008). Hierarchical scheduling framework for virtual clustering of
multiprocessors. In Proceedings of the 20th Euromicro Conference on Real-Time Systems, pages
181–190.
Shin, I. and Lee, I. (2003). Periodic resource model for compositional real-time guarantees. In Proceedings
of the 24th IEEE Real-Time Systems Symposium, pages 1–12.
Sousa, P., Souto, P., Tovar, E., and Bletsas, K. (2013). The carousel-EDF scheduling algorithm for multipro-
cessor systems. In Proceedings of the 19th IEEE International Conference on Embedded and Real-Time
Computing Systems and Applications, pages 12–21.
Stafford, R. (2006). Random vectors with fixed sum. http://www.mathworks.com/matlabcentral/
fileexchange/9700-random-vectors-with-fixed-sum.
Stavrinides, G. and Karatza, H. (2011). Scheduling multiple task graphs in heterogeneous distributed real-time
systems by exploiting schedule holes with bin packing techniques. Simulation Modelling Practice and
Theory, 19(1):540–552.
Tong, G. and Liu, C. (2016). Supporting soft real-time sporadic task systems on heterogeneous multiproces-
sors with no uilitzation loss. IEEE Transactions on Parallel and Distributed Systems, 27(9):2740–2752.
Valente, P. (2016). Using a lag-balance property to tighten tardiness bounds for global EDF. Real-Time
Systems, 52(4):486–561.
Vestal, S. (2007). Preemptive scheduling of multi-criticality systems with varying degrees of execution time
assurance. In Proceedings of the 28th IEEE Real-Time Systems Symposium, pages 239–243.
Xu, M., Phan, L., Sokolsky, O., Xi, S., Lu, C., Gill, C., and Lee, I. (2015). Cache-aware compositional
analysis of real-time multicore virtualization platforms. Real-Time Systems, 51(6):675–723.
Yang, K. and Anderson, J. (2014a). Optimal GEDF-based schedulers that allow intra-task parallelism on
heterogeneous multiprocessors. In Proceedings of the 12th IEEE Symposium on Embedded Systems for
Real-Time Multimedia, pages 30–39.
Yang, K. and Anderson, J. (2014b). Soft real-time semi-partitioned scheduling with restricted migrations
on uniform heterogeneous multiprocessors. In Proceedings of the 22nd International Conference on
Real-Time Networks and Systems, pages 215–224.
Yang, K. and Anderson, J. (2015a). On the soft real-time optimality of global EDF on multiprocessors:
From identical to uniform heterogeneous. In Proceedings of the 21st IEEE International Conference on
Embedded and Real-Time Computing Systems and Applications, pages 1–10.
Yang, K. and Anderson, J. (2015b). An optimal semi-partitioned scheduler for uniform heterogeneous
multiprocessors. In Proceedings of the 27th Euromicro Conference on Real-Time Systems, pages
199–210.
185
Yang, K. and Anderson, J. (2016a). On the dominance of minimum-parallelism multiprocessor supply. In
Proceedings of the 37th IEEE Real-Time Systems Symposium, pages 215–226.
Yang, K. and Anderson, J. (2016b). Tardiness bounds for global EDF scheduling on a uniform multiprocessor.
In Proceedings of the 7th International Real-Time Scheduling Open Problems Seminar, pages 3–4.
Yang, K. and Anderson, J. (2017). On the soft real-time optimality of global EDF on uniform multiprocessors.
In Proceedings of the 38th IEEE Real-Time Systems Symposium, pages 319–330.
Yang, K., Elliott, G., and Anderson, J. (2015). Analysis for supporting real-time computer vision workloads
using OpenVX on multicore+GPU platforms. In Proceedings of the 23rd International Conference on
Real-Time Networks and Systems, pages 77–86.
Yang, K., Yang, M., and Anderson, J. (2016). Reducing response-time bounds for DAG-based task systems on
heterogeneous multicore platforms. In Proceedings of the 24th International Conference on Real-Time
Networks and Systems, pages 349–358.
Yang, M., Lei, H., Liao, Y., and Rabee, F. (2013). PK-OMLP: An OMLP based k-exclusion real-time
locking protocol for multi-gpu sharing under partitioned scheduling. In Proceedings of the 11th IEEE
International Conference on Dependable, Autonomic and Secure Computing, pages 207–214.
186
