Effective And Efficient Preemption Placement For Cache Overhead Minimization In Hard Real-Time Systems by Cavicchio, John
Wayne State University 
Wayne State University Dissertations 
January 2019 
Effective And Efficient Preemption Placement For Cache 
Overhead Minimization In Hard Real-Time Systems 
John Cavicchio 
Wayne State University, cavicchio@comcast.net 
Follow this and additional works at: https://digitalcommons.wayne.edu/oa_dissertations 
 Part of the Computer Sciences Commons 
Recommended Citation 
Cavicchio, John, "Effective And Efficient Preemption Placement For Cache Overhead Minimization In Hard 
Real-Time Systems" (2019). Wayne State University Dissertations. 2150. 
https://digitalcommons.wayne.edu/oa_dissertations/2150 
This Open Access Dissertation is brought to you for free and open access by DigitalCommons@WayneState. It has 
been accepted for inclusion in Wayne State University Dissertations by an authorized administrator of 
DigitalCommons@WayneState. 
EFFECTIVE AND EFFICIENT PREEMPTION PLACEMENT FOR
CACHE OVERHEAD MINIMIZATION IN HARD REAL-TIME
SYSTEMS
by
JOHN C. CAVICCHIO
DISSERTATION
Submitted to the Graduate School
of Wayne State University,
Detroit, Michigan
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
2019
MAJOR: COMPUTER SCIENCE
Approved By:
Nathan Fisher, Ph.D.
Daniel Grosu, Ph.D.
Loren Schwiebert, Ph.D.
Hongwei Zhang, Ph.D.
Song Jiang, Ph.D.
ACKNOWLEDGEMENTS
First and foremost, Dr. Fisher is everything a professor should be and so much more. Dr. Fisher’s
consummate professional demeanor and enduring patience are just a few of the many fine qualities that
I will try to take with me as I hope to transition from industry to academia over the next several years.
Dr. Fisher is exceptional at providing students with the space to explore and grow technically while
respectfully reeling us back in when needed with insightful feedback that fosters continual learning. He
has been quite generous in sharing his experiences, frequently offering glimpses into the life of a professor
with the right touch of humor that still brings a smile to my face. I have always dreamed of being a
computer science professor, ever since that first high school data processing class with punch cards some
years ago. I find myself nearing graduation with a doctoral degree that I hardly thought possible, largely
in part due to Dr. Fisher’s gentle guidance that helped set me on the path to success. Words cannot express
the impact you have had on my academic achievements and future career.
I also wish to express my profound gratitude to the other members of my dissertation committee, Dr.
Loren Schwiebert, and Dr. Daniel Grosu of the Computer Science Department at Wayne State University,
along with Dr. Hongwei Zhang of the Department of Electrical and Computer Engineering at Iowa State
University and Dr. Song Jiang of the Department of Computer Science and Engineering at the University of
Texas Arlington for their example and continued support during my doctoral journey. I want to also thank
the remaining professors and staff of the Wayne State University Computer Science Department for their
hard work and dedication, and for their many contributions to my Computer Science career preparation.
I would like to especially give a most heartfelt thank you to my extended family for their love, support,
and encouragement; whose sacrifices during this long and arduous endeavor will always be remembered
for their part in this great achievement.
Lastly, the research contained in this dissertation was supported in part by the US National Science
Foundation (CNS Grant Nos. 0953585 and 1618185).
ii
TABLE OF CONTENTS
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
CHAPTER 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Real-time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Worst Case Execution Time (WCET) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Task Preemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Cache Related Preemption Delay (CRPD) . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Motivation Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CRPD Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Limited Preemption Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
CHAPTER 3 Models and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Real-Time Systems Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Jobs and Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Periodic Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Sporadic Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Control Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Online and Offline Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Cache Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Cache Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Direct Mapped Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Set Associative Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
Fully Associative Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Cache Replacement Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Belady’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Least Recently Used (LRU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Least Recently Used K (LRU-K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Most Recently Used (MRU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Random Replacement (RR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
First-In First-Out (FIFO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Least Frequently Used (LFU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Least Recently/Frequently Used (LRFU) . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Clock/ClockPro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Segmented LRU (SLRU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Low Inter-Reference Recency Set (LIRS) . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Adaptive Replacement Cache (ARC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Clock with Adaptive Replacement (CAR) . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Clock with Adaptive Replacement with Temporal Filtering (CART) . . . . . . . . . . . . 32
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
CHAPTER 4 CRPD Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Interdependent CRPD Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Example LCB Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Example of LCB Interdependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
CHAPTER 5 Minimizing Cache Overhead via Loaded Cache Blocks and Preemption Placement . . 41
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Integrated WCET/CRPD Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Preemption Point Placement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Preemption Cost Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
iv
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Breakdown Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
CHAPTER 6 Realizing Improved Preemption Placement in Real-Time Program Code with Interde-
pendent Cache Related Preemption Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Informal Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Real-Time Conditional Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Grammar Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Grammar Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Preemption Point Placement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
High-Level Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Recursive Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Non-Unrolled Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Inline Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Interdependent CRPD Solution Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Preemption Cost Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Breakdown Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
CHAPTER 7 Integrating Preemption Thresholds with Limited Preemption Scheduling . . . . . . . 110
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Limited Preemption FP Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Limited Preemption Threshold Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Limited Preemption Scheduling Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 114
v
Preemption Placement Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Response Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Limited Preemption FP Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Limited Preemption FP Threshold Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 116
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
High-Level Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Preemption Placement and OTA Integration . . . . . . . . . . . . . . . . . . . . . . . . . 117
Integrated PP/OTA Algorithm Proof of Correctness . . . . . . . . . . . . . . . . . . . . . 118
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Task Set Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
System Utilization Schedulability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
CHAPTER 8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Interdependent CRPD Computation for Set Associative Caches with Preemption Point
Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Real Time Systems Computational Framework for CRPD, Schedulability Analysis, and
Preemption Point Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Real Time Systems Computational Framework Case Study Evaluation . . . . . . . . . . . 126
Non-Inline Function Support for Preemption Placement in Real-Time Program Code . . . 127
Goto Statement Support for Preemption Placement in Real-Time Program Code . . . . . . 129
Interdependent Probabilistic CRPD with Preemption Point Placement . . . . . . . . . . . 130
Real Time Systems Computational Framework . . . . . . . . . . . . . . . . . . . . . . . 130
Multiprocessor Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Multi-Level Cache Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
CHAPTER 9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
CHAPTER 10 List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
vi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Autobiographical Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
vii
LIST OF TABLES
Table 1: Chapter Contribution Summary · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 10
Table 2: System Model Terminology · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 18
Table 3: CRPD Terminology· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 38
Table 4: Schedulability Terminology · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 60
Table 5: Preemption Placement Terminology · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 101
Table 6: Evaluation Terminology · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 108
viii
LIST OF FIGURES
Figure 1: Adaptive Cruise Control (ACC) High Level Block Diagram in ”Efficient Real-Time
Support for Automotive Applications: A Case Study”, by Goud et al., Proc. of the
Embedded Real-Time Computing Systems and Applications, 2006. · · · · · · · · · · · · · 1
Figure 2: Adaptive Cruise Control (ACC) Taskset in ”Efficient Real-Time Support for Au-
tomotive Applications: A Case Study”, by Goud et al., Proc. of the Embedded
Real-Time Computing Systems and Applications, 2006. · · · · · · · · · · · · · · · · · · · · · 2
Figure 3: Fly by Wire Hard-Real-Time Taskset in ”Real-Time Systems”, by Jane W.S. Liu,
2000. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 3
Figure 4: qurt Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 7
Figure 5: ndes Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 8
Figure 6: Real-Time Job Execution. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 14
Figure 7: Conditional Control Flow Graph. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 15
Figure 8: Linear CFG Structure. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 16
Figure 9: Key Cache Memory Design Components in ”The Cache Memory Book: The Aut-
horitative Reference on Cache Design”, Second Edition, by J. Handy, 1998. · · · · · · · 19
Figure 10: Cache Memory Address Decomposition. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 20
Figure 11: Direct Mapped Cache Hit Determination Logic. · · · · · · · · · · · · · · · · · · · · · · · · · · · 20
Figure 12: Set Associative Cache Memory Design Components. · · · · · · · · · · · · · · · · · · · · · · · 21
Figure 13: Set Associative Cache Hit Determination Logic. · · · · · · · · · · · · · · · · · · · · · · · · · · · 22
Figure 14: Fully Associative Cache Memory Address Decomposition. · · · · · · · · · · · · · · · · · · · 22
Figure 15: Fully Associative Cache Hit Determination Logic. · · · · · · · · · · · · · · · · · · · · · · · · · 23
Figure 16: LRU Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 24
Figure 17: LRU-K Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 25
Figure 18: MRU Cache Replacement Algorithm.· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 25
Figure 19: FIFO Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 26
Figure 20: LFU Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 27
Figure 21: Clock Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 28
Figure 22: ClockPro Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 28
ix
Figure 23: Segmented LRU Cache Replacement Algorithm.· · · · · · · · · · · · · · · · · · · · · · · · · · · 29
Figure 24: ARC Cache Replacement Algorithm. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 31
Figure 25: Taskset ECBs and UCBs.· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 38
Figure 26: Taskset AUCBs. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 38
Figure 27: LCB Interdependence · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 39
Figure 28: Linear CFG Algorithm Example. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 46
Figure 29: Algorithm Example. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 46
Figure 30: Combined WCET and CPRD Costs.· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 47
Figure 31: Algorithm Results. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 47
Figure 32: Recursion Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 50
Figure 33: LMS Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 51
Figure 34: ADPCM Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 51
Figure 35: Breakdown Utilization Comparison. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 53
Figure 36: Graph Creation. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 60
Figure 37: Series Composition. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 60
Figure 38: Parallel Composition. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 60
Figure 39: Cyclic Composition. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 61
Figure 40: Motivating Example. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 64
Figure 41: Independent CRPD Costs. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 64
Figure 42: Interdependent CRPD Costs. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 64
Figure 43: Subgraph Solution Interface. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 65
Figure 44: Equivalent Subgraph Solution Interface. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 65
Figure 45: Visible Predecessor Preemption Points. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 66
Figure 46: Visible Successor Preemption Points. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 66
Figure 47: Production Rule P1. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 67
Figure 48: Production Rule P2. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 68
x
Figure 49: Production Rule P3. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 72
Figure 50: Production Rule P4. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 72
Figure 51: Production Rule P5. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 74
Figure 52: Production Rule P6. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 76
Figure 53: Production Rule P7. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 79
Figure 54: Production Rules P1− P4. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 83
Figure 55: Production Rules P5− P7. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 83
Figure 56: Production Rule P8. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 84
Figure 57: Production Rule P9. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 85
Figure 58: Production Rule P10. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 87
Figure 59: Production Rules P8− P10. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 89
Figure 60: Production Rule P11. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 90
Figure 61: Production Rule P12. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 92
Figure 62: Production Rule P13. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 94
Figure 63: Production Rule P14. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 95
Figure 64: Production Rules P11− P14. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 98
Figure 65: FFT Instruction Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 103
Figure 66: Recursion Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 104
Figure 67: LMS Instruction Cache.· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 104
Figure 68: LMS Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 105
Figure 69: Cover Instruction Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 105
Figure 70: ADPCM Data Cache. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 106
Figure 71: Breakdown Utilization Comparison. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 108
Figure 72: FPTS Breakdown Utilization Comparison. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 109
Figure 73: Breakdown Utilization Comparison. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 122
Figure 74: Average Preemption Cost Comparison.· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 122
xi
Figure 75: Real-Time Systems Computational Framework. · · · · · · · · · · · · · · · · · · · · · · · · · · · 125
Figure 76: Function Definition Preemption Placement Problem · · · · · · · · · · · · · · · · · · · · · · · · 127
Figure 77: Compute Function Definition Preemption Problem · · · · · · · · · · · · · · · · · · · · · · · · · 127
Figure 78: Parse Tree Function Invocations · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 128
Figure 79: Function Call Parsing Progress· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 128
Figure 80: Function Call Preemption Solutions Computed · · · · · · · · · · · · · · · · · · · · · · · · · · · · 129
Figure 81: Graph Parsing Completed · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 129
xii
1
CHAPTER 1 INTRODUCTION
Motivation
Real-time Systems
Real-time systems are hardware and software systems where the utility of computational results ex-
ists within well defined time deadlines [43]. For example, in automotive engine control applications, it
is imperative that the controller provide the engine with the proper air/fuel/spark mixture at the proper
instant in time, otherwise engine and emissions performance will degrade significantly. Another example
in automotive adaptive cruise control (ACC) applications where the controller must dynamically adjust
the vehicle’s speed to avoid a potential collision using hard-real-time data generated by several distinct
sources each processed by a distinct hard-real-time software task. Figure 1 illustrates a high-level block
diagram of an adaptive cruise control system [32]. In the ACC system, the wheel sensors provide vehicle
Figure 1: Adaptive Cruise Control (ACC) High Level Block Diagram in ”Efficient Real-Time Support for
Automotive Applications: A Case Study”, by Goud et al., Proc. of the Embedded Real-Time Computing
Systems and Applications, 2006.
speed data while the radar software component detects and tracks other vehicles and/or objects that exist
along the current direction of travel. Once the raw data has been obtained, the control unit must execute
sophisticated software algorithms in a timely fashion each partitioned into disparate real-time software
tasks. The tasks subsequently perform the necessary computations to implement the complex adaptive
cruise control feature in real-time. Figure 2 illustrates the diverse task set and the corresponding data set
used in the complex computations involved in adaptive cruise control systems [32]. The ACC tasks are
shown as circles with the corresponding input and output data for each task. The diagram illustrates the
2
Figure 2: Adaptive Cruise Control (ACC) Taskset in ”Efficient Real-Time Support for Automotive Ap-
plications: A Case Study”, by Goud et al., Proc. of the Embedded Real-Time Computing Systems and
Applications, 2006.
dependencies of each task on the correct and timely output of predecessor tasks. Thus, it is clear that ACC
tasks must meet their respective deadlines in order for the vehicle to promptly and automatically adjust
the vehicle speed to avoid a potential collision. Thus, the real-time systems computation is dependent on
both the generation of correct logical results along with the time when these results are available. Three
prominent categories or classifications of real-time systems are hard-real-time systems, firm-real-time sy-
stems, and soft-real-time systems. Soft-real-time systems are systems where an occasional deadline miss
results in diminished utility or reduced quality of service (QoS). Firm-real-time systems are systems that
can tolerate or are resilient to an occasional deadline miss. Here, each deadline miss results in a useless
or unusable computation thereby adversely impacting system performance. In hard-real-time systems,
however, deadline misses result in catastrophic operational effects, adversely impacting critical system
functions. Thus, these systems are constrained to meet stringent deadlines regardless of computational
load on the system, otherwise the system is considered to have completely failed. Today, real-time com-
puting systems have become ubiquitous building blocks in many engineering applications comprising the
most critical element in achieving required system performance. For example, in airplane flight applica-
tions, the modern fly by wire flight control system is categorized as a hard-real-time system. The system
is comprised of a high-performance microprocessor and associated electro-mechanical actuators, where
the timeliness and accuracy of the frequent in-flight control actions in response to various environmental
and weather-related conditions is paramount for the safety of the pilot, crew, and passengers. Due to the
complexity of real-time application requirements, the requisite functionality is implemented via a set of
tasks, each responsible for performing some suitable subset of system computation. Figure 3 illustrates a
3
Figure 3: Fly by Wire Hard-Real-Time Taskset in ”Real-Time Systems”, by Jane W.S. Liu, 2000.
sample hard-real-time taskset utilized by a fly-by-wire avionics flight control system [21, 44]. Raw data
from each sensor shown on the left is filtered and processed using signal conditioning algorithms running
at the proscribed rate. The conditioned data is used by several motion control algorithms each running at
the required rates for proper motion control. Lastly, the motion control decisions result in the actuation of
the individual ailerons to carry out proper flight control. Like the ACC system example, it can be readily
seen that the output of each real-time task in the fly-by-wire flight control system is critical to correct sy-
stem operation. Therefore, the real-time system designer must assure the correct and timely functionality
of each real-time computing task and the associated electro-mechanical parts for all the operating conditi-
ons of the system. Verifying that all tasks in hard-real-time system tasksets meet their respective deadlines
is no doubt a non-trivial computational task.
Worst Case Execution Time (WCET)
Worst Case Execution Time (WCET) is the maximum execution time of a real-time computing task
to produce correct computational results [59]. One of the notable challenges of computing WCET is
the set of inputs that result in the maximum execution time is not always known [71]. WCET is used
primarily by real-time system designers to assess or characterize the worst-case timing behavior of tasks
in a partitioned task scheduled implementation for various input and output scenarios. The resulting WCET
is then compared to pre-established timing budgets or used as input to real-time schedulability analysis.
Real-time schedulability analysis is the mathematical and functional verification of the scheduling software
component along with the various computing algorithms comprising each distinct task. A real-time system
4
implemented via multiple tasks consists of key software components such as a task scheduler, a system
clock or tick, and the processing algorithms contained in software tasks. One of the characteristics that
real-time systems possess over non-real-time systems is the utility of computations heavily depends on the
schedulability of its processes or tasks. In other words, the ability of each task to meet is specified timing
deadline. A deadline is defined as the maximum time allotted for each task to complete its computation
or processing for a given set of inputs. In a real-time system, tasks are accepted by the scheduler software
component, then scheduled to use the CPU in accordance with their assigned priority as dictated by the
chosen scheduling algorithm, subject to the constraint that its processing must be completed as specified by
the task deadline. The ability of a given task to meet its deadline is highly dependent on the performance
characteristics of the chosen scheduling algorithm in conjunction with the performance characteristics of
the task specific algorithms. The precise modeling and evaluation of real-time system scheduling behavior
is the rigorous mathematical analysis of the scheduling algorithm’s ability to ensure that all real-time
system tasks meet their specified deadlines is the subject of schedulability analysis. A real-time system
meets the definition of schedulable if the scheduling algorithm can guarantee that all tasks can meet their
assigned deadlines, otherwise the system is said to be un-schedulable. The accuracy of the schedulability
assessment of real-time system performance is critically dependent on the accuracy of the various task
WCET values.
Task Preemption
Real-time scheduling algorithms are responsible for selecting tasks to use the CPU at any given mo-
ment from a list of tasks that are ready to execute. Most scheduling algorithms accomplish this via a
process known as task preemption. Task preemption results in the task currently using the CPU to be
momentarily suspended or interrupted by the real-time system scheduler, itself a privileged task. The
scheduler then selects the highest priority task to run, restores its execution context, thereby allowing the
new task to continue executing. This type of task scheduling change in real-time systems is known as a
context switch. A context switch is the process of storing and restoring the execution context of a running
task (typically a thread) so that execution can be resumed from the point at which the task was interrup-
ted some time later. In effect, the preemptive scheduler is a key real-time system component enabling
multiple tasks to share the CPU. The rules governing when task preemption occurs is a function of the
task preemption model and scheduling algorithm. The scheduling algorithms used in scheduling real-time
system tasks are categorized as fully preemptive, limited preemption or non-preemptive. A fully preemp-
tive scheduling algorithm allows tasks to be preempted at any code location during their execution. Tasks
run to completion in a non-preemptive scheduling algorithm before another task may use the CPU. Hence
5
non-preemptive scheduling does not permit tasks to be preempted, introducing undesirable blocking on
higher-priority tasks. In a limited preemption scheduling algorithm, tasks may be preempted only at a
limited number of locations. Limited preemption scheduling is also known as deferred preemption sche-
duling. Preemptions caused by the execution of higher priority tasks are deferred until a later point in
time. The amount of preemption deferment is based on taskset execution characteristics, and the amount
of available execution slack during system operation. The task preemption model along with the chosen
scheduling algorithm utilized in any given real-time system can facilitate or inhibit the ability of tasks to
meet their required deadlines.
Cache Related Preemption Delay (CRPD)
Cache memory is basically high-speed memory inserted between the CPU and the much slower main
memory (RAM) acting much like a buffer. The concept of cache memories relies on two fundamental
characteristics exhibited by most software programs, namely spatial locality and temporal locality. Spatial
locality refers to the property where software programs repeatedly executes the same instructions and/or
manipulates data stored in the same location. Temporal locality is the property where software programs
repetitively access the same memory locations around the same time frame as opposed to being spread
evenly over the execution life of the program. The objective of cache memory designs is to limit the
number of accesses to the slower main memory by storing frequently accessed data in the faster cache
memory, thereby increasing the effective memory speed and thus allowing the CPU to operate at a faster
rate. For real-time systems that employ instruction and/or data caches, task preemption has the effect of
introducing additional delays in task execution resulting from cache related preemption delay. Cache rela-
ted preemption delay (CRPD) occurs when a task is preempted by higher priority tasks, whose subsequent
execution causes memory blocks stored in cache memory to be evicted [39]. When the interrupted task
resumes execution, the evicted cache memory blocks must be reloaded, introducing additional delay to the
overall task execution time. The additional delay resulting from CRPD can significantly impact the ability
of tasks in a real-time system to meet their deadlines thereby degrading system operation or causing total
system failure. Therefore, it is imperative that CRPD effects be accounted for when determining whether
hard-real-time tasksets can reliably meet their deadlines.
Schedulability Analysis
Today, high-performance microprocessors with advanced capabilities are commonplace in real-time
systems. These processors operate at higher clock frequency and often contain multiple cores in a single
processor [45]. Also, many processor manufacturers offer complete embedded computers, which include a
powerful processor, several I/O interfaces, and serial communication interfaces in a single compact printed
6
circuit board (PCB) [46]. These modern embedded computers are ideal for contemporary hard-real-time
system development as they have higher computing capabilities, exhibit higher reliability, and are more
affordable than their previous counterparts. To realize the necessary performance gains, cache memory
designs are employed to enable the CPU to operate at the fastest rate possible. Cache memory, despite its
advantages, introduces schedulability challenges that real-time system designers must address during sy-
stem design to ensure that real-time tasks consistently meet their deadlines during runtime. Schedulability
analysis of the chosen scheduling algorithm and task preemption model in conjunction with the real-time
taskset relies heavily on accurate WCET and CRPD estimates. It is imperative that the WCET and CRPD
estimates be safe, meaning greater than or equal to the actual task performance in order for the schedu-
lability analysis results to be meaningful to real-time system designers. The fundamental problem with
existing CRPD calculation techniques is a significant level of pessimism or inaccuracy. The effect of this
inaccuracy in the context of schedulability analysis is that some tasksets may be declared un-schedulable
when in fact they are indeed schedulable, leading to over provisioning of CPU resources.
Until the last decade, real-time systems were mainly constrained by the available CPU processing
power, where the CPU processing power was the scarcest resource. Earlier, the cost of real-time hardware
systems was substantially high, compared to the combined cost of software design and testing [62]. In
recent times, the average cost per tera flop has been reduced exponentially [38]. Today, modern real-time
system manufacturers face different challenges. For real-time computing applications, the software design,
development, and testing costs now represent the highest portion of product development costs due to
increasing size and complexity of software programs as witnessed in a majority of engineering application
domains. Lack of proper preemption point placement and schedulability analysis frameworks along with
the difficultly involved in streamlining the real-time design process are prominent contributors to the higher
development costs. It is clear that real-time system designers need to focus more attention on the software
design and associated implementation issues. With the ever-increasing complexity exhibited by the real-
time system physical domain along with variable hardware constraints, the number, size, and complexity
of the real-time tasks developed to meet these diverse requirements is rapidly increasing. Thus, designing
a reliable system with efficient resource provisioning where real-time tasks are guaranteed to meet their
deadlines has become a central issue in achieving guaranteed robust real-time system performance. With
the increasing size and complexity of real-time software tasks, accurate WCET and CRPD estimates are
paramount to realizing efficient provisioning of CPU resources.
7
Figure 4: qurt Data Cache.
Motivation Summary
There are many important previous research results on cache related preemption delay (CRPD), sche-
dulability analysis, and limited preemption point placement in real-time system designs (see Chapter 2).
However, from a CRPD perspective, previous work has characterized CRPD as a single valued function
dependent only on task preemption at each program location during execution. Figures 4 and 5 illus-
trate that actual CRPD values are inherently an interdependent function based on the previous and next
preemption point within the context of a limited preemption model. The horizontal axis captures program
locations corresponding to execution in a linear control flow graph. The vertical axis measures the CRPD
at each program location. Two curves are shown, one denoting the minimum CRPD, and the other deno-
ting the maximum CRPD assuming preemption occurs at that program location. What is readily apparent
from the two figures is the CRPD values are program location dependent. To see the potential benefits of
the interdependent CRPD function, the CRPD values computed in current start of the art methods would
correspond to the maximum lines shown in the two figures. It is clear that no state of the art methods have
incorporated this inherent interdependence of CRPD on the actual locations of preemptions in the task
code. The benefits of integrating an interdependent CRPD metric into automated preemption point place-
ment in a limited preemption model are increased accuracy, increased taskset schedulability, and improved
CPU resource provisioning. The resulting accuracy improvements our newly proposed CRPD metric as
integrated into our proposed preemption point placement algorithms have resulted in substantial gains in
real-time taskset schedulability as expected. Therefore, in this thesis, we fill this vacancy by suggesting a
8
Figure 5: ndes Data Cache.
suitable schedulability analysis general framework for limited preemption scheduling.
In the rest of this chapter, we first discuss the objectives as well as the contributions of our work. Then
we will outline the rest of the thesis in the end of this chapter.
Objectives
In this thesis, we introduce a new metric called loaded cache blocks (LCB) which more accurately
characterizes the CRPD a real-time task may be subjected to due to the preemptive execution of higher
priority tasks. Our LCB metric is integrated into our newly developed algorithms that automatically place
preemption points supporting both linear and arbitrary control flow graphs (CFGs). Our proposed preemp-
tion point placement algorithms are shown to achieve optimal runtime performance for a given taskset.
Our schedulability framework offers a reliable and formal process, thus saving time and costs incurred
during the real-time system design cycle.
Thesis
The thesis of this document is:
Cache related preemption delay cost is inherently dependent on task preemption locations
in contrast to existing research. The CRPD interdependence can be leveraged to optimize
preemption point placement supporting limited preemption scheduling in real time systems.
The benefit of this approach is increased schedulability of hard-real-time tasksets by effecti-
vely minimizing the preemption overhead, and can be evaluated and verified empirically over
real-time benchmarks and synthetic test sets.
9
Summary of Contributions
The main contributions of this thesis are listed as follows:
1. We propose a new CRPD metric, called loaded cache blocks (LCB) which accurately characterizes
the CRPD a real-time task may be subjected to due to the preemptive execution of higher priority
tasks (see Chapter 4).
2. We show how to integrate our new LCB metric into our newly developed algorithms that automati-
cally place preemption points supporting linear control flow graphs (CFGs) for limited preemption
scheduling applications (see Chapter 5).
3. We extend the derivation of loaded cache blocks (LCB), that was proposed for linear control flow
graphs (CFGs) to conditional CFGs for structured programs. (see Chapter 6).
4. We demonstrate how to integrate our revised LCB metric into our newly developed algorithms that
automatically place preemption points supporting conditional control flow graphs (CFGs) for limited
preemption scheduling applications. (see Chapter 6).
5. We show how to integrate the revised LCB metric into a revised algorithm that automatically place
preemption points supporting linear and conditional control flow graphs (CFGs) for limited preemp-
tion scheduling applications. (see Chapter 6).
6. We show how to integrate preemption placement with preemption threshold scheduling supporting
linear and conditional control flow graphs (CFGs) for limited preemption scheduling applications.
(see Chapter 7).
Organization
The following table gives the details of each chapter:
10
Table 1: Chapter Contribution Summary
Chapter # Contribution
Chapter 2 Related work
Chapter 3 Models and definitions used in the dissertation
Chapter 4 Loaded cache block calculation for linear and
conditional control flow graphs
Chapter 5 Preemption point placement details for linear control flow graphs
Chapter 6 Preemption point placement details for conditional control flow graphs
Chapter 7 Integrating Preemption Thresholds with Limited Preemption Scheduling
Chapter 8 Future work
Chapter 9 Conclusion of this dissertation
Chapter 10 List of Publications
11
CHAPTER 2 RELATED WORK
In this chapter, we present prior research on cache related preemption delay and limited preemption
point placement techniques. Our proposed linear and conditional PPP algorithms leverage elements from
these two prominent areas of real-time theory. We now briefly summarize the prior work in each of these
areas and describe how it relates to our proposed algorithm. For clarity, the descriptions of the related
works are divided into two separate subsections, namely, CRPD calculation, and limited preemption sche-
duling.
CRPD Calculation
The concept and algorithm for computing the set of useful cache blocks (UCBs) was proposed by Lee
et al. [39] analyzing the preempted task. Similarly, the set of evicting cache blocks (ECBs) was realized
by Tomiyamay and Dutt [69] analyzing the preempting task. More formal definitions of UCBs and ECBs
by Altmeyer and Burguiere [5] were subsequently refined and presented. By convention, the preempting
task’s memory accesses (ECBs) will evict the preempted task’s UCBs thereby resulting in non-negligible
CRPD.
Tighter bounds on the CRPD computation via the intersection of the ECB and UCB sets were achie-
ved by Negi et al. [52] and Tan and Mooney [67]. A cache state reduction technique by Staschulat and
Ernst [66] was employed trading off CRPD accuracy or tightness for a reduction in computational com-
plexity. CRPD analysis using memory access patterns was proposed by Ramaprasad and Mueller [60].
The over-approximation of cache misses in WCET analysis tools by introducing definitely-cached
useful cache block (DC-UCB) as proposed by Altmeyer and Burguiere [5]. DC-UCBs are cache blocks
that must reside in cache memory.
The ECB Union approach and a combined UCB Union and ECB Union approach by Altmeyer et
al. [6] were later introduced and shown to dominate earlier CRPD methods. The UCB Union and ECB
Union multiset approaches were subsequently introduced to reduce the pessimism in CRPD analysis also
proposed by Altmeyer et al. [7].
The above approaches assume that the CRPD cost of a preemption is computed independently to
obtain a safe, conservative bound. Conversely, an interdependent CRPD model using loaded cache blocks
(LCBs) was proposed. Recent work to improve the CRPD accuracy in PPP algorithms by Cavicchio et
al. [25] leading to tangible schedulability gains. Our research leverages the interdependent CRPD model
and extends preemption placement to conditional real-time code structures.
12
Limited Preemption Scheduling
Research in limited preemption scheduling attempts to address well known limitations of the non-
preemptive and fully preemptive scheduling paradigms. Two limited preemption scheduling models are
the deferred preemption scheduling model and the preemption threshold scheduling model.
First, the fixed preemption point model proposed by Burns [20] and the floating-point preemption
model proposed by Baruah [9] are two distinct sub-categories of deferred preemption scheduling. In the
floating preemption point model [9], the currently executing task continues executing for a minimum of
Qi time units or runs to completion if the remaining execution time is less than Qi. The parameter Qi
is computed via task set schedulability analysis [9] representing the maximum amount of blocking time
that task τi may impose on higher priority tasks. Since the start of the non-preemptive execution region
coincides with the arrival of a higher priority task, region locations are nondeterministic or floating. In
contrast, the fixed-point preemption model [20] differs from the floating preemption point model in that
the task code preemptions are confined to pre-defined fixed locations and computed using an offline PPP
algorithm.
Second, preemption threshold scheduling proposed by Wang and Saksena [70] employs a modified
priority-based scheme to determine when tasks may preempted. Each task in the preemption threshold
scheduling [70] approach is assigned two distinct priority values, each for a different purpose. These pri-
orities are known as the nominal static priority pi and a preemption threshold Πi priority. The preempting
task τk may only preempt the currently executing task τi if τk has a nominal priority pk exceeding the
assigned preemption threshold Πi.
Fixed PPP algorithms sought to minimize preemption overhead in fixed priority task sets in early work
by Simonson and Patel [64] and Lee et al. [39]. For tasks executing via preemption triggered floating non-
preemptive regions, Marinho et al. [48] successfully computed an upper-bound on task CRPD. Lastly, work
in fixed priority (FP) preemption threshold schedulability (PTS) analysis by Bril et al [19], supporting task
sets with arbitrary deadlines, successfully assigned optimal priority thresholds by minimizing independent
CRPD costs. This work was later complimented by combining the optimal threshold assignment algorithm
with a simulated annealing algorithm that optimizes task layout [18]. The effectiveness of these heuristic
solutions is limited by the employed cost model.
An optimal PPP algorithm was realized by Bertogna et al. [13, 14] with linear time complexity for
strictly linear CFG structures. A pseudo-polynomial preemption point placement algorithm was realized
[58] supporting well-structured series/parallel conditional CFG structures. The limitations of these soluti-
13
ons stem from utilizing the less accurate independent CRPD cost model as compared to the interdependent
cost model accounting for the dependency between selected preemption points. Recently, a quadratic PPP
algorithm was proposed by Cavicchio et al. [25] using the interdependent CRPD cost model for linear
CFG structures.
In our work, we extend the dynamic programming algorithm proposed by Peng et al. [58] to incorpo-
rate the more accurate interdependent CRPD cost model supporting separate instruction and data direct-
mapped caches. Our conditional PPP algorithm realizes an enhanced minimized safe upper bound preemp-
tion point placement solution resulting in substantial improvements over previous PPP methods commen-
surate with the accuracy improvements using interdependent CRPD costs. Furthermore, the handling of
inline function calls is refined into function definition and function invocation components providing the
basis for future non-inline function support. Our comprehensive simulation approach utilizes real-time
code benchmark examples as a basis for comparison against several current state of the art scheduling
methods. Lastly, our evaluation demonstrates the superiority of the interdependent CRPD model coupled
with the limited preemption scheduling model resulting in a new state of the art.
14
CHAPTER 3 MODELS AND DEFINITIONS
Our work is cross-disciplinary research that leverages the concepts from both computer science and
electrical engineering. Therefore, in this chapter, we will introduce a basic overview on notations, con-
cepts, and models, to make readers more comfortable with the main concepts of this document. Also, we
only provide a very concise yet complete overview on the concepts and theories, details can be found in
relevant textbooks and research papers.
Real-Time Systems Model
In next couple of sub sections, real-time models, notations, and definitions are presented.
Jobs and Tasks
Any real-time system can be considered as a set of concurrent tasks [44]. Each task generates an
infinite number of jobs. The jobs from a same task obey sequential order of execution.
Definition 1 (Job). A real-time job j = (A,E,D) is characterized by three parameters, an arrival time
A, an execution time E, and a deadline D, with the requirement that this job must receive E units of
execution over the interval [A,D).
Figure 6 illustrates the definition of a real time job.
Figure 6: Real-Time Job Execution.
Periodic Task Model
In the periodic task model [10], a task Ti is completely characterized by a 4-tuple (ai, ei, di, pi), where
1. the offset ai denotes the instant at which the first job generated by this task becomes available for
execution.
2. the execution requirement ei specifies an upper limit on the execution requirement of each job gene-
rated by this task.
3. the relative deadline di denotes the temporal separation between each job’s arrival time and deadline.
A job generated by this task arriving at time-instant t has a deadline at time-instant (t+ di).
15
4. the period pi specifies the separation of or the inter-arrival times of successive jobs generated by the
task.
That is, Ti = (ai, ei, di, pi) generates a potentially infinite succession of jobs, each with execution-
requirement ei, at each instant (ai + k ·pi) for all integer k ≥ 0, and the job generated at instant (ai+k ·pi)
has a deadline at instant (ai + k · pi + di) [10].
Sporadic Task Model
The sporadic task model differs from the periodic task model because of its inability to express job
arrival time until the run-time moment. A sporadic task system contains a set of sporadic tasks [10].
Definition 2 (Sporadic Task). A sporadic task τi is completely described by a 3-tuple τi = (ei, di, pi)
where ei is the worst-case execution time, di is the relative deadline, and pi is the minimum inter-
arrival time between successive jobs. By definition, sporadic tasks generate an infinite sequence of jobs.
Successive job arrivals will be separated by a minimum time denoted by pi. A sporadic task system
τ
def
= {τ1, . . . , τn} is a collection of n sporadic tasks.
The utilization indicates the amount of time that system becomes busy due to tasks in a task model.
Formally, the utilization Ui of a periodic or sporadic task τi is defined to be the ratio of its execution
requirement to its period: Ui = eipi . The utilization U(τ) of a periodic or sporadic task system τ is defined
to be the sum of the utilizations of all tasks in τ : U =
∑
Ti∈τ Ui. For any task system under any known
uniprocessor scheduling algorithm, the utilization should not exceed 1 and cannot schedulable otherwise;
however, this does not necessarily say that a system is schedulable if the utilization is under 1.
Control Flow Graphs
Each task τi contains real-time code represented by a series/parallel control flow graph (CFG), denoted
Gi. We introduce the basic block notation δ
j
i where i is the task identifier and j is the basic block identifier.
A dummy basic block δ0i with zero WCET is added at the beginning of each task to capture the preemption
that occurs prior to task execution. In our model, a basic block is a set of one or more instructions that
execute non-preemptively. Basic blocks are essentially the vertices Vi of a conditional control flow graph
(CFG) connected by directed edgesEi representing the execution sequence of one or more job instructions.
Figure 7 shown below illustrates the conditional basic block connection structure.
 
 𝑖
 𝑠
  𝑖
𝑒 
 𝑖
 0
  𝑖
𝑁𝑖 G𝑖  
 𝑖
 𝑥
  𝑖
 𝑦
 
  e𝑖
𝑥,𝑦
 
V𝑖 
E𝑖 
Figure 7: Conditional Control Flow Graph.
16
Linear CFGs represents a special case or subset of the problem for conditional CFGs as shown in Figure 8. 
 𝑖
 𝑠
  𝑖
𝑒 
 𝑖
 0
  𝑖
𝑁𝑖 G𝑖  
 𝑖
 𝑥
  𝑖
 𝑦
 
  e𝑖
𝑥,𝑦
 
V𝑖 
E𝑖 
Figure 8: Linear CFG Structure.
The CFG for task τi denoted Gi is a tuple (Vi,Ei,δsi ,δ
e
i ) representing the real-time task code structure.
Flow graphs are directed graphs describing the task execution sequence from a starting basic block δsi to
an ending basic block δei through one or more execution paths p ∈ Pi(Gi, δsi , δei ), where Pi(·) denotes all
possible execution paths starting with δsi and ending at δ
e
i in Gi. An execution path p through Gi is an
ordered sequence of basic blocks from some starting instruction δsi to an ending instruction δ
e
i . A directed
edge ex,yi = (δ
x
i , δ
y
i ) where e
x,y
i ∈ Ei connecting basic blocks δxi and δ
y
i represents the execution of basic
block δxi immediately preceding the execution of basic block δ
y
i . We introduce an operator p describing
a more general execution precedence relation, where δxi p δ
y
i represents the execution of basic block
δxi preceding the execution of basic block δ
y
i by zero or more instructions along some path p. Similarly,
operator ≺p represents the execution of basic block δxi preceding the execution of basic block δ
y
i by one
or more instructions along some path p. We further introduce notation describing a subgraph in task τi
as Gai with the tuple (V
a
i ,E
a
i ,δ
sa
i ,δ
ea
i ) where a is the subgraph identifier, δ
sa
i is the starting basic block,
and δe
a
i is the ending basic block. Similar notation used to describe the paths in subgraph a is given as
P ai (G
a
i , δ
sa
i , δ
ea
i ). Preemptions are permitted at the edges e
x,y
i between basic blocks. We introduce the
non-preemptive basic block execution time notation bji where i is the task identifier and j is the basic
block identifier, hence using this convention we have:
CNPi = max
p∈Pi(Gi,δsi ,δei )
[Σ
δji∈p
bji ]. (1)
Tasks may be preempted by multiple higher-priority tasks as determined by the employed scheduling
algorithm. In this paper, we support fixed priority (FP), Deadline-Monotonic (DM) and Earliest-Deadline-
First (EDF) scheduling. The processor utilization Ui of task τi is given by:
Ui = C
NP
i /Ti. (2)
In a limited preemption approach, each task is permitted to execute non-preemptively for a maximum
amount of time denoted by Qi. Previous research on limited preemption scheduling by Baruah et al. [9]
has used the above information to determine the value of Qi for each task. The determination of Qi is
dependent on the placement of preemption points. Preemptions occur after the last basic block instruction
17
and before the first instruction of the next basic block. The details supporting the computation of Qi are
presented in the schedulability analysis section of Chapter 4.
Online and Offline Algorithms
In offline scheduling algorithms, all scheduling decisions are made before the system begins executing.
These scheduling algorithms select jobs to execute by referencing a table describing the pre-determined
schedule. Usually, offline schedules are repeated after a least common multiple (LCM) period. The offline
schedulers require the full knowledge of the job before the they execute.
In online scheduling algorithms, scheduling decisions are made without specific knowledge of jobs
that have not yet arrived. These scheduling algorithms select jobs to execute by examining properties of
active jobs. Online algorithms can be more flexible than offline algorithms since they can schedule jobs
whose behavior cannot be predicted ahead of time.
Scheduling Algorithms
Real-time jobs are selected for execution by the scheduler component. The scheduler uses a scheduling
algorithm to perform job selection. Typical examples of scheduling algorithms are earliest deadline first
(EDF), least laxity first (LLF), rate monotonic (RM), deadline monotonic (DM), and fixed priority (FP)
to name a few [22]. At each time instance t scheduling algorithms select an active job j whose priority
is the highest. In essence, scheduling algorithms tend to differ in the manner and/or task metric used
to determine the highest priority job. The EDF algorithm selects the job with the earliest deadline [10].
Similarly, the LLF algorithm selects the job with the smallest laxity as the highest priority job. One very
well-known fixed priority scheduling algorithm is the RM algorithm. The RM algorithm, selects jobs with
shorter periods as the highest priority job [44]. Finally, the DM algorithm selects the job with the smallest
or shorter deadline as the highest priority job [44].
Depending on the scheduling requirements, a hard-real-time system might employ online or off-line
scheduling algorithms. For example, if a hard-real-time system has a well-defined taskset whose charac-
teristics are known a priori, and there are no runtime task modifications expected, then off-line scheduling
algorithms are often deemed suitable. The off-line scheduling algorithms are more suitable for systems
with sufficient resources to handle the known taskset. However, an off-line scheduling algorithm is gene-
rally unsuitable to manage continuously varying and somewhat dynamic task/job characteristics that are
unknown ahead of time. Online scheduling techniques which account for dynamic task characteristics are
often employed in these instances [44]. It is customary that online scheduling algorithms used in dynamic
environments may use heuristic methods optimized for the specific taskset.
The primary intention of this section is to provide a review of previous work that summarizes the
18
real-time system model used in our work. As such, research on scheduling algorithms and associated
techniques are in scope, as our algorithms must be compatible with the underlying scheduling algorithm.
For convenience, Table 2 summarizes the terminology presented in this section.
Term Description
bji Basic block j WCET in task τi CFG
CNPi Non-preemptive task τi execution time
Di Relative deadline of task τi
δji Basic block j in task τi CFG
δ0i Dummy basic block with zero WCET added to each task CFG
δe
a
i Ending basic block in task τi subgraph a CFG
δs
a
i Starting basic block in task τi subgraph a CFG
Ei The set of directed edges in task τi CFG
Eai The set of directed edges in task τi subgraph
ex,yi A directed edge connecting basic blocks δ
x
i and δ
y
i
Gi Task τi control flow graph (CFG)
Gai Task τi subgraph a CFG
i Index variable denoting the preempted task
Ni + 1 Number of basic blocks in task τi CFG
Pi Unique set of paths through task τi CFG
P ai Unique set of paths through task τi subgraph a CFG
P ai P
a
i (G
a
i , δ
sa
i , δ
ea
i ) using graph boundaries δ
sa
i and δ
ea
i
p Current path through task τi CFG
pi Fixed priority of task τi
Qi Maximum non-preemptive execution region for task τi
Ti Task τi inter-arrival time or period
τ Taskset containing n tasks τ1, τ2, . . . τn
τi Task τi in taskset τ
Ui The processor utilization of task τi
Vi The set of vertices in task τi CFG
V ai The set of vertices in task τi subgraph a CFG
Table 2: System Model Terminology
In the next subsections, we give details on the system hardware cache models employed on uniproces-
sor systems.
Cache Memory Models
Cache Memory
Cache memory is basically a high-speed memory array which essentially buffers the slower main
memory from the CPU. The goal of cache memory is to minimize the number of occurrences where
instructions and data must be fetched from main memory. With the ever-increasing gap between CPU
processing speed and main memory access speed, real time system designers are attempting to maximize
CPU throughput via cache memory. Cache memory designs consist of cache data memory, cache directory,
19
cache controller, address buffers, and data buffers. Figure 9 illustrates the high-level block diagram of key
cache memory design components [33].
Figure 9: Key Cache Memory Design Components in ”The Cache Memory Book: The Authoritative
Reference on Cache Design”, Second Edition, by J. Handy, 1998.
The cache data memory is the high-speed memory array where replicas of recently used main memory
data are stored. The cache directory acts as storage for the main memory addresses whose data is currently
stored in the cache data memory. The cache controller contains hardware logic responsible for controlling
the cache memory operation. The operational characteristics of hardware logic are generally referred to as
cache policies. The cache directory is examined by the cache controller when main memory is accessed
to determine whether the data is already held in the cache data memory. The cache directory stores the
address tag bits along with a valid bit indicating if that cache memory location has valid data. Alternative
terms used for cache directory are cache-tag and content addressable memory (CAM). In the electronics
domain, the data buffer is a hardware device used to store data temporarily while it is being moved from
main memory to the cache data memory. The address buffer stores the most recently memory address
accessed and is used by the cache controller. The data and address buffers also serve to amplify the current
in order to facilitate the fan out needed by the cache memory design.
Direct Mapped Caches
Initially, we consider a single processor system with direct mapped instruction and data caches. Direct
mapped cache memory possesses the property where only one cache memory location exists for each main
memory data entry. Hence there is no need for a cache replacement policy as only a single unique cache
memory location is replaced with each subsequent main memory access. Let a be the number of address
20
Figure 10: Cache Memory Address Decomposition.
Figure 11: Direct Mapped Cache Hit Determination Logic.
bits used in a CPU that employs a direct mapped cache memory design. Further, let b the number of
bits representing the cache block size and let k be the number of bits representing the number of cache
lines in the cache data memory. The number of bits used for the cache directory or cache-tag is given by
t = a− k − b. Figure 10 illustrates a typical address decomposition for direct mapped caches [33].
Direct mapped caches have a distinct advantage of simplicity, as the tag bits only have to be checked
for a single slot to determine a match or hit. One of the primary disadvantages of direct mapped caches
is that the scheme suffers from many addresses, 2t−k specifically, that end up colliding to the same cache
data memory slot. This causes each cache line to be potentially frequently evicted, thereby imposing a
cache reload delay or penalty on a running task. Figure 11 illustrates the cache hit determination logic for
direct mapped caches [33].
The k bits comprising the slot field of the address are used to index or select the cache data memory
and cache tag memory location. On a CPU memory access request, the t bits comprising the tag field of
21
Figure 12: Set Associative Cache Memory Design Components.
the address are compared to the tag bits stored in the tag memory. If they are equal, then a cache hit occurs,
and the data is buffered for use by the CPU.
Set Associative Caches
To overcome the frequent eviction of cache lines, cache designers may employ the use of multiple
sets for each cache data memory location. Let the parameter n denote the number of cache sets. Cache
memory designs that have multiple sets for each cache data memory location are known as n-way set
associative caches. Figure 12 illustrates a high-level block diagram of key set associative cache memory
design components [33].
Due to having multiple sets for each cache data memory location, a mechanism to determine which
specific set to replace is required. This mechanism is known as a replacement policy or replacement
algorithm. Figure 13 illustrates the cache hit determination logic for set associative caches [33].
Similar to direct mapped caches, the k bits comprising the slot field of the address are used to index
or select the cache data memory and cache tag memory location. On a CPU memory access request, the
t bits comprising the tag field of the address are compared to the tag bits stored in the tag memory. The
primary difference for set associative caches is the tag memory has to be examined simultaneously for all
n sets. If one set contains a match, then a cache hit occurs, and the data is buffered for use by the CPU.
Fully Associative Caches
A fully associative cache implements a cache policy allowing any main memory block to be mapped
to any cache line. The hardware for determining whether the desired memory block is in cache memory
requires comparing the memory block address tag bits to the stored tag bits of every cache block (line,slot)
22
Figure 13: Set Associative Cache Hit Determination Logic.
Figure 14: Fully Associative Cache Memory Address Decomposition.
in parallel. A fully associative cache must use content addressable memory to translate processor addresses
into cache data RAM addresses.
Let a be the number of address bits used in a CPU that employs a fully associative cache memory
design. Further, let b the number of bits representing the cache block size and let k be the number of bits
representing the number of cache lines in the cache data memory. The number of bits used for the cache
directory or cache-tag is given by t = a − b. Since the memory block can occupy any location in cache
data memory, there is no need for including the k cache slot bits. Figure 14 shown below illustrates a
typical address decomposition for fully associative caches [33].
Due to having all cache blocks available for each cache data memory access, a mechanism to determine
which specific set to replace is required. This mechanism is known as a replacement policy or replacement
algorithm. Figure 15 illustrates the cache hit determination logic for fully associative caches [33]. When
a cache miss occurs, the cache controller must select a cache block that is empty or not valid to place
23
Figure 15: Fully Associative Cache Hit Determination Logic.
the main memory data currently being accessed. If no empty cache blocks exist, the cache controller must
choose a cache block to evict based on the cache eviction policy design. If the evicted cache block contains
dirty data (i.e. the data is different from main memory contents), then the data in the cache block must be
copied back to main memory before the new data may be loaded. One of the primary advantages of fully
associative caches is their excellent temporal locality properties. In this model, the flexible memory block
placement allows the design and implementation of replacement policies to minimize the replacement of
cached memory blocks that will be referenced again. One of the disadvantages is its high cost for large
cache memory designs due to the need for a tag comparator for each cache block.
In the following subsections, we give details on the various cache replacement policies used on uni-
processor systems. Since our future work will likely implement the CRPD metric and limited preemption
point placement for various cache replacement policies or algorithms, the supporting summaries of popular
algorithms are provided for reference.
Cache Replacement Policies
Cache replacement policies [65], also known as cache replacement algorithms, are responsible for the
selection of the cache line to place the newly referenced data in when a read miss occurs. If empty cache
lines are available, then they are prioritized over occupied or valid cache lines. If there are no empty cache
lines, then an occupied cache line must be chosen for removal. The rules or algorithms that choose an
occupied cache line for removal are designed to select cache lines that are least likely to be reused in the
future. In other words, cache lines that are most likely to be reused should be avoided where possible.
24
Lastly, before the new data may be loaded, the data in the selected cache line must be written back to main
memory if it has been modified.
Belady’s Algorithm
Belady’s cache replacement algorithm [11,49], also known as the optimal algorithm or the clairvoyant
algorithm, always selects the cache block whose information is no longer needed or will not be needed
for the longest period of time. In order to accomplish this, the system must examine all future memory
references. For obvious reasons, it isn’t practical to determine how far in the future cached data will be
required using online replacement algorithms. As such, this replacement algorithm is strictly used as a
comparator to determine the relative effectiveness as compared to more practical approaches.
Least Recently Used (LRU)
The least recently used (LRU) replacement algorithm selects the cache block that has the largest age.
In other words, the cache block that was used least recently. Figure 16 illustrates the operation of the algo-
rithm. It is often convenient to visualize LRU algorithm operation by using a FIFO list data structure. The
Figure 16: LRU Cache Replacement Algorithm.
LRU algorithm requires aging information to be maintained in order to determine which block is the least
recently used. For set associative and fully associative caches, aging information is maintained for each
cache set. For example, an 8-way set associative cache has 8! or 40320 different block orderings requiring
sixteen bits to represent aging information. For fully associative caches, its customary to represent the
aging information by using counters or lists. When a cache hit occurs, the ages of all other cache blocks
are incremented. Memory access workloads where LRU performs well have two important characteristics:
(1) a preponderance of memory references with smaller reuse distances [29]; and (2) the vast majority of
these memory references have reuse distances that are smaller than the number of available cache memory
blocks. Workloads with these two characteristics exhibit strong locality properties, resulting in a higher
number of cache hits. On the other hand, LRU is known to perform poorly with workloads exhibiting
weak locality properties. The primary reason for this is that LRU inherently operates under the assump-
tion that cache blocks that haven’t been accessed for the longest time are expected to be next accessed in
the longest time frame. Common applications of the LRU algorithm include virtual memory management,
25
file buffering, and database buffering. Cost wise, LRU is infeasible and unaffordable, as it must maintain
the LRU ordering for each memory access. In practice, more practical cost-effective implementations of
LRU are used whose algorithms are discussed in subsequent sections.
Least Recently Used K (LRU-K)
The least recently used K (LRU-K) replacement algorithm [53, 54], a more practical implementation
of the LRU algorithm, measures the time intervals using counters tracking successive memory references
whose primary goal is to retain cache blocks with smaller reference or inter-arrival counts. Figure 17
illustrates the operation of the algorithm [53, 54]. The inter-arrival counts are known as the backward
Figure 17: LRU-K Cache Replacement Algorithm.
K-distance metric, denoted bt(p,K), where bt(p,K) is the backward distance to the Kth most recent
reference to cache block p, otherwise, if there have been less than K references, bt(p,K) is assigned the
value of ∞. LRU-K selects the cache block whose backward K-distance metric is the maximum value
from the last K memory references. In other words, cache blocks selected for replacement are those with
the largest K reference length. In cases where multiple cache blocks have the same maximum backward
K-distance value, a secondary selection algorithm must be employed to break ties. In practice, the LRU-K
replacement algorithm is widely used in database buffering applications.
Most Recently Used (MRU)
The most recently used (MRU) replacement algorithm selects the cache block that was used most
recently. The MRU algorithm has been found useful for applications using repeated sequential scanning
of file data. Figure 18 illustrates the operation of the algorithm [28]. It is most suitable for memory access
Figure 18: MRU Cache Replacement Algorithm.
26
patterns where the older data items are more likely to be accessed [28]. MRU is not widely used as several
new cache replacement algorithms have been recently developed that have successfully addressed cache
performance during sequential scans of file data.
Random Replacement (RR)
The random replacement (RR) algorithm randomly selects a cache blocks discarding the contents to
make room as needed. The RR algorithm requires no bookkeeping information about past memory access
history. The primary advantage of the random replacement approach is its simplicity. This replacement
policy has been employed on ARM Cortex-R series processors.
First-In First-Out (FIFO)
The first-in first-out (FIFO) algorithm, known as one of the simplest cache replacement algorithms,
stores cached memory blocks in a queue data structure. The most recently accessed memory blocks are
inserted at the back end and the least recently used or oldest memory blocks are located at the front
end of the queue. FIFO selects the oldest memory block at the front end for replacement. Figure 19
illustrates the operation of the algorithm [63]. Advantages of the FIFO approach is its simplicity and
Figure 19: FIFO Cache Replacement Algorithm.
associated low overhead. By itself, FIFO performs poorly in practical applications, hence it is rarely
used unmodified. Historically, the FIFO algorithm has been used in the VAX/VMS operating system,
with modifications [63]. The Second Chance (SC) algorithm [68], is widely known as one of the more
commonly used FIFO algorithm variants.
Least Frequently Used (LFU)
The least frequently used (LFU) replacement algorithm selects the cache block that has been referen-
ced the least amount. The LFU algorithm maintains usage counters that increment every time a cache
hit occurs for a particular cache block. Figure 20 illustrates the operation of the algorithm [4, 26]. A
frequency-based memory access workload characterization known as the Independent Reference Model
(IRM) has been extensively studied. Under IRM, memory access patterns occur in a randomly independent
manner according to some fixed probability distribution over all blocks in main memory. In essence, the
IRM model favors frequency-based memory access patterns or workloads. In accordance with the IRM
27
Figure 20: LFU Cache Replacement Algorithm.
model, the LFU cache replacement algorithm has been shown to be optimal [4, 26]. On the other hand,
disadvantages of LFU include logarithmic complexity, ignoring recent memory accesses, and performing
poorly as memory access patterns change, as its frequency bias can often result in stale cache contents that
may not be referenced again.
Least Recently/Frequently Used (LRFU)
The Least Recently/Frequently Used (LRFU) replacement algorithm [40, 41], utilizes a weighted re-
cency and frequency (CRF) metric that effectively combines and subsumes the LRU and LFU replacement
algorithms. A self-adaptive tuning parameter λ acts as the weighting factor. When λ approaches a value
of 1, the CRF value will favor access recency causing LRFU to behave like LRU. When λ approaches a
value of 0, the CRF value will favor access frequency causing LRFU to behave like LFU. One of the major
disadvantages of the LRFU approach is its high computational overhead.
Clock/ClockPro
The Clock replacement algorithm [2, 27], was designed to approximate the LRU algorithm. Advanta-
ges include implementation simplicity, minimal hardware support required, and it outperforms the FIFO
algorithm. Disadvantages include a relatively weak estimation of LRU algorithm, poor performance for
memory access patterns with weak locality properties, and a failure to account for frequency of access.
Clock is appropriately named after the means by which pages are selected for replacement. Figure 21
illustrates the operation of the algorithm [2, 27]. Each cache block maintains a bit known as the clock bit
representing recent usage. Each cache set requires a pointer called the clock hand responsible for tracking
the next cache block to examine. As each cache block is examined, its usage bit is extracted and inserted
into the MSB of a k-bit shift register as the previous register contents are shifted right by a single bit
position. If the usage bit is set, the cache block remains cached, otherwise, it is evicted. The usage bit is
subsequently cleared with the clock hand pointing to the current circular list entry is advanced. A revised
version of the algorithm, ClockPro [35], extended the basic Clock algorithm to consider both access re-
cency and access frequency. Figure 22 illustrates the operation of the algorithm [35]. Access frequency
28
Figure 21: Clock Cache Replacement Algorithm.
Figure 22: ClockPro Cache Replacement Algorithm.
information is added where each cache block is categorized as a hot block (frequently accessed) or a cold
block (not frequently accessed). This is implemented by three clock hands called hothand, coldhand, and
testhand. The coldhand pointer points to the least recently used cold cache block and is used to identify
candidate cold cache blocks to replace. The hothand pointer points to the hot cache block least recently
used and is used to modify its hot status to cold if not recently accessed. The testhand pointer is used to
test cold cache blocks as a means to identify resident cold blocks that are replacement candidates. Each
cache block maintains three status bits, namely, clock bit, test, and hot. The hot bit categorizes the block
as either hot or cold. The test bit is used for cold cache blocks to indicate its usage is being evalua-
ted or tested for replacement during the test period. If the cold cache block is not accessed during the
test period once terminated by the hothand, then it is evicted from cache memory. Advantages include
consideration of both frequency and recency data, implementation simplicity compared to LRU, minimal
additional hardware support in the form of three bits per cache block, three address pointers for each cache
set, and tangible performance improvements over the Clock algorithm. Disadvantages include increased
29
implementation complexity compared to Clock. Clock and ClockPro are primarily used in virtual memory
(VM) applications.
Segmented LRU (SLRU)
The segmented least recently used (SLRU) replacement algorithm uses an approach whereby the cache
is divided into two distinct segments. These segments are called the probationary segment and the pro-
tected segment. The cache contents of each segment are stored in order of most recently accessed to least
recently accessed. Data from cache hits are moved to the protected segment and placed in the cache block
or slot corresponding to the most recently accessed information. The data movement to the protected seg-
ment may displace a cache block already stored which is moved to the most recently used location of the
probationary segment. If the probationary segment is full, data must be discarded from the cache, where
cache blocks stored in the least recently used location are chosen for replacement. Figure 23 illustrates
the operation of the algorithm. Likewise, data from cache misses are moved to the probationary segment
Figure 23: Segmented LRU Cache Replacement Algorithm.
and placed in the cache block or slot corresponding to the most recently accessed information. One can
readily see that cache blocks stored in the protected cache segment have been accessed a minimum of two
times. The sizes of the probationary and the protected segments is a key SLRU algorithm parameter set in
accordance with expected memory access patterns.
Low Inter-Reference Recency Set (LIRS)
The Low Inter-Reference Recency Set (LIRS) replacement algorithm [36], like the SLRU approach, di-
vides the cache into two distinct partitions or sets. These sets are called the High Inter-Reference Recency
(HIR) block set and Low Inter-Reference Recency (LIR) block set. The LIR set comprises the majority
of the cache (size LLIRS) while the HIR set comprises a minority (size LHIRS). LIRS records history
30
information on each block called the recent Inter-Reference Recency (IRR) metric. In the LIRS approach,
the IRR metric measures the number of other memory blocks that are accessed between two consecutive
references to a specific cache block. In cache literature, the term reuse distance [29] is frequently used.
LIRS operates under the assumption that cache blocks with large IRR values are likely to continue to be
large on the next IRR metric update. In accordance with this assumption, cache blocks with large IRR
values are selected for replacement. The reasoning behind this assumption is cache blocks with large IRR
values are more likely to be later evicted before being re-referenced again. In other words, LIRS attempts
to keep cache blocks in the LIR set in the cache and selects blocks from the HIR set for replacement.
Cache blocks move between the LIR and HIR sets when the updated IRR value of a cached HIR block is
smaller than the maximum recency values of all LIR cache blocks. In this scenario, the LIR/HIR statuses
of the HIR block and the LIR block with the maximum recency value are then switched. In practice, a
maximum recency threshold RMAX is used to trigger LIR/HIR status switching. When an HIR cache
block has an updated IRR value smaller than the threshold, the associated cache block re-categorization
occurs. In essence, the threshold controls the relative difficulty or ease that an HIR cache block becomes
an LIR cache block.
Adaptive Replacement Cache (ARC)
The Adaptive Replacement Cache (ARC) replacement [50], is an adaptive online algorithm that ana-
lyzes recent memory access patterns to dynamically prioritize between recency favorable workloads and
frequency favorable workloads. Like the SLRU and LIRS approaches, ARC divides the cache into two
distinct partitions or sets, denoted as L1 and L2. Set L1 contains cache blocks that have been accessed
once, whereas set L2 contains cache blocks that have been accessed at least twice. L1 emphasizes access
recency whereas L2 emphasizes access frequency. Figure 24 illustrates the operation of the ARC algo-
rithm. The ARC sets L1 and L2 can be thought of as a cache directory with twice the number of cache
blocks as the cache memory contains. If we let c denote the number of memory blocks that are resident
in the cache memory, then the size of the cache directory sets L1 and L2 is 2c. ARC has been shown to
be what is known as scan-resistant, where ”one access only memory” requests remain in the L1 partition
of the ARC cache without evicting cache blocks with higher temporal locality stored in the L2 partition.
L1 and L2 are further subdivided into top and bottom subsets denoted as T1, B1, T2, and B2. Memory
blocks resident in the cache are stored in subsets T1 and T2 whose aggregate size is always c. The size
of set T2, denoted by p, is the set with higher locality memory blocks. The parameter p is known as the
ARC algorithms adaptive parameter. The size of set T1 is therefore c−p. Sets B1 and B2 contain memory
blocks not stored in the cache whose memory accesses conform to membership in sets L1 and L2 respecti-
31
Figure 24: ARC Cache Replacement Algorithm.
vely. The online nature of ARC results in continuous updates to the p parameter according to memory
access patterns. At a high level, pseudo cache hits in the B1 subset cause p to increase, whereas pseudo
cache hits in the B2 subset cause p to decrease. The size of the increase or decrease, denoted by learning
rate variables δ1 and δ2, is a function of the relative sizes of subsets B1 and B2. Similar in some respects
to the SLRU approach, ARC differs from SLRU in its dynamic adjustment of the sizes of the protected
segment (L2) and the probationary segment (L1). ARC attempts to predict the best possible partition of
cache memory by analyzing recently evicted cache blocks. In fiduciary terms, ARC invests in the cache
directory subset exhibiting the best performance in the recent past. Basically, ARC attempts to strike a
balance between the LRU and LFU approaches.
Clock with Adaptive Replacement (CAR)
The Clock with Adaptive Replacement (CAR) replacement algorithm [8], combines the adaptive re-
placement (ARC) and the Clock replacement algorithms. Like ARC, CAR maintains two ordered sets:
L1 stores recently accessed memory blocks and L2 stores frequently accessed memory blocks. L1 and
L2 are cache directories partitioned into subsets of resident cache blocks (T1 and T2) and non-resident
cache blocks (B1 and B2). The sizes of the resident cache block sets T1 and T2 are dynamically adjusted
in accordance with memory access patterns as per the parameter p. Further, as in ARC, the parameter
p is updated as pseudo cache hits occur to blocks in the non-resident cache block sets B1 and B2. The
difference with the CAR approach is the subsets T1 and T2 are managed using the Clock algorithm. The
benefit of using the Clock algorithm to manage T1 and T2 is a simpler hardware implementation. As the
32
sizes of T1 and T2 vary, the new entries are added to the list tail.
Clock with Adaptive Replacement with Temporal Filtering (CART)
The Clock with Adaptive Replacement with Temporal filtering (CART) replacement algorithm [8], was
designed to prevent correlated memory references from polluting the T2 cache subset. Correlated memory
references [61] are basically successive short-term cache hits that may not correspond to the long-term
utility that T2 embodies. The result of such cache pollution is a reduction in cache hit performance. This
phenomenon results from the fact that two successive memory references form the criteria by which ARC
and CAR use to characterize potential long-term utility of cache blocks. The improvement offered by
CART to address this limitation is the creation of what is termed a temporal filter whose objective is to
incorporate a temporal characteristic into the long-term utility test. The mechanism behind this temporary
filter is the use of a temporal locality window to distinguish between short term and long-term successive
memory references. Short term successive memory references fall inside the temporal locality window
whereas long term successive memory references fall outside. In CART, the temporal locality window is
an adaptive parameter updated in accordance with memory access patterns. Application of the temporal
filter assigns a utility status to each cache block, S for short term utility and L for long term utility. The
utility status is updated when cache misses occur. With respect to the ARC sets, the following conventions
are used: 1) cache blocks in T2 and B2 are assigned a long term utility status L; 2) cache blocks in B1 are
assigned a short term utility status L; and 3) cache blocks in T1 may be assigned either short term utility
status L or long term utility status L. Cache blocks at the head of T1 are replaced if the reference bit is
unset and the filter status is short term L, otherwise if the filter status is long term L, the cache block is
moved to the tail of T2. Cache blocks at the head of T1 are moved to the tail of T1 if the reference bit is
set and the filter status is short term L. Cache blocks at the head of T2 are replaced if the reference bit is
unset. Cache blocks at the head of T2 are moved to the tail of T1 if the reference bit is set. All cache block
movement results in the reference bit being cleared. CAR/CART outperforms LRU and CLOCK, while
offering comparable performance to ARC.
Summary
In this chapter, we briefly built the essential technical background required by the other chapters.
Since our work is cross-disciplinary, we carefully introduced the basic overview on notations, concepts,
and models, to make readers more comfortable with the main concepts covered in this document. While
we only provided a concise yet complete overview on related concepts and theory, additional details can
be found in relevant textbooks and research papers. In the next chapter, we will extend our discussion to
present a new approach for minimizing cache overhead in a limited preemption model. We achieve this via
33
an interdependent approach for computing CRPD and integrating it into a new algorithm for preemption
point placement in a linear control flow graph executing on a uniprocessor system.
34
CHAPTER 4 CRPD COMPUTATION
In the previous chapter, we introduced various cache models for uniprocessor systems to provide the
reader with the necessary background information. In this chapter, we present our research regarding a
new approach for computing CRPD that emphasizes the interdependence of preemption points along with
accounting for the cache memory blocks that are reloaded thereby realizing significant accuracy gains.
This chapter presents a methodology for analyzing cache related preemption delay in uniprocessor
hard-real-time systems. The first section presents brief introduction and overview of this research. The
second section provides an overview of the hardware, real-time, and cache models used throughout the
chapter. The enhanced CRPD computation approach supporting both linear and conditional control flow
graphs is detailed in the third section. The fourth section illustrates method of computing loaded cache
blocks using a concrete example. The concept of LCB interdependence is discussed in the fifth section.
Finally, the sixth section gives a chapter summary.
Introduction
The importance of CRPD in schedulability analysis stems from it comprising the majority of preemp-
tion overhead. CRPD occurs when a task denoted τi is preempted by one or more higher priority tasks
denoted τk. The execution of high priority tasks results in the eviction of cache memory blocks that must
be subsequently reloaded when task τi resumes execution. Two primary models of CRPD computation
exist, 1) the independent CRPD cost model, and 2) the interdependent CRPD cost model. The vast majo-
rity of CRPD research falls under the independent CRPD model. Here, costs are solely a function of the
preemption location under consideration. Since the next preemption may occur at any forward point in the
task code, independent CRPD methods must conservatively utilize the next code location corresponding to
the maximum CRPD cost. The interdependent CRPD cost model, however, overcomes this limitation by
considering and computing costs between each pair of task code locations thereby achieving more accu-
racy. A key factor in preemption location decisions, CRPD cost accuracy is of paramount importance to
schedulability and preemption placement algorithms.
This chapter discusses the following important contribution:
• We propose a new CRPD metric, called loaded cache blocks (LCB) which accurately characterizes
the CRPD a real-time task may be subjected to due to the preemptive execution of higher priority
tasks. Our objective is to account for the cache blocks that are reloaded between each set of potential
preemption points.
35
Interdependent CRPD Computation
While existing research has focused on computing the upper bounds on cache related preemption delay
(CRPD), our approach achieves higher accuracy by computing the loaded cache blocks (LCBs), as defined
below, due to higher priority task preemption by using the set of potential/chosen preemption locations.
The sets of various cache blocks are represented as sets of integers. We employ standard definitions of
CRPD terms UCB and ECB as described by [5, 39].
Definition 1. Useful Cache Block (UCB): A memory block m is called a useful cache block at program
point δj1i , if
(a) m may be cached at δj1i and
(b) m may be reused at program point δj2i that may be reached from δ
j1
i without eviction of m on this path.
More formally, cache blockm ∈ UCB(τi) if and only if τi hasm as a useful cache block in some cache-set
s. Note this definition of UCB embodies a task level view. Cache block m ∈ UCBout(δji ) if and only if δ
j
i
has m as a useful cache block in some cache-set s where
UCB(τi) =
⋃
δji∈τi
UCBout(δ
j
i ). (3)
The notation UCBout(δ
j
i ) is the set of useful cache blocks cached in task τi post-execution of basic block
δji . Similarly, the notation UCBin(δ
j
i ) is the set of useful cache blocks cached in task τi pre-execution of
basic block δji .
Definition 2. Evicting cache block (ECB): A memory blockm of the preempting task is called an evicting
cache block, if it may be accessed during the execution of the preempting task.
Cache block m ∈ ECB(τk) if and only if τk may evict m in some cache-set s. Note this definition of
ECB also embodies a task level view. Cache block m ∈ ECB(δji ) if and only if δ
j
k may evict m in some
cache-set s where
ECB(τk) =
⋃
δjk∈τk
ECB(δjk). (4)
The notation ECB(δjk) is the set of evicting cache blocks accessed in task τk during execution of basic block
δjk. In order to determine which cache blocks may be reloaded once preemption occurs, we introduce the
notion of an accessed useful cache block (AUCB).
Definition 3. Accessed useful cache block (AUCB): A memory block of the preempted task is called an
accessed useful cache block if it may be accessed during the execution of a basic block δji for task τi.
36
The term AUCBout(δ
j
i ) represents the useful cache blocks (UCBs) accessed by task τi during execution of
basic block at location δji . The definition of AUCB is introduced to capture the set of task accessed memory
at a specific basic block location subsequently used in the calculation of blocks that must be reloaded when
task preemptions occur. We compute the set of accessed useful cache blocks (AUCBs) as follows:
AUCBout(δ
j
i ) = UCBout(δ
j
i ) ∩ ECB(δ
j
i ). (5)
where UCBout(δ
j
i ) is the set of useful cache blocks for task τi post basic block δ
j
i execution; and ECB(δ
j
i )
denotes the set of cache blocks accessed in task τi during execution of basic block δ
j
i . It is important to note
that only cache block evictions due to preemption are considered, as intrinsic cache misses are captured
as part of WCET analysis in the term CNPi . Using the previously defined terms, we may now define and
explicitly compute the cache blocks that are re-loaded due to preemption which are called loaded cache
blocks (LCBs).
Definition 4. Loaded cache block (LCB): A memory block of the preempted task is called a loaded cache
block, if it may be re-loaded during the non-preemptive region (i.e., within a series of basic blocks with no
preemptions) immediately following a preemption.
LCB(δcurri , δ
next
i ) denotes the set of cache blocks re-loaded during execution of the non-preemptive region
between basic block δcurri and basic block δ
next
i , resulting from preemption of task τi, where basic block
location δcurri and δ
next
i are the potential/selected preemption point and next potential/selected preemption
point respectively. In our model, a preemption point located at basic block δcurri occurs at the edge between
δcurri and δ
curr+1
i .
The definition of LCB is introduced to capture the set of reloaded cache memory at a specific basic
block as a function of the current and next selected preemption points. Here, we account for the over-
head within a non-preemptive region for reloading UCBs that could have potentially been evicted by the
preemption occurring immediately after δcurri and used by some basic block prior to the preemption occur-
ring immediately after δnexti .
LCB(δcurri , δ
next
i ) = [UCB(δ
curr
i ) ∩ [∪ν∈λAUCB(δνi )]] ∩ [∪τk∈hp(i)ECB(τk)] (6)
where λ def= {ν|ν ∈ p; p ∈ Pi(Gi, δcurri , δnexti )}
The linear flowgraph structure is a special case of the conditional flowgraph structure as shown in Fi-
gure 8 where the equation for computing loaded cache blocks (LCBs) exhibits a simpler form as shown in
37
Equation 7:
LCB(δcurri , δ
next
i ) = [UCBout(δ
curr
i ) ∩ [∪ν∈λAUCBout(δνi )]] ∩ [∪τk∈hp(i)ECB(τk)] (7)
where λ def= {ν|ν ∈ [curr+1, curr+2, . . . , next]} and δcurri represents the current selected preemption
point where δcurri ∈ ρi and δnexti represents the next selected preemption point where δnexti ∈ ρi, and
ρi ⊆ {δ0i , δ1i , . . . , δ
Ni
i } is an ordered set by ascending index of selected preemption points for task τi:
ρi
def
= {δmi |δmi is a selected preemption point of task τi ∧m ∈ [0, 1, 2, . . . , Ni]}
This formula for LCB(δcurri , δ
next
i ) results in the accounting of loaded cache blocks where the preemption
occurs. Once we have the set of cache blocks that must be re-loaded due to preemption, the CRPD related
preemption overhead may be computed as shown below.
γi(δ
curr
i , δ
next
i ) = |LCB(δcurri , δnexti )| ·BRT. (8)
where BRT is the cache block reload time; and LCB(δcurri , δ
next
i ) represents the loaded cache blocks or
memory accessed by the preempted task τi at basic block δcurri caused by higher priority preempting tasks.
The modified preemption cost as a function of the current and next preemption points is given by:
ξi(δ
curr
i , δ
next
i ) = γi(δ
curr
i , δ
next
i ) + π + σ + η(γi(δ
curr
i , δ
next
i )). (9)
where π is the processor pipeline cost, σ is the scheduler processing cost, and η() is the front side bus
contention resulting from the cache reload interference as described in [55–57]. For convenience, Table 3
summarizes the terminology presented in this chapter.
To illustrate our approach for computing LCBs, two supporting examples are presented. The first
example demonstrates how LCBs are computed. The second example illustrates the interdependence of
LCBs.
Example LCB Calculation
To illustrate our approach for computing LCBs, consider the following example. Assume we have
two tasks, τ1 and τ2 with UCBs and ECBs for each listed in Figure 25. Please note that we have not
shown the ECBs, UCBs, or AUCBs in the following figures for δ0i as it is a dummy basic block with
no elements for each of the aforementioned sets. The computation for LCBs uses the accessed useful
cache blocks (AUCBs) since the cache blocks that are re-loaded during execution of the non-preemptive
region between preemption points is a function of the memory that is explicitly accessed by the preempted
38
Term Description
AUCBout(δ
j
i ) Accessed UCBs post basic block j execution
BRT Cache memory block reload time
δcurri Basic block curr denoting the current preemption
δnexti Basic block next denoting the next preemption
ECB(δjk) Accessed cache memory blocks during task τk basic block j execution
ECB(τk) Accessed cache memory blocks during task τk execution
ξi(δ
curr
i , δ
next
i ) Preemption cost for preemptions at basic blocks δ
curr
i and δ
next
i
LCB(δcurri , δ
next
i ) Loaded Cache Blocks for preemptions at δ
curr
i and δ
next
i
η(γi) The front side bus contention resulting from the cache reload interference for
preemptions at δcurri and δ
next
i
ρi The set of preemption points selected for task τi
π The processor pipeline cost
σ The scheduler processing cost
UCBin(δ
j
i ) The set of Useful Cache Blocks before task τi basic block j execution
UCBout(δ
j
i ) The set of Useful Cache Blocks after task τi basic block j execution
UCB(τi) The set of Useful Cache Blocks during task τi execution
γi(δ
curr
i , δ
next
i ) LCB cardinality for preemptions at basic blocks δ
curr
i and δ
next
i
Table 3: CRPD Terminology
task. The computed AUCBs for each task is shown in Figure 26. In accordance with Equation 7 one
can readily see that the AUCBs are simply the intersection of the UCBs and ECBs for each basic block.
In our example, assume preemptions are taken at basic blocks δ21 and δ
4
1 for task τ1. For simplicity, we
Task 
ID 
Evicting Cache Blocks ECB(
𝐢
𝐣) 
𝐢
𝟏
 𝐢
𝟐
 𝐢
𝟑
 𝐢
𝟒
 𝐢
𝟓
 
τ1 {1,2} {3,4,8} {4,5,6,8} {1,2,7,8} {1,2,7,8} 
τ2 {1,9} {3,10} {11,12} {5,7,13} {1,3,7,8} 
 
Task 
ID 
Useful Cache Blocks UCBout(𝐢
𝐣) 
𝐢
𝟏
 𝐢
𝟐
 𝐢
𝟑
 𝐢
𝟒
 𝐢
𝟓
 
τ1 {1,2} {1,2,4,8} {1,2,8} {1,2,7,8} {1,2,7,8} 
τ2 {1} {1,3} {1,3} {1,3,7} {1,3,7,8} 
 
Figure 25: Taskset ECBs and UCBs.
Task 
ID 
Accessed Useful Cache Blocks AUCBout(𝐢
𝐣) 
𝐢
𝟏
 𝐢
𝟐
 𝐢
𝟑
 𝐢
𝟒
 𝐢
𝟓
 
τ1 {1,2} {4,8} {8} {1,2,7,8} {1,2,7,8} 
τ2 {1} {3} {1,3} {7} {1,3,7,8} 
 
Figure 26: Taskset AUCBs.
calculate the LCBs associated with these two preemption points. For LCB(δ21 , δ
4
1), we have UCB(δ
2
1) =
{1, 2, 4, 8}. The second expression is the set of memory that is accessed in basic blocks δ31 and δ41 , namely
{8} ∪ {1, 2, 7, 8} = {1, 2, 7, 8} comprising the set of AUCBs. The third expression is the set of
ECBs for task τ2 where ECB(τ2) = {1, 3, 5, 7, 8, 9, 10, 11, 12, 13}. Thus, LCB(δ21 , δ41) is given by the
39
intersection of the three sets:
LCB(δ21 , δ
4
1) = {1, 2, 4, 8} ∩ {1, 2, 7, 8} ∩ {1, 3, 5, 7, 8, 9, 10, 11, 12, 13} = {1, 8}
The preemption cost γ(δ21 , δ
4
1) for a BRT = 390µs is given by:
γ(δ21 , δ
4
1) = |{1, 8}| · 390 = 780µs
Using the same method, LCB(δ41 , δ
5
1) is given by:
LCB(δ41 , δ
5
1) = {1, 2, 7, 8} ∩ {1, 2, 7, 8} ∩ {1, 3, 5, 7, 8, 9, 10, 11, 12, 13} = {1, 7, 8}
The preemption cost γ(δ41 , δ
5
1) for a BRT = 390µs is given by:
γ(δ41 , δ
5
1) = |{1, 7, 8}| · 390 = 1170µs
Using the method illustrated here, the preemption cost matrix entries for each pair of basic blocks are
computed in a similar fashion and used as input to our preemption point placement algorithm.
Example of LCB Interdependence
To further exemplify the interdependence of preemption points, consider the example shown below. In
order to account for all re-loaded cache blocks (LCBs), preemptions are always included at the first basic
block δ0i and the last basic block δ
Ni
i as shown in Figure 27. This is commensurate with the preemptions
that occur before and after the task executes. Assume we have two tasks where τ2 contains four basic
blocks which may be preempted by task τ1. For simplification, assume that the ECBs of task τ1 evicts all
UCBs of task τ2. Let us further assume that we have ρ2 = {δ02 , δ12 , δ22 , δ42}. Using our LCB computation
approach, only the re-loaded lines as captured in the terms LCB(δ12 , δ
2
2) and LCB(δ
2
2 , δ
4
2) are included in
the C2 computation. LCB(δ02 , δ
1
2) = 0 as no LCBs have been cached until after execution of basic block
δ12 .
Figure 27: LCB Interdependence
40
Summary
In this chapter, we presented a new method for computing a substantially improved interdependent
CRPD metric. Our novel new CRPD metric is utilized in the following chapters supporting our work in
preemption placement algorithms for linear and conditional CFGs.
41
CHAPTER 5 MINIMIZING CACHE OVERHEAD VIA LOADED
CACHE BLOCKS AND PREEMPTION PLACEMENT
In the previous chapter, we introduced various cache models for uniprocessor systems to provide the
reader with the necessary background information. In this chapter, we present our research for minimizing
cache overhead via loaded cache block calculation and preemption point placement for linear control flow
graphs [25].
Uniprocessor hard real-time systems have been widely using in an increasing number of real-time and
embedded systems. These systems need to operate under strict timing and application design constraints.
In this chapter, we discuss a theoretic framework to ensure hard-real-time deadlines on a uniprocessor
platform by minimizing the cache overhead when selecting task preemption points in a limited preemption
model. We use a new approach for computing CRPD that emphasizes the interdependence of preemption
points along with accounting for the cache memory blocks that are reloaded thereby realizing significant
accuracy gains. Also, we show how the real-time system designer can use our approach to minimize cache
overhead via the optimal selection of preemption points to ensure taskset schedulability constraints are
met.
This chapter presents a methodology for designing and analyzing linear control flow graphs in unipro-
cessor hard-real-time systems. The first section presents brief introduction and overview of this research.
The second section provides an overview of the hardware, real-time, and cache models used throughout
the chapter. The third section integrates our enhanced CRPD computation into taskset schedulability ana-
lysis for real-time systems scheduled with the Earliest Deadline First (EDF) algorithm. The fourth section
describes our innovative preemption point placement algorithm leveraging the integrated WCET/CRPD
computation to select a set of preemption points that minimizes cache overhead while ensuring taskset
schedulability. The fifth section describes the results of our simulations and related experiments. Finally,
the sixth section gives a chapter summary.
Introduction
One of the noticeable contributors to preemption overhead is due to cache related preemption delay
(CRPD). CRPD occurs when a task denoted τi is preempted by one or more higher priority tasks denoted
τk whose execution results in the eviction of cache memory blocks that must be subsequently reloaded
when task τi resumes execution. Limited preemption approaches have the advantage of reduced blocking
with a limited number of allowed preemptions while having the advantage of sections of non-preemptive
regions (NPRs) reducing the preemption overhead. One promising approach to implementing a limited
preemption approach is selecting preemption points for each task subject to the constraint on maximum
42
non-preemptive region execution time Qi. A paper by Bertogna et al. [14] proposed and realized a linear
time algorithm for selecting optimal preemption points for a sequential basic block structure. Basic blocks
are the vertices V of a control flow graph (CFG) connected in a sequence by edges E representing the
execution sequence of one or more job instructions. A sequential basic block structure implies that con-
ditional logic and branches are fully contained within basic block boundaries. Existing research utilizes
pessimistic CRPD costs that effectively limit the effectiveness of preemption point placement algorithms.
The primary contributions outlined in this paper include improved accuracy for computing CRPD cost ta-
king into account where preemptions actually occur, and providing an optimal preemption point placement
algorithm implemented via dynamic programming using the more accurate CRPD cost. Furthermore, we
demonstrate using a case study improved task set schedulability as compared to state-of-the-art methods.
This chapter discusses the following important contributions:
• We propose a new CRPD metric, called loaded cache blocks (LCB) which accurately characterizes
the CRPD a real-time task may be subjected to due to the preemptive execution of higher priority
tasks. Our objective is to account for the cache blocks that are reloaded between each set of potential
preemption points.
• We show how to integrate our new LCB metric into our newly developed algorithms that automati-
cally place preemption points supporting linear control flow graphs (CFGs) for limited preemption
scheduling applications.
• We empirically evaluate the breakdown utilization of tasks utilizing our preemption point placement
algorithm in a limited preemption scheduling environment. We demonstrate the superiority of our
approach versus several state of the art methods.
Our enhanced CRPD computation method was presented in Chapter 4.
Integrated WCET/CRPD Calculation
The modified preemption cost as a function of the current and next preemption points is given by:
ξi(δ
curr
i , δ
next
i ) = γi(δ
curr
i , δ
next
i ) + π + σ + η(γi(δ
curr
i , δ
next
i )). (10)
where π is the pipeline cost, σ is the scheduler processing cost, and η() is the front side bus contention
resulting from the cache reload interference as described in [55–57]. The output of our algorithm is
an optimal set of preemption points, with respect to our computed preemption cost ξi(), subject to the
maximum allowable non-preemption region Qi. The optimal set of preemption points obtained using
the enhanced accuracy of our preemption cost computation is used to calculate each task’s WCET with
preemption overhead given by:
43
Ci = Bi(ρi) = C
NP
i +
|ρi|−1∑
m=1
[ξi(ρ
m
i , ρ
m+1
i )] (11)
where ρmi is them
th selected preemption point for task τi. To further clarify what we mean by preemption,
we say that δmi is a “preemption point” if the preemption between basic blocks δ
m
i and δ
m+1
i is enabled,
meaning the scheduler may preempt between these two basic blocks. Commensurate with our preemption
point placement algorithm discussed later, preemptions are always taken at basic blocks δ0i and δ
Ni
i hence
δ0i , δ
Ni
i ∈ ρi for any feasible set of preemption points ρi. Supplemental clarification of this requirement
is shown in the example of LCB interdependence in the appendix. The complete problem formulation with
constraints for finding the minimum WCET with preemption overhead cost Bi(ρi) for task τi is given by:
Bi(ρi) = min
ρi∈τi
{[ |ρi|−1∑
m=1
ξi(ρ
m
i , ρ
m+1
i ) +
Ni∑
s=1
bsi
]
| Ψi(ρi) = True
}
(12)
The selection of optimal preemption points is subject to the constraint Ψi(ρi) that no non-preemptive
region in task τi exceeds the maximum allowable non-preemption region parameter Qi:
Ψi(ρi) =
 True, if qmi (ρi) ≤ Qi for m ∈ [1, |ρi| − 1]False, otherwise
 (13)
where qmi (ρi) represents the m
th non-preemptive-region (NPR) time for task τi, capturing the cost of the
preemption ρmi given that ρ
m+1
i is the next selected preemption point, plus the basic block cost of all
blocks between ρmi and ρ
m+1
i :
qmi (ρi) =
[
ξi(ρ
m
i , ρ
m+1
i ) +
∑
s:δsi∈{ρmi ,...,ρ
m+1
i }
bsi
]
(14)
Our approach employs the results of schedulability analysis and the aforementioned WCET + CRPD
calculation with the maximum allowable non-preemption region parameter Qi computed for each task
τi. The objective is to select a subset of preemption points that minimizes each tasks WCET + CRPD
parameter Ci. We introduce slight variations of terms Ψi, qi, and Bi, used in solving intermediate sub-
problems of our proposed algorithm. The selection of optimal preemption points is subject to the constraint
that no non-preemptive region in task τi exceeds the maximum allowable non-preemption region parameter
Qi:
Ψi(δ
j
i , δ
k
i ) =
 True, if qi(δ
j
i , δ
k
i ) ≤ Qi for δ
j
i , δ
k
i ∈ ρi
False, otherwise
 (15)
44
where qi(δ
j
i , δ
k
i ) represents a possible candidate optimal non-preemptive-region (NPR) time with succes-
sive preemption points at basic block locations δji and δ
k
i for task τi:
qi(δ
j
i , δ
k
i ) =
[
ξi(δ
j
i , δ
k
i ) +
k∑
n=j+1
bni
]
(16)
The next expression gives a recursive solution to the CRPD+WCET minimization problem for the subpro-
blem of the first i basic blocks. We compute the WCET including the preemption point δki using the term
Bi(δ
k
i ) for task τi as given by:
Bi(δ
k
i ) = min
m∈{0,1,...,k−1}
{[
Bi(δ
m
i ) + qi(δ
m
i , δ
k
i )
]
|Ψi(δmi , δki ) = True
}
(17)
when k ∈ {1, 2, . . . , Ni}. For the base case where k = 0, we have Bi(δ0i ) = 0. This recursive formu-
lation is necessary for developing the dynamic programming solution presented in the next section. The
following theorem shows our problem has optimal substructure, and thus the formulation of Equation 17
represents an optimal solution.
Theorem 1. The WCET + CRPD cost variable Bi(δki ) utilized in Equation 17 exhibits optimal substruc-
ture.
Proof. Let ∆ki be the subproblem of the sequential control flowgraph containing basic blocks {δ0i , δ1i , . . . , δki }.
To prove optimal substructure, we show that we can obtain the optimal set of preemption points for mini-
mizing the WCET+CRPD for any ∆ki by using the optimal solutions to subproblems ∆
0
i ,∆
1
i , . . . ,∆
k−1
i .
Let Bi(δ0i ), Bi(δ
1
i ), . . . , Bi(δ
k−1
i ) represent the cost of the optimal solution to these subproblems. We
need to show that Equation 17 represents the optimal cost to ∆ki .
By way of contradiction, assume there is a better feasible solution ρ′i for the sub-problem of deter-
mining the optimal limited preemption execution costs from basic block δ0i to basic block δ
k
i ; that is,
Bi(ρ
′
i) is strictly smaller than the solution to ∆
k
i obtained in Equation 17 (i.e., Bi(δ
k
i )). Let δ
`
i (where
` ∈ {0, 1, . . . , k−1}) be the last preemption point prior to δki in the set ρ′i, and let ρ′′i be the set of preemp-
tion points obtained from ρ′i by removing the preemption point after δ
k
i (i.e., ρ
′′
i is a solution to ∆
`
i). Thus,
we can represent the cost of the solution ρ′i (i.e., Bi(ρ
′
i) by the left-hand-side of the following inequality:
Bi(ρ
′′
i ) + qi(δ
`
i , δ
k
i ) < Bi(δ
k
i ). (18)
Since ρ′i is a feasible solution to ∆
k
i , it must be that Ψi(δ
`
i , δ
k
i ) is true. Hence, from Equation 17 and
the min operation, we can obtain an upper bound on Bi(δki ) by considering the solution to subproblem
45
∆`i :
Bi(δ
k
i ) ≤ Bi(δ`i ) + qi(δ`i , δki ). (19)
Combining the inequalities of Equations 18 and 19, we finally obtain Bi(ρ′′i ) < Bi(δ
`
i ). However, this
contradicts our assumption at the beginning of the proof that Bi(δ`i ) represented an optimal solution to
subproblem ∆`i . Thus, a solution ρ
′
i with smaller cost than Bi(δ
k
i ) cannot exist, and Equation 17 computes
the minimum obtainable cost for the problem ∆ki .
Preemption Point Placement Algorithm
Implementing a recursive algorithm directly from Equation 17 would lead to a computationally intrac-
table implementation. Instead, we now propose anO(N2i ) dynamic programming algorithm for computing
the optimal preemption points. Our dynamic programming preemption point placement algorithm is sum-
marized in Algorithm 1.
Algorithm 1 D.P. Optimal Preemption Point Placement
1: function Select Optimal PPP (Ni,bi,Qi,ξi)
2: Bi ← ∞ ρprev ← {δ0i };
3: if bki > Qi for some k ∈ {1, . . . , Ni} then
4: return INFEASIBLE;
5: end if
6: Bi(δ0i ) ← 0;
7: for k : 0 ≤ k ≤ Ni do
8: CNPi (δ
k
i ,δ
k
i )← 0;
9: qi(δki , δ
k
i )← 0;
10: for j : k − 1 ≥ j ≥ 0 do
11: CNPi (δ
j
i , δ
k
i )← b
j+1
i + C
NP
i (δ
j+1
i , δ
k
i );
12: qi(δ
j
i , δ
k
i )← ξi(δ
j
i , δ
k
i ) + C
NP
i (δ
j
i , δ
k
i );
13: if qi(δji , δki ) ≤ Qi then
14: Pcost ← Bi(δji ) + qi(δ
j
i , δ
k
i );
15: if Pcost < Bi(δki ) then
16: Bi(δki )← Pcost;
17: ρprev(δki )← δ
j
i ;
18: end if
19: end if
20: end for
21: end for
22: ρi ← Compute PPSet(Ni, ρprev);
23: return FEASIBLE;
24: end function
25:
26:
27: function Compute PPSet(Ni,ρprev)
28: Computes ρi from ρprev (Details omitted)
29: end function
For each task τi, we are given the following parameters: 1) the number of basic blocks Ni , 2) the non-
preemptive execution time of each basic block bi, 3) the maximum allowable non-preemptive region Qi,
46
and 4) the preemption cost matrix ξi. The preemption cost matrix ξi is organized for each basic block
δji and contains the preemption cost for all successor basic blocks of the task’s control flow graph. The
minimum preemption cost up to all basic blocks is computed and stored in an array denotedBi. Each entry
of the Bi array is initialized to infinity. As we consider whether each basic block δki is in the set of optimal
preemption points, the location of the previous basic block δji with minimal preemption cost is stored in an
array denoted ρprev. The algorithm examines each basic block from δ1i to δ
Ni
i to minimize the preemption
cost by traversing backwards from the current basic block δki under consideration in order to find the basic
block δji with minimal preemption cost subject to the constraint qi(δ
j
i , δ
k
i ) ≤ Qi. While each basic block
will have a predecessor with minimum preemption cost, the list of selected preemption points is obtained
by starting with basic block δNii and hopping to the predecessor basic block stored at ρprev(δ
Ni
i ), denoted
δmi . Basic blocks δ
Ni
i and δ
m
i are added to the optimal preemption point set ρi. This basic block hopping
process continues until basic block δ0i is reached. The set ρi contains the complete list of selected optimal
preemption points. To exemplify how our algorithm works, consider the following example.
 
i
1
 
 
H 
0 3 2 2 3 3 3 
i
2
 
 
i
3
 
 
i
4
 
 
i
5
 
 
i
6
 
 
i
0
 
 
Figure 28: Linear CFG Algorithm Example.
0 -  3 - 2 - 2 - 3 - 3 - 3 - 
i
0
 
i
0
  i
1
 
i
1
 i
2
 
i
2
 i
3
 
i
3
 i
4
 
i
4
 i
5
 
i
5
 i
6
 
i
6
 
 

i
 i
0
 i
1
 i
2
 i
3
 i
4
 i
5
 i
6
 
i
0
 0 1 2 4 4 3 2 
i
1
 - 0 3 5 6 4 3 
i
2
 - - 0 8 7 5 4 
i
3
 - - - 0 8 7 6 
i
4
 - - - - 0 6 7 
i
5
 - - - - - 0 8 
i
6
 - - - - - - 0 
 
Figure 29: Algorithm Example.
Let Ni = 6 and Qi = 12 for the following basic block structure with WCET and preemption costs
shown in Figure 29. The algorithm computes and stores the cumulative non-preemptive execution costs
for starting and ending basic block pairs in a matrix denoted CNPi (δ
j
i ,δ
k
i ). For example, C
NP
i (δ
1
i ,δ
3
i ) =
b2i + b
3
i = 2 + 2 = 4. We do not include b
1
i since the preemption occurs after execution of basic block
47
δ1i . Using this information, the combined WCET and CRPD costs for each basic block pair is computed
and stored in a matrix denoted qi. For example, qi(δ1i ,δ
3
i ) = C
NP
i (δ
1
i ,δ
3
i ) + ξi(δ
1
i ,δ
3
i ) = 4 + 5 = 9.
The remaining qi matrix entries are computed in a similar fashion. These matrices are shown in Figure 30.
The shaded cells in the qi matrix represent cases where the combined WCET and CRPD costs for these
basic block pairs exceed the maximum allowable non-preemptive region parameter Qi. During execution
of the algorithm, the minimum combined WCET + CRPD costs are computed for each basic block and
stored in an array denoted Bi. Basic block pairs with preemptive costs that are less than or equal to Qi
are candidates for selection. When a lower cost is determined for a given basic block, the predecessor
preemption point is updated in the ρprev array which keeps track of the selected predecessor preemption
points thereby forming a daisy chain containing the entire set of selected preemption points. The final
results are illustrated in Figures 30 and 31 below.
qi i
0
 i
1
 i
2
 i
3
 i
4
 i
5
 i
6
 
i
0
 0 4 7 11 14 17 19 
i
1
 - 0 5 9 13 14 16 
i
2
 - - 0 10 12 13 15 
i
3
 - - - 0 11 13 15 
i
4
 - - - - 0 9 13 
i
5
 - - - - - 0 11 
i
6
 - - - - - - 0 
 
Figure 30: Combined WCET and CPRD Costs.
Bi i
0
 i
1
 i
2
 i
3
 i
4
 i
5
 i
6
 
i
0
 - 4 7 11 19 28 39 
 

𝑃𝑅𝐸𝑉
 i
0
 i
1
 i
2
 i
3
 i
4
 i
5
 i
6
 
 𝐢
𝟎
 𝐢
𝟎
 𝐢
𝟎
 𝐢
𝟎
 𝐢
𝟐
 𝐢
𝟒
 𝐢
𝟓
 
 

𝑖
 i
0
 i
1
 i
2
 i
3
 i
4
 i
5
 i
6
 
 𝐢
𝟎
  𝐢
𝟐
  𝐢
𝟒
 𝐢
𝟓
 𝐢
𝟔
 
 
0 2 3 0 2 7 2 0 3 6 3 8 3 0 
i
0
 
i
0
 i
1
 
i
1
 i
2
 
i
2
 i
3
 
i
3
 i
4
 
i
4
 i
5
 
i
5
 i
6
 
i
6
 
 
Figure 31: Algorithm Results.
The maximum blocking time Qi that each task may tolerate utilizes the computed task WCET para-
meter Ci. The method for obtaining the maximum blocking time is eloquently summarized by Bertogna
et al. for the Earliest Deadline First (EDF) and Fixed Priority (FP) scheduling algorithms [14] [13]. The
48
circular dependency between the maximum blocking time Qi and task WCET Ci parameters suggests an
iterative approach to allow the two parameters to convergence to a steady state. One such iterative appro-
ach contains the following steps as given in Algorithm 2. Convergence of Algorithm 2 results primarily
from the fact that the optimal preemption points do not change unless the parameter Qi changes. Since
Qi represents the execution time slack available to a task τi, as expected Qi has been observed to exhibit
non-increasing values during each subsequent algorithm iteration. The need to subsume higher level pro-
gramming constructs being the prominent assumption of the linear basic block structure can potentially
diminish the utility of our approach if the non-preemptive execution time of any basic block violates the
constraint CNPi > Qi. For this case, we need to “break-up” the basic block by permitting preemption
every Qi time units.
Algorithm 2 Iterative Schedulability and Preemption Point Placement Algorithm
1: Assume the CRPD of the task system is initially zero.
2: repeat
3: Run the Baruah algorithm to obtain the maximum non-preemptive region Qi for each task.
4: Select optimal preemption points using our dynamic programming algorithm and CRPD calculation.
5: Compute the WCET + CRPD Ci from the selected preemption points.
6: until the selected preemption points do not change or the system is not feasible for the computed
WCET + CRPD.
7: The breakdown utilization is given by U.
Evaluation
The evaluation of our preemption point placement algorithm will embody two methods: 1) charac-
terization and measurement of preemption costs using real-time application code, and 2) a breakdown
utilization schedulability comparison for various CRPD computational approaches.
Preemption Cost Characterization
To characterize the behavior and estimate the benefit of the approach proposed in this paper, a case
study of representative tasks was performed. The ten tasks were randomly selected from the Malarda-
len University (MRTC) WCET benchmark suite [51]. Each task was built using Gaisler’s Bare-C Cross
Compiler [30] for the GRSIM LEON3 [31] simulated target.
Tasks were first analyzed using AbsInt’s a3 WCET [1] to determine the set of basic blocks. Next, the
basic blocks {δ0i , δ1i , δ2i , ..., δ
Ni
i }were serialized by recording their order during execution. Program points
were identified as the address of the final instruction of each basic block δji for j ∈ [0, Ni] to match the
sequential basic block structure used by our preemption point placement algorithm. Each program point
served as a breakpoint when running the task on the simulator. In accordance with the sequential basic
49
block structure used in our algorithm, only the data cache is used to compute CRPD as the linear basic
block structure offers only limited opportunities for instruction cache blocks to be revisited.
During the execution of each basic block, the data cache states were saved as ΥD(δ
j
i ) at each break-
point. The cache snapshots were selected as the final visit of each program point where the data cache
contents ΥD(δ
j
i ) were captured and recorded. From the final ΥD(δ
j
i ) snapshots, a conservative estimate
of LCBs shared between program points was determined.
Shared LCBs were calculated by intersecting the cache state snapshots from δji to δ
k
i , except δ
k
i . A
cache line that remains unchanged after the execution of {δj+1i , δ
j+2
i , ..., δ
k
i } will be present in the cache
before execution of the basic block that δki represents. It is only from these unchanged cache lines that the
shared LCBs between δji and δ
k
i can be selected. The complete set of unchanged cache lines serves as a
safe upper-bound on the LCBs shared between each basic block pair. The equation below formalizes this
idea, using the data cache snapshots ΥD(δ
j
i ).
LCB(δji , δ
k
i ) ⊆
k⋂
m=j+1
ΥD(δ
m
i ) (20)
Availability
This method may be verified and reproduced using the same tools and data. Gaisler’s compiler and
simulator are freely available. AbsInt’s a3 tool is available for educational and evaluation purposes. The
programs written, and data used in our work can be found on GitHub [24] thereby permitting the research
community to reproduce and leverage our work as needed.
Results
The results are presented as an illustration of the potential benefit of our proposed method, utilizing
pairs of preemptions to determine costs, over methods that consider only the maximum CRPD at a par-
ticular preemption point (e.g., Bertogna et al. [14]). In terms of LCB computation this implies that the
maximum LCB value over all subsequent program points must be used as the CRPD cost:
max{LCB(δji , δ
k
i ) | j < k} (21)
In the following graphs, each point in the graph represents two points in the program. The first point
of the program δji is fixed by the x-axis. The y-axis indicates the shared LCB count with a later program
point. The first graph shown in Figure 32 is for the recursion program of the MRTC benchmark suite. The
lines represent the minimum and maximum shared LCBs between various program points. At each point
on the graph, the index of the next preemption point associated with the respective minimum or maximum
50
graph value at that location is shown. At program point δ4i , the minimum CRPD value is coupled with
program point δ7i having a shared LCB count of 14 whereas the single-valued CRPD computation method
finds 24 shared LCBs coupled at program point δ5i . To compare the potential utility of a method that uses
 0
 5
 10
 15
 20
 25
 30
 0  1  2  3  4  5  6  7  8  9  10  11
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
03
07
10
07 07 09 09 10
10
Minimum
02
03
04
05 06
07
08
09
10
Maximum
Figure 32: Recursion Data Cache.
at each preemption a single-valued CRPD and our method which determines CRPD based on a pair of
adjacent preemption points, consider any program point on the horizontal access of Figures 32, 33, or 34.
For any program preemption point δki , a single-valued approach would, to be safe, use the value reported
in the larger-valued dashed line. However, an approach that considered pairs of preemptions, as in our
approach, can reduce the value and potentially obtain a CRPD reported on the solid line. The difference
between the performance of these two preemption point placement algorithms is an example of the benefit
provided by considering location aware CRPD cost.
The second graph represents the lms benchmark task data cache shown in Figure 33, and the third graph
represents the adpcm benchmark task data cache shown in Figure 34. Similar to the recursion benchmark
data cache, the variability in the minimum and maximum shared LCBs for each program point further
exemplifies the benefit of using location aware CRPD cost in preemption point placement. Maximum
and minimum data cache costs for the other seven tasks show similar variability but are not shown here
due to space limitations. In summary, the CRPD cost was consistently reduced or maintained compared
to the single-valued approach. A maximum of 68% reduction of contributing cache lines in the bsort
benchmark along with an average of 18.6% decrease over all benchmarks was notably observed. The
ability of our method to leverage this reduction leads to the schedulability improvements observed in the
next subsection. In examining the graphs for all tasks and data caches a few common traits were noted.
51
 0
 10
 20
 30
 40
 50
 60
 70
 80
 0  20  40  60  80  100  120  140  160  180
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
Minimum
Maximum
Figure 33: LMS Data Cache.
 0
 10
 20
 30
 40
 50
 60
 70
 80
 0  50  100  150  200  250
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
Maximum
Minimum
Figure 34: ADPCM Data Cache.
The minimum shared LCBs sharply increases at the end of each task’s execution. This is due to the nature
of the conservative estimation of shared LCBs and the tasks, and the number of instructions in the basic
block between the final program points is relatively small thereby limiting the number of data cache lines
that could be eliminated by the intersection.
Drastic spikes downwards in the shared LCB counts for the minimum and maximum curves coincide
with function call boundaries, or large conditional blocks. At these boundaries, the maximum and mini-
mum LCB counts are approximately the same. There is a sharp upward spike in the early program points
for the maximum curves. This trend is due to the early initialization blocks built into tasks. In our analysis,
52
the minimum curves show a clear benefit of selecting preemption points outside of the early initialization
section.
Breakdown Utilization
The previous analysis illustrates the benefit of using a more precise CRPD cost in preemption point
placement thereby reducing the preemption overhead for an individual task, however, it does not address
the benefits to task set schedulability. To evaluate task set schedulability benefits a second case study was
performed focusing on the breakdown utilization of the MRTC WCET benchmark suite [51] for various
algorithms. Our study compares the Lunniss et al. UCB only approach for EDF [47], the Bertogna et al.
Explicit Preemption Point Placement algorithm [14], and our improved Explicit Preemption Point Place-
ment algorithm. For notational convenience the UCB Only approach for EDF will be referred to as UOE,
the Bertogna Explicit Preemption Point Placement algorithm as BEPP , and our Explicit Preemption
Point Placement algorithm as EPP .
The appropriate schedulability test for UOE is comprised of three parts: γucbt,j , U
∗
j , and U
∗ each repre-
senting the maximum CRPD for task τj , the utilization of task τj including CRPD, and the utilization of
the task set respectively as documented in Lunniss et al. [47]. A task set is schedulable when U∗ ≤ 1.
Borrowing the breakdown utilization evaluation technique from [47], each task has its deadline and
period set to Ti = Pi = u · Ci where u is a constant. The constant, u, begins at the number of tasks (ten)
and is increased in steps of 0.25 until the task set becomes schedulable. Incremental negative adjustments
are then made to determine when the set becomes un-schedulable, indicating the final breakdown utiliza-
tion. For each task, the set of shared LCBs are calculated at each program point. Taking the maximum
shared LCB count for any task is safe and appropriate for calculating U∗.
For BEPP , the maximum shared LCB counts obtained in the earlier evaluation serve as input. Las-
tly, for EPP , the shared LCB counts obtained in the earlier evaluation serve as input for our enhanced
preemption point placement algorithm. The last input variables required for both approaches are Ci and
BRT . Ci was captured as the total number of cycles required to complete the task without preemptions.
The breakdown utilization study sweeps the BRT parameter from 10 µs to 390 µs, representing values
across several different types of processors.
The breakdown utilization determination leverages our iterative schedulability and preemption point
placement algorithm as outlined in the following steps below as given in Algorithm 3. Using ten tasks, the
breakdown utilization comparison between UOE, BEPP , and EPP are summarized in Figure 35.
53
Algorithm 3 Breakdown Utilization Evaluation Algorithm
1: Start with a task system that may or may not be feasible.
2: Assume the CRPD of the task system is initially zero.
3: Run the Iterative Schedulability and Preemption Point Placement Algorithm 2
4: if the task system is feasible/schedulable then
5: Increase the system utilization by decreasing the periods via a binary search.
6: else
7: Decrease the system utilization by increasing the periods via a binary search.
8: end if
9: The breakdown utilization is given by U.
Figure 35: Breakdown Utilization Comparison.
The breakdown utilization results indicate thatBEPP dominates the UOE algorithm primarily due to the
limited preemption model utilizing CRPD cost at the basic block level. EPP dominates BEPP resulting
from the more accurate location aware CRPD cost used in our preemption point placement algorithm. As
expected, the breakdown utilization converges for all three methods for small BRT values as the cache-
overhead becomes negligible.
Summary
In this work, we presented an enhanced approach for calculating the CRPD taking into account the
selected preemption points resulting in greater accuracy. Using a more precise CRPD calculation, we
also presented an improved algorithm for selecting a limited number of preemption points for each task
subject to schedulability constraints. Our improved preemption placement algorithm was demonstrated
to minimize the overall preemption cost, an important result in achieving schedulability in real-time sys-
tems. We highlighted the iterative nature of considering schedulability constraints in our preemption point
54
placement algorithm and proposed an algorithm combining schedulability analysis with limited preemp-
tion point placement. This approach effectively illustrates how the individual tasks non-preemption region
parameters Qi and the optimal selected preemption points will eventually converge. Furthermore, our
enhanced algorithm was demonstrated to be optimal in that if a feasible schedule is not found, then no
feasible schedule exists by any known method utilizing a static Qi value. Our algorithm was shown to run
in quadratic time complexity. Potential preemption points can be defined automatically using Gaisler’s
compiler and simulator along with AbsInt’s a3 tool or defined manually by the programmer during design
and implementation. Our experiments demonstrated the effectiveness of the enhanced CRPD calculation
by illustrating the benefits using the task set from the MRTC WCET benchmark suite [51]. We also de-
monstrated the benefits of our enhanced limited optimal preemption point placement algorithm and its
increased system schedulability as compared to other algorithms. While our task model is defined using a
linear sequence of basic blocks, it was deemed a highly suitable model to introduce our revised methods
for enhanced CRPD calculation and optimal limited preemption point placement.
In future work, we plan to 1) extend the techniques described here to set-associative caches, 2) perform
a schedulability comparison of synthetic task set for various preemption models, and 3) remove the linear
basic block restriction thereby permitting arbitrary control flow graphs.
55
CHAPTER 6 REALIZING IMPROVED PREEMPTION PLACEMENT
IN REAL-TIME PROGRAM CODE WITH INTERDEPENDENT
CACHE RELATED PREEMPTION DELAY
In the previous chapter, we introduced innovative methods for computing accurate CRPD, integrating
the enhanced CRPD with EDF schedulability analysis, and placing optimal preemption points to ensure
taskset schedulability for linear control flow graphs in uniprocessor systems. In this chapter, we present
our research for minimizing cache overhead via loaded cache block calculation and preemption point
placement for conditional control flow graphs.
This chapter presents a methodology for designing and analyzing conditional control flow graphs in
uniprocessor hard-real-time systems. The first section presents brief introduction and overview of this
research. The second section provides an overview of the hardware, real-time, and cache models used
throughout the chapter. The third section presents an informal problem statement outlining the various
real-time disciplines that collectively comprise our work. The fourth section summarizes the fixed priority
(FP) and earliest deadline first (EDF) schedulability analysis constraints for limited preemption schedu-
ling. The real-time conditional flow graph model is detailed in the fifth section. The sixth section presents
a formal problem statement describing the objective function for selecting preemptions in conditional flo-
wgraphs. The seventh section discusses our enhanced pseudo quartic time interdependent CRPD conditio-
nal preemption point placement algorithm. A case study using MRTC benchmarks demonstrates improved
task set schedulability in the eighth section. Finally, the nineth section gives a chapter summary.
Introduction
The utility of real-time system computations depends on two important properties, correctness and
timeliness. The timeliness property (the subject of schedulability analysis) is concerned with ensuring
real-time task computations are completed within required deadlines. Designers of real-time systems must
choose the scheduling paradigm that will ultimately determine if the real-time task set will meet its time-
liness objectives. The available choices are 1) non-preemptive scheduling, 2) fully preemptive scheduling,
and 3) limited preemption scheduling. Non-preemptive scheduling suffers from blocking of high priority
tasks and fully preemptive scheduling suffers from substantial preemption overhead (up to 44% [55–57]
of a tasks WCET) each approach degrading task set schedulability. Limited preemption scheduling at-
tempts to 1) reduce blocking by limiting the number of allowed preemptions, maximizing non-preemptive
task execution and 2) reduce preemption overhead via non-preemptive regions. Regardless of the chosen
scheduling paradigm, effective schedulability analysis of real-time task sets mandates accurate WCET and
CRPD estimates. In this paper, the recognized benefits of limited preemption scheduling motivate our
56
work on PPP algorithms.
PPP algorithms select preemption points for each task to 1) minimize the contribution of CRPD to
the task’s overall WCET, and 2) ensure the execution time between adjacent preemptions is limited by
the maximum non-preemptive region execution time. The maximum non-preemptive region execution
time, denoted Qi, is determined via task set schedulability analysis. The motivation behind our work is
the utility of existing PPP algorithms are limited either by the less accurate CRPD costs or by assuming
a linear code structure (i.e., no branches or loops are permitted) [13, 14, 25, 58]. Our approach removes
this linear code assumption and combines the interdependent CRPD cost model with an improved PPP
algorithm thereby reducing overall task WCET. The benefits of our approach will be illustrated in a case
study employing real-time tasks from the MRTC benchmark suite. This chapter discusses the following
important contributions:
• We revise the interdependent CRPD metric, called loaded cache blocks (LCB) to consider the com-
plexity of conditional control flow graphs.
• We show how to integrate our new LCB metric into our newly developed algorithms that auto-
matically place preemption points supporting conditional control flow graphs (CFGs) for limited
preemption scheduling applications.
• Our breakdown utilization methodology evaluates the effectiveness of our conditional preemption
placement algorithm using real-time code via the Malardalen MRTC benchmarks.
• We empirically evaluate the breakdown utilization of tasks utilizing our preemption point placement
algorithm in a limited preemption scheduling environment. We demonstrate the superiority of our
approach versus several state of the art methods.
Informal Problem Statement
In this section, we will summarize the subsequent sections and their use in our work using an informal
problem statement to set the proper context.
The problem we solve in this paper is to minimize the WCET+CRPD of each real-time task τi in a
task set τ whose code structure is represented by a real-time conditional flow graph Gi such that all tasks
meet their prescribed deadlines in accordance with the employed scheduling algorithm. To achieve this
objective, we propose a conditional preemption placement algorithm that computes CRPD costs using an
interdependent CRPD cost model. Task preemptions are subject to the constraint that all non-preemptive
regions must be less than or equal to the maximum allowable non-preemptive region parameter Qi deter-
mined via task characteristics and the scheduling algorithm.
Our work embodies several diverse real-time disciplines, namely, graph grammars, CRPD analysis,
57
preemption placement, and schedulability analysis. Each discipline is discussed in a separate section for
clarity. The foruth section details the fixed priority (FP) and earliest deadline first (EDF) schedulability
analysis constraints for limited preemption scheduling. These scheduling constraints limit preemption
placement in such a way that task set schedulability is achieved. Real-time conditional flow graphs along
with an overview of graph grammars are discussed in the fifth section. Graph grammars are used to parse
the real-time task code during preemption placement. Once these fundamental topics are covered, the
sixth section presents a formal problem statement describing the objective function for selecting preempti-
ons in conditional flowgraphs. Our conditional preemption placement algorithm parses the real-time task
code using a set of production rules presented in the seventh section. As each production rule is applied,
the WCET+CRPD cost and the associated preemption points are computed and stored for processing in
subsequent production rules. Our enhanced CRPD computation method was presented in Chapter 4.
Schedulability Analysis
In this section, analysis of EDF [9, 12] and FP [72] limited preemption scheduling, summarizes the
computation of the maximum non-preemptive region parameter Qi supporting our conditional and linear
preemption point placement algorithms.
The goal of schedulability analysis is to determine whether a taskset is schedulable under the worst-
case task activation pattern. The worst-case activation pattern for task τi, known as the critical instant,
results in the maximum response time. Earlier work [43] proved the critical instant for each task coincides
with the synchronous activation of the task with all higher priority tasks where all jobs released immedia-
tely in accordance with the minimum inter arrival time. For FP scheduling the set of higher priority tasks
is represented by:
k ∈ hp(i) = {k | k < i} (22)
For EDF scheduling, the set of higher priority tasks is represented by:
k ∈ hp(i) = {k | Dk < Di} (23)
To facilitate the schedulability analysis for FP scheduling, the request bound function [42]RBF (t) is used
to examine the maximum cumulative execution request in an interval of length t generated by jobs of τi.
RBFi(t) =
⌈ t
Ti
⌉
(CNPi + γi) (24)
58
where γi denotes the preemption cost due to preemption by higher priority tasks hp(i) during execution of
task τi. To facilitate the schedulability analysis for EDF scheduling, the demand bound function DBF (t)
is used to examine the maximum cumulative execution request in an interval of length t generated by jobs
of τi.
DBFi(t) =
(
1 +
⌈ t−Di
Ti
⌉)
(CNPi + γi) (25)
where γi denotes the preemption cost due to preemption by higher priority tasks hp(i) during execution of
task τi. Starting with the critical instant, the cumulative execution request in an interval t for task τi and
all higher priority tasks hp(i) is given by:
Wi(t) =
∑
j∈i,hp(i)
RBFj(t) (26)
Characterized by the non-preemptive regions inherent to limited preemption scheduling, the analysis must
take into account the longest NP region in the lower priority tasks. The maximum NP region in task τi is
given by:
qmaxi ≤ Qi = min
h∈hp(i)
βh (27)
For FP scheduling, the blocking factor Bi each task τi experiences is given by:
Bi = max
l∈lp(i)
{qmaxl } (28)
where the set of lower priority tasks lp(i) for FP scheduling is given by:
l ∈ lp(i) = {l | l > i} (29)
For EDF scheduling, the set of lower priority tasks is represented by:
l ∈ lp(i) = {l | Dl > Di} (30)
Since the lowest priority task τm is subjected to no blocking, by convention we have Bm = 0. We
can use the schedulability analysis derived for floating non-preemptive regions [15] to establish taskset
schedulability:
Wi(t) +Bi ≤ t, ∀ t ∈ mTi, m ∈ N (31)
59
Another way to look at this FP limited preemption schedulability constraint is to characterize the amount
of blocking tolerance βi that a task τi can withstand while meeting its deadlines.
βi = max
t∈A|t<Di
{
t−
∑
j∈hp(i),i
RBFj(t)
}
(32)
where A = {mTj ,m ∈ N, 1 ≤ j < n}. Similarly, for EDF, we have:
βi = min
t∈A|Di≤t<Di+1
{
t−
∑
τj∈τ
DBFj(t)
}
(33)
whereA = {mTj + Dj , m ∈ N, 1 ≤ j ≤ n}. With suitable expressions for βi introduced, we can express
the taskset limited preemption schedulability constraint for both FP and EDF scheduling as summarized
in Theorem 2 by Bertogna et al. [14].
Theorem 2. A task set τ is schedulable with limited preemption scheduling if, for all i | 1 ≤ i ≤ n,
Bi ≤ βi (34)
We can restate the taskset schedulability constraint in terms of the maximum non-preemptive region para-
meter Qi via Theorem 3 by Bertogna et al. [14].
Theorem 3. A task set τ is schedulable with limited preemption scheduling if, for all i | 1 < i ≤ n,
qmaxi ≤ Qi = min
h∈hp(i)
βh (35)
In summary, each task is permitted to execute non-preemptively for a maximum amount of time denoted
by Qi thereby ensuring taskset schedulability as long as preemption points are placed such that all non-
preemptive regions are less than or equal to Qi. For convenience, Table 4 summarizes the terminology
presented in this section.
Real-Time Conditional Flow Graph
The types of real-time conditional flow graphs used in our work belong to the class of graphs known
as series-parallel flow graphs [37]. We use the series/parallel terminology to describe the supported graph
composition steps. Series-parallel flow graphs can be created using varying sequences of four basic ope-
rations, namely graph creation, series composition, parallel composition and cyclic composition. Creation
of graph GAi consists of two basic blocks as vertices, δ
s
i and δ
e
i , containing a connecting directed edge
es,ei = (δ
s
i , δ
e
i ) as shown in Figure 36.
60
Term Description
Bi Task τi blocking factor
βi Task τi blocking tolerance
DBF i(t) Demand bound function specifying the maximum cumulative execution request
in an interval of length t generated by jobs of task τi
hp(i) The set of task τi higher priority tasks
k Index variable denoting the preempting task
l Index variable denoting τi blocking tasks
lp(i) The set of task τi lower priority tasks
qmaxj The maximum NP execution region in task τj
RBF i(t) The resource bound function specifying the maximum cumulative execution request in
an interval of length t generated by jobs of task τi
t Cumulative task set execution time
Wi(t) The cumulative execution request for task τi
γi The preemption cost during task τi execution
Table 4: Schedulability Terminology
 
+ = 
G𝑖
𝐴
 G𝑖
𝐵
 G𝑖
𝐶
 
 𝑖
 𝑠
  𝑖
 𝑒
  𝑖
 𝑠
  𝑖
 𝑒
 
Figure 36: Graph Creation.
Series composition takes two disjoint graphs, GAi and G
B
i creating a new graph G
C
i with a connecting
directed edge ee
a,sb
i = (δ
ea
i , δ
sb
i ) where δ
ea
i represents the sink node of graph G
A
i and δ
sb
i represents the
source node of graph GBi as shown in Figure 37.
 
+ = 
G𝑖
𝐴
 G𝑖
𝐵
 G𝑖
𝐶
 
 𝑖
 𝑠𝑏
  𝑖
 𝑒𝑏
  𝑖
 𝑠𝑎
  𝑖
 𝑒𝑎
  𝑖
 𝑠𝑐
  𝑖
 𝑒𝑐
 e𝑖
𝑒𝑎,𝑠𝑏
 
Figure 37: Series Composition.
Parallel composition takes four disjoint graphs, GAi , G
B
i , G
C
i , and G
D
i , creating a new graph G
E
i with
edges es,s
b
i = (δ
s
i , δ
sb
i ), e
s,sc
i = (δ
s
i , δ
sc
i ), e
eb,e
i = (δ
eb
i , δ
e
i ), and e
ec,e
i = (δ
ec
i , δ
e
i ) where δ
sb
i and δ
sc
i
represent the source nodes, and δe
b
i and δ
ec
i represent the sink nodes of graphs G
B
i and G
C
i respectively as
shown in Figure 38.
 
G𝑖
𝐴
 G𝑖
𝐵
 G𝑖
𝐶
 G𝑖
𝐷
 
G𝑖
𝐸
 
+ + + = 
 𝑖
 𝑠
  𝑖
 𝑠𝑏
  𝑖
 𝑒𝑏
  𝑖
 𝑠𝑐
 𝑖
 𝑒𝑐
  𝑖
 𝑒
  𝑖
 𝑠
  𝑖
𝑒 
𝑖
𝑠𝑏
  𝑖
𝑒𝑏 
 𝑖
 𝑠𝑐
  𝑖
𝑒𝑐 
 e𝑖
𝑠,𝑠𝑏    
 e𝑖
𝑠,𝑠𝑐
 
  e𝑖
𝑒𝑏,𝑒
 
  e𝑖
𝑒𝑐,𝑒
 
Figure 38: Parallel Composition.
61
Cyclic composition takes three disjoint graphs, GAi , G
B
i , and G
C
i , creating a new graph G
D
i with edges
es,s
b
i = (δ
s
i , δ
sb
i ), e
eb,e
i = (δ
eb
i , δ
e
i ), and e
e,s
i = (δ
e
i , δ
s
i ) where δ
sb
i represent the source node, and δ
eb
i
represent the sink node of graph GBi as shown in Figure 39.
 
G𝑖
𝐴
 
G𝑖
𝐵
 
G𝑖
𝐶
 
G𝑖
𝐷
 
+ + = 
 𝑖
 𝑠
  𝑖
 𝑠𝑏
  𝑖
 𝑒𝑏
  𝑖
 𝑒
  𝑖
 𝑠
  𝑖
 𝑒
  𝑖
 𝑠𝑏
  𝑖
 𝑒𝑏
 
 e𝑖
𝑠,𝑠𝑏
   e𝑖
𝑒𝑏,𝑒
 
Figure 39: Cyclic Composition.
Our previous work was limited to linear control flow graphs constructed using graph creation and
series composition operations only. To demonstrate the applicability of series-parallel graphs to modern
real-time applications, well known and commonly used real-time structured programming language con-
structs such as ordered linear statement sequences, if-then statements, if-then-else statements, switch sta-
tements, bounded unrolled loops, and inline functions [3] comprise the supported software artifacts. One
can readily observe that series-parallel graphs are partitioned into series and parallel connected linear code
sections of basic blocks each having single entry and exit points. Due to the constrained resources of most
real-time embedded processors, well written programs must employ a safe subset of programming lan-
guage constructs such as conditional statements, bounded loops, and efficient functions with no recursion
guaranteed to terminate within a bounded execution time in order to support cooperative tasking [71]. As
a result, the real-time task code represented via series-parallel graphs can be efficiently implemented by
most structured programming languages such as C. In the following sections, we present a context free
graph grammar describing the series-parallel graphs supported in our work. The use of a context free
grammar is compulsory for real-time conditional code preemption placement algorithms.
Grammar Background
In this section, we present a brief overview of context-free graph grammars. Historically, graph gram-
mars have been used to facilitate the code optimization phase of program compilation [37]. In our work,
we use a graph grammar to 1) recognize and construct control flow graphs conforming to the series-parallel
graph structure previously described, and 2) generate intermediate structured programmatic constructs that
can be efficiently solved as smaller subproblems and subsequently combined together to solve larger sub-
problems realizing our real-time conditional code-based PPP algorithm.
Formally, a grammar G defines a textual language L(G) that is parsed and recognized via a set of
production rules. Production rules are of the form LHS ← RHS where the left-hand side (LHS) is a
62
non-terminal string and the right-hand side (RHS) contains non-terminal and/or terminal strings. A non-
terminal string is a symbolic syntactic variable denoting some valid language construct. A terminal string
represents some abstract or symbolic construct that is part of a textual based language L(G). The ap-
plication of a production rule means the non-terminal string on the left-hand side is substituted for the
non-terminal and/or terminal strings on the right-hand side. The process of rewriting or substituting lan-
guage strings in this manner is called a derivation. In the compiler domain, the set of valid language
symbols are also known as tokens. Typically, these language symbols consist of numerical strings, key-
words, identifiers, or symbols, comprising a program. A grammar G whose production rules contain only
non-terminal strings on the left-hand side of each production is called a context free grammar. A context
free grammar G where each valid string S ∈ L(G) has a unique derivation sequence that recognizes S is
called unambiguous.
Like their textual counterparts, context free graph grammars consist of production rules containing
both non-terminal and terminal strings. However, in a graph grammar, a terminal string represents a single
vertex or basic block and a non-terminal string represents a set of vertices or basic blocks and the directed
edges connecting them. Therefore, we can think of a graph grammar as a set of production rules describing
how basic blocks are connected thereby representing a valid control flow graph G ∈ L(G). Formally, a
graph G is in the language L(G), if there exists a sequence of derivations, starting from a specified non-
terminal node, that uses the productions of G and results in graph G [58].
In the following subsection, we present a context free graph grammar specification describing the real-
time conditional CFGs that are addressed in our work. Real time code snippets exemplifying each grammar
production rule are presented along with each grammar production rule.
Grammar Specification
Our real-time conditional graph grammar production rules are specified in this paper in Backus-Naur
form (BNF) presented in the seventh section. Non-terminals are textually denoted between angle brackets
〈CB〉 and graphically denoted by an enclosing box. Terminals or basic blocks are textually denoted by
(δji , b
j
i ) where δ
j
i is the basic block identifier and b
j
i is the WCET, and graphically by a filled circle. The
limitations of series-parallel graphs eliminate the use of goto statements and return statements preceding
the end of a function. It is well known that real-time structured programs can be exclusively comprised of
sequential statements, conditional statements, functions, and loop statements only [17].
Problem Statement
Using our series-parallel graph structure, the objective is to select a set of effective preemption points
ρi that minimizes the WCET+CRPD of task τi whose real-time condition code is given by graph Gi, sub-
63
ject to the constraint that all non-preemptive regions must be less than or equal to the maximum allowable
non-preemptive region parameter Qi. The problem we solve in this paper is given by:
Problem Statement:
Given a real-time conditional flow graphGi ∈ L(G), an interdependent CRPD cost function ξi(δxi , δ
y
i ) and
WCET bji for each basic block, find a set of Effective Preemption Points (EPPs) ρi ⊆ E that minimizes
the cost function:
Φi(Gi, ρi)
def
= max
p∈Pi(Gi,δsi ,δei )
[
∑
δxi ∈p
bxi +
∑
(δx
i
,δ
y
i
)∈p,ρi
δxi ≺pδ
y
i
ξi(δ
x
i , δ
y
i )] (36)
subject to the constraint ∀p ∈ Pi(Gi, δsi , δei ), δwi ∈ ρi,∃e
u,v
i = (δ
u
i , δ
v
i ), e
x,y
i =
(δxi , δ
y
i ) where e
u,v
i , e
x,y
i ∈ ρi ::
[
∑
δw
i
∈p
δui pδwi pδxi
bwi + ξi(δ
u
i , δ
x
i )] ≤ Qi (37)
The cost function Φi(Gi, ρi) evaluates to the maximum cost across all paths p through the task code.
Preemption Point Placement Algorithm
In this section, we present a dynamic-programming algorithm that achieves an improved solution ρi
to the effective PPP problem compared to existing PPP methods. The set of selected feasible preemption
points ρi minimizes our WCET cost objective function Φi(Gi, ρi) in that any other set of preemption
points ρ
′
i would result in a WCET cost Φ
′
i(Gi, ρi) ≥ Φi(Gi, ρi). To sufficiently describe our dynamic-
programming algorithm, we first present a motivating example, then a high-level overview, followed by
a recursive formulation based on our real-time conditional context-free grammar G. The production rules
described in the recursive formulation are applied to the CFG as part of the individual parsing steps in a
bottom-up fashion.
Motivating Example
To present the benefits of preemption point placement using the interdependent CRPD model, consider
the following example as shown in Figure 40. The WCET costs are given for each basic block along with
the independent CRPD costs shown along each edge between adjacent basic blocks and summarized in
Figure 41. The interdependent CRPD cost matrix summarizes the CRPD costs for each pair of connected
basic blocks illustrating the opportunities for cost reduction as shown in Figure 42. Unconnected basic
blocks have −1 CRPD cost entries. The upward pointing arrows denote the minimum independent CRPD
64
 
D3 D2 
Z 
E3 E2 E1 
D1 
C B A H G F 
1 
0 3 4 1 
1 1 3 
2 2 
1 3 2 
0 1 3 
5 
7 
1 6 
5 
5 1 
6 
3 1 0 
Figure 40: Motivating Example.

i
 Z A B C D1 D2 D3 E1 E2 E3 F G H 
 0 1 3 5/7 5 1 6 1 1 3 3 1 0 
 
Figure 41: Independent CRPD Costs.
 

i
 Z A B C D1 D2 D3 E1 E2 E3 F G H 
Z -1 0 0 0 0 0 0 0 0 0 0 0 0 
A -1 -1 1 1 1 1 1 1 1 1 1 1 1 
B -1 -1 -1 3 3 3 3 3 3 2 2 2 1 
𝐶 -1 -1 -1 -1 5 4 1 7 7 1 4 2 1 
𝐷1 -1 -1 -1 -1 -1 5 5 -1 -1 -1 5 3 2 
D2 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 1 1 
D3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 4 2 
E1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 
E2 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 5 3 
E3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 5 5 2 
F -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 3 2 
G -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 
H -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 
Figure 42: Interdependent CRPD Costs.
cost solution whose WCET and preemption cost is 26. The downward pointing arrows denote the mini-
mum interdependent CRPD cost solution whose WCET and preemption cost is 22 for both paths. The
interdependent PPP algorithm chooses alternate preemption points (i.e. edges eC,D1, eC,E1, eD3,F , and
eE3,F ) in accordance with the reduced preemption cost at those locations thereby illustrating the benefits
of the interdependent CRPD model as highlighted in Figure 42.
High-Level Overview
Dynamic programming algorithms are used to efficiently implement complex algorithms where so-
lutions to smaller subproblems are computed, stored, and subsequently reused in the solutions to larger
subproblems. In our approach, we compute solutions to subsets of the real-time conditional control flow
graph as it is being constructed in accordance with our grammar G production rules. Grammar G is struc-
tured in such a way that solutions are computed in the following order at each level of the parse tree,
namely, 1) basic blocks, 2) linear sections, 3) conditional sections, and 4) aggregate block structures. We
65
use the maximum non-preemptive region parameter Qi as a suitable constraint on the number of compu-
ted solutions stored for each subgraph. We introduce two solution interface parameters, ζpred, and ζsucc,
denoting the non-preemptive execution times that a given solution presents to predecessor and successor
subgraphs when subgraphs are combined to form solutions to larger subgraphs. To illustrate this concept,
consider the following intermediate graph structure GAi with a proposed set of preemption points selected
as denoted by the up and down arrows shown in Figure 43. Alternatively, we can visualize the solution
 

𝑝𝑟𝑒𝑑
 
𝑠𝑢𝑐𝑐
 
Figure 43: Subgraph Solution Interface.
interface parameters ζpred and ζsucc as basic blocks whose execution times are ζpred units and ζsucc units
respectively as shown in Figure 44. In this simplistic model, we note that preemption is not permitted at the
exterior edges as the added basic blocks denote non-preemptive execution. We use infinite weight edges
to enforce non-preemption between some basic blocks. Therefore, for each subgraph, we must compute
 
𝑮𝒊
𝑨
 
  
  

𝒑𝒓𝒆𝒅
 
  

𝒔𝒖𝒄𝒄
   
Figure 44: Equivalent Subgraph Solution Interface.
at most (Qi + 1)2 distinct solutions for each value of ζpred and ζsucc in the range of [0 . . . Qi − 1]. We
can think of the WCET cost and associated preemption point solutions as a set of (Qi + 1) × (Qi + 1)
matrices, denoted as Φi(GAi , ζpred, ζsucc) and ρi(G
A
i , ζpred, ζsucc) respectively. Later, when subgraph G
A
i
is combined with other programmatic constructs in the parse tree to solve a larger subproblem, we use the
ζpred and ζsucc parameters to constrain which solutions from each subgraph can be combined and conside-
red as potential solutions for the larger subgraph. We introduce two functions used to identify the visible
predecessor preemption points, denoted ρpredi and the visible successor preemption points, denoted ρ
succ
i .
The function ρpredi (G
A
i , ζpred, ζsucc) returns the set of visible selected preemption points in the intermedi-
ate solution that may be reached along any path p ∈ Pi(GAi , δs
A
i , δ
eA
i ) starting at the first basic block δ
sA
i
and reaching some basic block δyi ∈ ρ
pred
i by executing non-preemptively subject to the constraint that
ζpred ≤ Qi as shown in Figure 45.
66
 
Figure 45: Visible Predecessor Preemption Points.
Similarly, the function ρsucci (G
A
i , ζpred, ζsucc) returns the set of visible selected preemption points in the
intermediate solution that may be reached along any path p ∈ Pi(GAi , δs
A
i , δ
eA
i ) starting at basic block
δyi ∈ ρsucci and reaching the ending basic block δe
A
i by executing non-preemptively subject to the constraint
that ζsucc ≤ Qi as shown in Figure 46.
 
Figure 46: Visible Successor Preemption Points.
Thus, the sets ρpredi and ρ
succ
i are used to determine the additive preemption cost between two subgraphs
whose solutions are being combined. In the next subsection, we present a recursive formulation that
achieves an effective minimized safe upper bound preemption solution for each larger subgraph as a com-
bination of the minimized safe upper bound preemption solutions to the respective smaller subgraphs.
Recursive Formulation
In accordance with our context-free grammar G, the various subgraphs GAi are created via applying
the production rules on a textual based graph description that conforms to the language L(G) presented
below. As each production rule or derivation is applied, a defined subset of basic blocks and their con-
nection relationships are assigned to subgraphs denoted by their non-terminal symbol. As each production
rule is encountered, the cost function ΦSGi , and the selected preemption points ρ
SG
i are computed over all
possible values of ζpred and ζsucc in the range [0 . . . Qi], where < SG > represents the larger subgraph
being constructed. For instance, the subgraph created by the linear blocks production is denoted with a
suitable abbreviation to establish a proper association between the production rule, the subgraph GSGi , the
WCET cost objective function ΦSGi , and the set of selected preemption points ρ
SG
i .
The time complexity with respect to each production rule relates to the computation of the new cost
and preemption matrices using the associated formulas. The time complexity is shown for the most pro-
found production rules in our grammar G. Moreover, the time complexity of parsing any graph Gi using
our production rules is highly dependent on the structures present in the real-time program code being
67
analyzed. For a graph containing Ni nodes the worst-case number of production rules executed is 2Ni− 1
or O(Ni).
We now present the grammar G production rules focusing on the computation of the WCET cost ob-
jective function Φi(GAi , ζpred, ζsucc) and the associated set of selected preemption points ρi(G
A
i , ζpred, ζsucc)
comprising the set of solutions generated at each level of the parse tree for all values of ζpred and ζsucc
in the range of [0 . . . Qi]. In this section, we present production rules P1 through P7 supporting the con-
ditional CFG and block structures, while the remaining production rules P8 through P13 containing the
other programming constructs such as loops, and functions are presented in the following subsections.
For instruction production rule P1, we have:
<SB>← (δji , b
j
i )
The following graphical example shown in Figure 47 illustrates production rule P1. A basic block is
identified by a label and its corresponding WCET. The label we use is the address of the instruction
contained in the basic block. Recall that by convention, basic blocks contain one or more instructions. In
our approach, each basic block contains a single instruction.
<SB> ← (𝐢
𝐣
, 𝐛𝐢
𝐣
) 
 
Figure 47: Production Rule P1.
The derivation of production rule P1 creates a subgraph GSBi containing a single basic block, δ
j
i . The
associated WCET cost and preemption point functions are given by:
Φ
SB(j)
i (ζpred, ζsucc) =

∞, if bji > Qi
bji , if (b
j
i + ζpred + ζsucc) ≤ Qi
bji , if (b
j
i + ζpred + ζsucc) > Qi
 (38)
ρ
SB(j)
i (ζpred, ζsucc) =

∅, if bji > Qi
∅, if (bji + ζpred + ζsucc) ≤ Qi
δji , if (b
j
i + ζpred + ζsucc) > Qi
 (39)
There are three distinct cases to consider for production rule P1. For case 1, the WCET bji of basic block
δji exceeds the maximum non-preemptive region Qi, hence there is no solution for this graph. For case
2, the sum of basic block δji WCET b
j
i , the preceding non-preemptive region ζpred, and the succeeding
68
non-preemptive region ζsucc is less than Qi, so no preemption is needed (ρi = ∅). For case 3, the sum of
basic block δji WCET b
j
i , the preceding non-preemptive region ζpred, and the succeeding non-preemptive
region ζsucc is greater than Qi, so a preemption is needed (ρi = δ
j
i ).
All real-time code examples presented in this section employ assembly language code for the MIPS pro-
cessor family compiled using the GCC compiler. The following real-time code example exemplifies the
application of production rule P1:
<SB>←

grammar representation : (i40013c, 4)
instruction code : 0x40013c lw v1, 8(sp)
WCET : 4 cycles

For conditional production rule P2, we have:
<CB>← <SB> [ <Blocks> ] [ <Blocks> ]∗ <SB>
The following graphical example shown in Figure 48 illustrates production rule P2. The conditional block
is comprised of two linear section blocks (for simplicity), and two connecting basic blocks, one located at
the start of the conditional, and one located at the end of the conditional.
              <CB>                 ←   <SB>   [ <Blocks> ]   [ <Blocks> ]*  <SB> 
 
 
 
 
Figure 48: Production Rule P2.
The derivation of production rule P2 creates a subgraph GCBi concatenating a single basic block, δ
j
i in
subgraph GSB(j)i , followed by one or more blocks each forming a conditional section in subgraph G
CSa
i
where a ∈ [1, r] and r denotes the number of conditional sections, ending with a single basic block, δki
in subgraph GSB(k)i . Solutions previously computed for the r conditional sections are combined with the
solutions computed for the leading and trailing basic blocks δji and δ
k
i respectively. The computation of
the cost and preemptions associated with the semantics of production rule P2 exhibits time complexity
executing in O(Ni4rQ2i ) time. Each <SB> contains 2 solutions with each of the r <Blocks> structures
69
containing (Qi + 1)2 solutions. The associated WCET1 cost and preemption point functions are given by:
ΦCBi (ζpred, ζsucc) = max
a∈N:1≤a≤r
{min
s,t,u
{ΦSB(j)i (ζpred, ζsuccs) + maxδmi ,δni [ξi(δmi , δni )] +
ΦCSai (ζpredt , ζsucct) + maxδvi ,δwi [ξi(δ
v
i , δ
w
i )] +
Φ
SB(k)
i (ζpredu , ζsucc)}}
(40)
where the variables in the min and max expressions (ζsuccs , ζpredt , ζsucct , ζpredu , δ
m
i , δ
n
i , δ
v
i , and δ
w
i )
represent values where the function ΦCBi (ζpred, ζsucc) is minimized subject to the following constraints:
(ζsuccs + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] + ζpredt) ≤ Qi (41)
(ζsucct + maxδvi ,δwi [ξi(δ
v
i , δ
w
i )] + ζpredu) ≤ Qi (42)
δmi ∈ ρsucci (G
SB(j)
i , ζpred, ζsuccs) (43)
δni ∈ ρ
pred
i (G
CSa
i , ζpredt , ζsucct) (44)
δvi ∈ ρsucci (GCSai , ζpredt , ζsucct) (45)
δwi ∈ ρ
pred
i (G
SB(k)
i , ζpredu , ζsucc) (46)
ρCBi (ζpred, ζsucc) = ρ
SB(j)
i (ζpred, ζsuccs)
r⋃
a=1
ρCSai (ζpredt , ζsucct)
⋃
ρ
SB(k)
i (ζpredu , ζsucc)
(47)
The solutions for conditional blocks GCBi are created by considering each of the two possible solutions
stored in subgraph GSB(j)i and subgraph G
SB(k)
i . For each pair of solutions, we iterate through (Qi + 1)
2
solutions stored in each of the r conditional section subgraphsGCSai determining the minimized combined
cost and preemption solutions. While each conditional section GCSai cost is minimized, the overall cost of
the conditional block GCBi is the maximum of all conditional sections considered selected for each value
of ζpred and ζsucc. While combining solutions for each conditional section, the maximum preemption
cost between the solutions for combined subgraphs GSB(j)i and G
CSa
i along with the combined subgraphs
GCSai and G
SB(k)
i are added to the costs of the solutions for each subgraph. The first pair uses the visible
successor preemptions of subgraph GSB(j)i and the visible predecessor preemptions of subgraph G
CSa
i .
Similarly, the second pair uses the visible successor preemptions of subgraph GCSai and the visible prede-
cessor preemptions of subgraph GSB(k)i . The preemptions for each minimized solution contain the current
preemption solutions for subgraphs GSB(j)i and G
SB(k)
i along with the union of each of the r minimized
1ζpred and ζsucc are bolded to denote that they are constants in determining the costs for Equations 40-47; all other ζ and δ
values vary over their respective ranges.
70
conditional section preemption solutions. The basic blocks δmi , δ
n
i , δ
v
i , and δ
w
i represent elements of the
predecessor and successor visible preemption sets used to determine the interdependent preemption cost
of the combined solutions. It is important to note that in order to maintain a safe cost bound, we must use
the cost associated with the worst-case predecessor/successor preemption points for each solution when
smaller solutions are combined into larger solutions. Formally, given the Φi and ρi functions for each
substructure of GAi where each ρ
A
i (ζpred, ζsucc) represents a feasible solution for substructure A given
preemptions ζpred before (resp., ζsucc) after and ΦAi represents a safe upper bound on the total WCET
and preemption cost of that solution. Thus, the solutions selected by our algorithm are minimized in ac-
cordance with our cost functions, however, our use of the maximum preemption cost when combining
solutions potentially destroys global optimality. This concept applies to all presented production rules.
Theorem 4. Given Φi and ρi functions for each substructure ofCB where each ρAi (ζpred, ζsucc) represents
a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a safe upper
bound on the total WCET and preemption cost of that solution. Applying production P2 over a feasible
Gi, GCBi and Qi results in a feasible solution ρ
CB
i and a safe upper bound Φ
CB
i given by Equations 40,
41-46, and 47 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level Qi
constraint is not violated and the cost function ΦCBi (ζpred, ζsucc) results in a safe upper bound. To prove
theQi constraint is not violated, we must show 1) the non-preemptive execution time of the combined solu-
tions does not exceedQi at each solution interface, and 2) the non-preemptive execution time of the combi-
ned solution at the new predecessor and successor interfaces does not exceedQi. Let Φ
SB(j)
i (ζpred, ζsuccs)
with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for subgraph G
SB(j)
i for basic
block δji , with its corresponding set of selected preemption points denoted by ρ
SB(j)
i (ζpred, ζsuccs) be a
limited preemption execution safe upper bound cost solution for basic block δji . We make an identical sta-
tement for subgraph GSB(k)i for basic block δ
k
i , whose cost function is denoted Φ
SB(k)
i (ζpredu , ζsucc), and
whose set of selected preemption points are denoted ρSB(k)i (ζpredu , ζsucc). We make a similar statement
for subgraph GCSai starting at basic block δ
scsa
i and ending at basic block δ
ecsa
i , whose cost function is
denoted ΦCSai (ζpredt , ζsucct), and whose set of selected preemption points are denoted ρ
CSa
i (ζpredt , ζsucct)
where a ∈ [1, r]. Since we have a safe upper bound cost solution for each of the combined subgraphs,
we can conclude that ΦCBi (ζpred, ζsucc) computed in Equation 40 represents a safe upper bound cost solu-
tion for the concatenated series subgraphs GSB(j)i
⋃r
a=1 G
CSr
i ∪ G
SB(k)
i starting at basic block δ
j
i , and
ending at basic block δki with its corresponding selected preemption points denoted by ρ
CB
i (ζpred, ζsucc)
and computed in Equation 47. Condition 1 is met in accordance with Equations 41-46 whose purpose is to
71
ensure the non-preemptive execution time of the combined solutions does not exceed Qi at each solution
interface. Condition 2 is met per the definition of the parameters ζpred, and ζsucc respectively, whose range
is given by [0 . . . Qi]. Thus, the problem finds a feasible safe upper bound cost preemption points solution
when applying production P2.
The following real-time code example exemplifies the application of production rule P2:
<SB>←

grammar representation : (i40010c, 4)
instruction code : 0x40010c beqz v0, 400118
WCET : 4 cycles

<Blocks>←

grammar representation : (i400110, 4)
(i400114, 4)
instruction code : 0x400110 addiu v0, v0,−1
0x400114 j 400120

<Blocks>←

grammar representation : (i400118, 4)
(i40011c, 4)
instruction code : 0x400118 lw v0, 12(sp)
0x40011c addiu v0, v0, 1

<SB>←

grammar representation : (i400120, 4)
instruction code : 0x400120 lw v0, 18(sp)
WCET : 4 cycles

Once the production rules for the four sub-components have been applied, they are subsequently aggregated
into a conditional block <CB> as follows:
<CB>← <SB> [ <Blocks> ] [ <Blocks> ]∗ <SB>
For blocks production rule P3, we have:
<Blocks>← [ <SB> ]
The following graphical example shown in Figure 49 illustrates production rule P3. Here a single basic
72
block is subsumed into an aggregate block structure.
<Blocks> ← <SB>  
 
 
Figure 49: Production Rule P3.
The derivation of production rule P3 creates a subgraph GBLKSi that is equivalent to the subgraph GSBi .
The associated WCET cost function is given by:
ΦBLKSi (ζpred, ζsucc) = Φ
SB
i (ζpred, ζsucc) (48)
The associated set of selected preemption points function is given by:
ρBLKSi (ζpred, ζsucc) = ρ
SB
i (ζpred, ζsucc) (49)
The existing single block cost and preemption solutions are copied to the aggregate blocks structure.
The following real-time code example exemplifies the application of production rule P3:
<SB>←

grammar representation : (i40013c, 4)
instruction code : 0x40013c lw v1, 8(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <SB> ]
For blocks production rule P4, we have:
<Blocks>← [ <CB> ]
The following graphical example shown in Figure 50 illustrates production rule P4. Here a single condi-
tional block structure is subsumed into an aggregate block structure.
<Blocks>                           ←                  <CB> 
 
 
 
Figure 50: Production Rule P4.
73
The derivation of production rule P4 creates a subgraph GBLKSi that is equivalent to the subgraph GCBi .
The associated WCET cost function is given by:
ΦBLKSi (ζpred, ζsucc) = Φ
CB
i (ζpred, ζsucc) (50)
The associated set of selected preemption points function is given by:
ρBLKSi (ζpred, ζsucc) = ρ
CB
i (ζpred, ζsucc) (51)
The existing conditional block cost and preemption solutions are copied to the aggregate blocks structure.
The following real-time code example exemplifies the application of production rule P4:
<SB>←

grammar representation : (i40010c, 4)
instruction code : 0x40010c beqz v0, 400118
WCET : 4 cycles

<Blocks>←

grammar representation : (i400110, 4)
(i400114, 4)
instruction code : 0x400110 addiu v0, v0,−1
0x400114 j 400120

<Blocks>←

grammar representation : (i400118, 4)
(i40011c, 4)
instruction code : 0x400118 lw v0, 12(sp)
0x40011c addiu v0, v0, 1

<SB>←

grammar representation : (i400120, 4)
instruction code : 0x400120 lw v0, 18(sp)
WCET : 4 cycles

74
Once the production rules for the four sub-components have been applied, they are subsequently aggregated
into a conditional block as follows:
<CB>← <SB> [ <Blocks> ] [ <Blocks> ]∗ <SB>
Once the production rule for the conditional block sub-component has been applied, it is subsequently
aggregated into a <Blocks> structure as follows:
<Blocks>← [ <CB> ]
For aggregate blocks production rule P5, we have:
<Blocks>← <SB> <Blocks>
The following graphical example shown in Figure 51 illustrates production rule P5. Here a single basic
block is concatenated onto the front of an existing aggregate block structure.
 <Blocks>       ←   <SB> <Blocks> 
 
 
Figure 51: Production Rule P5.
The derivation of production rule P5 creates a subgraph GBLKS
′
i concatenating a previously created ag-
gregate blocks basic block subgraph GSBi in series with a previously created subgraph G
BLKS
i . The
associated WCET cost function is given by:
ΦBLKS
′
i (ζpred, ζsucc) = minr,s{(ΦSBi (ζpred, ζsuccr) + max
δmi ,δ
n
i
[ξi(δ
m
i , δ
n
i )] +
ΦBLKSi (ζpreds , ζsucc)}
(52)
where ζsuccr , and ζpreds represent the values where the function Φ
BLKS
′
i (ζpred, ζsucc) is minimized and
valid solution combinations are subject to the following constraints:
(ζsuccr + max
δmi ,δ
n
i
[ξi(δ
m
i , δ
n
i )] + ζpreds) ≤ Qi (53)
δmi ∈ ρsucci (GSBi , ζpred, ζsuccr) (54)
δni ∈ ρ
pred
i (G
BLKS
i , ζpreds , ζsucc) (55)
75
The associated preemption point function is given by:
ρBLKS
′
i (ζpred, ζsucc) = ρ
SB
i (ζpred, ζsuccr) ∪ ρBLKSi (ζpreds , ζsucc) (56)
The solutions for aggregate blocks subgraph GBLKS
′
i are created by considering each of the two possible
solutions stored in subgraph GSBi . For each of the subgraph G
SB
i solutions, we iterate through each of the
(Qi + 1)
2 aggregate block GBLKSi subgraph solutions determining the minimized cost and preemption
solutions. While combining each pair of solutions, the minimum preemption cost between the combined
solutions for subgraphsGSBi andG
BLKS
i are selected for each value of ζpred and ζsucc. The algorithm uses
the visible successor preemptions of subgraph GSBi and the visible predecessor preemptions of subgraph
GBLKSi . The preemptions for each minimized solution contain the union of selected preemption solutions
for subgraph GSBi and subgraph G
BLKS
i .
Theorem 5. Given Φi and ρi functions for each substructure of BLKS where each ρAi (ζpred, ζsucc)
represents a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a
safe upper bound on the total WCET and preemption cost of that solution. Applying production P5 over
a feasible Gi, GBLKSi and Qi results in a feasible solution ρ
BLKS
i and a safe upper bound Φ
BLKS
i given
by Equations 52, 53-55, and 56 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level
Qi constraint is not violated and the cost function ΦBLKS
′
i (ζpred, ζsucc) results in a safe upper bound.
To prove the Qi constraint is not violated, we must show 1) the non-preemptive execution time of the
combined solutions does not exceed Qi at each solution interface, and 2) the non-preemptive execution
time of the combined solution at the new predecessor and successor interfaces does not exceed Qi. Let
ΦBLKSi (ζpred, ζsuccs) with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for subgraph
GBLKSi , with its corresponding set of selected preemption points denoted by ρ
BLKS
i (ζpred, ζsuccs) be a
limited preemption execution safe upper bound cost solution for subgraph GBLKSi . We make an identical
statement for subgraphGSBi , whose cost function is denoted Φ
SB
i (ζpredu , ζsucc), and whose set of selected
preemption points are denoted ρSBi (ζpredu , ζsucc). Since we have a safe upper bound cost solution for
each of the combined subgraphs, we can conclude that ΦBLKS
′
i (ζpred, ζsucc) computed in Equation 52
represents a safe upper bound cost solution for the concatenated series subgraphsGSBi ∪ GBLKSi , with its
corresponding selected preemption points denoted by ρBLKS
′
i (ζpred, ζsucc) and computed in Equation 56.
Condition 1 is met in accordance with Equations 53-55 whose purpose is to ensure the non-preemptive
execution time of the combined solutions does not exceedQi at each solution interface. Condition 2 is met
per the definition of the parameters ζpred, and ζsucc respectively, whose range is given by [0 . . . Qi]. Thus,
76
the problem finds a feasible safe upper bound cost preemption points solution when applying production
P5.
The following real-time code example exemplifies the application of production rule P5:
<SB>←

grammar representation : (i40013c, 4)
instruction code : 0x40013c lw v1, 8(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <SB> ]
The next instruction in the sequence will be parsed as a single block <SB> structure as follows:
<SB>←

grammar representation : (i400140, 4)
instruction code : 0x400140 sw v1, 4(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it along with the previous
blocks <Blocks> structure are subsequently aggregated into a <Blocks> structure as follows:
<Blocks>← <SB> <Blocks>
For aggregate blocks production rule P6, we have:
<Blocks>← <CB> <Blocks>
The following graphical example shown in Figure 52 illustrates production rule P6. Here a single condi-
tional block is concatenated onto the front of an existing aggregate block structure.
<Blocks>                                      ←                  <CB>                     <Blocks>           
 
 
 
Figure 52: Production Rule P6.
The derivation of production rule P6 creates a subgraph GBLKS
′
i concatenating a previously created con-
ditional block subgraph GCBi in series with a previously created aggregate blocks subgraph G
BLKS
i . Pro-
77
duction rule P6 exhibits the maximum time complexity for our algorithm executing in O(NiQ4i ) time.
Each <Blocks> and <CB> structure contains (Qi + 1)2 solutions. The associated WCET cost function is
given by:
ΦBLKS
′
i (ζpred, ζsucc) = minr,s{(ΦCBi (ζpred, ζsuccr) + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] +
ΦBLKSi (ζpreds , ζsucc)}
(57)
where ζsuccr , and ζpreds represent the values where the function Φ
BLKS
′
i (ζpred, ζsucc) is minimized and
valid solution combinations are subject to the following constraints:
(ζsuccr + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] + ζpreds) ≤ Qi (58)
δmi ∈ ρsucci (GCBi , ζpred, ζsuccr) (59)
δni ∈ ρ
pred
i (G
BLKS
i , ζpreds , ζsucc) (60)
The associated preemption point function is given by:
ρBLKS
′
i (ζpred, ζsucc) = ρ
CB
i (ζpred, ζsuccr) ∪ ρBLKSi (ζpreds , ζsucc) (61)
The solutions for aggregate blocks subgraph GBLKS
′
i are created by considering each of the (Qi + 1)
2
possible solutions stored in subgraph GCBi . For each of the subgraph G
CB
i solutions, we iterate through
each of the (Qi + 1)2 aggregate block GBLKSi subgraph solutions determining the minimized cost and
preemption solutions. While combining each pair of solutions, the minimum preemption cost between the
combined solutions for subgraphs GCBi and G
BLKS
i are selected for each value of ζpred and ζsucc. The
algorithm uses the visible successor preemptions of subgraph GCBi and the visible predecessor preemp-
tions of subgraph GBLKSi . The preemptions for each minimized solution contain the union of selected
preemption solutions for subgraph GCBi and subgraph G
BLKS
i .
Theorem 6. Given Φi and ρi functions for each substructure of BLKS where each ρAi (ζpred, ζsucc)
represents a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a
safe upper bound on the total WCET and preemption cost of that solution. Applying production P6 over
a feasible Gi, GBLKSi and Qi results in a feasible solution ρ
BLKS
i and a safe upper bound Φ
BLKS
i given
by Equations 57, 58-60, and 61 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level
Qi constraint is not violated and the cost function ΦBLKS
′
i (ζpred, ζsucc) results in a safe upper bound.
To prove the Qi constraint is not violated, we must show 1) the non-preemptive execution time of the
combined solutions does not exceed Qi at each solution interface, and 2) the non-preemptive execution
78
time of the combined solution at the new predecessor and successor interfaces does not exceed Qi. Let
ΦBLKSi (ζpred, ζsuccs) with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for subgraph
GBLKSi , with its corresponding set of selected preemption points denoted by ρ
BLKS
i (ζpred, ζsuccs) be a
limited preemption execution safe upper bound cost solution for subgraph GBLKSi . We make an identical
statement for subgraphGCBi , whose cost function is denoted Φ
CB
i (ζpredu , ζsucc), and whose set of selected
preemption points are denoted ρCBi (ζpredu , ζsucc). Since we have a safe upper bound cost solution for
each of the combined subgraphs, we can conclude that ΦBLKS
′
i (ζpred, ζsucc) computed in Equation 57
represents a safe upper bound cost solution for the concatenated series subgraphs GCBi ∪ GBLKSi with its
corresponding selected preemption points denoted by ρBLKS
′
i (ζpred, ζsucc) and computed in Equation 61.
Condition 1 is met in accordance with Equations 58-60 whose purpose is to ensure the non-preemptive
execution time of the combined solutions does not exceedQi at each solution interface. Condition 2 is met
per the definition of the parameters ζpred, and ζsucc respectively, whose range is given by [0 . . . Qi]. Thus,
the problem finds a feasible safe upper bound cost preemption points solution when applying production
P6.
The following real-time code example exemplifies the application of production rule P6:
<SB>←

grammar representation : (i400124, 4)
instruction code : 0x400124 lw v1, 8(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <SB> ]
The next instruction in the sequence will be parsed as a conditional block <CB> structure as follows:
<SB>←

grammar representation : (i40010c, 4)
instruction code : 0x40010c beqz v0, 400118
WCET : 4 cycles

79
<Blocks>←

grammar representation : (i400110, 4)
(i400114, 4)
instruction code : 0x400110 addiu v0, v0,−1
0x400114 j 400120

<Blocks>←

grammar representation : (i400118, 4)
(i40011c, 4)
instruction code : 0x400118 lw v0, 12(sp)
0x40011c addiu v0, v0, 1

<SB>←

grammar representation : (i400120, 4)
instruction code : 0x400120 lw v0, 18(sp)
WCET : 4 cycles

Once the production rules for the four sub-components have been applied, they are subsequently aggregated
into a conditional block <CB> as follows:
<CB>← <SB> [ <Blocks> ] [ <Blocks> ]∗ <SB>
Once the production rule for the conditional block <CB> sub-component has been applied, it along with
the previous blocks <Blocks> structure are subsequently aggregated into a <Blocks> structure as follows:
<Blocks>← <CB> <Blocks>
For aggregate blocks production rule P7, we have:
<Blocks>← <Blocks> <Blocks>
The following graphical example shown in Figure 53 illustrates production rule P7. Here two aggregate
block structures are concatenated together to create a new aggregate block structure.
<Blocks>                                      ←               <Blocks>              <Blocks>           
 
 
 
Figure 53: Production Rule P7.
80
The derivation of production rule P7 creates a subgraph GBLKS
′
i concatenating a previously created ag-
gregate blocks subgraph GBLKSai in series with a previously created aggregate blocks subgraph G
BLKSb
i .
Production rule P7 exhibits the maximum time complexity for our algorithm executing in O(NiQ4i ) time.
Each <Blocks> contains (Qi + 1) solutions. The associated WCET cost function is given by:
ΦBLKS
′
i (ζpred, ζsucc) = minr,s{(Φ
BLKSa
i (ζpred, ζsuccr) + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] +
ΦBLKSbi (ζpreds , ζsucc)}
(62)
where ζsuccr , and ζpreds represent the values where the function Φ
BLKS
′
i (ζpred, ζsucc) is minimized and
valid solution combinations are subject to the following constraints:
(ζsuccr + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] + ζpreds) ≤ Qi (63)
δmi ∈ ρsucci (G
BLKSa
i , ζpred, ζsuccr) (64)
δni ∈ ρ
pred
i (G
BLKSb
i , ζpreds , ζsucc) (65)
The associated preemption point function is given by:
ρBLKS
′
i (ζpred, ζsucc) = ρ
BLKSa
i (ζpred, ζsuccr) ∪ ρ
BLKSb
i (ζpreds , ζsucc) (66)
The solutions for aggregate blocks subgraph GBLKS
′
i are created by considering each of the (Qi + 1)
2
possible solutions stored in subgraph GBLKSai . For each of the subgraph G
BLKSa
i solutions, we iterate
through each of the (Qi + 1)2 aggregate block G
BLKSb
i subgraph solutions determining the minimized
cost and preemption solutions. While combining each pair of solutions, the minimum preemption cost
between the combined solutions for subgraphs GBLKSai and G
BLKSb
i are selected for each value of ζpred
and ζsucc. The algorithm uses the visible successor preemptions of subgraph GBLKSai and the visible
predecessor preemptions of subgraph GBLKSbi . The preemptions for each minimized solution contain the
union of selected preemption solutions for subgraph GBLKSai and subgraph G
BLKSb
i .
Theorem 7. Given Φi and ρi functions for each substructure of BLKS where each ρAi (ζpred, ζsucc)
represents a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a
safe upper bound on the total WCET and preemption cost of that solution. Applying production P7 over
a feasible Gi, GBLKSi and Qi results in a feasible solution ρ
BLKS
i and a safe upper bound Φ
BLKS
i given
by Equations 62, 63-65, and 66 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level
Qi constraint is not violated and the cost function ΦBLKS
′
i (ζpred, ζsucc) results in a safe upper bound.
81
To prove the Qi constraint is not violated, we must show 1) the non-preemptive execution time of the
combined solutions does not exceed Qi at each solution interface, and 2) the non-preemptive execu-
tion time of the combined solution at the new predecessor and successor interfaces does not exceed Qi.
Let ΦBLKSai (ζpred, ζsuccs) with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for
subgraphGBLKSai , with its corresponding set of selected preemption points denoted by ρ
BLKSa
i (ζpred, ζsuccs)
be a limited preemption execution safe upper bound cost solution for subgraph GBLKSai . We make an
identical statement for subgraph GBLKSbi , whose cost function is denoted Φ
BLKSb
i (ζpredu , ζsucc), and
whose set of selected preemption points are denoted ρBLKSbi (ζpredu , ζsucc). Since we have a safe upper
bound cost solution for each of the combined subgraphs, we can conclude that ΦBLKS
′
i (ζpred, ζsucc) com-
puted in Equation 62 represents a safe upper bound cost solution for the concatenated series subgraphs
GBLKSai ∪ G
BLKSb
i with its corresponding selected preemption points denoted by ρ
BLKS
′
i (ζpred, ζsucc)
and computed in Equation 66. Condition 1 is met in accordance with Equations 63-65 whose purpose is to
ensure the non-preemptive execution time of the combined solutions does not exceed Qi at each solution
interface. Condition 2 is met per the definition of the parameters ζpred, and ζsucc respectively, whose range
is given by [0 . . . Qi]. Thus, the problem finds a feasible safe upper bound cost preemption points solution
when applying production P7.
The following real-time code example exemplifies the application of production rule P7:
<SB>←

grammar representation : (i400124, 4)
instruction code : 0x400124 lw v1, 8(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <SB> ]
The next instruction in the sequence will be parsed as a conditional block <CB> structure as follows:
<SB>←

grammar representation : (i40010c, 4)
instruction code : 0x40010c beqz v0, 400118
WCET : 4 cycles

82
<Blocks>←

grammar representation : (i400110, 4)
(i400114, 4)
instruction code : 0x400110 addiu v0, v0,−1
0x400114 j 400120

<Blocks>←

grammar representation : (i400118, 4)
(i40011c, 4)
instruction code : 0x400118 lw v0, 12(sp)
0x40011c addiu v0, v0, 1

<SB>←

grammar representation : (i400120, 4)
instruction code : 0x400120 lw v0, 18(sp)
WCET : 4 cycles

Once the production rules for the four sub-components have been applied, they are subsequently aggregated
into a conditional block <CB> as follows:
<CB>← <SB> [ <Blocks> ] [ <Blocks> ]∗ <SB>
Once the production rule for the conditional block <CB> sub-component has been applied, it is subse-
quently aggregated into a <Blocks> structure as follows:
<Blocks>← <CB>
Once the production rule for the aggregate block <Blocks> sub-component has been applied, it along with
the previous blocks <Blocks> structure are subsequently aggregated into a <Blocks> structure as follows:
<Blocks>← <Blocks> <Blocks>
For convenience, Figure 54 summarizes production rules P1− P4.
83
Production Ƥ1:  
<SB> ← (𝐢
𝐣
, 𝐛𝐢
𝐣
) 
 
 
Production Ƥ2: 
              <CB>                 ←   <SB>   [ <Blocks> ]   [ <Blocks> ]*  <SB> 
 
 
 
 
Production Ƥ3:  
<Blocks> ← <SB>  
 
 
Production Ƥ4:  
         <Blocks>                  ←                  <CB> 
 
 
 
Figure 54: Production Rules P1− P4.
For convenience, Figure 55 summarizes production rules P5− P7.
Production Ƥ5:  
<Blocks>       ←   <SB> <Blocks> 
 
 
Production Ƥ6:  
<Blocks>                                      ←                  <CB>                     <Blocks>           
 
 
 
Production Ƥ7:  
<Blocks>                                      ←               <Blocks>              <Blocks>           
 
 
 
Figure 55: Production Rules P5− P7.
The grammar we have presented thus far are focused on the production rules for conditional structures.
Non-unrolled loops and functions are structured programming constructs that are also prevalent in real-
time code. We present the production rules supporting these structured programming elements in the
following subsections.
84
Non-Unrolled Loops
For loop production rule P8, we have:
<Loop>← [ <Blocks> <MaxIter> ]
The following graphical example shown in Figure 56 illustrates production rule P8. Here a loop structure
is created from an existing aggregate block structure and the maximum number of loop iterations.
<Loop>                  ←    [ <Blocks>                         <MaxIter> ] 
 
                                                                                     100 
Figure 56: Production Rule P8.
The derivation of production ruleP8 creates a subgraphGLOOPi that is equivalent to the subgraphGBLKSi .
The associated WCET cost function is given by:
ΦLOOPi (ζpred, ζsucc) = Φ
BLKS
i (ζpred, ζsucc) × MaxIter (67)
where valid solution combinations are subject to the following constraints:
(ζsucc + ζpred) ≤ Qi (68)
The associated set of selected preemption points function is given by:
ρLOOPi (ζpred, ζsucc) = ρ
BLKS
i (ζpred, ζsucc) (69)
The existing aggregate blocks cost and preemption solutions are copied to the loop structure.
85
The following real-time code example exemplifies the application of production rule P8:
<Blocks>←

grammar representation : (i400100, 4)
· · ·
(i400110, 4)
(i400114, 4)
10
instruction code : 0x400100 lw v0, 10
· · ·
0x400110 addiu v0, v0,−1
0x400114 bnez v0, 400100
maximum iterations : 10

Once the production rule for the blocks <Blocks> structure has been applied, it along with the maximum
loop iterations <MaxIter> are subsequently aggregated into a <Loop> structure as follows:
<Loop>← [ <Blocks> <MaxIter> ]
For blocks production rule P9, we have:
<Blocks>← [ <LOOP> ]
The following graphical example shown in Figure 57 illustrates production rule P9. Here a single loop
structure is subsumed into an aggregate block structure.
         <Blocks>                ←             [ <Loop> ]  
 
 
 
Figure 57: Production Rule P9.
The derivation of production ruleP9 creates a subgraphGBLKSi that is equivalent to the subgraphGLOOPi .
The associated WCET cost function is given by:
ΦBLKSi (ζpred, ζsucc) = Φ
LOOP
i (ζpred, ζsucc) (70)
86
The associated set of selected preemption points function is given by:
ρBLKSi (ζpred, ζsucc) = ρ
LOOP
i (ζpred, ζsucc) (71)
The existing loop block cost and preemption solutions are copied to the aggregate blocks structure.
The following real-time code example exemplifies the application of production rule P9:
<Blocks>←

grammar representation : (i400100, 4)
· · ·
(i400110, 4)
(i400114, 4)
10
instruction code : 0x400100 lw v0, 10
· · ·
0x400110 addiu v0, v0,−1
0x400114 bnez v0, 400100
maximum iterations : 10

Once the production rule for the blocks <Blocks> structure has been applied, it along with the maximum
loop iterations <MaxIter> are subsequently aggregated into a <Loop> structure as follows:
<Loop>← [ <Blocks> <MaxIter> ]
Once the production rule for the loop <Loop> component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <Loop> ]
For aggregate blocks production rule P10, we have:
<Blocks>← <Loop> <Blocks>
The following graphical example shown in Figure 58 illustrates production rule P10. Here an existing
loop structure is concatenated with an existing aggregate block structure to create a new aggregate block
structure.
87
         <Blocks>                                      ←               <Loop>                <Blocks>           
 
 
 
Figure 58: Production Rule P10.
The derivation of production rule P10 creates a subgraph GBLKS
′
i concatenating a previously created
aggregate blocks subgraph GLOOPi in series with a previously created aggregate blocks subgraph G
BLKS
i .
Production ruleP10 exhibits the maximum time complexity for our algorithm executing inO(NiQ4i ) time.
Each <Loop> and each <Blocks> contains (Qi + 1) solutions. The associated WCET cost function is
given by:
ΦBLKS
′
i (ζpred, ζsucc) = minr,s{(ΦLOOPi (ζpred, ζsuccr) + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] +
ΦBLKSi (ζpreds , ζsucc)}
(72)
where ζsuccr , and ζpreds represent the values where the function Φ
BLKS
′
i (ζpred, ζsucc) is minimized and
valid solution combinations are subject to the following constraints:
(ζsuccr + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] + ζpreds) ≤ Qi (73)
δmi ∈ ρsucci (GLOOPi , ζpred, ζsuccr) (74)
δni ∈ ρ
pred
i (G
BLKS
i , ζpreds , ζsucc) (75)
The associated preemption point function is given by:
ρBLKS
′
i (ζpred, ζsucc) = ρ
LOOP
i (ζpred, ζsuccr) ∪ ρBLKSi (ζpreds , ζsucc) (76)
The solutions for aggregate blocks subgraph GBLKS
′
i are created by considering each of the (Qi + 1)
2
possible solutions stored in subgraph GLOOPi . For each of the subgraph G
LOOP
i solutions, we iterate
through each of the (Qi+1)2 aggregate blockGBLKSi subgraph solutions determining the minimized cost
and preemption solutions. While combining each pair of solutions, the minimum preemption cost between
the combined solutions for subgraphs GLOOPi and G
BLKS
i are selected for each value of ζpred and ζsucc.
The algorithm uses the visible successor preemptions of subgraph GLOOPi and the visible predecessor
preemptions of subgraph GBLKSi . The preemptions for each minimized solution contain the union of
selected preemption solutions for subgraph GLOOPi and subgraph G
BLKS
i .
Theorem 8. Given Φi and ρi functions for each substructure of BLKS where each ρAi (ζpred, ζsucc)
represents a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a
safe upper bound on the total WCET and preemption cost of that solution. Applying production P10 over
88
a feasible Gi, GBLKSi and Qi results in a feasible solution ρ
BLKS
i and a safe upper bound Φ
BLKS
i given
by Equations 72, 73-75, and 76 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level
Qi constraint is not violated and the cost function ΦBLKS
′
i (ζpred, ζsucc) results in a safe upper bound.
To prove the Qi constraint is not violated, we must show 1) the non-preemptive execution time of the
combined solutions does not exceed Qi at each solution interface, and 2) the non-preemptive execution
time of the combined solution at the new predecessor and successor interfaces does not exceed Qi. Let
ΦLOOPi (ζpred, ζsuccs) with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for subgraph
GLOOPi , with its corresponding set of selected preemption points denoted by ρ
LOOP
i (ζpred, ζsuccs) be a
limited preemption execution safe upper bound cost solution for subgraph GLOOPi . We make an identical
statement for subgraph GBLKSi , whose cost function is denoted Φ
BLKS
i (ζpredu , ζsucc), and whose set of
selected preemption points are denoted ρBLKSi (ζpredu , ζsucc). Since we have a safe upper bound cost
solution for each of the combined subgraphs, we can conclude that ΦBLKS
′
i (ζpred, ζsucc) computed in
Equation 72 represents a safe upper bound cost solution for the concatenated series subgraphs GLOOPi ∪
GBLKSi with its corresponding selected preemption points denoted by ρ
BLKS
′
i (ζpred, ζsucc) and computed
in Equation 76. Condition 1 is met in accordance with Equations 73-75 whose purpose is to ensure the
non-preemptive execution time of the combined solutions does not exceed Qi at each solution interface.
Condition 2 is met per the definition of the parameters ζpred, and ζsucc respectively, whose range is given
by [0 . . . Qi]. Thus, the problem finds a feasible safe upper bound cost preemption points solution when
applying production P10.
The following real-time code example exemplifies the application of production rule P10:
<SB>←

grammar representation : (i400118, 4)
instruction code : 0x400118 lw v1, 8(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <SB> ]
The next instruction in the sequence will be parsed as a loop block <Loop> structure as follows:
89
<Blocks>←

grammar representation : (i400100, 4)
· · ·
(i400110, 4)
(i400114, 4)
10
instruction code : 0x400100 lw v0, 10
· · ·
0x400110 addiu v0, v0,−1
0x400114 bnez v0, 400100
maximum iterations : 10

Once the production rule for the blocks <Blocks> structure has been applied, it along with the maximum
loop iterations <MaxIter> are subsequently aggregated into a <Loop> structure as follows:
<Loop>← [ <Blocks> <MaxIter> ]
Once the production rule for the loop block <Loop> sub-component has been applied, it along with the
previous blocks <Blocks> structure are subsequently aggregated into a <Blocks> structure as follows:
<Blocks>← <Loop> <Blocks>
For convenience, Figure 59 summarizes production rules P8− P10.
Production Ƥ8:  
<Loop>                  ←    [ <Blocks>                         <MaxIter> ] 
 
                                                                                                    100 
 
Production Ƥ9: 
         <Blocks>                ←             [ <Loop> ]  
 
 
 
Production Ƥ10:  
<Blocks>                                      ←               <Loop>                <Blocks>           
 
 
Figure 59: Production Rules P8− P10.
90
Inline Functions
Functions are split into two grammar elements, namely, function definition and function invocation.
The conditional PPP algorithm generates solutions for the function definition blocks consistently with the
main task function. The generated function definition preemption solutions are combined with the function
invocation preemption solutions at each graph location where the function is called.
For function definition production rule P11, we have:
<Function>← [ <Blocks> <FunctionName> ]
The following graphical example shown in Figure 60 illustrates production rule P11. Here a function
definition is created from an existing aggregate block structure and identified by its corresponding function
name.
        <Function>                         ←   [ <Blocks>                            <FunctionName> ] 
 
                                                                                                         Initialize 
 
  
Figure 60: Production Rule P11.
The derivation of production rule P11 creates a subgraph GFUNCi that is equivalent to the subgraph
GBLKSi . The associated WCET cost function is given by:
ΦFUNCi (ζpred, ζsucc) = Φ
BLKS
i (ζpred, ζsucc) (77)
The associated set of selected preemption points function is given by:
ρFUNCi (ζpred, ζsucc) = ρ
BLKS
i (ζpred, ζsucc) (78)
The existing aggregate blocks cost and preemption solutions are copied to the function definition block
structure.
91
The following real-time code example exemplifies the application of production rule P11:
<Function>←

grammar representation : (i400044, 4)
(i400048, 4)
(i40004c, 4)
(i400050, 4)
· · ·
(i400258, 4)
(i40025c, 4)
Initialize
instruction code : 0x400044 addiu sp, sp,−8
0x400048 li v0,−1
0x40004c sw v0,−32760(gp)
0x400050 lw v1,−32760(gp)
· · ·
0x400258 addiu sp, sp, 8
0x40025c jr ra
function name : Initialize

Once the production rule for the blocks <Blocks> structure has been applied, it along with the function
name <FunctionName> are subsequently aggregated into a <Function> structure as follows:
<Function>← [ <Blocks> <FunctionName> ]
For function call production rule P12, we have:
<FunctionCall>← [ <Blocks> <FunctionName> ]
The following graphical example shown in Figure 61 illustrates production rule P12. Here a function
invocation is created from an existing aggregate block structure concatenated with the function definition
identified by function name. Here the function definition is represented by a rather simple two basic block
linear structure for simplicity.
92
 <FunctionCall>               ←    [ <Blocks>                             <FunctionName> ] 
                                                                                                            Initialize 
 
 
Figure 61: Production Rule P12.
The derivation of production rule P12 creates a subgraph GFCALLi that is equivalent to the subgraph
GBLKSi . The associated WCET cost function is given by:
ΦFCALLi (ζpred, ζsucc) = minr,s{(ΦBLKSi (ζpred, ζsuccr) + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] +
ΦFUNCi (ζpreds , ζsucc)}
(79)
where ζsuccr , and ζpreds represent the values where the function Φ
FCALL
i (ζpred, ζsucc) is minimized and
valid solution combinations are subject to the following constraints:
(ζsuccr + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] + ζpreds) ≤ Qi (80)
δmi ∈ ρsucci (GBLKSi , ζpred, ζsuccr) (81)
δni ∈ ρ
pred
i (G
FUNC
i , ζpreds , ζsucc) (82)
The associated preemption point function is given by:
ρFCALLi (ζpred, ζsucc) = ρ
BLKS
i (ζpred, ζsuccr) ∪ ρFUNCi (ζpreds , ζsucc) (83)
The solutions for the function call subgraph GFCALLi are created by considering each of the (Qi + 1)
2
possible solutions stored in subgraph GBLKSi . For each of the subgraph G
BLKS
i solutions, we iterate
through each of the (Qi + 1)2 aggregate block GFUNCi subgraph solutions determining the minimized
cost and preemption solutions. While combining each pair of solutions, the minimum preemption cost
between the combined solutions for subgraphs GBLKSi and G
FUNC
i are selected for each value of ζpred
and ζsucc. The algorithm uses the visible successor preemptions of subgraph GBLKSi and the visible
predecessor preemptions of subgraph GFUNCi . The preemptions for each minimized solution contain the
union of selected preemption solutions for subgraph GBLKSi and subgraph G
FUNC
i .
Theorem 9. Given Φi and ρi functions for each substructure of FCALL where each ρAi (ζpred, ζsucc)
represents a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a
safe upper bound on the total WCET and preemption cost of that solution. Applying production P12 over
a feasible Gi, GFCALLi and Qi results in a feasible solution ρ
FCALL
i and a safe upper bound Φ
FCALL
i
given by Equations 79, 80-82, and 83 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level
93
Qi constraint is not violated and the cost function ΦFCALLi (ζpred, ζsucc) results in a safe upper bound.
To prove the Qi constraint is not violated, we must show 1) the non-preemptive execution time of the
combined solutions does not exceed Qi at each solution interface, and 2) the non-preemptive execution
time of the combined solution at the new predecessor and successor interfaces does not exceed Qi. Let
ΦBLKSi (ζpred, ζsuccs) with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for subgraph
GBLKSi , with its corresponding set of selected preemption points denoted by ρ
BLKS
i (ζpred, ζsuccs) be a
limited preemption execution safe upper bound cost solution for subgraph GBLKSi . We make an identical
statement for subgraph GFUNCi , whose cost function is denoted Φ
FUNC
i (ζpredu , ζsucc), and whose set
of selected preemption points are denoted ρFUNCi (ζpredu , ζsucc). Since we have a safe upper bound cost
solution for each of the combined subgraphs, we can conclude that ΦFCALLi (ζpred, ζsucc) computed in
Equation 79 represents a safe upper bound cost solution for the concatenated series subgraphs GBLKSi ∪
GFUNCi with its corresponding selected preemption points denoted by ρ
FCALL
i (ζpred, ζsucc) and computed
in Equation 83. Condition 1 is met in accordance with Equations 80-82 whose purpose is to ensure the
non-preemptive execution time of the combined solutions does not exceed Qi at each solution interface.
Condition 2 is met per the definition of the parameters ζpred, and ζsucc respectively, whose range is given
by [0 . . . Qi]. Thus, the problem finds a feasible safe upper bound cost preemption points solution when
applying production P12.
The following real-time code example exemplifies the application of production rule P12:
<FunctionCall>←

grammar representation : (i400018, 4)
(i40001c, 4)
(i400020, 4)
Initialize
instruction code : 0x400018 addiu sp, sp,−24
0x40001c sw ra, 20(sp)
0x400020 jal 400044
function name : Initialize

Once the production rule for the blocks <Blocks> structure has been applied, it along with the function
name <FunctionName> are subsequently aggregated into a <FunctionCall> block structure as follows:
<FunctionCall>← [ <Blocks> <FunctionName> ]
94
For blocks production rule P13, we have:
<Blocks>← [ <FunctionCall> ]
The following graphical example shown in Figure 62 illustrates production rule P13. Here a function
invocation is subsumed into an aggregate block structure.
<Blocks>                          ←       [ <FunctionCall> ] 
 
  
Figure 62: Production Rule P13.
The derivation of production rule P13 creates a subgraph GBLKSi that is equivalent to the subgraph
GFCALLi . The associated WCET cost function is given by:
ΦBLKSi (ζpred, ζsucc) = Φ
FCALL
i (ζpred, ζsucc) (84)
The associated set of selected preemption points function is given by:
ρBLKSi (ζpred, ζsucc) = ρ
FCALL
i (ζpred, ζsucc) (85)
The existing aggregate blocks cost and preemption solutions are copied to the function call block structure.
The following real-time code example exemplifies the application of production rule P13:
<FunctionCall>←

grammar representation : (i400018, 4)
(i40001c, 4)
(i400020, 4)
Initialize
instruction code : 0x400018 addiu sp, sp,−24
0x40001c sw ra, 20(sp)
0x400020 jal 400044
function name : Initialize

Once the production rule for the blocks <Blocks> structure has been applied, it along with the function
name <FunctionName> are subsequently aggregated into a <FunctionCall> block structure as follows:
95
<FunctionCall>← [ <Blocks> <FunctionName> ]
Once the production rule for the function call <FunctionCall> component has been applied, it is subse-
quently aggregated into a <Blocks> structure as follows:
<Blocks>← [ <FunctionCall> ]
For aggregate blocks production rule P14, we have:
<Blocks>← <FunctionCall> <Blocks>
The following graphical example shown in Figure 63 illustrates production rule P14. Here an existing
function call structure is concatenated with an existing aggregate block structure to create a new aggregate
block structure.
 <Blocks>                                      ←        < FunctionCall >        <Blocks>           
 
 
Figure 63: Production Rule P14.
The derivation of production rule P14 creates a subgraph GBLKS
′
i concatenating a previously created ag-
gregate blocks subgraph GFCALLi in series with a previously created aggregate blocks subgraph G
BLKS
i .
Production ruleP14 exhibits the maximum time complexity for our algorithm executing inO(NiQ4i ) time.
Each <FunctionCall> and each
<Blocks> contains (Qi + 1) solutions. The associated WCET cost function is given by:
ΦBLKS
′
i (ζpred, ζsucc) = minr,s{(ΦFCALLi (ζpred, ζsuccr) + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] +
ΦBLKSi (ζpreds , ζsucc)}
(86)
where ζsuccr , and ζpreds represent the values where the function Φ
BLKS
′
i (ζpred, ζsucc) is minimized and
valid solution combinations are subject to the following constraints:
(ζsuccr + maxδmi ,δni [ξi(δ
m
i , δ
n
i )] + ζpreds) ≤ Qi (87)
δmi ∈ ρsucci (GFCALLi , ζpred, ζsuccr) (88)
δni ∈ ρ
pred
i (G
BLKS
i , ζpreds , ζsucc) (89)
96
The associated preemption point function is given by:
ρBLKS
′
i (ζpred, ζsucc) = ρ
FCALL
i (ζpred, ζsuccr) ∪ ρBLKSi (ζpreds , ζsucc) (90)
The solutions for the aggregate blocks subgraph GBLKS
′
i are created by considering each of the (Qi +
1)2 possible solutions stored in subgraph GFCALLi . For each of the function call subgraph G
FCALL
i
solutions, we iterate through each of the (Qi+1)2 aggregate blockGBLKSi subgraph solutions determining
the minimized cost and preemption solutions. While combining each pair of solutions, the minimum
preemption cost between the combined solutions for subgraphsGFCALLi andG
BLKS
i are selected for each
value of ζpred and ζsucc. The algorithm uses the visible successor preemptions of subgraph GFCALLi and
the visible predecessor preemptions of subgraph GBLKSi . The preemptions for each minimized solution
contain the union of selected preemption solutions for subgraph GFCALLi and subgraph G
BLKS
i .
Theorem 10. Given Φi and ρi functions for each substructure of BLKS where each ρAi (ζpred, ζsucc)
represents a feasible solution for substructure A given preemptions ζpred before, ζsucc after, and ΦAi is a
safe upper bound on the total WCET and preemption cost of that solution. Applying production P14 over
a feasible Gi, GBLKSi and Qi results in a feasible solution ρ
BLKS
i and a safe upper bound Φ
BLKS
i given
by Equations 86, 87-89, and 90 respectively.
Proof. The proof is by direct argument. We need to prove that our solution ensures that the task level
Qi constraint is not violated and the cost function ΦBLKS
′
i (ζpred, ζsucc) results in a safe upper bound.
To prove the Qi constraint is not violated, we must show 1) the non-preemptive execution time of the
combined solutions does not exceed Qi at each solution interface, and 2) the non-preemptive execu-
tion time of the combined solution at the new predecessor and successor interfaces does not exceed Qi.
Let ΦFCALLi (ζpred, ζsuccs) with ζpred,ζsuccs ∈ [0 . . . Qi] represent a safe upper bound cost solution for
subgraphGLOOPi , with its corresponding set of selected preemption points denoted by ρ
FCALL
i (ζpred, ζsuccs)
be a limited preemption execution safe upper bound cost solution for subgraphGFCALLi . We make an iden-
tical statement for subgraph GBLKSi , whose cost function is denoted Φ
BLKS
i (ζpredu , ζsucc), and whose
set of selected preemption points are denoted ρBLKSi (ζpredu , ζsucc). Since we have a safe upper bound
cost solution for each of the combined subgraphs, we can conclude that ΦBLKS
′
i (ζpred, ζsucc) compu-
ted in Equation 86 represents a safe upper bound cost solution for the concatenated series subgraphs
GFCALLi ∪ GBLKSi with its corresponding selected preemption points denoted by ρBLKS
′
i (ζpred, ζsucc)
and computed in Equation 90. Condition 1 is met in accordance with Equations 87-89 whose purpose is to
ensure the non-preemptive execution time of the combined solutions does not exceed Qi at each solution
interface. Condition 2 is met per the definition of the parameters ζpred, and ζsucc respectively, whose range
97
is given by [0 . . . Qi]. Thus, the problem finds a feasible safe upper bound cost preemption points solution
when applying production P14.
The following real-time code example exemplifies the application of production rule P14:
<SB>←

grammar representation : (i400024, 4)
instruction code : 0x400024 lw v1, 8(sp)
WCET : 4 cycles

Once the production rule for the single block sub-component has been applied, it is subsequently aggregated
into a <Blocks> structure as follows:
<Blocks>← [ <SB> ]
The next instruction in the sequence will be parsed as a function call block <FunctionCall> structure as
follows:
<FunctionCall>←

grammar representation : (i400018, 4)
(i40001c, 4)
(i400020, 4)
Initialize
instruction code : 0x400018 addiu sp, sp,−24
0x40001c sw ra, 20(sp)
0x400020 jal 400044
function name : Initialize

Once the production rule for the blocks <Blocks> structure has been applied, it along with the function
name <FunctionName> are subsequently aggregated into a <FunctionCall> block structure as follows:
<FunctionCall>← [ <Blocks> <FunctionName> ]
Once the production rule for the loop block <Loop> sub-component has been applied, it along with the
previous blocks <Blocks> structure are subsequently aggregated into a <Blocks> structure as follows:
<Blocks>← <FunctionCall> <Blocks>
For convenience, Figure 64 summarizes production rules P11− P14.
98
Production Ƥ11:  
<Function>                       ←   [ <Blocks>                             <FunctionName> ] 
 
                                                                                                         Initialize 
 
 
Production Ƥ12:  
<FunctionCall>               ←    [ <Blocks>                             <FunctionName> ] 
                                                                                                            Initialize 
 
Production Ƥ13:  
<Blocks>                          ←       [ <FunctionCall> ] 
 
 
 
Production Ƥ14:  
<Blocks>                                      ←        < FunctionCall >        <Blocks>           
 
 
 
Figure 64: Production Rules P11− P14.
The complexity of supporting non-inline functions arises from the requirement that a single function defi-
nition preemption solution be computed for all function invocations. This requirement does not mesh well
with the production rule grammar approach used here and requires specialized methods for computing the
preemption solution for each function definition. We use the term inline function to describe that each
function invocation combined with its corresponding function definition potentially results in a different
preemption solution, essentially behaving as if the function were actually inlined. The use of the notion of
inline functions is a limitation in all conditional real-time preemption placement approaches. The problem
of solving non-inline functions is a topic to be addressed in future work.
Interdependent CRPD Solution Handling
One of the primary motivations for our work is the interdependent CRPD cost model, which necessi-
tates a series of modifications to the conditional PPP algorithm for proper solution handling.
99
Algorithm 4 Visible Section Pred. Preemptions
1: function vis pred pps sect(ςcurr, ρ, ρcall)
2: δlefti ← ςcurr.leftmostBB
3: δrighti ← ςcurr.rightmostBB
4: for δcurri ∈ [δ
right
i , δ
left
i ] do
5: if δcurri ∈ ρ then
6: ρcall ← ρcall ∪ δcurri
7: Exit the For Loop.
8: end if
9: end for
10: return ρcall
11: end function
Algorithm 5 Visible Section Successor Preemptions
1: function vis succ pps sect(ςcurr, ρ, ρcall)
2: δlefti ← ςcurr.leftmostBB
3: δrighti ← ςcurr.rightmostBB
4: for δcurri ∈ [δ
left
i , δ
right
i ] do
5: if δcurri ∈ ρ then
6: ρcall ← ρcall ∪ δcurri
7: Exit the For Loop.
8: end if
9: end for
10: return ρcall
11: end function
Algorithm 6 Visible Current Pred. Preemptions
1: function vis curr pred pps(ςcurr, ςstart, ρ, ρprev)
2: αsections ← true
3: ρcall ← ∅
4: ρnext ← ∅
5: ρprev ← vis pred pps sect(ςcurr, ςstart, ρprev)
6: if ρprev 6= ∅ then
7: ρcall ← ρcall ∪ ρprev
8: end if
9: if ςcurr 6= ςstart then
10: for ςnext ∈ ςpred(ςcurr) do
11: ρsect ←
12: vis pred pps sect(ςnext, ςstart, ρ, ρcall)
13: if ρsect = ∅ then
14: αsections ← false
15: else
16: ρnext ← ρnext ∪ ρsect
17: end if
18: end for
19: if αsections = false then
20: ρnext ← ρnext ∪ ρcall
21: end if
22: else
23: ρnext ← ρcall
24: end if
25: return ρnext
26: end function
Algorithm 7 Visible Current Successor Preemptions
1: function vis curr succ pps(ςcurr, ςstart, ρ, ρnext)
2: αsections ← true
3: ρcall ← ∅
4: ρprev ← ∅
5: ρnext ← vis succ pps sect(ςcurr, ςstart, ρnext)
6: if ρnext 6= ∅ then
7: ρcall ← ρcall ∪ ρnext
8: end if
9: if ςcurr 6= ςend then
10: for ςprev ∈ ςsucc(ςcurr) do
11: ρsect ←
12: vis succ pps sect(ςprev, ςend, ρ, ρcall)
13: if ρsect = ∅ then
14: αsections ← false
15: else
16: ρprev ← ρprev ∪ ρsect
17: end if
18: end for
19: if αsections = false then
20: ρprev ← ρprev ∪ ρcall
21: end if
22: else
23: ρprev ← ρcall
24: end if
25: return ρprev
26: end function
One challenge interdependent CRPD presents (that independent CRPD does not) is the preemption cost
cannot be determined when the preemption solutions for basic blocks are processed using production rule
P1 since the successor preemption is not known. Interdependent CRPD costs may only be determined for
preemption pairs in contrast to independent CRPD, a function of a single preemption location only.
100
Algorithm 8 Visible Predecessor Preemptions
1: function vis pred pps(β, ρ)
2: Solution block CFG β, preemption solution ρ
3: ρprev ← ∅
4: ςstart ← β.startSection
5: ςend ← β.endSection
6: ρvis ←vis curr pred pps(ςend, ςstart, ρ, ρprev)
7: return ρvis
8: end function
Algorithm 9 Visible Successor Preemptions
1: function vis succ pps(β, ρ)
2: Solution CFG block β, preemption solution ρ
3: ρnext ← ∅
4: ςstart ← β.startSection
5: ςend ← β.endSection
6: ρvis ←vis curr succ pps(ςend, ςstart, ρ, ρnext)
7: return ρvis
8: end function
This means we have to determine the maximum preemption cost for pairs of solutions that are combi-
ned as higher-level block structures are processed. To accomplish this, we must have a way of determining
the set of preemption points that are visible externally to adjacent solutions in order to compute the max-
imum preemption cost of the combined solutions. A preemption point has external visibility to adjacent
blocks if there exists a path from the starting or ending section block to the preemption point with no
intervening preemption points encountered. Determining the visible preemption points while combining
preemption solutions adds an O(Ni) factor to the algorithm for the current block.
Algorithm 8 illustrates the method of computing the visible predecessor preemption points for an ex-
isting solution. For the visible predecessor preemption points, we start with the ending section of the
block and work our way backwards towards the starting section of the block. As we move backward,
the preemption points encountered replace those in the current set if all sections have preemption points.
If not, then the existing preemption points are still visible and copied to the preemption set for the next
iteration. This is shown in Algorithm 6. For each section block processed, the first successor preemption
point encountered moving from right to left is added to the preemption point set as shown in Algorithm
4. Computing the visible successor preemption points works in an identically symmetric way using three
similar algorithms.
Algorithm 9 illustrates the method of computing the visible successor preemption points for an exis-
ting solution. For the visible successor preemption points, we start with the starting section of the block
and work our way forwards towards the ending section of the block. As we move forward, the preemption
points encountered replace those in the current set if all sections have preemption points. If not, then the
existing preemption points are still visible and copied to the preemption set for the next iteration. This
is shown in Algorithm 7. For each section block processed, the first predecessor preemption point en-
countered moving from left to right is added to the preemption point set as shown in Algorithm 5. For
convenience, Table 5 summarizes the terminology presented in this section.
101
Term Description
a Current conditional section index
αsections All sections containing preemptions
δlefti Left most basic block of the current block
δrighti Right most basic block of the current block
G Production rules grammar
L(G) Textual language recognized by grammar G
ζpred Predecessor non-preemptive execution
ζsucc Successor non-preemptive execution
MaxIter Maximum number of loop iterations
ρi Preemptions for task τi
ρAi (G
A
i , ζpred, ζsucc) Preemptions for single subgraph G
A
i solution
ρcall Visible section preemptions
ρnext Visible successor preemptions
ρpredi Visible predecessor preemptions concept
ρpredi (G
A
i , ζpred, ζsucc) Visible predecessor preemptions for single subgraph G
A
i solution
ρprev Visible predecessor preemptions
ρsect Visible section preemptions
ρsucci Visible successor preemptions concept
ρsucci (G
A
i , ζpred, ζsucc) Visible successor preemptions for single subgraph G
A
i solution
ρvis Visible predecessor/successor preemptions
r Number of blocks in conditional
s Current graph solution index variable
ςcurr Current linear section variable
ςend Current graph ending linear section variable
ςnext Next linear section variable
ςpred(ςcurr) Predecessor sections of section ςcurr
ςstart Current graph starting linear section variable
ςsucc(ςcurr) Successor sections of section ςcurr
t Current graph solution index variable
u Current graph solution index variable
Φi(Gi, ρi) WCET + preemption cost for task τi graph Gi
ΦAi (G
A
i , ζpred, ζsucc) WCET + preemption cost for single subgraph G
A
i preemption solution
Table 5: Preemption Placement Terminology
Evaluation
Our conditional PPP algorithm will be evaluated using two methods: 1) characterization and measure-
ment of preemption costs using real-time application code, and 2) a breakdown utilization schedulability
comparison of various PPP and fully preemptive CRPD algorithms. Each PPP algorithm evaluated either
uses an independent or interdependent CRPD cost model.
Preemption Cost Characterization
Our study will utilize a subset of real-time tasks from the Malardalen University (MRTC) WCET
benchmark suite [51] for comparing various PPP algorithms. The task code was compiled using the GCC
102
MIPS Cross Compiler for MIPS series processors with separate instruction and data 1KB direct-mapped
caches with a line size of 32 bytes with 32 cache blocks. Tasks are scheduled using Earliest Deadline First
(EDF) scheduling. Our method also supports fixed priority scheduling (FPS).
The compiled real-time task code was processed by the Heptane Static WCET analysis tool [34].
Heptane is used to determine the set of section blocks, the section block WCETs, the CFG structure, and
the cache state at each instruction by analyzing the program executable. The analysis results are then
imported into a Java benchmark parser program, designed to generate the task code grammar used by
our conditional PPP algorithm. To accomplish this, high level programming constructs such as loops,
conditionals, functions, and block statements must be recognized from the low-level compilation output.
Once the benchmark CFG structure has been constructed, the results of the Heptane cache state analy-
sis are imported and used to compute the instruction and data UCBs (ΥI(δ
j
i ) and ΥD(δ
j
i ) respectively) at
each program location. These sets are then used to compute the shared LCBs, along with the interdepen-
dent preemption cost matrix.
The intersection of the cache state snapshots from δji to δ
k
i are used to calculate shared LCBs. Shared
LCBs represent the set of cache lines whose contents remain un-evicted after execution of basic blocks
{δj+1i , δ
j+2
i , ..., δ
k
i }. As such, shared LCBs will continue to be present in the cache prior to the execution
of basic block δk+1i . Thus, a safe upper bound on the LCBs shared between each basic block pair can be
represented by the set of unchanged cache lines. The following equation below formalizes this computation
where the instruction cache snapshots are denoted ΥI(δ
j
i ) and the data cache snapshots denoted ΥD(δ
j
i ).
LCB(δji , δ
k
i ) ⊆
k⋂
m=j+1
[
ΥI(δ
m
i ) ∪ΥD(δmi )
]
(91)
Availability
The following tools and data sets may be used to verify and reproduce our work. The MIPS GCC
cross compiler and the Heptane static worst-case execution time tool are freely available. The research
community may reproduce and leverage our work via the developed programs and analyzed data archived
at GitHub [23].
Results
The results are presented as an illustration of the potential benefit of our proposed method, utilizing
pairs of preemptions to determine costs, over methods that consider only the maximum CRPD at a parti-
cular preemption point (e.g., [14] and [58]). In terms of LCB computation this implies that the maximum
103
LCB value over all subsequent program points must be used as the CRPD cost:
max{LCB(δji , δ
k
i ) | δ
j
i  δ
k
i } (92)
The interdependent CRPD approach, representing CRPD cost for pairs of preemptions, is illustrated
in the following graphs. The graph lines shown characterize the minimum and maximum shared LCBs
between program points. The x-axis represents the first program preemption point, denoted δji . The y-axis
measures the shared LCB count with the secondary program point, denoted δki , annotated at each graph
point as shown. The first graph shown in Figure 65 characterizes the instruction cache CRPD costs for
the FFT benchmark program. Each point on both curves plots the minimum or maximum CRPD value
for the first preemption point given by the x-axis, and the next preemption point annotated on the graph.
At program point δ1i , the minimum CRPD value is coupled with program point δ
8
i having a shared LCB
count of 175 whereas the single-valued CRPD computation method finds 250 shared LCBs coupled at pro-
gram point δ5i . Comparison of the dual-valued interdependent CRPD with the single-valued independent
 160
 180
 200
 220
 240
 260
 280
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
08
08 08 08 08 08
09
13 13
13 13 13 14
Interdependent CRPD
02 03 04 05 06 07 08 09 10 11 12 13 14
Independent CRPD
Figure 65: FFT Instruction Cache.
CRPD methods can be visualized by comparing the vertical distance between the minimum and maximum
CRPD curves of Figures 65 through 70. The maximum CRPD cost at any program location represents a
mandatory safe value for any single-valued independent CRPD approach. In contrast, our interdependent
CRPD method offers the potential minimum value reported on the solid line. The benefit provided in
considering location aware interdependent CRPD cost is captured by the difference between the minimum
104
 0
 5
 10
 15
 20
 25
 30
 0  1  2  3  4  5  6  7  8  9  10  11
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
03
07
10
07 07 09 09 10
10
Minimum
02
03
04
05 06
07
08
09
10
Maximum
Figure 66: Recursion Data Cache.
and maximum CRPD cost curves fueling the improved performance of our conditional PPP algorithm.
The variability in the minimum and maximum CRPD costs further exemplifies the benefits as illustrated
in the second and third graphs, representing the lms benchmark task instruction cache in Figure 67 and
the cover benchmark task instruction cache in Figure 69 respectively. In this paper, we have presented
the variability witnessed in the instruction cache graphs, as the conditional CFG structure emphasizes the
instruction cache effect on CRPD. Maximum and minimum instruction and data cache costs for the other
MRTC tasks exhibit similar variability.
 50
 100
 150
 200
 250
 300
 0  20  40  60  80  100  120  140  160  180
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
Interdependent CRPD
Independent CRPD
Figure 67: LMS Instruction Cache.
105
 0
 10
 20
 30
 40
 50
 60
 70
 80
 0  20  40  60  80  100  120  140  160  180
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
Minimum
Maximum
Figure 68: LMS Data Cache.
 0
 50
 100
 150
 200
 250
 0  50  100  150  200  250
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
Interdependent CRPD
Independent CRPD
Figure 69: Cover Instruction Cache.
Review of the instruction and data cache graphs led to some notable observations. The maximum
and minimum LCBs converged towards the end of each tasks CFG due to the decreasing CFG structure
remaining thereby reducing the LCB count variability. Downward spikes are well aligned with task block
boundaries, such as loops, conditionals, and functions. Early upward trends result from task initialization
code. The separation between the two curves illustrates the accuracy improvement of our interdepen-
dent CRPD method versus independent CRPD methods. The interdependent CRPD costs for each graph
are identical to the independent CRPD costs at the edges which coincide with the end of the task code.
Our proposed PPP algorithm utilizes the more accurate interdependent CRPD cost leading to substantial
106
 0
 10
 20
 30
 40
 50
 60
 70
 80
 0  50  100  150  200  250
U
C
B
s
 S
h
a
re
d
 w
it
h
 S
u
b
s
e
q
u
e
n
t 
P
ro
g
ra
m
 P
o
in
t
Program Point
Maximum
Minimum
Figure 70: ADPCM Data Cache.
schedulability improvements.
Breakdown Utilization
Now that the reduced task preemption overhead benefits of the more precise interdependent CRPD
cost model has been presented, we turn our attention to the benefits in task set schedulability. To evaluate
task set schedulability, breakdown utilization performance was compared for several PPP algorithms on
selected MRTC benchmark [51] tasks. The goal of breakdown utilization analysis [47] is to determine the
utilization at which a task set becomes un-schedulable. The PPP algorithms compared in our study include
the BEPP algorithm [14], our linear PPP (LEPP) algorithm [25], the PEPP conditional algorithm [58], and
our proposed conditional PPP algorithm (CEPP). We compare these PPP algorithms against several fully
preemptive CRPD approaches that have been integrated into task set response time analysis, namely JCR,
ECB Only, UCB Only, ECB Union, UCB Union, ECB Union Multi Set, and UCB Union Multi Set [7].
The detailed steps of our iterative schedulability algorithm integrated with our conditional PPP algo-
rithm are described in Algorithm 10. Task set utilization, given by U , is controlled by setting each tasks
deadline and period to Di = Ti = u · CNPi . The constant, u, is binary search incremented in small steps
until the task set becomes schedulable. Then u is binary search decremented in small steps until the task
set becomes un-schedulable, resulting in the breakdown utilization UB . ForBEPP and PEPP , the max-
imum shared LCB counts previously obtained form the independent CRPD input. Lastly, for LEPP and
CEPP , the explicit shared LCB counts previously obtained comprise the interdependent CRPD input for
our linear and conditional explicit PPP algorithms. The remaining input variables required by the break-
107
down utilization algorithm are CNPi and BRT . C
NP
i was computed as the maximum number of cycles to
complete task execution non-preemptively. During each run of the breakdown utilization study, the BRT
parameter is swept from 1 ns to 640 ns per MIPS processor family performance.
Algorithm 10 Breakdown Utilization Evaluation Algorithm
1: Start with a task system that may or may not be feasible.
2: Assume the CRPD of the task system is initially zero.
3: repeat
4: Run the Iterative Schedulability and PPP Algorithm
5: if the task system is feasible/schedulable then
6: Increase U by decreasing Ti values via binary search.
7: else
8: Decrease U by increasing Ti values via binary search.
9: end if
10: until the utilization change is less than some tolerance.
11: The breakdown utilization is given by U.
Due to compiler optimizations, we had to post-process the MRTC tasks to apply our grammar. Post
processing was required when the compiler generated non-structured assembly code not conforming to
our grammar G. This task consisted of transforming the assembly code into an equivalent structured form.
Ideally, an automated tool would exist for this step; however, such a tool is beyond the scope of this paper
due to the explicit compiler-level detail required. For this paper, we manually post-processed the following
MRTC tasks: simple, bs, fibcall, lcdnum, sqrt, qurt, insertsort, ns, ud, crc, expint, jfdctint, matmult, and
bsort100 [23].
Using the completed MRTC tasks, the breakdown utilization comparison between various PPP met-
hods is summarized in Figure 71. The breakdown utilization results indicate that the CEPP (resp.
LEPP ) algorithm dominates the PEPP (resp. BEPP ) algorithm primarily due to the benefits of
interdependent versus independent CRPD. Both the CEPP and PEPP algorithms dominate LEPP
and BEPP algorithms due to the enhanced granularity of the conditional CFGs, offering more possi-
ble preemptions than the linear CFGs. As expected, the breakdown utilization values converge for each
distinct graph structure (e.g. linear, conditional) as cache-overhead becomes negligible for small BRT
values. The conditional methods converge, and the linear methods converge. Specifically, the size of the
individual blocks in the linear approaches are higher and hence are less schedulable than the finer grained
conditional approaches. The performance of the CRPD response time analysis methods is generally ham-
pered into two respects, namely, 1) they take a coarser view accounting for an entire task of UCBs and
ECBs on every preemption, and 2) the number of higher priority task preemptions in the response time of
108
Figure 71: Breakdown Utilization Comparison.
preempted tasks is not limited in accordance with the inherent slack in the system. The ECB Union Multi
Set and UCB Union Multi Set approaches improve the pessimism in terms of the number of preemptions
as compared to the other CRPD methods, however the system slack is not considered. Improvements are
also realized in the accounting of UCBs and ECBs, however, they are encumbered by the pessimism of the
task view of UCBs and ECBs.
We also compared the various preemption placement algorithms against the corresponding fixed pri-
ority (FP) preemption threshold scheduling (FPTS) methods, namely, ECB Only, UCB Only Multi Set,
ECB Union Multi Set, and UCB Union Multi Set [18, 19] as shown in Figure 72. Here we compare all
CRPD methods directly without any further task layout improvements using simulated annealing. Compa-
red to the traditional CRPD approaches, the corresponding FPTS methods realize substantial improvement
due to 1) reduced preemption exposure to higher priority tasks, and 2) reduced job execution time frame
known as the hold time where higher priority tasks may preempt. However, FPTS shares a similar trait
in its pessimistic task view of UCBs and ECBs thereby limiting performance. For convenience, Table 6
summarizes the terminology presented in this section.
Term Description
ΥD(δ
m
i ) Data cache snapshot at basic block δ
m
i
ΥI(δ
m
i ) Instruction cache snapshot at basic block δ
m
i
Table 6: Evaluation Terminology
109
Figure 72: FPTS Breakdown Utilization Comparison.
Summary
In this work, we presented a conditional PPP algorithm using a more precise interdependent CRPD
metric. By extending the interdependent CRPD cost to conditional CFG structures, further reductions
in task preemption overhead were realized, leading to substantial schedulability improvements. These
improvements were achieved by integrating our conditional interdependent CRPD PPP algorithm with
algorithms from well-established task set schedulability theory. Our iterative schedulability algorithm
demonstrates the convergence of selecting preemptions balanced by the task maximum non-preemptive
execution region constraint Qi. Our experiments demonstrated improved schedulability on real-time code
using interdependent CRPD.
In future work, we plan to 1) extend the breakdown utilization analysis to the remaining MRTC bench-
mark tasks, 2) perform a timing analysis of various PPP algorithms, 3) add support for non-inline functi-
ons, 4) add support for goto statements, 5) extend the CRPD techniques described here to set-associative
caches, and 6) conduct an in-depth case study on commercial real-time code.
110
CHAPTER 7 INTEGRATING PREEMPTION THRESHOLDS WITH
LIMITED PREEMPTION SCHEDULING
In Chapter 6, we introduced innovative methods for computing accurate CRPD, integrating the en-
hanced CRPD with EDF schedulability analysis, and placing optimal preemption points to ensure taskset
schedulability for conditional control flow graphs in uniprocessor systems. In this chapter, we present our
research for integrating limited preemption scheduling with preemption threshold scheduling by combi-
ning preemption placement with optimal threshold assignment.
Introduction
The utility of real-time system computations depends on two important properties, correctness and
timeliness. The timeliness property (the subject of schedulability analysis) is concerned with ensuring
real-time task computations are completed within required deadlines. Designers of real-time systems must
choose the scheduling paradigm that will ultimately determine if the real-time task set will meet its time-
liness objectives. The available choices are 1) non-preemptive scheduling, 2) fully preemptive scheduling,
and 3) limited preemption scheduling. Non-preemptive scheduling suffers from blocking of high priority
tasks and fully preemptive scheduling suffers from substantial preemption overhead (up to 44% [55–57]
of a tasks WCET) each approach degrading task set schedulability. Limited preemption scheduling at-
tempts to 1) reduce blocking by limiting the number of allowed preemptions, maximizing non-preemptive
task execution and 2) reduce preemption overhead via non-preemptive regions. Regardless of the chosen
scheduling paradigm, effective schedulability analysis of real-time task sets mandates accurate WCET and
CRPD estimates. The recognized benefits of limited preemption scheduling have motivated recent work
on PPP algorithms. In a complimentary fashion, preemption threshold scheduling assigns preemption
thresholds allowing tasks to execute non-preemptively where feasible. The two methods take advantage
of inherent task set execution slack to enhance schedulability. The primary difference arises in the non-
preemptive granularity. Limited preemption scheduling divides each task into non-preemptive chunks
whereas preemption threshold scheduling facilitates non-preemptive execution at the task level.
The importance of CRPD in schedulability analysis stems from it comprising the majority of preemp-
tion overhead. CRPD occurs when a task denoted τi is preempted by one or more higher priority tasks
denoted τk. The execution of high priority tasks results in the eviction of cache memory blocks that must
be subsequently reloaded when task τi resumes execution. Two primary models of CRPD computation
exist, 1) the independent CRPD cost model, and 2) the interdependent CRPD cost model. The vast majo-
rity of CRPD research falls under the independent CRPD model. Here, costs are solely a function of the
preemption location under consideration. Since the next preemption may occur at any forward point in the
111
task code, independent CRPD methods must conservatively utilize the next code location corresponding to
the maximum CRPD cost. The interdependent CRPD cost model, however, overcomes this limitation by
considering and computing costs between each pair of task code locations thereby achieving more accu-
racy. A key factor in scheduling decisions, CRPD cost accuracy is of paramount importance to preemption
placement, optimal threshold assignment, and schedulability algorithms.
PPP algorithms select preemption points for each task to 1) minimize the overall task WCET, and 2)
ensure the execution time between adjacent preemptions is limited by the maximum non-preemptive re-
gion execution time. The maximum non-preemptive region execution time, denoted Qi, is determined via
task set schedulability analysis. The motivation behind our work is the potential benefits of preemption
threshold scheduling can be integrated with existing PPP algorithms to further enhance task set sche-
dulability. Our approach effectively integrates limited preemption scheduling via preemption placement
with preemption threshold scheduling via optimal preemption threshold assignment resulting in improved
schedulability. The benefits of our approach will be illustrated in a case study employing synthetically
generated real-time tasks.
Schedulability Analysis
In this section, analysis of FP [72], and PTS [18, 19] limited preemption scheduling, summarizes
the computation of the maximum non-preemptive region parameter Qi supporting conditional and linear
preemption point placement algorithms.
The goal of schedulability analysis is to determine whether a taskset is schedulable under the worst-
case task activation pattern for a particular scheduling paradigm. The worst-case activation pattern for
task τi, known as the critical instant, results in the maximum response time. Earlier work [43] proved the
critical instant for each task coincides with the synchronous activation of the task with all higher priority
tasks where all jobs released immediately in accordance with the minimum inter arrival time.
Limited Preemption FP Scheduling
For limited preemption FP scheduling the set of higher priority tasks is represented by:
k ∈ hp(i) = {k | πk > πi} (93)
where πj represents the assigned nominal priority for task τj . To facilitate the schedulability analysis
for FP limited preemption scheduling, the request bound function [42] RBF (t) is used to examine the
112
maximum cumulative execution request in an interval of length t generated by jobs of τi.
RBFi(t) =
⌈ t
Ti
⌉
(CNPi + γi) (94)
where γi denotes the preemption cost due to preemption by higher priority tasks hp(i) during execution of
task τi. Starting with the critical instant, the cumulative execution request in an interval t for task τi and
all higher priority FP tasks is given by:
Wi(t) =
∑
j∈i,hp(i)
RBFj(t) (95)
One way to look at this limited preemption FP schedulability constraint is to characterize the amount of
blocking tolerance βi that a task τi can withstand while meeting its deadlines.
βi = max
t∈A|t<Di
{
t−
∑
j∈hp(i),i
RBFj(t)
}
(96)
where A = {mTj ,m ∈ N, 1 ≤ j < n}. We can also characterize the blocking factor Bi each task τi
experiences which is given by:
Bi = max
l∈lp(i)
{qmaxl } (97)
where the set of lower priority tasks lp(i) is given by:
l ∈ lp(i) = {l | πl < πi} (98)
Characterized by the non-preemptive regions inherent to limited preemption scheduling, the analysis must
take into account the longest NP region in the lower priority tasks. The maximum NP region in task τi is
given by:
qmaxi ≤ Qi = min
h∈hp(i)
βh (99)
Limited Preemption Threshold Scheduling
For PTS scheduling, the set of higher priority tasks is represented by two different conditions: 1) when
a job completes, and the scheduler must select the next task job to execute, and 2) when a higher priority
task preempts an existing task during its execution at the end of a non-preemptive region (NPR). The first
condition is given by Equation 93. The second condition is given by:
k ∈ ht(i) = {k | θk > πi} (100)
113
In PTS, the tasks that may preempt task τi have a nominal priority θk exceeding the computed preemption
threshold πi. Higher priority tasks that can block task τi are those that cannot be preempted by τi due to
preemption threshold assignments is given by:
bh(i) = hp(i) \ ht(i) (101)
The set of lower priority tasks that may block task τi in cases where they are activated just before task τi
is given by:
bl(i) = lp(i) \ lt(i) (102)
where the set of lower priority tasks task τi can preempt is given by:
l ∈ lt(i) = {l | θl < πi} (103)
To facilitate the schedulability analysis for PTS scheduling, the request bound function [42] RBF (t) is
also used to examine the maximum cumulative execution request in an interval of length t generated by
jobs of τi per Equation 94. Starting with the critical instant, the cumulative execution request in an interval
t for task τi and all higher priority ht(i), higher priority blocking bh(i) and lower priority blocking bl(i)
FPTS tasks is given by:
Wi(t) =
∑
j∈ht(i),bh(i),i
RBFj(t) + max
l∈bl(i)
RBFl(t) (104)
Like FP scheduling, we can look at this limited preemption PTS schedulability constraint by characterizing
the amount of blocking tolerance βi that a task τi can withstand while meeting its deadlines.
βi = max
t∈A|t<Di
{
t−
∑
j∈ht(i),bh(i),i
RBFj(t)− max
l∈bl(i)
RBFl(t)
}
(105)
whereA is identical to the FP scheduling expression. We can also characterize the blocking factorBi each
task τi experiences which is given by:
Bi = max( max
l∈lp(i)
{qmaxl }, max
b∈bl(i)
{RBFb(t)}) (106)
Characterized by the non-preemptive regions inherent to limited preemption scheduling, the analysis must
take into account the longest NP region in the lower priority tasks. The maximum NP region in task τi is
114
given by:
qmaxi ≤ Qi = min
h∈ht(i)
βh (107)
Limited Preemption Scheduling Analysis
Since the lowest priority task τm is subjected to no blocking, by convention we have Bm = 0. We
can use the schedulability analysis derived for floating non-preemptive regions [15] to establish task set
schedulability in each case:
Wi(t) +Bi ≤ t, ∀ t ∈ mTi, m ∈ N (108)
With suitable expressions for βi introduced, we can express the task set limited preemption schedulability
constraint for FP, and PTS scheduling as summarized in Theorem 11 [14].
Theorem 11. A task set τ is schedulable with limited preemption scheduling if, ∀i | 1 ≤ i ≤ n,
Bi ≤ βi (109)
We can restate the taskset schedulability constraint in terms of the maximum non-preemptive region para-
meter Qi via Theorem 12 [14].
Theorem 12. A task set τ is schedulable with limited preemption scheduling if, ∀i | 1 < i ≤ n,
qmaxi ≤ Qi (110)
In summary, each task is permitted to execute non-preemptively for a maximum amount of time denoted
by Qi thereby ensuring taskset schedulability as long as preemption points are placed such that all non-
preemptive regions are less than or equal to Qi.
Preemption Placement Objective
Using the series-parallel graph structure, the objective of any preemption placement algorithm is to
select a single set of effective preemption points ρi that minimizes the WCET+CRPD of each task whose
real-time conditional code is given by graph Gi, subject to the constraint that all non-preemptive regions
must be less than or equal to the maximum allowable non-preemptive region parameter Qi. This objective
is formally stated as:
Preemption Placement Objective:
Given a real-time conditional flow graph Gi ∈ L(G), an independent or interdependent CRPD cost
function ξi(δxi , δ
y
i ) and WCET b
j
i for each basic block, find a set of Effective Preemption Points (EPPs)
115
ρi ⊆ E that minimizes the cost function:
Φi(Gi, ρi)
def
= max
p∈Pi(Gi,δsi ,δei )
[
∑
δxi ∈p
bxi +
∑
(δx
i
,δ
y
i
)∈p,ρi
δxi ≺pδ
y
i
ξi(δ
x
i , δ
y
i )] (111)
subject to the constraint ∀p ∈ Pi(Gi, δsi , δei ), δwi ∈ ρi,∃e
u,v
i = (δ
u
i , δ
v
i ), e
x,y
i = (δ
x
i , δ
y
i )where e
u,v
i , e
x,y
i ∈
ρi ::
[
∑
δw
i
∈p
δui pδwi pδxi
bwi + ξi(δ
u
i , δ
x
i )] ≤ Qi (112)
The cost function Φi(Gi, ρi) evaluates to the maximum cost across all paths p through the task code.
Response Time Analysis
In this section, response time analysis of FP [72], and PTS [18, 19] limited preemption scheduling,
summarizes the computation of the worst-case response time of tasks under each scheduling paradigm.
Response time analysis is used in the PTS optimal threshold assignment algorithm and in checking tas-
kset schedulability as a means of comparing the various scheduling methods in our evaluation. In each
subsection, we begin by computing the time period associated with the worst-case task activation pattern,
known as the level-i active period, for each task τi in the task set τ . For limited preemption FP threshold
scheduling, the worst-case activation pattern occurs when all higher priority tasks are activated coincident
with the activation of task τi with the activation of the blocking task with the largest execution cost just
prior to that of task τi. We subsequently compute the worst-case starting and finishing times in accordance
with the scheduling algorithm. Finally, the worst-case response time is the maximum difference between
the finish time Fi,k any of the k jobs in the level-i active period and the absolute deadline of the kth job.
Limited Preemption FP Scheduling
For limited preemption FP scheduling, the level-i active period is given by:
Li =
∑
j∈hep(i)
Ej(Li) · Φj(Gj , ρj) + max
l∈lp(i)
Ql (113)
where the maxl∈lp(j)Ql represents the worst-case blocking experienced by task τi inherent to limited
preemption scheduling and the term Ej(t) is given by:
Ej(t) = dt/Tje (114)
116
The start time Si,k of the kth job of task τi is given by:
Si,k = k ·
[
Φi(Gi, ρi) + max
l∈lp(i)
Ql
]
+
∑
j∈hp(i)
Ej(Si,k) · Φj(Gj , ρj) (115)
The finish time Fi,k of the kth job of task τi is given by:
Fi,k = Si,k +
[
Φi(Gi, ρi) + max
l∈lp(i)
Ql
]
+
∑
j∈hp(i)
(Ej(Fi,k)− Ej(Si,k)) · Φj(Gj , ρj) (116)
The worst-case response time Ri of task τi is given by:
Ri = max
k∈Ei(Li)
(Fi,k − kTi) (117)
Limited Preemption FP Threshold Scheduling
For limited preemption FP threshold scheduling, the level-i active period is given by:
Li =
∑
j∈hep(i)
Ej(Li) · Φj(Gj , ρj) + max
(
max
l∈lt(i)
Ql, max
b∈bl(i)
Φb(Gb, ρb)
)
(118)
where bl(i) represents the set of tasks that may block task τi which for limited preemption threshold
scheduling is given by Equation 102 and where lt(i) is given by Equation 103. For limited preemption
threshold scheduling, the start time Si,k of the kth job of task τi is given by:
Si,k = k · Φi(Gi, ρi) + max
(
max
l∈lt(i)
Ql, max
b∈bl(i)
Φb(Gb, ρb)
)
+
∑
j∈hp(i)
Ej(Si,k) · Φj(Gj , ρj) (119)
The finish time Fi,k of the kth job of task τi is given by:
Fi,k = Si,k + Φi(Gi, ρi) +
∑
j∈ht(i)
(Ej(Fi,k)− Ej(Si,k)) · Φj(Gj , ρj) (120)
The worst-case response time Ri of task τi is given by Equation 117.
Implementation
In this section, we present the integration of limited preemption FP scheduling with FP preemption
threshold scheduling. The integration consists of an iterative algorithm combining preemption placement
with optimal threshold assignment.
High-Level Overview
Limited preemption scheduling and preemption threshold scheduling are complimentary in that they
use the execution time slack inherent in the task set τ to permit tasks to execute non-preemptively. Limi-
ted preemption scheduling accomplishes this via a task parameter Qi specifying task τi’s maximum non-
117
preemptive execution time within which the task set τ remains schedulable. In essence, limited preemption
scheduling divides each task τi into a sequence of non-preemptive execution regions bounded by Qi. In
preemption threshold scheduling, preemption thresholds are adjusted for circumstances where the execu-
tion slack of task τi can completely subsume the execution time of task τb while still meeting its deadline,
the preemption threshold of task τb is adjusted upwards to prevent it from being preempted by task τi. Each
method attempts to introduce non-preemptive execution into a task set τ commensurate with its real-time
execution characteristics. One of the primary differences in these two methods is the level of granularity
at which the non-preemptive regions are established. Limited preemption scheduling divides each task
into non-preemptive regions or chunks whereas preemption threshold scheduling permits non-preemptive
execution at the task level.
Preemption Placement and OTA Integration
In this section, we present an algorithm that integrates preemption placement with optimal threshold
assignment as summarized in Algorithm 11.
Algorithm 11 Integrated Preemption Placement With OTA
1: Input Task set τ = τ1, ..., τn with {Vi, Ei, CNPi , Ti, Di, πi} ∀τi ∈ τ .
2: Output Task set schedulable, ρi ⊆ Ei, and Θ ∈ Π, ∀τi ∈ τ .
3:
4: for all τi ∈ τ do
5: θi ← π1;
6: Qi ← CNPi ;
7: Ci ← CNPi ;
8: ρi ← ∅;
9: end for
10: for τi ← [τ1, ..., τn] do
11: Compute Li using Equation 121;
12: Compute βi using Equations 122, 123;
13: if βi < 0 then
14: return unschedulable;
15: end if
16: for ∀τl with l ∈ lp(i) do
17: if Ql > βi or Cl > βi then
18: θl ← πi+1;
19: Ql ← min(Ql, βi);
20: ρl ← Perform task τl preemption placement
21: using Ql and the set hep(i) as preempting tasks;
22: Cl ← Φl(Gl, ρl);
23: end if
24: end for
25: end for
26: return schedulable;
118
We use a slightly modified version of Equation 118 to compute the level-i active period term Li given by:
Li =
∑
j∈hep(i)
Ej(Li) · Φj(Gj , ρj) + max
(
max
b∈bl(i)
Cb, max
l∈lt(i)
Ql
)
(121)
We also use a slightly modified version of Equation 106 to compute the task τi blocking tolerance βi term
as given by:
βi =
dLi
Ti
e−1
min
k=0
{βki } (122)
where the term βki is given by:
βki = max
t∈[kTi,kTi+Di]
{
t−
∑
j∈hep(i)
RBFj(t)
}
(123)
The algorithm begins by initializing the variables θi, Qi, Ci, and ρi. We start with setting each task’s
preemption threshold θi to the highest priority with the default state resulting in each task executing non-
preemptively (line 6). Consistent with the non-preemptive execution defaults, we initialize the variables
Qi and Ci to be the task non-preemptive execution time CNPi (lines 7,8). Lastly we initialize the task
preemption solution ρi to be empty set (line 9). At a high level, we process each task in the task set,
computing the level-i active period Li and the blocking tolerance βi (lines 13,14). If the blocking tolerance
is less than zero, we return the task as un-schedulable (lines 16-18). Next we analyze the potential blocking
introduced by all lower priority tasks (line 20). If either the maximum non-preemptive region parameter
Ql or the WCET+CRPD variable Cl exceeds the blocking tolerance of task τi (line 21), then we set the
maximum preemption threshold θl such that task τl cannot block task τi (line 22). We also constrain the
maximum non-preemptive region Ql of task τl to the blocking tolerance βi of task τi (line 23). With the
modified maximum non-preemptive region parameter Ql, we perform preemption point placement using
Ql and Θ with the set hep(i) as the preempting tasks (lines 25,26). Following preemption placement,
we update the WCET+CRPD variable Cl with the computed preemption placement costs (line 28). Once
each task has been considered along with analyzing the potential blocking introduced by lower priority
tasks, we return schedulable (line 33). A proof of correctness and discussion of our integrated preemption
placement and maximum threshold assignment is given in the following subsection.
Integrated PP/OTA Algorithm Proof of Correctness
In this section, we prove the correctness of the integrated preemption placement and maximum preemp-
tion threshold assignment algorithm as detailed in Algorithm 11.
119
Theorem 13. Given a schedulable task set τ , the task set is schedulable with limited preemption scheduling
if ∀i | 1 ≤ i ≤ n,
Ql ≤ βi ∀ l ∈ lp(i) (124)
Proof. The proof is by direct argument. Using the result of Theorem 109, we have:
Bi ≤ βi ∀i | 1 ≤ i ≤ n (125)
Substituting Equation 106 for Bi, we have:
max( max
l∈lp(i)
{qmaxl }, max
b∈bl(i)
{RBFb(t)}) ≤ βi (126)
Substituting Equation 107 for qmaxl , and simplifying the RBF term with Cb, we have:
max( max
l∈lp(i)
{Ql}, max
b∈bl(i)
{Cb}) ≤ βi (127)
The first term represents blocking via limited preemption scheduling with the second term representing
blocking via non-preemptive execution. We can reduce the constraint to the limited preemption scheduling
case since Ql ≤ Cl. Therefore, we have:
Ql ≤ βi ∀l ∈ lp(i) (128)
Thus, proving the schedulability constraint.
We use the schedulability constraint to prove the correctness of Algorithm 11.
Theorem 14. Given a schedulable task set τ , Algorithm 11 assigns the maximum non-preemptive region
set Q and maximum preemption threshold set Θ to tasks achieving schedulability if such an assignment
exists.
Proof. The proof is by induction.
Basis: We start with the highest priority task τ1. Being the highest priority task, τ1 executes completely
non-preemptively and its schedulability can be evaluated using the expression CNPi ≤ Di or βi ≥ 0. If the
condition βi ≤ 0 is true, we return unschedulable (lines 16-18). Otherwise, βi has the computed execution
slack or blocking tolerance with both Q1 and C1 set to the non-preemptive execution time CNPi .
Induction: For each task τi, assuming the previous i − 1 tasks {τ1, τ2, ..., τi−1} are schedulable, each
iteration of the outer loop, results in a schedulable task set {τ1, τ2, ..., τi}. If the task set is schedulable,
then Algorithm 11 assigns Ql and θl ∀l ∈ lep(i) such that {τ1, τ2, ..., τi} is schedulable. For each lower
120
priority task τl, the inner loop evaluates the extent to which τl can execute non-preemptively with respect
to task τi. Two non-preemptive execution possibilities exist: 1) execute completely non-preemptively if
(Cl ≤ βi), or 2) execute in limited preemption fashion (Ql ≤ βi). As required to maintain schedulability,
if either parameter Ql or Cl exceeds task τi blocking tolerance βi, then we must execute using limited
preemption scheduling only. We adjust Ql if needed to ensure task τl blocking imposed on task τi is
bounded by βi (line 23). The θi is correspondingly adjusted to a level where task τi must be able to
preempt task τl. Revisions to Ql and θl are followed by computation of a new preemption solution (lines
25,26). Otherwise if Ql and Cl are already bounded by task τi blocking tolerance βi, then no adjustments
are made and task τl may execute completely non-preemptively with respect to task τi. Thus, with each
iteration of the outer loop, the task set {τ1, τ2, ..., τi} remains schedulable with the assigned maximum
non-preemptive region set Q and maximum preemption threshold set Θ, if such an assignment exists.
Evaluation
Our integrated PPP/OTA algorithm will be evaluated using the following method, namely the average
preemption cost swept across the range of system utilization, and 2) the weighted schedulability ratio as a
function of cache utilization.
Task Set Generation
Our study will utilize synthetically generated task sets of linear CFGs along with task execution para-
meters for comparing various linear PPP algorithms against those augmented with our integrated PPP/OTA
algorithm. Each experiment generates 1000 different task sets. Target task utilization is swept in accor-
dance with the breakdown utilization algorithm. The number of tasks is randomly selected using a uniform
distribution in the interval [8, 15]. The number of basic blocks in each task is randomly selected using a
uniform distribution in the interval [20, 200]. The number of instructions in each basic block is randomly
selected using a uniform distribution in the interval [10, 100]. The WCET of each basic block is randomly
selecting using a gaussian distribution with µ = 200 and σ = 10. The absolute value of all genera-
ted WCET values are used. Our system employs a separate instruction and data direct mapped caches
containing NC = 1024 cache blocks with a total cache utilization UC = 4. The cache block size s is
randomly generated using a uniform distribution selected from the set {8, 16, 32, 64, 128} bytes. The total
cache memory size is given by the expression NC · s. The total number of task set ECBs is given by
NC · UC = 4096. Individual task cache utilization UCi is generated via the UUniFast algorithm [16].
Tasks occupy contiguous memory ordered from highest priority to lowest priority. Individual instruction
ECBs are determined from the generated memory layout. Individual data ECBs are generated via a two-
pronged approach. Within each basic block, a median RAM memory location MA is generated using a
121
uniform distribution over the RAM memory space. The RAM memory accessed by each instruction is
generated using a gaussian distribution with µ = MA and σ = 512 bytes. Task data UCBs have a targeted
overlap of 40% with its data ECBs. Task periods Ti are generated using a uniform distribution in the in-
terval [10, 1000] mS. Arbitrary task deadlines Di are generated using a uniform distribution in the interval
[(Ci+Ti)/2, 4Ti]. Individual task utilization are generated via the UUniFast algorithm [16]. Nominal task
priorities Π are assigned in descending order commensurate with a Deadline Monotonic policy.
Availability
The research community may reproduce and leverage our work via the developed programs and ana-
lyzed data archived at GitHub [23].
Results
The results are presented as an illustration of the potential benefit of our integrated PPP/OTA algorithm
combining limited preemption scheduling with preemption threshold scheduling, in comparison to existing
limited preemption methods.
System Utilization Schedulability
In this section, we study the performance of limited preemption scheduling using preemption pla-
cement only as a baseline, as compared to preemption placement integrated with preemption threshold
scheduling via optimal threshold assignment. We perform a breakdown utilization comparison between
the two approaches along with a comparison of the average preemption cost swept across the system utili-
zation range for schedulable task sets. By comparing these two methods we illustrate the superiority of the
integrated preemption placement/preemption threshold scheduling algorithm versus limited preemption
scheduling via preemption placement only. Figure 73 illustrates the breakdown utilization comparison.
The limited preemption scheduling via preemption placement is denoted as LEPP whereas the integrated
preemption placement/preemption threshold scheduling method is denoted as LEPPOTA.
122
Figure 73: Breakdown Utilization Comparison.
For large block reload times, the integrated preemption placement/preemption threshold scheduling algo-
rithm dominates the preemption placement only algorithm. This trend is also seen in Figure 74.
Figure 74: Average Preemption Cost Comparison.
The ability of the integrated approach to adjust the preemption thresholds to permit tasks to execute non-
preemptively where possible has further benefits in reduced preemption placement costs. As expected,
both methods converge as the block reload time approaches zero.
123
Summary
In this work, we presented an algorithm that integrates limited preemption scheduling via conditi-
onal preemption placement with preemption threshold scheduling via optimal preemption threshold as-
signment. Our experiments demonstrated the schedulability dominance of the integrated approach over
existing preemption placement algorithms. In future work, we plan to evaluate this integrated approach on
a wider range of linear and conditional real-time programs.
124
CHAPTER 8 FUTURE WORK
In this chapter, we discuss the portion of the research that has not been completed. As we explained
in the previous chapters, our goal is to develop a generalized schedulability framework supporting the
hard-real-time system design process for the limited preemption task execution model. This framework
addresses taskset schedulability determination for real-time systems with physical, hardware and timing
challenges. So far, we have completed a subset of our main research and proposed a new CRPD metric,
called loaded cache blocks (LCB) which accurately characterizes the CRPD a real-time task may be sub-
jected to due to the preemptive execution of higher priority tasks. We showed how to integrate our new
LCB metric into our newly developed algorithms that automatically place preemption points supporting
linear control flow graphs (CFGs) for limited preemption scheduling applications. Finally, towards the
end of our defense, we complete the generalized schedulability framework that will ensure that hard-real-
time tasksets will be schedulable at design time. In the next section, we discuss the remaining work to be
completed.
Future Work
While significant progress has been made in preemption placement supporting limited preemption
scheduling in recent years, there are several open problems that remain to be solved in order to permit
limited preemption scheduling using PPP algorithms to migrate into industry applications supporting the
development cyber-physical systems. A few of these open problems are summarized below.
Interdependent CRPD for Set Associative Caches with Preemption Placement
We extend our LCB metric to handle the complexity of N-way set associative caches. In this design,
the chosen cache replacement policy significantly impacts the algorithms needed to compute a safe bound
or the CRPD. While algorithms exist to compute CRPD for set associative caches, we will specifically
address the problem of computing interdependent CRPD for the purposes of eventual preemption point
placement. The challenge this work presents is to design the LCB metric in such a manner as to apply
to the broadest range of cache replacement policies as possible. Cache replacement policies under consi-
deration are the algorithms covered in Chapter 3. Cache replacement policies not fitting into the broader
framework will require custom extensions to provide the requisite data needed by the preemption pla-
cement algorithm. Previous work focused on direct mapped cache designs, thereby engendering a need
to extend the benefits of limited preemption scheduling models to hard-real-time systems employing N-
way set associative cache designs. We will evaluate the applicability or suitability of these algorithms to
the various cache replacement policies in use today. Lastly, we integrate our revised LCB metric with
125
preemption point placement algorithms applicable to both linear and arbitrary control flow graphs.
Real-Time Systems Computational Framework
We implement a real time systems computational framework integrating our CRPD computation algo-
rithms with existing tools supporting WCET analysis. Subcomponents of this framework include WCET
analysis, CRPD analysis, preemption point placement, and schedulability analysis. Figure 75 illustrates a
high-level diagram of the various analysis steps, their input data, and output data to give an idea of what
is envisioned. Blocks with a red accent denote existing tools, while blocks with a blue accent denote our
contributions. WCET analysis tools analyze program structure computing the worst-case execution time
Figure 75: Real-Time Systems Computational Framework.
of each instruction or basic block. CRPD analysis tools analyze program structure computing the set of
useful cache blocks thereby determining the potential impact task preemption has at each basic block.
Preemption point placement algorithms utilize the computed WCET and CRPD values at each basic block
to determine optimal preemption point locations with the goal of minimizing preemption overhead for
hard-real-time tasksets. Schedulability analysis techniques analyze modified taskset characteristics with
the CRPD overhead considered to determine the amount of available scheduling slack key to determining
if a taskset is schedulable. The schedulability analysis serves as the input to preemption point placement
supporting linear and arbitrary control flow graphs. The output of schedulability analysis, the allowa-
ble scheduling slack, also known as the maximum non-preemptive region Qi, is a key parameter used in
our optimal preemption point placement algorithm. Each preemption point placement solution results in
new scheduling slack that affects preemption point placement during subsequent iterations. The algorithm
126
converges when the selected preemption points no longer change or the slack reaches some minimum
threshold. In summary, our method uses an iterative approach to refine the preemption points of tasks
factoring in the updated scheduling slack until the chosen preemption points or scheduling slack no lon-
ger changes. The means to making these various tools usable for the hard-real-time systems designer is
the integration of the iterative schedulability analysis with interdependent CRPD based preemption point
placement. In this context, our work will implement the schedulability analysis for two commonly used
scheduling algorithms. These algorithms are the earliest deadline first (EDF) and fixed priority (FP). The
framework will be designed in such a manner to offer the ability to customize for future scheduling al-
gorithms supporting a flexible, open source software organization. The evaluation will demonstrate the
requisite function on the Malardalen Real Time Code (MRTC) task suite [51], deemed to be of suitable
size and complexity to support an initial tool feasibility assessment.
Real-Time Systems Computational Framework Case Study Evaluation
We utilize our newly developed real time systems computational framework to analyze a real time
systems application of suitable size and complexity. Subcomponents of this framework include WCET
analysis, CRPD analysis, preemption point placement, and schedulability analysis. The overall tool im-
provement focus is essentially a collection of graphical based methods for visualization and automated
back annotation of downstream scheduling design decisions onto real time code, as briefly described be-
low. Visualization of WCET analysis results would include numerical display of basic block worst-case
execution time annotated in the graphical representation of the control flow graph structure. Visualization
of CRPD analysis would take the form of a two-dimensional graph showing the CRPD cost assuming a
given basic block is selected as a preemption point. The hard-real-time systems designer would access
this information by selecting a specific edge in the control flow graph. Visual methods supporting back
annotation of real time code with the selected optimal preemption points will be implemented. Within
our graphical design framework, designers will have the option of selecting required preemption points
to augment the preemption point placement algorithms. The inherent challenge in this work is to design
the requisite information exchange between the various sub-components in such a way as to allow support
for new scheduling algorithms, cache replacement policies, and new taskset characteristics. Breakdown
utilization studies will illustrate for each task where the taskset schedulability breaks down so refactoring
of the design can readily occur in problematic software components. The evaluation of our real time sys-
tems computational framework will focus on a case study involving analysis of a real-time application of
moderate complexity deemed suitable in accordance with analysis time constraints.
127
Non-Inline Function Support for Preemption Placement in Real-Time Program Code
One of the most significant limitations of existing preemption placement algorithms is they only pro-
vide support for inline functions, thereby excluding non-inline functions. One of the primary motivations
for working on this problem is the future commercial viability of limited preemption scheduling via in-
terdependent CRPD cost model, by extending the supported real-time task code constructs to non-inline
functions. To understand the context in which we solve for and obtain a single minimized preemption
solution for each function definition each having kj invocations in the real-time task code where j is the
function definition index consider the aggregate subgraph GFi shown in Figure 76.
 
FCALL FUNCTION BLOCKS 
Solution Interface Solution Interface 

 𝑃𝑅𝐸𝐷
 𝐹𝐶𝐴𝐿𝐿
  𝑆𝑈𝐶𝐶
 𝐹𝐶𝐴𝐿𝐿
 
 𝑃𝑅𝐸𝐷
 𝐹𝑈𝑁𝐶
 
 𝑆𝑈𝐶𝐶
 𝐹𝑈𝑁𝐶
 
 𝑃𝑅𝐸𝐷
 𝐵𝐿𝐾𝑆
 
 𝑆𝑈𝐶𝐶
 𝐵𝐿𝐾𝑆
 
Figure 76: Function Definition Preemption Placement Problem
Each function invocation in the real-time task code consists of a function call subgraph GFCALLi , a
function definition subgraph GFUNCi , and a post function execution subgraph G
BLKS
i . To solve this
problem, we intuitively must compute the minimized function definition at the point where the preemption
solutions are available for each of the kj invocations in the subgraphs GFCALLi , G
FUNC
i , and G
BLKS
i .
Using the brute force approach where all (Qi + 1)2 preemption solutions for each of the k function invo-
cations are simultaneously iterated results in an intractably exponential time complexity O(kNiQ2
k
i ). To
visualize the complexity of computing a single preemption solution for each function definition, and to
demonstrate the feasibility of solving this problem, consider the following graphical depiction as shown in
Figure 77. The blue dots represent the function call GFCALLi subgraphs for each of the k invocations of
 
Top view 
𝑒𝑘−1
𝐹𝐶𝐴𝐿𝐿
 
𝑒𝑘
𝐹𝐶𝐴𝐿𝐿
 
𝑒1
𝐵𝐿𝐾𝑆
 
𝑒2
𝐵𝐿𝐾𝑆
 
𝑒3
𝐵𝐿𝐾𝑆
 
𝑒1
𝐹𝐶𝐴𝐿𝐿
 
𝑒2
𝐹𝐶𝐴𝐿𝐿
 
𝑒3
𝐹𝐶𝐴𝐿𝐿
 
𝑒𝑗
𝐹𝐶𝐴𝐿𝐿
 𝑒𝑗
𝐵𝐿𝐾𝑆
 
𝑒𝑘−1
𝐵𝐿𝐾𝑆
 
𝑒𝑘
𝐵𝐿𝐾𝑆
 
Side view 
0 
1 
2 
3 
4 
𝑸𝒊 
∞ 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
To 
∞ 
∞ 
∞ 
∞ 
𝑸𝒊-1 
Side view 
0 
1 
2 
3 
4 
𝑸𝒊 
∞ 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
0 
1 
𝑸𝒊-1 
𝑸𝒊 
To 
∞ 
∞ 
∞ 
∞ 
𝑸𝒊-1 
Figure 77: Compute Function Definition Preemption Problem
128
function f . Similarly, the yellow dots represent the post function execution GBLKSi subgraphs following
each of the k invocations of function f . Lastly, the red dot denotes one of the (Qi + 1)2 preemption
solutions stored for the function f definition subgraph GFUNCi . The side views depict separate vertices
for each of the (Qi + 1)2 preemption solutions stored in the GFCALLi and G
BLKS
i subgraphs respectively.
Incorporating support for non-inline functions to preemption placement algorithms will serve to extend
limited preemption scheduling to a wider class of real-time program code.
Since function invocations can occur anywhere in the task code, this creates challenges in the context
of the progression of in line production rules in our graph grammar G. With the objective of a single
preemption solution for each function definition subgraph GFUNCi , semantic processing of production
rules must be suspended at each function invocation until the subgraph solutions for all invocations are
available. To illustrate this concept, consider the following abstract parse tree shown in Figure 78.
 
𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑐𝑎𝑙𝑙𝑠 
𝑛𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑐𝑎𝑙𝑙𝑠 
Figure 78: Parse Tree Function Invocations
Function calls are indicated by the filled red dots and the non-function calls are indicated by the hollow
dots. This example demonstrates the conditions for computing a single function definition preemption
solution does not coincide with the ordering of production rules. Figure 79 illustrates the parsing progress
limited due to the presence of function f calls in the task code.
 
𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 
𝑤𝑎𝑖𝑡𝑖𝑛𝑔 to be executed 
𝑠𝑢𝑠𝑝𝑒𝑛𝑑𝑒𝑑, waiting for function f preemption solution 
Figure 79: Function Call Parsing Progress
The green dots denote subgraphs where the preemption solutions have been computed during parsing.
The red dots denote function call subgraphs where preemption solutions cannot be computed until the
corresponding function definition preemption solution has been computed. The yellow dots denote higher
level subgraphs whose preemption solutions are dependent on the function call preemption solutions. Once
129
all the calls to function f have been encountered in the parse tree, we compute the minimized function f
definition preemption solution as shown in Figure 80.
 
𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 
𝑤𝑎𝑖𝑡𝑖𝑛𝑔 to be executed 
𝑠𝑢𝑠𝑝𝑒𝑛𝑑𝑒𝑑, waiting for function f preemption solution 
𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 
Figure 80: Function Call Preemption Solutions Computed
The green dots outlined in red denote the function call subgraphs where the preemption solutions have been
computed from the single function f preemption solution. Once these preemption solutions are available,
the remaining subgraph nodes whose preemption solution computations that were previously suspended
may now be processed as shown in Figure 81.
 
𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 
𝑤𝑎𝑖𝑡𝑖𝑛𝑔 to be parsed 
𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 
𝑠𝑢𝑠𝑝𝑒𝑛𝑑𝑒𝑑 𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 
Figure 81: Graph Parsing Completed
To accomplish this modified ordering of parsing semantics, we must store suspended productions in a FIFO
queue Q until the requisite function definition preemption solution dependencies have been resolved.
Goto Statement Support for Preemption Placement in Real-Time Program Code
Similar to the incorporation of non-inline function support in preemption placement algorithms, the
addition of support for goto statements would extend preemption placement to a profoundly broader set of
real-time program code. The complexity of goto statement support far exceeds that of non-inline functions
since the use of goto statements adds preemption placement dependencies between program constructs
in different locations. We envision the primary tenets of the proposed solution to non-inline functions to
be directly applicable to goto statements. Goto statements are ubiquitous in low level assembly code due
to compiler optimizations employed during the code generation phase. Because the grammar supporting
existing preemption placement algorithms is largely comprised of structured programming constructs, it is
necessary to semi-manually post process the generated assembly code to fit the inherent grammar limitati-
130
ons. This post processing step is arduous, time consuming, and error prone regardless of whether manual
or automated methods are applied, thereby limiting the feasibility of preemption placement algorithms to
small applications. Thus, by integrating goto statement support into preemption placement algorithms, the
need for post processing of real-time program code would be eliminated.
Interdependent Probabilistic CRPD with Preemption Point Placement
We extend our LCB metric to handle the complexity of N-way set associative caches with random
cache replacement policy. In this design, the random cache replacement policy requires a probabilistic
approach to the analysis in order to compute a safe bound or the CRPD. In this probabilistic analysis,
we will specifically address the problem of computing interdependent CRPD for the purposes of eventual
preemption point placement. The challenge this work presents is the probabilistic nature the random cache
replacement policy engenders is the inherent non-deterministic behavior prominently affects the methods
for computing CRPD. Unlike previous methods, it is impossible to determine the exact cache contents at
any program location. It will be imperative that our research offers the hard-real-time systems designer
a statistical theoretical framework that will result in safe yet reasonably accurate CRPD metrics based on
sound probabilistic analysis methods. Lastly, we integrate our revised LCB metric with preemption point
placement algorithms applicable to both linear and arbitrary control flow graphs.
Real Time Systems Computational Framework
Our future objective is to make the capabilities of our real time systems computational framework
prototype available to companies developing cyber-physical systems. To accomplish this, it will be im-
perative that a robust, commercial tool implementation with suitable customer support must exist moving
forward. Opportunities to partner with existing WCET analysis tool vendors with the goal of incorporating
our research into their tools will be realized.
Multiprocessor Systems
Multiprocessor or multi-core CPUs are being ubiquitously used in cyber-physical systems to meet the
rapidly increasing computational complexity witnessed in many application domains today. Real-time
systems research pertaining to multiprocessor systems is highly active with the schedulability challenges
posed by task execution concurrency. Opportunities to extend our CRPD metric into a multiprocessor
concurrent execution environment will be explored.
Multi-Level Cache Analysis
Multi-level cache analysis addresses the complexity introduced by cache memory designs containing
more than one or multiple levels. Multi-level cache designs are primarily used in multiprocessor or multi-
131
core CPUs to economically improve cache hit performance. Multi-level caches may employ disparate
inclusion properties that determine how cache block evictions at lower cache levels affect upper cache
levels. Opportunities to extend our CRPD metric into multi-level cache designs will be explored, proposed,
and validated.
Summary
Hard-real-time systems frequently subjected to various physical, hardware and timing constraints are
a growing research area. There are many emerging and growing career opportunities in cyber-physical
systems (CPS) design that require a broad understanding of a diverse set of real-time system concepts and
disciplines.
132
CHAPTER 9 CONCLUSION
In this dissertation, we introduced a new CRPD metric, called loaded cache blocks (LCB) which accu-
rately characterizes the CRPD a real-time task may be subjected to due to the preemptive execution of
higher priority tasks. We showed how to integrate our new LCB metric into our newly developed al-
gorithms that automatically place preemption points supporting linear and conditional CFGs for limited
preemption scheduling applications. Essentially, this framework addresses taskset schedulability determi-
nation for real-time systems with physical, hardware, and timing challenges. Furthermore, our enhanced
algorithm was demonstrated to be optimal for linear CFGs in that if a feasible taskset schedule cannot
be constructed, then no feasible schedule exists by any known method utilizing a static Qi value. For
conditional CFGs, our proposed PPP algorithm was suboptimal due to the method of handling of preemp-
tion costs between successive block structures. Nonetheless, our conditional PPP algorithm demonstrated
superior performance as compared to existing schedulability approaches. Finally, our design and analy-
sis framework may be classified as limited preemption point placement and schedulability analysis for
hard-real-time systems.
As a proof of concept, we demonstrate the effectiveness of our approach via experiments using tasksets
from the MRTC WCET benchmark suite [51]. We show how to solve some of the issues and challenges
of designing predictable real-time systems that must guarantee hard deadlines. In our framework, the
system designer specifies the required deadlines that hard-real-time taskset must meet where the system
automatically adjusts the taskset preemption points to ensure taskset schedulability.
The impact of our work to the hard-real-time systems development community is to facilitate the mi-
gration of limited preemption scheduling from academia to mainstream cyber-physical systems applicati-
ons in industry. With respect to minimizing preemption overhead, limited preemption scheduling has been
shown to exhibit superior performance as compared to fully preemptive and non-preemptive scheduling in
hard-real-time tasksets. The benefit to cyber-physical systems (CPS) is improved utilization of hardware
resources via reduced overhead enabled by increasing the accuracy of CRPD computations and automa-
ting optimal preemption point placement. Furthermore, the future integration of WCET analysis, CRPD
analysis, schedulability analysis, and preemption point placement into a single unified real-time systems
computational framework will serve to reduce the overall system schedulability design time allowing real-
time system designers to focus on the more critical and important tasks of ensuring correct and robust
system operation. As new product features are added, the integrated real-time systems schedulability tools
will permit system designers to quickly redo the previous analysis to ensure the system remains schedula-
ble. Therefore, these improvements in CPU utilization and reduced design time will result in substantial
133
cost reductions in CPS products thereby benefiting both system producers and consumers alike.
134
CHAPTER 10 LIST OF PUBLICATIONS
JOURNAL
Under Review
(a) John Cavicchio and Nathan Fisher. Realizing Improved Preemption Placement in Real-
Time Program Code with Interdependent Cache Related Preemption Delay, Real-Time
Systems Journal (Status: Submitted).
CONFERENCE & WORKSHOP
Published
(a) Brandon Szeliga, John Cavicchio, and Weisong Shi. DIMM: Distributed Metadata Ma-
nagement for Data-Intensive HPC Environments, Proc. of the IEEE Workshop on Data-
Aware Distributed Computing HPDC Symp., Boston, 2008.
(b) John Cavicchio, Nathan Fisher, and Corey Tessler. Minimizing Cache Overhead via Lo-
aded Cache Blocks and Preemption Placement, Proc. of the Euromicro Conference on
Real-Time Systems Symp., Sweden, 2015.
IN PREPARATION
1. John Cavicchio and Nathan Fisher. Integrating Preemption Thresholds with Limited Preemp-
tion Scheduling, will submit to Euromicro Conference on Real-Time Systems Symp., 2019.
135
REFERENCES
[1] AbsInt. AbsInt a3 WCET Tool, 2018.
[2] T. Acharya and M. Ladlow. Cache replacement algorithms in hardware. Technical Report
Swarthmore College, May 2008.
[3] A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools Second Edition.
2007.
[4] A.V. Aho, P.J. Denning, and J.D. Ullman. Principles of optimal page replacement. J. ACM, 18(1):80–
93, 1971.
[5] S. Altmeyer and C. Burguiere. Cache-related preemption delay via useful cache blocks: Survey and
redefinition. Journal of Systems Architecture (JSA), Elsevier, 2011.
[6] S. Altmeyer, R. Davis, and C. Maiza. Cache related pre-emption delay aware response time analysis
for fixed priority pre-emptive systems. In RTSS, 2011.
[7] S. Altmeyer, R. Davis, and C. Maiza. Improved cache related pre-emption delay aware response time
analysis for fixed priority pre-emptive systems. Real Time Systems, Springer, 2012.
[8] S. Bansal and D.S. Modha. CAR: Clock with adaptive replacement. In USENIX Conference on File
and Storage Technologies (FAST 04), San Francisco, CA, March 31-April 2, 2004.
[9] S. Baruah. The limited-preemption uniprocessor scheduling of sporadic task systems. In ECRTS,
2005.
[10] S. Baruah and J. Goossens. Scheduling real-time tasks: Algorithms and complexity. In Joseph Y.-T
Leung, editor, Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press
LLC, 2003.
[11] L.A. Belady. A study of replacement algorithms for virtual storage computers. IBM Sys. J., 5(2):78–
101, 1966.
[12] M. Bertogna and S. Baruah. Limited preemption EDF scheduling of sporadic task systems. In RTSS,
2011.
[13] M. Bertogna, G. Buttazzo, M. Marinoni, G. Yao, F. Esposito, and M. Caccamo. Preemption points
placement for sporadic task sets. In ECRTS, 2010.
[14] M. Bertogna, O. Xhani, M. Marinoni, F. Esposito, and G. Buttazzo. Optimal selection of preemption
points to minimize preemption overhead. In ECRTS, 2011.
[15] E. Bini and G. C. Buttazzo. Schedulability analysis of periodic fixed priority systems. In IEEE
Transactions on Computing, 53(11):1462–1473, 2004.
136
[16] E. Bini and G. C. Buttazzo. Measuring the performance of schedulability tests. Real-Time Systems,
30(1):129–154, 2005.
[17] C. Bohm and G. Jacopini. Flow diagrams, turing machines and languages with only two formation
rules. Communications of the ACM, 9(5):366–371, May 1966.
[18] R. Bril, S. Altmeyer, M. van den Heuvel, R. Davis, and M. Behnam. Fixed priority scheduling with
pre-emption thresholds and cache-related pre-emption delays: integrated analysis and evaluation.
Real-Time Systems, 53(4):403–466, Jul 2017.
[19] R. Bril, S. Altmeyer, M. van den Heuvel, R. Davis, and M. Benham. Integrating cache-related pre-
emption delays into analysis of fixed priority scheduling with preemption thresholds. In RTSS, 2014.
[20] A. Burns. Preemptive priority-based scheduling: an appropriate engineering approach. 1995.
[21] G. Buttazzo. Hard Real Time Computing Systems: Predictable Scheduling Algorithms and Applica-
tions. 2005.
[22] G. C. Buttazzo, M. Bertogna, , and G. Yao. Limited preemptive scheduling for real-time systems: A
survey. IEEE Transactions on Industrial Informatics, 9(1), February 2013.
[23] J. Cavicchio and N. Fisher. Paper Programs and Data Repository, 2018.
[24] J. Cavicchio, C. Tessler, and N. Fisher. Paper Programs and Data Repository, 2014.
[25] J. Cavicchio, C. Tessler, and N. Fisher. Minimizing cache overhead via loaded cache blocks and
preemption placement. In ECRTS, 2015.
[26] J.E.G. Coffman and P.J. Denning. Operating Systems Theory. 1973.
[27] F.J. Corbato. A paging experiment with the multics system. MIT Project MAC Report MAC-M-384,
May 1968.
[28] S. Dar, M.J. Franklin, B. Jonsson, D. Srivastava, and M. Tan. Semantic data caching and replacement.
In Proceedings of the International Conference on Very Large Data Bases (VLDB), 1996.
[29] C. Ding and Y. Zhong. Predicting whole-program locality through reuse-distance analysis. Procee-
dings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June,
2003.
[30] Gaisler. Gaisler Bare-C Cross Compiler (BCC), 2018.
[31] Gaisler. GRSIM, 2018.
[32] G. Goud, N. Sharma, K. Ramamritham, and S. Malewar. Efficient real-time support for automotive
applications: A case study. Proc. of the Embedded and Real-Time Computing Systems and Applica-
tions, 2006.
[33] J. Handy. The Cache Memory Book: The Authoritative Reference on Cache Design, Second Edition.
137
1998.
[34] D. Hardy, B. Rouxel, and I. Puaut. The heptane static worst-case execution time estimation tool. In
WCET, 2017.
[35] S. Jiang, X. Zhang, , and F. Chen. Clock-pro: An effective improvement of the clock replacement.
ATEC Proceedings of the Annual Conference on USENIX Anuual Tecchnical Conference, pp.35,
2005.
[36] S. Jiang and X. Zhang. LIRS: An efficient low interreference recency set replacement policy to
improve buffer cache performance. In Proceedings of the ACM SIGMETRICS Conference on Mea-
surement and Modeling of Computer Systems, 2002.
[37] K. Kennedy and L. Zucconi. Applications of a graph grammar for program control flow analysis. In
Proceedings ACM SIGACT/SIGPLAN Symposium on Principles of programming languages, 1977.
[38] H. Kikuchi, R.K. Kalia, A. Nakano, P. Vashishta, F. Shimojo, and S. Saini. Scalability of a low-cost
multi-teraflop linux cluster for high-end classical atomistic and quantum mechanical simulations. In
IPDPS ’03: Proceedings of the 17th International Symposium on Parallel and Distributed Proces-
sing, page 66.2, Washington, DC, USA, 2003. IEEE Computer Society.
[39] C.-G. Lee, J. Hahn, S.L. Min, R. Ha, S. Hong, C.Y. Park, M. Lee, and C.S. Kim. Analysis of cache-
related preemption delay in fixed-priority preemptive scheduling. IEEE Transactions on Computers,
47(6):700–713, 1998.
[40] D. Lee, J. Choi, J.-H. Kim, S.H. Noh, S.L. Min, Y. Cho, and C. S. Kim. On the existence of a
spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU)
policies. In ACM SIGMETRICS Performance Evaluation Review, volume 27, pages 134–143. ACM,
1999.
[41] D. Lee, J. Choi, J.-H. Kim, S.H. Noh, S.L. Min, Y. Cho, and C. S. Kim. LRFU: A spectrum of
policies that subsumes the least recently used and least frequently used policies. IEEE Transactions
on Computers, 50(12):1352–1361, 2001.
[42] J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: Exact characterization
and average case behavior. In Proceedings of the Real-Time Systems Symposium - 1989, Santa
Monica, California, USA, IEEE Computer Society Press, pages 166–171, December 1989.
[43] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time envi-
ronment. Journal of the ACM (JACM), 20(1):46–61, 1973.
[44] J. W. S. Liu. Real-Time Systems. Prentice-Hall, Inc., Upper Saddle River, New Jersey 07458, 2000.
[45] Advantech Co. Ltd. Advancech product hardware. http://www.advantech.com/products/, September
138
2013.
[46] Analog Devices Co. Ltd. Linux industrial input-output subsystems.
http://wiki.analog.com/software/linux/docs/iio/iio, July 2012.
[47] W. Lunniss, S. Altmeyer, C. Maiza, and R. Davis. Integrating cache related pre-emption delay ana-
lysis into EDF scheduling. In RTAS, 2013.
[48] J. Manuel Marinho, V. Nélis, S. M. Petters, and I. Puaut. Preemption delay analysis for floating non-
preemptive region scheduling. In Design, Automation & Test in Europe Conference & Exhibition
(DATE), 2012, pages 497–502. IEEE, 2012.
[49] R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. Evaluation techniques for storage hierarchies.
IBM Sys. J., 9(2):78–117, 1970.
[50] N. Megiddo and D.S. Modha. ARC: A self-tuning, low overhead replacement cache. In Proceedings
of the USENIX File and Storage Technologies (FAST), 2003.
[51] MRTC. Malardalen WCET research group MRTC benchmarks, 2018.
[52] H. S. Negi, T. Mitra, and A. Roychoudhury. Accurate estimation of cache related preemption delay.
In CODES, 2003.
[53] E.J. O’neil, P.E. O’neil, and G. Weikum. The LRU-K page replacement algorithm for database disk
buffering. In ACM SIGMOD, 22(2):297–306, 1993.
[54] E.J. O’neil, P.E. O’neil, and G. Weikum. An optimality proof of the LRU-K page replacement
algorithm. Journal of the ACM (JACM), 46(1):92–112, 1999.
[55] R. Pellizzoni, E. Betti, S. Bak, J. Criswell, M. Caccamo, and R. Kegley. A predictable execution
model for COTS-based embedded systems. In RTAS, 2011.
[56] R. Pellizzoni, B.D. Bui, M. Caccamo, and L. Sha. Coscheduling of CPU and I/O transactions in
COTS-based embedded systems. In RTSS, 2008.
[57] R. Pellizzoni and M. Caccamo. Toward the predictable integration of real-time COTS-based systems.
In RTSS, 2007.
[58] B. Peng, N. Fisher, and M. Bertogna. Explicit preemption placement for real-time conditional code.
In ECRTS, 2014.
[59] P. Puschner and Ch. Koza. Calculating the maximum execution time of real time programs. In
Real-Time Systems, 1989.
[60] H. Ramaprasad and F. Mueller. Bounding preemption delay within data cache reference patterns for
real-time tasks. In RTAS, 2006.
[61] J.T. Robinson and M.V. Devarakonda. Data cache management using frequency-based replacement,
139
volume 18. ACM, 1990.
[62] J. Sifakis. Modeling real-time systems - challenges and work directions. In In Proceedings of the
1st International Workshop on Embedded Software (EMSOFT), Lecture Notes in Computer Science,
pages 373–389. Springer Verlag, 2001.
[63] A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts (Seventh Edition). 2005.
[64] J. Simonson and J.H. Patel. Use of preferred preemption points in cache-based real-time systems.
In Computer Performance and Dependability Symposium, 1995. Proceedings., International, pages
316–325. IEEE, 1995.
[65] A.J. Smith. Design of CPU cache memories. In Proceedings of IEEE Region 10 Conference (TEN-
CON), 1987.
[66] J. Staschulat and R. Ernst. Scalable precision cache analysis for real-time software. ACM Transacti-
ons on Embedded Computing Systems (TECS), 6(4), September 2005.
[67] Y. Tan and V. Mooney. Integrated intra- and inter-task cache analysis for preemptive multi-tasking
real-time systems. In SCOPES, 2004.
[68] A.S. Tanenbaum and A.S. Woodhull. Operating Systems, Design and Implementation. 1997.
[69] H. Tomiyamay and N. D. Dutt. Program path analysis to bound cache-related preemption delay in
preemptive real-time systems. In CODES, 2000.
[70] Y. Wang and M. Saksena. Scheduling fixed-priority tasks with preemption threshold. In the Interna-
tional Conference on Real Time Computing Systems and Applications, 1999.
[71] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand,
R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenström. The
worst-case execution-time problem–overview of methods and survey of tools. ACM Transactions on
Embedded Computing Systems (TECS), 7(3):36:1–36:53, May 2008.
[72] G. Yao, G. Buttazzo, and M. Bertogna. Bounding the maximum length of non-preemptive regions
under fixed priority scheduling. In RTCSA, 2009.
140
ABSTRACT
EFFECTIVE AND EFFICIENT PREEMPTION PLACEMENT FOR CACHE OVERHEAD
MINIMIZATION IN HARD REAL-TIME SYSTEMS
by
JOHN C. CAVICCHIO
May 2019
Advisor: Dr. Nathan Fisher
Major: Computer Science
Degree: Doctor of Philosophy
Schedulability analysis for real-time systems has been the subject of prominent research over the past
several decades. One of the key foundations of schedulability analysis is an accurate worst case execution
time (WCET) for each task. In preemption based real-time systems, the CRPD can represent a significant
component (up to 44% as documented in research literature) of variability to overall task WCET. Several
methods have been employed to calculate CRPD with significant levels of pessimism that may result in a
task set erroneously declared as non-schedulable. Furthermore, they do not take into account that CRPD
cost is inherently a function of where preemptions actually occur. Our approach for computing CRPD via
loaded cache blocks (LCBs) is more accurate in the sense that cache state reflects which cache blocks and
the specific program locations where they are reloaded.
Limited preemption models attempt to minimize preemption overhead (CRPD) by reducing the num-
ber of allowed preemptions and/or allowing preemption at program locations where the CRPD effect is
minimized. These algorithms rely heavily on accurate CRPD measurements or estimation models in order
to identify an optimal set of preemption points. Our approach improves the effectiveness of limited opti-
mal preemption point placement algorithms by calculating the LCBs for each pair of adjacent preemptions
to more accurately model task WCET and maximize schedulability as compared to existing preemption
point placement approaches. We utilize dynamic programming technique to develop an optimal preemp-
tion point placement algorithm. Lastly, we will demonstrate, using a case study, improved task set sche-
dulability and optimal preemption point placement via our new LCB characterization.
We propose a new CRPD metric, called loaded cache blocks (LCB) which accurately characterizes
the CRPD a real-time task may be subjected to due to the preemptive execution of higher priority tasks.
We show how to integrate our new LCB metric into our newly developed algorithms that automatically
place preemption points supporting linear control flow graphs (CFGs) for limited preemption scheduling
141
applications.
We extend the derivation of loaded cache blocks (LCB), that was proposed for linear control flow
graphs (CFGs) to arbitrary CFGs. We show how to integrate our revised LCB metric into our newly
developed algorithms that automatically place preemption points supporting arbitrary control flow graphs
(CFGs) for limited preemption scheduling applications.
For future work, we will verify the correctness of our framework through other measurable physical
and hardware constraints. Also, we plan to complete our work on developing a generalized framework
that can be seamlessly integrated into real-time schedulability analysis.
142
AUTOBIOGRAPHICAL STATEMENT
JOHN C. CAVICCHIO
John Cavicchio received his Master’s degree in Computer Science in 2013 from Wayne State Uni-
versity, Detroit, Michigan USA. He previously received a Master’s degree in Electrical Engineering from
the University of Michigan and a Bachelor’s degree in Computer Science from Michigan State University.
He currently works for Harman International as a Principal Lead Software Engineer in audio embedded
systems and plans on continuing his research in real-time systems. His research interests include real-time
systems, scheduling theory, distributed systems, algorithm design and analysis, embedded systems, worst-
case execution time (WCET) analysis, cache analysis, cache related preemption delay (CRPD) analysis,
limited preemption scheduling, and real-time system architecture.
