Compiler-Assisted Scheduling for Real-Time Applications: A Static Alternative to Low-Level Tuning by Hong, Seongsoo
AbstractTitle of Dissertation: Compiler-Assisted Scheduling for Real-Time Applications:A Static Alternative to Low-Level TuningSeongsoo Hong, Doctor of Philosophy, 1994Dissertation directed by: Assistant Professor Richard GerberDepartment of Computer ScienceDeveloping a real-time system requires nding a balance between the timing constraints and thefunctional requirements. Achieving this balance often requires last-minute, low-level interventionin the code modules { via intensive hardware-based instrumentation and manual program opti-mizations. In this dissertation we present an automated, static alternative to this kind of human-intensive work. Our approach is motivated by recent advances in compiler technologies, which weextend to two specic issues on real-time programming, that is, feasibility and schedulability.A task is infeasible if its execution time stretches over its deadline. To eliminate such faults,we have developed a synthesis method that (1) inspects all infeasible paths, and then (2) movesinstructions out of those paths to shorten the execution time.On the other hand, schedulability of a task set denotes an ability to guarantee the deadlinesof all tasks in the application. This property is aected by interactions between the tasks, as wellas their individual execution times and deadlines. To address the schedulability problem, we havedeveloped a task transformation method based on program slicing. The method decomposes atask into two subthreads: the IO-handler component that must meet the original deadline, and thestate-update component that can be postponed past the deadline. This delayed-deadline approachcontributes to the schedulability of the overall application. We also present a new xed-prioritypreemptive scheduling strategy, which yields both a feasible priority ordering and a feasible task-slicing metric.
Compiler-Assisted Scheduling for Real-Time Applications:A Static Alternative to Low-Level TuningbySeongsoo HongDissertation submitted to the Faculty of the Graduate Schoolof The University of Maryland in partial fulllmentof the requirements for the degree ofDoctor of Philosophy1994Advisory Committee:Assistant Professor Richard Gerber, Chair/AdvisorProfessor Ashok K. AgrawalaProfessor John GannonProfessor Virgil GligorAssociate Professor William PughAssociate Professor Udaya A. Shankar
c Copyright bySeongsoo Hong1994
Table of ContentsList of Tables viiList of Figures viii1 Introduction 11.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31.2 Compiler Transformations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61.2.1 Feasible Code Synthesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61.2.2 Real-Time Task Slicing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 71.3 Summary of Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81.4 Outline of the Dissertation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 92 Related Work 102.1 Formal Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 102.2 Real-Time Programming Languages : : : : : : : : : : : : : : : : : : : : : : : : : : : 112.3 Real-Time Compiler Tools : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 123 The Language of Time-Constrained Events 153.1 Design Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 153.2 Timing Constructs and Their Semantics : : : : : : : : : : : : : : : : : : : : : : : : : 163.3 Example TCEL Code : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18iv
4 Basic Notations 214.1 Flow Graphs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 214.2 Basic Compiler Denitions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 235 Transformation 1: Feasible Code Synthesis 275.1 The Problem and Our Solution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 285.1.1 The Problem of Feasible Code Synthesis : : : : : : : : : : : : : : : : : : : : : 295.1.2 Solution Strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 325.2 Section Decomposition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 335.2.1 Determining Section Boundaries : : : : : : : : : : : : : : : : : : : : : : : : : 335.2.2 Deriving Code-Based Timing Constraints : : : : : : : : : : : : : : : : : : : : 355.3 Code Scheduling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 385.3.1 The Top-Level Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 395.3.2 Subroutine Schedule Section(S,D,DUR(S),Vbar) : : : : : : : : : : : : : : : : 405.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 496 Transformation 2: Real-Time Task Slicing 506.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 516.1.1 Characterization of Discrete Control Software : : : : : : : : : : : : : : : : : : 526.1.2 Fixed-Priority Preemptive Scheduling : : : : : : : : : : : : : : : : : : : : : : 546.1.3 Scheduling with Compiler Transformations : : : : : : : : : : : : : : : : : : : 566.2 Automatic Task Decomposition by Program Slicing : : : : : : : : : : : : : : : : : : : 596.2.1 The Program Slicing Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 606.2.2 Assigning Times to Subtasks : : : : : : : : : : : : : : : : : : : : : : : : : : : 626.3 Scheduling Alternatives and Their Analyses : : : : : : : : : : : : : : : : : : : : : : : 646.4 Priority Ordering with Task Slicing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 676.4.1 Feasibility Test : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 686.4.2 The Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 686.4.3 A Larger Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 69v
6.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 707 Practical Considerations and Prototype Implementation 737.1 Practical Considerations. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 737.1.1 Limits of Data-Flow Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : 747.1.2 Limits of Timing Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 747.1.3 User Interaction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 757.2 TimeWare/SLICE: the Prototype Implementation : : : : : : : : : : : : : : : : : : : : 757.2.1 TimeWare/SLICE Tool Screens : : : : : : : : : : : : : : : : : : : : : : : : : : 767.2.2 Commands : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 767.2.3 Implementation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 797.3 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 808 Conclusions 818.1 Future Directions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83
vi
List of Tables5.1 Timing Constraints of S3 and S4 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 386.1 Example Task Set : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 716.2 Priority Assignment with Program Slicing : : : : : : : : : : : : : : : : : : : : : : : : 72
vii
List of Figures1.1 Software Development Process: Two Alternatives : : : : : : : : : : : : : : : : : : : : 21.2 Event-Based Specication of Sensor-Controller System : : : : : : : : : : : : : : : : : 41.3 Two TCEL Programs with the Same Semantics : : : : : : : : : : : : : : : : : : : : : 51.4 Run-Time Behavior of a TCEL Periodic Task : : : : : : : : : : : : : : : : : : : : : : 83.1 Flow Graph of do Construct : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 173.2 Behavior of Periodic Timing Construct : : : : : : : : : : : : : : : : : : : : : : : : : : 183.3 (A) TCEL Source Code for the Flight Controller and (B) Corresponding Flow Graph 205.1 Software Development Process: Revisited for Feasible Code Synthesis : : : : : : : : 285.2 Flow Graph of do Construct and its Section Division : : : : : : : : : : : : : : : : : : 345.3 Flight Controller Program: After Section Decomposition : : : : : : : : : : : : : : : : 365.4 Top-Level Algorithm for Code Synthesis : : : : : : : : : : : : : : : : : : : : : : : : : 395.5 (A) Source Code in TCEL, (B) Corresponding Intermediate Code in GSA Form, (C)Intermediate Code after Bookkeeping-Free Transformations, and (D) IntermediateCode after Transformations with Bookkeeping : : : : : : : : : : : : : : : : : : : : : : 415.6 Speculative Code Motion: (A) Original Code, (B) Corresponding GSA Code and(C) Speculatively Transformed Code : : : : : : : : : : : : : : : : : : : : : : : : : : : 435.7 The Section Scheduling Algorithm (Schedule Section) : : : : : : : : : : : : : : : : : : 465.8 The Section Scheduling Algorithm (Sched) : : : : : : : : : : : : : : : : : : : : : : : : 475.9 Flight Controller Program: After Code Scheduling : : : : : : : : : : : : : : : : : : : 486.1 Software Development Process: Revisited for Real-Time Task Slicing : : : : : : : : : 51viii
6.2 Generalized Slicing Method for Schedulability Tuning : : : : : : : : : : : : : : : : : 526.3 Task Instance at the kth Period : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 536.4 Dynamic Behavior of Periodically Implemented Control-Loop : : : : : : : : : : : : : 536.5 TCEL Program for Task 2 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 566.6 Simulated Time Line for the Example Task Set : : : : : : : : : : : : : : : : : : : : : 576.7 Decomposed Task at the kth Period : : : : : : : : : : : : : : : : : : : : : : : : : : : 586.8 Two Decomposed Subtasks of Task 2 : : : : : : : : : : : : : : : : : : : : : : : : : : 586.9 Program Dependence Graph : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 616.10 Slice with respect to Criterion hL9; cmdi : : : : : : : : : : : : : : : : : : : : : : : : : 626.11 Slicing a Conditional : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 636.12 Algorithm for Priority Ordering with Slicing Decision : : : : : : : : : : : : : : : : : 707.1 Tool Screens of TimeWare/SLICE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 777.2 Output of TimeWare/SLICE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 788.1 Software Development Process: Two Alternatives : : : : : : : : : : : : : : : : : : : : 82
ix
Chapter 1IntroductionA real-time application is characterized by the existence of two competing factors: its functionalspecication and its temporal requirements. Functional specications dene valid translations frominputs into outputs. As such they are realized by a set of programs, which consume CPU time.Temporal requirements, on the other hand, place upper and lower bounds between occurrences ofevents [9, 24]. An example is the robot arm must receive a next-position update every 10ms. Sucha constraint arises from the system's requirements, or from a detailed analysis of the applicationenvironment. Temporal requirements implicitly limit the time that can be provided by the system'sresources.Figure 1.1 illustrates the real-time software development process from a high-level requirementsspecication to a low-level implementation. First, high-level requirements are translated into real-time programs. Then the programs are compiled into real-time tasks, and timing constraints areimposed on these tasks. After the compilation, the programmers must assure that each of thetasks in the application will meet its timing constraints on the underlying hardware platform. Thisprocess is called a feasibility test.The programmers must also assure that the whole application will adhere to the timing con-straints, even in the presence of direct and indirect interactions among the tasks, such as taskblocking and preemption. This process is called a schedulability test. If the programmers cannotguarantee either of the tests, the result is careful renement of the implementation. As shownin Figure 1.1, we can think of two alternatives as a means for the low-level system renement:system-tuning and compiler-based task optimization.Low-level tuning of the application can be a time-consuming process. It often involves usingexpensive hardware monitors (e.g., in-circuit emulators, logic analyzers, etc.), manually countinginstruction-cycle times, hand-optimizing the code, and experimenting with various orderings of1
High-LevelReal-Time ProgramsTiming Constraints on TaskscompilationsynthesisFeasibility Tests(Intra-Task Consistency)
RequirementsSpecicationDecomposed Tasks,
Schedulability Tests(Inter-Task Consistency) Compiler-basedTransformation MethodsFeasible Code Synthesis(Chapter 5)failurefailurefailure success
renementrenement
Real-Time Task Slicing(Chapter 6)System-Tuning,Re-implementationfailureFigure 1.1: Software Development Process: Two Alternativesoperations. As a last resort, entire subsystems may have to be re-designed altogether.The goal of this dissertation research is to provide programmers with a powerful alternative tolow-level tuning. It is based on compiler-based transformation tools. Our approach consists of twointer-related components: a real-time programming language and static compiler transformationmethods. The programming language not only provides a means for expressing timing constraintswithin a source program, but it also lays a foundation for sound compiler transformation methods.As shown in Figure 1.1, our compiler transformation methods include feasible code synthesis andreal-time task slicing. 2
1.1 MotivationOur approach is motivated by the success of static compiler transformation technologies in codeparallelization and program debugging. It is a natural step to adopt these technologies to real-timedomains, where developers lack powerful software engineering tools to apply at the tuning stage.This stage can consume a disproportionate amount of project's budget.To carry out this approach, it is imperative to nd a real-time programming language to pro-vide high-level timing constructs with unambiguous semantics. This allows us to dene a class ofsemantics-preserving transformations for them. Unfortunately, most existing real-time program-ming languages do not fulll the requirement. We conceive this as a problem of \code-basedspecications."Consider experimental real-time programming languages which have been proposed in the lit-erature [23, 27, 30, 33, 40]. They provide high-level real-time constructs such as \within 10msdo B," where the block of code \B" must be executed within a 10 millisecond time frame. Theselanguages, while providing a convenient framework for expressing time in programs, have done littleto ease the process of translating a real-time specication into schedulable code.The reason is straightforward: Language constructs such as \within 10ms do B" establishconstraints on blocks of code. However, \true" real-time properties establish constraints betweenthe occurrences of events [9, 24]. While language-based constraints are very sensitive to a program'sexecution time, specication-based constraints must be maintained regardless of the platform's CPUcharacteristics, memory cycle times, bus arbitration delays, etc.Our approach is to treat a real-time program as (1) an event-based timing specication, whichrepresents the system's real-time requirements; and (2) a functional implementation, that is, thesystem's code. We carry out this approach with a real-time syntax quite similar to those found in theabovementioned languages. However, in our approach the interpretation is quite dierent. Insteadof constraining blocks of code, the timing constructs establish constraints between the observableevents within the code.The language we propose is called Time-Constrained Event Language (TCEL). As an exam-ple of TCEL code, consider the following specication fragment, which is rendered pictorially inFigure 1.2:(1) Every 25ms, an external sensor sends a message to the controller, containing physical worldmeasurement data.(2) The controller must receive every message.3
(3) Using the sensor data and the current state, the controller computes a next-position commandand sends it to an actuator.(4) To achieve steady state, transmission of cmd is made no earlier than 3.5ms after receipt ofdata.(5) To guarantee response-time threshold, transmission of cmd is made no later than 4.0ms afterreceipt of data.(6) Based on the sensor-input, the controller updates its current state.
   from
 Sensor




state=nextState(...); output(Actuator, cmd)input(Sensor, &data)
 3:5ms 4:0ms 25:0msFigure 1.2: Event-Based Specication of Sensor-Controller SystemWithin our event-based framework, the program fragments in Figure 1.3 realize the specication.The system's only observable events occur instantaneously during the executions of the \output"and \input" operations. The \do" statement establishes timing constraints only between thesetwo operations. On the other hand, the local statement \cmd = nextCmd(data, state)" is onlyconstrained by the program's natural control and data dependences.Armed with this interpretation, our compiler treats both programs as having equivalent seman-tics. This is quite dierent from the approaches mentioned above, where timing constructs establishconstraints on code. In that interpretation, program A would rst receive its data, then delay for3.5ms and nally, evaluate nextCmd and send the result within the remaining 0.5ms. Program Bwould receive its data, evaluate nextCmd, then delay for 3.5ms and nally, send the result within4.0ms of evaluating nextCmd.Both programs may fail to implement the specication on some hardware platforms. If nextCmdis a CPU-intensive function (and thereby requires over 0.5ms of execution time), program A is4
/ Program A /every 25msdoinput(Sensor, &data);start after 3.5ms nish within 4.0msf cmd = nextCmd(state, data);state = nextState(state, data);output(Actuator, cmd);g/ Program B /every 25msdof input(Sensor, &data);cmd = nextCmd(state, data);gstart after 3.5ms nish within 4.0msf state = nextState(state, data);output(Actuator, cmd);gFigure 1.3: Two TCEL Programs with the Same Semantics5
inherently unschedulable. On the other hand, program B establishes a constraint between theevaluation of nextCmd and the nextState, and not between the two specied events. Both programswould have to be rewritten to achieve the desired eect. The necessary corrections would includemanually decomposing nextCmd, as well as adjusting the timing constraints. The actual changeswould heavily depend on the particular characteristics of the computer, and thus, the very reasonfor using high-level timing constructs would be defeated.There are several immediate benets to our semantics for real-time constructs. First, a sourceprogram is not hardware-specic, and thus maintains the abstract, \portable" spirit of a high-levellanguage. Since the timing constraints refer only to specication-based events, they need not beplatform-specic. Second, this decoupling of timing constraints from code blocks enables a morestraightforward implementation of an event-based specication.But of most importance, some of the arduous, assembly-language level hand-tuning can nowbe accomplished semi-automatically { by compiler optimization techniques. In this dissertation wepresent two of such techniques: one that relies on Trace Scheduling, and the other based on programslicing. Traditionally, Trace Scheduling has been used in instruction scheduling, etc., and programslicing has been used in program analysis and debugging. Here, the objective is dierent: to achieveguaranteed real-time performance. In doing this we use the observable events as \signposts," whichconstrain the places where code can be moved. These events, as well as data dependences, establishthe limiting constraints for the optimization algorithm.1.2 Compiler TransformationsArmed with the event-based semantics of TCEL, we have developed two compiler transformationtechniques, which we call feasible code synthesis and real-time task slicing. The objective of feasiblecode synthesis is to achieve internal consistency between a task's execution time and its timingconstraints specied with \do" constructs. On the other hand, the objective of real-time taskslicing is to enhance inter-task schedulability. This transformation is applicable to \every" periodicconstructs.1.2.1 Feasible Code SynthesisWe call a code segment infeasible if its execution time stretches over its specied deadline. Fea-sible code synthesis localizes and corrects such a fault via a two-step process. First, the compilerdecomposes a \do" construct into a set of code blocks according to the control structure, and then6
automatically derives a set of dispatch equations from the language-based timing constraints. Thisstep is necessitated by a gap between the event-based semantics model of source programs andthe code-based execution model of real-time schedulers. In fact, most real-time schedulers onlyaccept timing constraints on the start and nish times of tasks. The compiler thereby satises thescheduler's requirements by transforming event-driven source programs into constrained blocks ofcode, and providing the dispatch equations for the scheduler.Next, the compiler attempts to correct feasibility faults with respect to the derived dispatchequations. This is done by a variant of Trace Scheduling, in which worst-case execution time pathsof the infeasible task are selected, and unobservable code is moved to shorten their execution time.For simple illustration, suppose that nextCmd in Program A of Figure 1.3 executes longer than0.5ms and hence Program A is infeasible. As a solution, \nextCmd(data, state)" can be inlined,and then decomposed into two parts { one which is ow-dependent on the parameters data andstate, and another which is invariant of them. This second part can be lifted out of the dostatement altogether. We elaborate on this transformation method in Chapter 5.1.2.2 Real-Time Task SlicingReal-time task slicing considers a more ambitious goal { inter-task transformations for schedulabil-ity. The eect of this transformation is global, though it is individually applied to a small numberof tasks selected from an unschedulable application.The key idea behind this method is based on a simple fact, that is, an application's schedulabilityimproves if we increase the deadlines of its constituent tasks. The same eect is achieved by allowinga task to slide past its deadline, while maintaining the original event-based semantics. We can realizethis benet by transforming a task, so that its time-sensitive component always executes within itsframe, while postponing the rest of the task.To systematically carry out the transformation, we harness a novel application of programslicing [41, 52, 53]. An unschedulable task is decomposed into two subthreads: one that is \time-critical" and the other \unobservable." The unobservable subthread is then appended to the endof time-critical thread, with TCEL semantics being maintained. Figure 1.4 pictorially illustratesthe net eect of this transformation. The downward (upward) arrow of the kth frame representsthe execution of input (output) event in the same time frame.Since the goal of the transformation comes from the scheduler component of the real-timesystem, the framework consists of the following ingredients.7
k   1st kth k + 1st k + 2ndperiodFigure 1.4: Run-Time Behavior of a TCEL Periodic Task(1) An algorithm which uses standard xed-priority preemptive scheduling analysis to nd un-schedulable tasks which need task slicing.(2) A program slicer which decomposes a task and isolates the thread that can be postponed.(3) An online component of the scheduler which enforces precedence constraints between taskinteractions.In Chapter 6 we will discuss in detail the xed-priority preemptive scheduling paradigm as well asthe application of program slicing method.1.3 Summary of ContributionsThe major contributions of this dissertation is itemized as follows: The event-based abstraction is a widely used approach in the formal methods literature. Wehave extended it into a full-blown real-time programming language called TCEL. The event-based semantics of TCEL makes it straightforward to realize a high-level specication intoa real-time program. This semantics allows a clear and unique interpretation of high-leveltiming constructs, and thus enables us to dene safe compiler transformations for TCEL pro-grams. Although TCEL has been implemented on top of the C programming language, it canbe used with other programming languages such as Pascal or Fortran. It is a general, exiblemechanism for high-level timing annotations, and is not specic to a particular language. We have developed an automatic method of translating a high-level TCEL program intoschedulable units of code. When implemented within a TCEL compiler, this method decom-8
poses a TCEL program into tasks, and then derives code-based timing constraints from thelanguage-based timing constraints. This method automatically renes the original timingconstraints to account for the eects caused by programs' control structures. We have developed an automatic compiler method that is used to transform an infeasibletask into a feasible one. This problem is an intractable one, as will be proved. Thus we haveinvented an approximation approach based on Trace Scheduling. We have developed another compiler transformation method that is used to enhance the real-time schedulability of an input task set. This method automatically decomposes a task sothat the real-time scheduling method can guarantee timely execution of observable operations,while local operations need not lead to scheduling overrun. We have developed a priority ordering algorithm that not only assigns feasible priorities, butalso selects a subset of tasks to be transformed. The guiding metric is schedulability.1.4 Outline of the DissertationThis dissertation is organized as follows. In Chapter 2 we survey background and related work.In Chapter 3 we introduce TCEL with an example program. In Chapter 4 we introduce basicnotations for ow graphs and standard compiler technologies which are used throughout the disser-tation. In Chapter 5 we present the rst transformation method, feasible code synthesis, along witha motivating example. Then, in Chapter 6 we present the second transformation method, real-timetask slicing, and we discuss its underlying theory of xed-priority, preemptive scheduling. In Chap-ter 7 we consider the practical limitations of our compiler-based approach, and propose solutions toovercome these problems. We also briey show TimeWare/SLICE, our prototype implementation ofthe real-time task slicer, which reects our philosophy on tool-based system development. Finally,in Chapter 8 we conclude this dissertation with future research directions.
9
Chapter 2Related WorkIn this chapter we survey related work in real-time programming. First, we review some of formalmethods that have laid the foundation for our event-based semantics. We also study real-timeprogramming languages, since they have inuenced the design of TCEL's timing constructs. Finally,we survey compiler-based real-time tools, and compare them with our approach.2.1 Formal MethodsTCEL's semantics was inspired by a principle commonly applied in formal methods. That is, whenreasoning about a real-time concurrent system it is often useful to consider only \events of interest,"and to abstract away local-state information. Indeed, almost all formal models ease this processby making some distinction between an \event" and a corresponding \action." In this section wesurvey four such methodologies: Real-Time Logic [24], RTRL [9], Timed IO Automata [37] andACSR [29].RTL. Real-Time Temporal Logic possesses an underlying event-action model. It captures thetemporal ordering between an application's actions and its events. An action in RTL is the executionof an operation which consumes a certain amount of system resources. The eects of actions arerevealed by events that are generated before and after they execute. The occurrence of an event isdened to be instantaneous, while the execution of an action takes non-zero time. Thus a timingconstraint is an assertion about temporal relationships between certain events. The conjunctionof these timing assertions are \the system specication," and they must imply that key safetyproperties are maintained. 10
RTRL. RTRL (Real-Time Requirements Language) is a formal specication language developedfor use in modeling telephone switching systems. In RTRL, a real-time system is viewed as anite-state machine, in which a response at any instance is determined by the system's state andthe external inputs. Hence the timing constraints are established between external inputs andresponses. Unlike the events in RTL, however, these external inputs and responses denote signals(of a switching system) rather than points in time, and thus have duration times.Timed I/O Automata. A timed I/O automaton is essentially a state transition system consist-ing of state variables and events, where each of the events has an enabling condition and an action.A timed I/O automaton has a set of timing assumptions, each of which species an event set and aninterval during which the event set can be continuously enabled since its last occurrence. Thus thetiming assumptions place constraints on the event-ring times [46]. A \behavioral abstraction" of atimed I/O automaton allows reasoning about only the event sequences, since local state informationcan be ignored in the abstraction.ACSR. The computation model of ACSR (Algebra of Communicating Shared Resources) ad-dresses two key issues in modeling a real-time system: concurrency and communication. An ap-plication specied in ACSR consists of a set of communicating processes that use shared resourcesfor execution and synchronization with one another. A timed action in ACSR takes unit timeto execute and consumes a set of resources during that time. Synchronization between actions issupported by events that are instantaneous, and consume no resources.Impact of Formal Methods. Strongly motivated by the event-action models, we have extendedthe event-action abstraction into to a \full-blown" real-time programming language, in which the\events" correspond to actual IO operations within C code. A logical consequence of the event-action model is the ability to exploit this looser semantics, and to use compiler transformations tomove unobservable instructions out of over-constrained code blocks.2.2 Real-Time Programming LanguagesMost other real-time languages do not make such a distinction, and instead place constraints onthe boundaries of code blocks. Two paradigms are used in these languages: either constraints areexpressed directly in the program itself (as in [33, 30, 54]), or they are postulated in a separateinterface, and then passed to the scheduler as directives. A common language-based approach (rst11
presented in [30]) is to provide constructs such as \within t do f: : :g," \at t do f: : :g" and \aftert do f: : :g." An alternative, taken in [33], is to set up linear constraint expressions on the the starttimes and deadlines of code blocks. We have borrowed from both approaches: in the TCEL sourcewe use the higher-level constructs, while in our intermediate code we make use of the constraintrepresentation. But in TCEL the semantics is quite dierent, as it establishes constraints betweenthe observable events within the code, and not on the code's textual boundaries.2.3 Real-Time Compiler ToolsThere have been many other compiler-based approaches to real-time programming, most of whichaddress dierent real-time programming problems and rely on dierent techniques. However, theyshare a common goal, namely, enhancing predictability and schedulability of real-time applications.In this section we survey tools, and then show where we can place our tools in the realm of real-timeprogramming.Schedulability Analyzer for Real-Time Euclid. Among the early approaches was the schedu-lability analyzer [49], specically developed for the timing and scheduling analyses of programswritten in Real-Time Euclid [27]. The schedulability analyzer consists of a front-end and a back-end. The front-end is incorporated into the code generator of the Real-Time Euclid compiler, andit produces a segment tree that represents compilation units such as modules, monitors or routines.Using a segment tree, the back-end computes the execution time prole of each segment consid-ering synchronization, phasing, and IO between segments, as well as execution times of segments'instructions. We believe that such a technique is infeasible in a practical sense, because it presentsan intractable circularity between the compiler-based analyzer and the scheduler. The result of theanalysis aects that of scheduling, and vice versa.Compiler-Assisted Adaptive Scheduling. Gopinath and Gupta introduced a technique calledcompiler-assisted adaptive scheduling in [18]. In their work, the compiler indexes a piece of codeinto four classes on the basis of predictability and monotonicity. Then it rearranges the code tosupport adaptive run-time scheduling. Unlike our approach, which is entirely static in terms ofboth program analysis and scheduling, their approach was developed to aid in dynamic runtimescheduling. However, their work still demonstrates that a successful interplay between the compilerand scheduler is possible without introducing a circularity between them.12
Partial Evaluation. Partial evaluation may conceptually be understood as compile-time eval-uation of constant expressions. Partial evaluation of a source program not only reduces constantexpressions in the program into simple values, but also simplies some control structures such asloops and conditionals. In [39] Nirke and Pugh applied the technique of partial evaluation to real-time programming to produce residual code that is both more optimized and more deterministic.Safe Real-Time Code Optimizations. Conventional compiler optimizations are designed toreduce the expected or average execution time of programs without taking in account the exacttiming behavior of programs. On the other hand, optimizations for hard real-time must meetstringent timing constraints. In [36] Marlowe and Masticola examine a large class of conventionalcode transformations, and then classify them for application in real-time programming by the notionof safe real-time code transformations. To do so, they assume time-critical statements (or events)in the underlying programming language, and interpret deadlines as relationships between theexecutions of time-critical statements. This approach comes closest to our work, in that the timingbehavior of real-time programs is described in terms of events or the executions of time-criticalstatements. Unlike the semantics for TCEL, however, the execution times of non-time-criticalstatements are not explicitly decoupled from timing constraints imposed on the events. This isbecause the start of a process or start of a critical section is still considered \time-critical." As aresult, the applicability of some transformations may be unnecessarily restricted.Timing Tools. Predicting the worst-case execution time of a program is a fundamental require-ment to build a real-time application. Indeed, the results of our compiler-based methods relyon timing tools, since both the feasibility and schedulability of tasks are a function of predictedworst-case executions of the transformed tasks.The technique reported in [42] is based on a simple source-level timing schema, and it is fairlystraightforward to implement in a tool. In [19] another approach for more accurate timing wasproposed. Since the resulting tool is able to analyze micro-instruction streams using machine-description rules, it is retargetable to various architectures. On the other hand, both approachesdo not address the problem of predicting architecture-specic timing behaviors due to variouslatencies in the memory hierarchy and pipelines. Recently, several approaches have been developedto account for this timing variance. In [55] Zhang et al. presented a timing analyzer based on amathematical model of the pipelined Intel 80C188 processor. This analysis method is able to takeinto account the overlap between instruction execution and fetching, which is an improvement overschemes where instruction executions are treated individually. In [44] Arnold et al. developed a13
timing prediction method called static cache simulation to statically analyze memory and cachereference patterns. A similar but more advanced approach was reported in [32]. While the latterapproach is able to predict pipeline stalls as well, both approaches essentially rely on attributegrammars [2] to propagate cache hit information backward in a ow graph.However, no static timing tool is precise enough to be used with complete condence for de-veloping production-quality software. Moreover, even sophisticated timing analysis methods suchas [32, 44] are not appropriate for ne-grained instruction timing. In Chapter 7 we explain howwe can eectively use these tools in spite of the limitations, by also taking advantage of softwareproling, as well as static timing prediction. Specically, our slicing technique does not require anystatic analyzer: it can be used to rst transform the program, with the timing carried out later bya runtime proler.
14
Chapter 3The Language of Time-Constrained EventsIn this chapter we present TCEL's timing constructs to express timing constraints within a high-level program. These high-level timing constructs make explicit reference to observable events aswell as time, so that the specication-level timing constraints can be extracted from the sourcecode, and then conveyed to real-time schedulers.3.1 Design GoalsThe objectives of the TCEL design are (1) to lay a semantic foundation for developing safe real-timecompiler transformation techniques; and (2) to provide high-level timing constructs that enable acompiler to automatically translate a high-level real-time specication into executable code. Inaddition to these, the design of the timing constructs is motivated by the following practical goals: To add a minimal set of features to existing languages. To this end, we embedded timingconstructs in C, and then extended it as a real-time programming language. Our constructsare syntactic descendents of the temporal scope in [30]. To keep the denition of an observable event as general as possible. The notion of an observ-able event is a relative term depending on the application system. We support a generalizablemechanism in that observable events are classied by the programmer, and the resulting eventspecication is externally provided for a TCEL compiler. Such a specication may includeinput and output operations, message passing primitives and memory accesses to shared vari-ables. For simplicity, in the sequel we consider all \input" and \output" operations to beobservable. 15




















E5Figure 3.1: Flow Graph of do ConstructAs long as the \while" condition is true, the observable events in the constraint block executeevery p time units. Akin to an untimed while-loop, when the condition evaluates to false thestatement terminates. However, unlike the untimed counterpart, event operations cannot be partof the condition. In its real-time behavior, the interpretation of the \every" construct is similarto that of \do." For example, assume that the statement is rst scheduled at time t, and thatthe \while" condition is true for periods 0 through i. The periodic constraints established bythis statement are depicted in Figure 3.2, where the time-line shows the rst two instances of thestatement.Examining the time-line, we see that the every statement is released at time t, and that withinthe rst frame, the rst observable event (denoted by an arrow) occurs between t + tmin andt+ tmax1. Similarly, the rst frame's last event occurs before t+ tmax2. Generalizing, the followingconstraints are induced for period i where i  0: start after tmin : The rst event executed in the CB occurs after t+ i  p+ tmin. start before tmax1 : The rst event executed in the CB occurs before t+ i  p+ tmax1. nish within tmax2 : The last event executed in the CB occurs before t + i  p+ tmax2.17
observable event occurrence
t t + p t + 2pt + tmint + tmax1 t + tmax2 t + p + tmint + p + tmax1t + p + tmax2 t + ipFigure 3.2: Behavior of Periodic Timing Construct3.3 Example TCEL CodeAs we have stated, timing constructs may be arbitrarily nested. Consider the program in Fig-ure 3.3(A), which is a (very gross) 2-dimensional abstraction of an aircraft navigation/control loop.A set of route coordinates are maintained in the array \GOAL," which is managed by another mod-ule. These coordinates are accessed using index variable \i," which is initialized before the periodicloop, and updated within the loop body. The TCEL program's role is to (1) sample the aircraft'scurrent coordinates, its (true) heading, roll, and its ground speed; (2) get the next route coordinateto visit; (3) compute the relative attitude between the heading and the coordinate; and (4) adjustthe course by updating throttle and roll. Adjustments are made in discrete increments, and arecontingent on the current roll and velocity, as well as the amount that the course must be changed.The timing constraints are induced as follows:(1) Control updates are made periodically, with rate 50/second.(2) In order to give the actuators time to get updated (and for the craft to respond accordingly),all updates must be made within the rst 5ms of each period.(3) Velocity (ground speed) is obtained via a \request-response" protocol from an external unit;the response arrives with maximum latency of 0.75ms.(4) To correlate ground speed with outputs, all throttle and ap updates must be made withinof 3.1ms of actual ground speed sample. In the best case this may be made upon issuing therequest.If the specication mandated additional timing constraints, clearly we could employ further18
levels of nesting to achieve them. For example, suppose we desired to add a fth timing requirementto the four listed above:(5) The nal two outputs must be correlated within 0.5ms of each other. (This is not unrealistic,since the two outputs control are coordinated to eect the angular adjustment.)To accomplish this we would replace the sequential composition with an additional nested TCELstatement: do output(THROT, throttle);nish within 0.5msoutput(FLAP Cntrl, wap);The net runtime eect would simply be a renement of the potential behaviors; i.e., the time-eventrelationships exhibited by the altered program would be a subset of those in the original version.
19





t min max1 max2t t





























    safeDtheta(rtheta,roll)
Figure 3.3: (A) TCEL Source Code for the Flight Controller and (B) Corresponding Flow Graph20
Chapter 4Basic NotationsThe output of the TCEL compiler's machine-independent pass is the code in an intermediaterepresentation, encapsulated in a ow graph [2]. We extend the format of a ow graph to hold theoriginal timing information. For example, Figure 3.3(B) shows the extended ow graph for ouright controller program in Figure 3.3(A), where for the sake of brevity we have left the code inits original C form. In this chapter we rst introduce two types of ow graphs, and then denebasic notations of standard compiler technologies. These notations include a data structure calleda program dependence graph, and a code representation called static single assignment form. Wewill use these notations throughout this dissertation.4.1 Flow GraphsA ow graph is a natural representation of an intermediate program from which other importantinformation (such as data and control dependences) is extracted by standard static analysis tech-niques. In order to help compilers deal with nested program structures we dene a hierarchicalow graph as well as a standard ow graph.Flow Graph. A ow graph is a standard, attened representation of a program, in which a nodedenotes a fragment of straight-line code and an edge species potential ow of control from onenode to another.Denition 4.1 A ow graph FG(B) of code block B is a directed graph(V;E; entry(B); exit(B))21
where V is a set of basic blocks of B, and the two distinguished nodes entry(B) and exit(B) thatdenote the unique entry and exit of B, respectively. E is a set of edges representing potentialcontrol ow.Hierarchical Flow Graph. It is sometimes necessary to explicitly specify the program's originalhierarchical levels of scoping in a ow graph. We call this structure a hierarchical ow graph.Denition 4.2 A hierarchical ow graph HFG(B) of code block B is a directed graph(V;E; entry(B); exit(B))where V is a set of nodes, and E is a set of edges representing potential control ow between nodes.A vertex n 2 V , however, may be either a basic block of B, an entry node, an exit node, or anotherhierarchical ow graph HFG(B0).Thus all nested constructs, including loops, are reduced into single nodes in a hierarchical owgraph. In our work we do not lift code out of loops, and thus we treat HFG's as acyclic graphs,since we can ignore all back edges.Of course our hierarchical structure assumes that the underlying programming language is\perfectly structured." That is, any two statements S1 and S2 in the program are in one of thefollowing forms:1. S1 is contained in S2;2. S2 is contained in S1; or3. S1 and S2 are disjoint.Since many real-time programming languages allow only \structured" programs without unre-stricted gotos1, we assume that our programs possess this property.Since nodes in both a ow graph and a hierarchical ow graph have the same interface andstructure, all notations dened for a ow graph are used in a hierarchical ow graph. We can easilyextend traditional compiler terminology to HFG's as well. In the following sections we dene basiccompiler notations for HFG(B) = (V;E; entry(B); exit(B)).1We disallow break statement as well, since it is a special instance of goto.22
4.2 Basic Compiler DenitionsPaths. A path (or trace) between n1; n2 2 V is denoted by \n1 !b n2," where b is the sequenceof nodes traversed between (and including) n1 and n2. When b includes a node m 2 V , we denotethis by overloading the set membership operator, i.e., as \m 2 b."We also use the path relation, but omit the actual path, to denote the existence of some pathbetween two nodes, i.e., n1 ! n2 def= 9 b :: n1 !b n2We assume that for any node n 2 V; n! n.Dominator and Postdominator. A node d is called a dominator of node n, if every path fromentry(B) to n goes through d. Similarly, a node p is a postdominator of node n, if every path fromn to exit(B) goes through d [2].Data dependence. Let Def (n) and Use(n) be sets of variables dened and used by node n inB, respectively. For instructions s1; s2 2 B, s2 is data dependent on s1 (denoted \s1 d* s2") i thereis a path b such that s1 !b s2 and9v 2 Def (s1) \Use(s2) :: (8s 2 B :: (s 2 b) ^ v 2 Def (s))) s = s1)For instructions s1 and s2 in B, we say that s2 is transitively data dependent on s1 (denoted\s1 d*+ s2") i there is a path s1 d* s01 d* s001 d* : : : d* s2We extend the notion of data dependence for nodes of HFG. For nodes n1; n2 2 V , n2 is datadependent on n1 (denoted \n1 d* n2") i9s1 2 n1; s2 2 n2 :: s1 d* s2:Control Dependence. For nodes n1; n2 2 V , n2 is control dependent on n1 (denoted \n1 c* n2")if one of the following holds:1. n1 is an entry vertex and n2 is not nested within any loop or conditional; or2. n1 represents a control predicate and n2 is immediately nested within the loop or the condi-tional whose predicate is represented by n1.23
Our denition of control dependence is simpler than that found in [11], since it covers a restrictedlanguage possessing only structured program constructs.Reaching Denitions. For a node n 2 V and an expression e in B, we dene reaching denitionsRD(n; e) as a set of nodes m such that9b :: (m!b n ^ 9v 2 Def (m) \Use(e) ::8n0 2 V :: ((n0 2 b) ^ v 2 Def (n0))) n0 = m):In other words, RD(n; e) is a set of nodes whose denitions of some variables in Use(e) are availableat node n.Dependence Closure (Static Backward Program Slice). The dependence closure for noden in the block B, denoted by \DC (n;B)," contains n and all nodes m that reach n via zero ormore control or data dependence edges. It is inductively dened by the following least x-pointoperation: DC(n;B) = fix F (fng); whereF (S) = fm 2 V j 9n0 2 S ::m d* n0 _m c* n0g [ SWhen we are concerned only with data dependences, we make use of data dependence closure,\DDC (n;B)" dened as below. DDC (n;B) = fm 2 V jm d*+ ng [ fngIn fact, a dependence closure and a data dependence closure are equivalent to a static backwardprogram slice and a static data slice, respectively [52].Program Dependence Graph. A program dependence graph is an intermediate program rep-resentation that stores both the data and control dependences for each operation in a program.Formally, the program dependence graph is a directed graph PDG(B) = (U;W ), where The vertex set U is equal to V in HFG(B). The edges W are of two sorts: either a control dependence between n1 and n2 such thatn1 c* n2, or a data dependence between n1 and n2 such that n1 d* n2.In addition, we dene \m) n" to mean that there is a path starting from node n and endingat node q in PDG(B). 24
Static Single Assignment Form. The static single assignment (SSA) form of a program canbe considered not only as a sparse representation of ow data dependences, but also as a notationwhere the spurious data dependences such as output and anti-dependences are eliminated [7, 8].A program is dened to be in SSA form if each use of a variable is reached by exactly oneassignment to it [8]. Thus, a program's SSA representation can be obtained iteratively applyingthe following process: For each variable in the program, (1) unique names are given to all ofits appearances on the left-hand-side of an assignment; and (2) all of the uses reached by thatassignment are renamed to correspond to the new name. The following examples demonstrate thisprocess. In the straight line code, each assignment to a variable is given a subscripted name, andall of its uses are then renamed as well.v = f();a = v + 1;v = g();b = v + 2; =) v1 = f();a = v1 + 1;v2 = g();b = v2 + 2;Conditional statements require a bit more work in achieving the SSA form. At conuence pointsin the CFG, merge functions called -functions are introduced. A -function for a variable mergesits possible values from distinct incoming control ow paths, and produces one argument for eachcontrol ow predecessor.if condthen v = f();else v = g();a = v; =) P = cond;if Pthen v1 = f();else v2 = g();v3 = (v1; v2);a = v3;In addition to a conuence point immediately following a conditional, another type of conuencepoint exists at every loop header. For each variable v dened in a loop body, two values of v, namelyone assigned v before the loop, the other within the loop, merge at the conuence point of the loopheader. Thus a -function for v is introduced there.v = 10;do fv = v 1;g while(v > 0) =) v0 = 10;do fv1 = (v0; v2);v2 = v1 1;g while( v2 > 0)25
Here we denote the value initialized before the loop by v0 (subscript 0).Gated Single Assignment Form. Although SSA form allows for ecient representation of dataows in a program, it possesses a weakness in terms of program transformability. For example, inthe above code the -function is structurally bound to the conditional \if P," since the -function\v3=(v1; v2)" can be executed only after the outcome of \P" is known. Such a coupling will beavoidable, if we parameterize the -function with the associated predicate.Gated single assignment (GSA) form solves this problem [20]. GSA form is an extension of SSAform with new functions that encode control over value assignment. The following code fragmentsshow translation from a conditional into the code in GSA form.if condthen v = f();else v = g();a = v; =) P = cond;if Pthen v1 = f();else v2 = g();v3 = (P; v1; v2);a = v3;The -function denotes the extension. The rst argument of the -function is the predicateassociated with the original -function and its remaining arguments are the same as the -function.Having our intermediate programs in GSA form, we can treat -functions in the same way as wedeal with any other statement.In GSA, loops need a special form of pseudo-assignment function other than a -function.The reason is the conuence points at loop headers are not associated with particular conditionalpredicates. Thus for each variable v dened in a loop body, a denition v0 = (vinit; viter) is insertedat the loop header, where vinit is the initial value of v reaching the header from outside and viter isthe iterative value reaching along the back-edge of the loop [20]. In our example below we denotevinit and v0 by v0 and v1, respectively.v = 10;do fv = v 1;g while(v > 0) =) v0 = 10;do fv1 = (v0; v2);v2 = v1 1;g while( v2 > 0)26
Chapter 5Transformation 1: Feasible Code SynthesisIn this chapter we address translating a TCEL program into a low-level representation eligible forreal-time scheduling. To give a clear understanding of the problem domain, consider Figure 5.1.Our problem domain at hand { denoted by the components with bold lines { actually contains twoproblems. The rst problem lies in compiling a TCEL program into the schedulable units, and thesecond problem deals with attaining the feasibility of an infeasible task.We approach the rst problem with a compiler-based decomposition method. This method en-ables a TCEL compiler to automatically translate a TCEL source program into a set of decomposedtasks, and it derives a set of timing equations for them. Later, at runtime, a scheduler benetsfrom this decomposition method, since it can eciently dispatch the tasks according to their timeschedules (or priorities) based on both their execution times and associated timing equations.The second problem arises when any task is determined to have an execution time conictingwith the derived timing constraints. In this case, it is impossible for any scheduler to generate afeasible schedule for the task set. We develop a compiler transformation method to automaticallycorrect such feasibility faults based on a variant of Trace Scheduling [12]. We name this methodfeasible code synthesis. In this method, the compiler picks the worst-case execution time paths ofan infeasible task and moves unobservable code to eliminate the overload.The remainder of this chapter is organized as follows. In Section 5.1 we formally state thefeasible code synthesis problem, and show that it is essentially an intractable problem. Then webriey discuss our solution strategy. In Sections 5.2 and 5.3 we explain the transformation strategy.27
High-LevelReal-Time ProgramsTiming Constraints on TaskscompilationsynthesisFeasibility Tests(Intra-Task Consistency)
RequirementsSpecicationDecomposed Tasks,Schedulability Tests(Inter-Task Consistency) Compiler-basedTransformation MethodsFeasible Code Synthesis(Chapter 5)failurefailurefailure success
renementrenement
Real-Time Task Slicing(Chapter 6)System-Tuning,Re-implementationfailureFigure 5.1: Software Development Process: Revisited for Feasible Code Synthesis5.1 The Problem and Our SolutionCompiling a TCEL program into schedulable code raises two fundamental issues, which are endemicto all real-time systems.1. A program may not be feasible, i.e., a single process's execution time may conict with itsown real-time constraints.2. While the program may be feasible, it may not be schedulable under any tractable real-timescheduling algorithm.As for the feasibility problem, consider the following TCEL fragment:28
do input(P, &m);start after 10ms nish within 20msf input(Q, &x);S; [20ms]output(R, y);gThe code's timing constraints mandate a 10ms latency between the events generated by \in-put(P, &m)" and \input(Q, &x)," as well as a 20ms deadline between the events generated by\input(P, &m)" and \output(R, y)." Meanwhile, the bracketed \20ms" denotes that the un-observable statement S requires a maximum of 20ms to execute, a bound obtained by a timinganalysis tool (e.g., [19, 32, 42, 47, 55]). Consequently, the program possesses an inherent conict,since S requires 20ms to execute while it is only allowed 10ms.We address this problem by an approach we call feasible code synthesis. In our example thiswould involve decomposing S and, if possible, moving instructions not dependent on \x" outof the overloaded section. However merely achieving feasibility may be of little help, since thetransformed code may still not be schedulable under any known methods. For example, when aprogram contains if-then-else branches, then the actual execution paths (and the events executed)are determined dynamically. But since schedulers must provide guarantees, they do not have theexibility to instantaneously, dynamically reschedule a task set whenever an event is triggered.Indeed, while an event-based semantics makes conceptual sense at the source-program level, mostreal-time schedulers only accept timing constraints on the start and nish times of tasks.Since the strategies used to achieve feasibility have a profound aect on the ultimate taskstructure, we solve these two problems together. Thus the role of the TCEL compiler is to partitionevent-driven source programs into time-constrained blocks of code, in which all of the blocks arefeasible.5.1.1 The Problem of Feasible Code SynthesisEven without the code block partitioning component, achieving feasibility is a nontrivial problem.This is obviously the case if we allowed potentially unbounded loops (or conditional jumps), whichwould render the problem undecidable. But since real-time programs must be amenable to worst-case timing estimates, we assume that upper bounds can be obtained for execution times. Formally,29
we let \wt(S)" denote a statement's worst-case execution time, where we assume that wt(S) 2[0;1) for any statement S. Obviously \wt" is implementation dependent, and its tightness isdetermined by the quality of the analysis tool used to generate the bound.1While the feasibility problem may be decidable for our domain, it is not necessarily trivial.Even when a program is structured like our example above, and where S is a basic block, simplydeciding whether the program can be made feasible is still NP-hard.The problem can be stated as follows: given a TCEL timing constructdo RB start after tmin start before tmax1 nish within tmax2 CBis it possible to transform RB and CB to meet the following constraints?(1) On any execution path, the original ordering of observable events is maintained.(2) The original data and control dependences are not violated between instructions and events.(3) The code's execution time does not conict with the timing constraints between the events.Clauses (1)-(2) imply that the transformed code must be functionally correct: events may notbe reordered, and the original relationships between input and output data must be maintained.Clause (3) means that the new code is feasible.This problem is NP-hard, due to the existence of immovable operations and data dependences.Theorem 5.1 Feasible code synthesis is NP-hard.Proof: The proof follows by a straightforward transformation from \Partition[SP12]" [13] to fea-sible code synthesis. Consider an instance (A; s) of Partition, where A = fa1; a2; : : : ; ang is a set ofelements, and where s : A 7! N is the cost of each element. Letting Pni=1 s(ai) = 2T , a partitionof A is some A0  A such that Pa2A0 s(a) = T = Pb2A A0 s(b). Determining whether such an A0exists is equivalent to determining whether the following TCEL program can be made feasible:1We return to this issue in Chapter 7.
30
doE1: input(P, &x);start after T start before T nish within 2TfE2: output(Q, g(x));L1: x1=f1(x); [s(a1)]L2: x2=f2(x); [s(a2)]... ...Ln: xn=fn(x); [s(an)]E3: output(R, h(x1; : : : ; xn));gHere E1   E3 generate events, L1   Ln are unobservable instructions, and each line is consideredatomic (that is, a line must be relocated as a single entity). Then by the construction,(1) E2, L1   Ln are mutually data independent.(2) E2, L1   Ln are data dependent on E1.(3) E3 is data dependent on L1   Ln.If we assume that wt(Li) = s(ai) for 1  i  n, and that wt(Ej) = 0 for 1  j  3, then there existsa partition of our original set A if and only if there is a feasible transformation for the program.As for the \only if" part, assume there is a partition A0  A. Then for all ai 2 A0, moving thecorresponding instruction Li between the events E1 and E2 ensures feasibility: exactly T executiontime is consumed between E1 and E2, and another T is used between E2 and E3.As for the \if" part, assume there is a feasible transformation. Then, since 2T execution timeis used overall, the constraints mandate that exactly T of it be used between E1 and E2, and therest between E2 and E3. Thus we must move some set of instructions L  fL1; : : : Lng betweenE1 and E2, where PLj2Lwt(Lj) = T . But then the corresponding elements in A form a partition.When both the constraint block and reference block consist of straight-line code, the problemis obviously in NP as well. A feasible ordering can always be \guessed" and then veried, whichconsequently yields the following corollary.Corollary: Feasible code synthesis is NP-complete for straight-line code.31
5.1.2 Solution StrategyIn proving Theorem 5.1 we used the simplest possible TCEL program, which possessed just twobasic blocks. In this case a feasible transformation would simply reorder the instructions, whilekeeping the program's fundamental structure intact.But the situation gets signicantly more complicated when the program possesses branches,and when the events that get executed are determined at runtime. Since attaining feasibilitymandates that we statically guarantee the timing constraints along all execution paths, reorderingthe instructions along a single path may not be sucient. In the worst of all cases, all paths of amulti-branching program would have to be \expanded," and then each one reordered individually{ potentially requiring time and space exponential in the original number of instructions. Clearlythis is not an attractive solution approach. Moreover, it would render the problem of schedulabletask decomposition { our second objective { next to impossible.Thus we take the following alternative approach, in which feasible tasks are synthesized in atwo-step process { section decomposition (Section 5.2) and code scheduling (Section 5.3).Section Decomposition. First the code is translated into its gated single assignment (GSA)form [8, 20]. This representation serves two purposes: (1) it yields a compact means of representingthe data-dependence relation discussed in Chapter 4; and (2) GSA's convention of uniquely namingeach variable assignment is precisely what we require in the code transformations phase.Next, the timing construct is decomposed into sections, which represent the natural schedulable\task" units. This step involves determining the section boundaries, as well as generating a set ofdispatch equations that constrain the start and nish times of each section.Code Scheduling. In this step the dispatch equations are checked for their consistency with thecode's execution time. If there is an inconsistency, program transformations are used to relocateunobservable code across section boundaries.The algorithm is a greedy approximation, in that each section is processed locally, with the goalof attaining consistency for an entire program. The following strategy is used: if a given section isdetermined to be over-constrained, code is moved out of the section and \up" on the HFG. Afterthe section at hand is determined consistent, another section can be processed in reverse topologicalorder. Thus the net result is an \upward migration" of unobservable instructions, which terminateseither when the program achieves consistency, or it is determined to be inconsistent.An analogous strategy is used for programs with nested timing constructs. In this case the HFG32
































end_S4Figure 5.2: Flow Graph of do Construct and its Section Division end S2: This marker is inserted directly before the unobservable instruction most closelypost-dominating LAST (RB) [ fentry(RB)g. begin S4: This marker is inserted directly after the unobservable instruction most closelydominating FIRST (CB) [ fexit(CB)g. end S4: This marker is inserted directly before the unobservable instruction most closelypost-dominating LAST (CB) [ fentry(CB)g.For example, consider the constraint block in Figure 5.2. The unobservable node B9 post-dominates LAST(CB) and the entry node. Thus, its logical place is in the interface section S5,which is not subject to the construct's timing constraints. Hence we need the marker end S4, whichis the unique exit point for the constrained section S4.Now, let the variable S2.start correspond to the actual time that the marker begin S2 is \exe-cuted" (that is, the dispatch time of section S2), and let S2.nish correspond to the time that thesection ends. Similarly, let S4.start and S4.nish represent the start and nish times of section S4.Using these variables we can represent the section decomposition of a TCEL construct in a mannersimilar to that found in the Flex language [25].Recall the ight controller program from Figure 3.3. Figure 5.3 illustrates its constituent sec-tions. The constraint-expression for S6 corresponds to the program's outer, periodic loop. As the34
program is in GSA form, -functions appear at conuence points where dierent values of thesame variable in the original program merge. For example, \dtheta4=(c3,dtheta2,dtheta3)" isinserted at the place where two dierent values of dtheta merge. As a result, each use of a variableis reached by a unique assignment. Also, -functions for variables gx, gy and i are inserted at theperiodic loop header so that the initialized values are merged with the iterative values. Again, thebracketed numbers denote the maximum execution times of the corresponding operations on thetargeted CPU. On modern architectures, ne-grained operations like simple assignments possessminuscule execution times, and cannot be measured (in isolation) by any timing tool. For the sakeof presentation we assume that such instructions take zero time, and concentrate on larger-grainedfunction calls and the like.25.2.2 Deriving Code-Based Timing ConstraintsAs seen in Figure 5.3, the code-based timing constraints can be expressed as conjunctions of linearinequalities between start-times and nish-times of dierent sections. However, note the dierencebetween the code-based constraints and the TCEL source-level constraints: In Figure 3.3 the \nishwithin" deadline is 3.1ms, while in Figure 5.3 it is tightened to 3.0ms. There is good reason for this{ the new code-based timing constraints must be strong enough to guarantee the original semanticsof the event-based constraints. That is, they must take into account the program's execution-timecharacteristics. In general, consider the TCEL construct such asdo RB start after tmin start before tmax1 nish within tmax2 CBObviously, the original TCEL parameters are not tight enough to guarantee the correctness of thecode-based constraints. For example, if we wish to maintain the \tmax1" requirement, it is notsucient to simply mandate that S4 starts within a maximum delay of tmax1 after S2 ends (thoughthis is certainly necessary). We can see in Figure 5.2 that the event actually executed in LAST(S2)may be E1, while the event executed in FIRST(S4) may be E4. Thus the naive strategy fails tofactor in the execution times of B3 and B4.However, the event-based semantics is clear: the time between the executed event in LAST(S2)and the executed event in FIRST(S4) is at most tmax1. To guarantee that this occurs, we mustaccount for all possible execution scenarios. Specically, we must tighten the constraints, allowingfor the maximum amount of time between an event in LAST(S2) and end S2, as well as the2We revisit this issue in Chapter 7. 35
S6: (S6.start[p]p20ms, S6.nish[p]p20ms+5ms)fS1: f gx1 = (gx0, gx3);gy1 = (gy0, gy3);i1 = (i0, i3);input(GPS, &x1, &y1); [.1ms]input(NAV, &theta1); [.1ms]input(IMU, &roll1); [.1ms]gS2: f output(Cntrl out, REQ VEL); [.1ms]gS3: f /* Null */ gS4: (S4.startS2.nish+0.75ms, S4.nishS2.nish+3.0ms)f input(Cntrl in, &vel1); [.1ms]c1 = GOAL[i1].passedif (c1) fgx2 = GOAL[i1].x;gy2 = GOAL[i1].y;i2 = (i1+1) % NCOORD;ggx3 = (c1, gx2, gx1);gy3 = (c1, gy2, gy1);i3 = (c1, i2, i1);rtheta1 = compRelAtt(theta1, x1, y1, gx3, gy3); [.25ms]c2 = jrtheta1j < EPS;if (c2)dtheta1 = 0.0;elsefc3 = vel1< VHIGH;if (c3)dtheta2 = rtheta1;elsedtheta3 = safeDtheta(rtheta1, roll1); [.43ms]dtheta4 = (c3, dtheta2, dtheta3);gdtheta5 = (c2, dtheta1, dtheta4);wap = compFlapw(roll1, vel1, dtheta5); [.95ms]throttle1 = compThrot(roll1, vel1, dtheta5); [.89ms]output(THROT, throttle1); [.1ms]output(FLAP Cntrl, wap); [.1ms]gS5: f / null / ggFigure 5.3: Flight Controller Program: After Section Decomposition36
maximum amount of execution time between begin S4 and an event in FIRST(S4). We mustsimilarly adjust tmax2. To do this, we make the following denitions: S2 def= maxfwt(p) j e 2 LAST(S2); e!p end S2g.3 S4 def= maxfwt(p) j e 2 FIRST(S4); begin S4!p eg.Note that S2 and S4 are sensitive to not only code's execution time characteristics, but alsochanges made to some paths between events and markers during program translation. For example,changes to paths between end S2 and a node in LAST(S2) might require re-evaluation of S2.Now the code-based timing constraints can be postulated as follows:(1) S4:start  S2:nish + Tmin (where Tmin = tmin)(2) S4:start  S2:nish + Tmax1 (where Tmax1 = tmax1  S2  S4)(3) S4:nish  S2:nish + Tmax2 (where Tmax2 = tmax2  S2)These timing constraints are strong enough to guarantee the original event-based timing constraints.(By convention, if the \start after" constraint is omitted, we consider tmin to be 0. Similarly, wheneither the \start before" or \nish within" constraints are missing, we consider tmax1 = 1 ortmax2 = 1, respectively.) Returning to Figure 5.3, we can see that equation (3) indeed mandatestightening the original 3.1ms to 3.0ms.Now we wish to determine when (1)-(3) can be met. That is, what do these equations inferabout the program's allowable worst-case execution-time behavior? This can easily be derived if weadd precedence constraints reecting the natural ow of the program; i.e., that S4 executes afterS3, which executes after S2:(4) S2:nish + wt(S3)  S4:start(5) S4:start+ wt(S4)  S4:nishEliminating S2.nish, S4.start and S4.nish from (1)-(5), we end up with:(a) Tmin  Tmax1(b) wt(S3)  Tmax1(c) wt(S3) + wt(S4)  Tmax2(d) wt(S4)  Tmax2   Tmin3We overload the wt-function for paths, since a path can be thought of as a sequential composition of statements.37
Section Duration Constraint (DUR(S))S3 minfTmax1; Tmax2  wt(S4)gS4 Tmax2   TminTable 5.1: Timing Constraints of S3 and S4Obviously, (a) had better be true in order for the TCEL construct to make any sense. For the pur-poses of our algorithm we combine (b) and (c), yielding the following two constraints on executiontimes: () wt(S3)  minfTmax1; Tmax2  wt(S4)g() wt(S4)  Tmax2   TminThese are the necessary and sucient conditions to achieve feasibility, and they are summarized inTable 5.1.Returning to our example, we nd that section S4 violates its duration constraint (DUR(S4) =2:25ms),4 since wt(S4) by far exceeds it. (Adding up the time annotations yields wt(S4) = 2:82msalong the worst-case execution time path.) In the next section we discuss our code-schedulingtechniques which handle cases such as this, in which the duration constraints fail to hold.5.3 Code SchedulingThe code scheduling algorithm is inspired by a common compiler strategy used for VLIW andsuperscalar architectures [3, 10, 12, 14, 38, 48]. In such domains, an optimizing compiler exploits aprogram's inherent ne-grained parallelism, and \packs" its computations into as many functionalunits as possible. Thus the objective is to keep each unit busy, and to achieve better overallthroughput.Our problem context has an entirely dierent goal, and it cannot be solved by directly applyingwell-known techniques such as Trace Scheduling [12] or Percolation Scheduling [3, 10]. We areconcerned not with enhancing average-case performance, but instead with ensuring feasibility. Infact, we will be satised with even increasing the program's overall execution time { as long as thetiming constraints are met.4Tmax2   Tmin = 3:0ms  0:75ms = 2:25ms 38
algorithmCode Scheduling(T) / T is a timing construct /input: the ordered set of sections fS1; S2; : : : ; S5g in Tbegind = Tmax2   Tmin;/ Schedule code from S4 into S3 /call Schedule Section(S4, S3, d, ;);recompute Tmax1; / to reect the change in S4 /if (wt(S3)  Tmin) then exit(\No scheduling needed for S3.");else d = minfTmax1; Tmax2  wt(S4)g;/ Schedule code from S3 into S1 /call Schedule Section(S3, S1, d, Def(S2));endFigure 5.4: Top-Level Algorithm for Code Synthesis5.3.1 The Top-Level AlgorithmOur approach to code scheduling is a greedy approximation, and it attempts to attain the desiredfeasibility of a timing construct in a section-by-section manner. It inspects sections S4 and S3 (inreverse topological order), and checks whether they satisfy their duration constraints. If S4 violatesits constraint, the algorithm attempts to reduce its surplus execution time by moving nodes intosection S3. In turn it processes section S3, which may now contain newly moved code.To perform greedy code motion, we have adapted a technique from the approach to TraceScheduling in [12], and we use it as a component of the code scheduling algorithm. In our approach,nodes lying on paths that exceed their section's duration constraint are considered for code motion.We distinguish such paths as critical traces. Formally, a critical trace p of section S is dened as apath entry(S)!p exit(S)such that wt(p) > DUR(S). The reason we use the trace-based approach is straightforward: op-timizing to avoid hard real-time exceptions demands scheduling only the critical traces, and noothers.Figure 5.4 sketches the algorithm. Note that Tmax1 is recomputed after scheduling S4 and before39
scheduling S3. This is mandatory, since S4 may be changed during the scheduling. Also observethat the code of S3 is moved into S1, while that of S4 is moved into S3. We disallow code frommoving into S2 because it could potentially change the value of S2, which would in turn invalidateour assumptions about DUR(S4). In order to complete the procedure in a single pass, we assumethat S2 remains constant. In reality this restriction does not seriously limit the approach: fromour experience, events in the RB typically lie in straight-line code (and thus S2 contains a singleinstruction, as in our example).The top-level algorithm calls subroutine \Schedule Section," which then schedules the over-loaded section at hand. Note that when code is scheduled from S3 into S1, the variables dened inS2 are passed to the subroutine, which ensures the dependences from S2 to S3 are maintained. Inthe following section we discuss this subroutine, and the strategies it uses to solve the schedulingproblem.5.3.2 Subroutine Schedule Section(S,D,DUR(S),Vbar)For the ow graphs of source section S and destination section D, the critical trace schedulingproblem is to construct new ow graphs for S and D such that:(1) The observable nodes of S remain in S and keep their relative order on all paths in S.(2) wt(p)  DUR(S) holds for all traces p in S.(3) Execution ordering established by code's original data dependences is preserved.When code is scheduled from S3 into S1 the parameter Vbar { containing the variables dened inS2 { is required to maintain property (3).As we have stated, the trace-based approach is attractive precisely because it allows us toconcentrate on paths which violate the duration constraints. However, a direct application ofTrace Scheduling induces a severe liability { extra code must be inserted to preserve the program'ssemantic integrity. In the parlance of instruction-scheduling, this is typically called bookkeepingcode.Consider the GSA program in Figure 5.5(B) to see why Trace Scheduling requires bookkeeping.Since the instruction \z1=F(x)" is free of a data dependence on the variable\y," it may be eligibleto be moved into S3. (This transformation { which we develop in the sequel { is called speculative,since it breaks a control dependence.) As shown in Figure 5.5(C), moving the instruction requiresno additional code to maintain the program's semantics; GSA's naming conventions maintain thecorrectness. 40
(A) Code in TCEL (B) Code in GSA (C) Our Approach (D) Bookkeeping Approachdoinput(P, &x);start after t1nish within t2f input(Q, &y);c1 = p(y);if (c1) fa = E(y);z = F(x);g elsez = G(y);c2 = q(y);if (c2)r = H(z);elser = K(z);s = I(r);...g
...S2:f input(P, &x);gS3: f /* Null */ gS4: (S4.start : : : ,S4.nish : : : )f input(Q, &y);c1 = p(y);if (c1) fa1 = E(y);z1 = F(x);g elsez2 = G(y);a2 = (c1, a1, a0);z3 = (c1, z1, z2);c2 = q(y);if (c2)r1 = H(z3);elser2 = K(z3);r3 = (c2, r1, r2);s = I(r3);...g
...S2: f input(P, &x);gS3: f z1=F(x);gS4: (S4.start : : : ,S4.nish : : : )f input(Q, &y);c1 = p(y);if (c1)a1 = E(y);elsez2 = G(y);a2 = (c1, a1, a0);z3 = (c1, z1, z2);c2 = q(y);if (c2)r1 = H(z3);elser2 = K(z3);r3 = (c2, r1, r2);s = I(r3);...g
...S2: f input(P, &x);gS3: f z1 = F(x);r1 = H(z1);r2 = K(z1);gS4: (S4.start : : : ,S4.nish : : : )f input(Q, &y);c1 = p(y);if (c1) fa1 = E(y);c3 = q(y);r3 = (c3, r1, r2);g else fz2 = G(y);c4 = q(y);if (c4)r4 = H(z2);elser5 = K(z2);r6 = (c4, r4, r5);ga2 = (c1, a1, a0);z3 = (c1, z1, z2);r7 = (c1, r3, r6);s = I(r7);...gFigure 5.5: (A) Source Code in TCEL, (B) Corresponding Intermediate Code in GSA Form, (C) In-termediate Code after Bookkeeping-Free Transformations, and (D) Intermediate Code after Trans-formations with Bookkeeping 41
However a more aggressive policy could be carried out, which is shown in Figure 5.5(D). Notethat additional code may be moved without breaking data dependences; even the variable r may besplit into movable parts (i.e., r1 and r2) and the parts that depend on y (i.e., r3 and r5). However,the price we pay is the additional bookkeeping code required to maintain correctness.One obvious problem with bookkeeping is that it induces a signicant amount of extra code {indeed, if carried to extremes, the transformations in Figure 5.5(D) may result in an exponentialblow-up. And in our problem context bookkeeping may have an additional, \fatal" eect:Scheduling a critical trace may insert bookkeeping code on other, non-critical traces, andthereby increase their execution times. Hence a non-critical trace may become critical.To avoid this drawback of Trace Scheduling, we use the type of transformation depicted inFigure 5.5(C), and we use it as aggressively as possible. The strategy involves repetitively applyingthe following three steps: (i) nding a critical trace p; (ii) identifying a node n which can be movedinto the destination section D; and (iii) moving n into D, along with n's ancestor nodes requiredto maintain the program's semantics. Since our objective is to keep the amount of new code to aminimum, step (iii) translates into the following rules for moving n into D:(1) Node n's data dependence predecessors are moved along with n; i.e., the nodes on which nis transitively data dependent.(2) The control-dependence predecessors (i.e., the if-then-else's guarding n) are treated as fol-lows:(a) If possible they are copied into D, so that they still guard the execution of n.(b) Otherwise (as in Figure 5.5(C)), n will now be unguarded in its destination section D.The end result of code scheduling appears as if large-grained control structures were rearranged,and hence we name the strategy structural code motion. Yet code scheduling is still trace-based,since it is driven by worst-case paths.Consider a node n to be moved into the destination section. When all of n's dependencepredecessors (both data and control) are moved (or copied) along with n, the new execution orderingis guaranteed to maintain the program's original semantics.But what are the ramications of case 2(b) above, i.e., where control dependences are broken?Consider Figure 5.6(A), where we assume that the condition variable \c" is dependent on an inputevent. Examining the source code, assume that we wish to move the node 
 	x=a+3 above 
 	if c .42
if c x=a+3x=a+2 x3=(c, x1,x2)x3=(c, x1,x2)if c if cx2=a+3 x2=a+3x1=a+2x1=a+2(A) (B) (C)Figure 5.6: Speculative Code Motion: (A) Original Code, (B) Corresponding GSA Code and (C)Speculatively Transformed CodeThe moved instruction is executed regardless of the control-predicate's outcome; hence the name\speculative transformation."Carrying out speculative transformations raises three critical issues: variable naming, executiontime and safety.Naming: Consider what would happen if the transformation shown in Figure 5.6(C) were per-formed at the source level. Since x would always end up dened as a+3, one branch of the conditionalwould result in an incorrect state. Fortunately, the GSA form of the program ensures that multiplydened source variables { and their corresponding -functions { maintain the original semantics,regardless of where assignments are moved. Examining Figure 5.6(C), we see that x1 and x2 aredened sequentially. By GSA's naming conventions, the node  x3=(c,x1,x2) ensures that x3always carries the assignment corresponding to the original source variable x.Timing: Figure 5.6 shows how speculative transformations can easily increase the executiontime of the destination section D. This is not necessarily harmful, since D may in fact possesssucient slack for both instructions to execute. (Indeed, D may contain an explicit delay.) Butif D itself exceeds its own duration constraint, excessive speculative transformations could makematters worse. Thus we take the following approach: the algorithm performs speculative codemotion only when feasibility cannot be achieved with the non-speculative variety.43
Safety: Perhaps the most critical issue is the correctness of the transformed program. Afterall, the source code is written by a human programmer. When an instruction appears within thebody of a conditional (but is free of a transitive dependence on it), one should still assume thatthe programmer had a good reason for putting it there. Often the reason stems from a personalcoding style, or perhaps for readability. Also, splitting variable denitions in the style of GSA is arather unnatural practice at the source level.Referring back to Figure 5.5(C), we note that the \eager" execution of \F" should be safe if:(1) it contains no observable events, (2) it induces no global side-eects, and (3) it does not causean exception. We can assume that (1)-(2) hold { otherwise \F" would not have been moved inthe rst place. However, verifying property (3) may be dicult, since there may be an invariantrelationship between \p" and \F" that only the programmer understands.While this seems to argue against speculative transformation, recall that our objective is toassist programmers in tuning faulty code. And production real-time programmers will nd thistype of code reordering sadly familiar, since it is usually carried out by hand, and often under thepressure of an approaching release deadline. Our technique can help in this eort, since it helpsautomate this process by (1) identifying the \good target" instructions to move, (2) by transferringthem to their \correct" places, and (3) by analyzing the results. Nonetheless, we do believe that thisshould be an interactive process (perhaps driven by a graphical front-end), in which the programmervisually checks each transformation.5For the sake of brevity, however, we present the algorithm in a fully automatic form. Thus weassume that any node n that can be speculatively executed is \pre-checked" and is denoted by thecondition spec(n).Unconditional and Speculative Movability. The preceding discussion leads to three classesof instructions: those that can be unconditionally moved, those that can be moved to executespeculatively, and those which cannot be moved at all. The following denitions distinguish betweenthese cases.Denition 5.1 (Unconditional Movability) Mu(S; Vbar) is the set of nodes in S that do nottrigger events, and do not use any variables in Vbar; i.e.Mu(S; Vbar) = fm 2 S jm is not an event ^ Use(m) \ Vbar = ;g5We discuss this issue in Chapter 7. 44
Then Umove(n) denotes that we can unconditionally schedule node n from S into D:Umove(n)  DC (n; S) Mu(S; Vbar)That is, all of n's data and control dependence ancestors are also unconditionally movable { andwhen n is moved, they will be moved (or copied) as well.Denition 5.2 (Speculative Movability) Additional considerations come into play when a nodeis speculatively executed. Consider the set Ms(S; Vbar):Ms(S; Vbar) = fm 2 Mu(S; Vbar) j spec(m)gIf a node n is in Ms(S; Vbar) then (1) it uses no variables in Vbar, (2) it triggers no events, and (3)the programmer has checked that it does not cause a local exception. Then Smove(n) denotes thatwe can speculatively move node n:Smove(n)  DDC (n; S) Ms(S; Vbar)That is, when Smove(n) holds true, all of n's data-dependent ancestors can be moved too, withoutmandating bookkeeping.The Algorithm. The code scheduling algorithm is presented in Figures 5.7 and 5.8. It is com-posed of three stages: pre-processing, marking/deleting and post-processing. In the pre-processingstage, S's ow graph is traversed in topological order, during which the conditions Umove andSmove are evaluated. A topological traversal ensures that whenever a node n is visited, all ances-tor nodes in DC(n; S) have already been processed; hence a single traversal is sucient to evaluatethese conditions. Then a \clone" S0 of S is created, which is used to hold the part of the ow graphto be transferred to D.Next the algorithm searches for a critical trace, and if one exists it invokes subroutine \Sched."Sched makes use of array \mark," each of whose entries corresponds to a node in S0. Whenever\mark[m; S0] = true," it means that node m will be \moved" into the destination section. Schedexamines the critical trace in topological order, looking for node n such that Umove(n) is true. Ifsuch a node n exists, then closure DC (n; S) is generated; its non-predicate members are deletedfrom S, while all corresponding nodes in S0 are marked. If no unconditionally movable instructionexists, then a transformation of the speculative variety is attempted. And if no movable node ispresent, the program is forced to exit. 45
subroutine Schedule Section(S, D, d, Vbar)input: source section S, destination section D, duration constraint d, anda set Vbar of variables dened in sections lying between S and D.begin/* Pre-processing Stage */foreach node n in S in topological order doevaluate Umove(n) and Smove(n);/* Marking/Deleting Stage */make a copy S0 of S;compute p in S such that wt(p) = maxfwt(p0) j entry(S)!p0 exit(S)g;while (wt(p) > d)call Sched(p; S; S0);recompute p in S such that wt(p) = maxfwt(p0) j entry(S)!p0 exit(S)g;end/* Post-processing Stage */delete all unmarked nodes from S0;delete all predicate nodes guarding null code from S;append S0 to end of D;endFigure 5.7: The Section Scheduling Algorithm (Schedule Section)At the end, if all critical traces were scheduled, the algorithm proceeds to a post-processingstage. If speculative transformation was carried out then S0 will, by denition, contain branchingstructures with empty predicate nodes. In this case, the nodes on the dierent branches of thepredicate node are merged into a single block.Finally, S0 is attached to the end of the destination section D. Cleaning up, the algorithm deletescontrol-predicates which guard empty nodes in section S { i.e., the \if" nodes whose correspondingbodies and -functions were completely transferred to S0.Example Revisited. We return to our original ight controller example from Figure 5.3, andsubject it to the code scheduling algorithm. The end-result appears in Figure 5.9. In schedulingS4, \Sched" unconditionally moves the function call compRelAtt, as well as the other nodes in itsdependence closure. This reduces wt(S4) by 0.25ms, which now stands at 2.57ms. Since DUR(S4) =46
subroutine Sched(p; S; S0)input: critical trace p, source section S, and a clone S0 of S.beginif there is some n 2 p such that Umove(n) holds thenbeginSelect rst such n 2 p such that Umove(n) holds;foreach node m 2 DC (n; S) domark[m; S0] := true;if m is not a control-predicate then Delete m from S;endendelse if there is some n 2 p such that Smove(n) holds thenbeginSelect rst such n 2 p such that Smove(n) holds;foreach node m 2 DDC (n; S) domark[m; S0] := true;if m is not a control-predicate then Delete m from S;endendelse exit(\Unable to synthesize.");endFigure 5.8: The Section Scheduling Algorithm (Sched)2:25ms), further reductions are made by entering the speculative transformation phase; this resultsin moving one conditional branch and the function call safeDtheta beyond the immovable control-predicate if (c3).After the transformation, the implementation satises the necessary condition for consistency,since the body of S4 requires 2.14ms in the worst-case. Since wt(S3) = 0:68ms is still less thanDUR(S3), the code scheduling successfully terminates without further scheduling S3.In addition to such an instant benet, the transformation converts possibly wasteful delay intouseful computation time, since the new code in S3 can be scheduled within the delay intervalbetween S2 and S4. 47
S6: (S6.start[p]p20ms, S6.nish[p]p20ms+5ms)fS1: f gx1 = (gx0, gx3);gy1 = (gy0, gy3);i1 = (i0, i3);input(GPS, &x1, &y1); [.1ms]input(NAV, &theta1); [.1ms]input(IMU, &roll1); [.1ms]gS2: f output(Cntrl out, REQ VEL); [.1ms]gS3: f c1 = GOAL[i1].passed;if (c1) fgx2 = GOAL[i1].x;gy2 = GOAL[i1].y;i2 = (i1+1)% NCOORD;ggx3 = (c1, gx2, gx1);gy3 = (c1, gy2, gy1);i3 = (c1, i2, i1);rtheta1 = compRelAtt(theta1, x1, y1, gx3, gy3); [.25ms]c2 = jrtheta1j < EPS;if (c2)/* null */elsedtheta3 = safeDtheta(rtheta1, roll1); [.43ms]gS4: (S4.startS2.nish+0.75ms, S4.nishS2.nish+3.0ms)f input(Cntrl in, &vel1); [.1ms]if (c2)dtheta1 = 0.0;elsefc3 = vel1< VHIGH;if (c3)dtheta2 = rtheta1;else/* null */dtheta4 = (c3, dtheta2, dtheta3);gdtheta5 = (c2, dtheta1, dtheta4);wap = compFlapw(roll1, vel1, dtheta5); [.95ms]throttle1 = compThrot(roll1, vel1, dtheta5); [.89ms]output(THROT, throttle1); [.1ms]output(FLAP Cntrl, wap); [.1ms]gS5: f / null / ggFigure 5.9: Flight Controller Program: After Code Scheduling48
5.4 SummaryIn this chapter we have addressed the problem of feasible code synthesis. We approach the problemvia a two-step process, in that (1) an event-based TCEL program is decomposed into code-basedtasks; and (2) unobservable code is scheduled to avoid feasibility faults. We attack the intractabilityof the code scheduling problem with a greedy approximation approach, based on Trace Scheduling.
49
Chapter 6Transformation 2: Real-Time Task SlicingAfter tasks have been successfully synthesized into feasible code, the next challenge lies in achievingschedulability of the entire application. Consider Figure 6.1. The two components at the bottomof the diagram correspond to the major elements involved in the tuning problem. \Schedulability"is, by denition, a key metric that drives our compiler-based task transformation tools. Thus closeinteractions between the real-time scheduler and the compiler tool are necessary. But this presentstwo competing demands: (1) it is desirable to maintain the traditional separation of concernsbetween compilation and scheduling; and (2) schedulability depends on complex task interactionsthat are often exposed at runtime.In this chapter we present a compiler transformation method which satises these demands. Wename the technique real-time task slicing. This transformation is specically intended for periodictasks in the domain of periodic, discrete control systems.Ideally real-time task slicing should be complementary to feasible code synthesis, since bothaddress two distinct system-tuning problems. Each must be solved before we get a correct real-time system. In practice, however, it is possible that their successive applications to the same taskwill produce a conicting result. If we were to truly aim for optimal performance, feasible codesynthesis would have to be geared for the specic schedulers. However we have already shownthat the synthesis problem is, by itself, highly complex. This would add an additional layer ofcomplexity.In this thesis we treat the two phases as orthogonal. In doing so, we restrict ourselves to applyingthe real-time task slicing method only to simple \every" constructs with no nested \do" constructswithin their bodies. Fortunately, many applications from discrete control domains possess this typeof periodicity requirement.To take a closer look at the problem of schedulability tuning, we open up the process diagram of50
High-LevelReal-Time ProgramsTiming Constraints on TaskscompilationsynthesisFeasibility Tests(Intra-Task Consistency)
RequirementsSpecicationDecomposed Tasks,Schedulability Tests(Inter-Task Consistency) Compiler-basedTransformation MethodsFeasible Code Synthesis(Chapter 5)failurefailurefailure success
renementrenement
Real-Time Task Slicing(Chapter 6)System-Tuning,Re-implementation failureFigure 6.1: Software Development Process: Revisited for Real-Time Task SlicingFigure 6.1, and re-draw the related components in Figure 6.2. The generalized task transformationmethod consists of three interrelated components: (1) schedulability tests, (2) a priority orderingalgorithm, and (3) a real-time task slicer. In this chapter we explain these components, as well asthe interactions. First, in Section 6.1 we discuss the characteristics of control domain software andxed-priority preemptive scheduling algorithms. We also present the key idea of the generalizedslicing method, and discuss the safety of the transformation method. We devote the remainingsections to the technical discussion of these three components.6.1 BackgroundIn this section we study two essential factors related to the real-time task slicing method, namely thecharacteristics of discrete control software and xed priority preemptive scheduling. Since discrete51
Task Set,Priority Ordering Algorithm(Schedulability Tests)Tasks To Be SlicedSchedulablePriority Order Task Slicersuccess failurePeriod ConstraintsFigure 6.2: Generalized Slicing Method for Schedulability Tuningcontrol software possesses many representative properties that can be found in other applications(e.g., multimedia, vision, etc.), our approach can be easily adopted to other types of real-timesystems.6.1.1 Characterization of Discrete Control SoftwareMany discrete control algorithms possess computations that t a xed-rate algorithm paradigm [28],i.e., control-loops which execute repetitively with xed periods. During each period, the physicalworld measurement data is sampled, and then actuator commands are computed. Meanwhile, aset of states is updated based on the current state and the sampled data.The dynamic behavior of discrete control systems can often be expressed by the followingequations. Outputk = g(Statek; Inputk)Statek+1 = h(Statek ; Inputk)In these equations, Inputk, Statek, and Outputk respectively represent the input, current state,and output of the kth period, while g is an output generation function and h is a state evolutionfunction.Since control equations are thought of as simultaneous relationships (and not as a computationprocedure), there are usually many valid computational orderings. The usual practice is to choosea single ordering, and then to code it up as a cyclic control-loop whose kth iteration is rendered52
Task at the kth periodInputk OutputkStatek Statek+1Figure 6.3: Task Instance at the kth Periodin Figure 6.3. The actual loop structure is often driven by one's personal programming style, orperhaps the availability of generic code modules. But regardless of the choice (unless the underlyingcontrol laws are stateless), g and h mandate key precedence constraints, denoted by \":Inputk  OutputkStatek  OutputkInputk  Statek+1Statek  Statek+1The typical way to enforce these constraints is to use the \code-based" semantics, and ensure thateach iteration of the control-loop completes by the end of its period. This means that the k + 1stiteration starts only after the kth iteration ends. Figure 6.4 illustrates the eect.
k   1st kth period k + 1st k + 2ndInputk OutputkFigure 6.4: Dynamic Behavior of Periodically Implemented Control-Loop53
6.1.2 Fixed-Priority Preemptive SchedulingIn any nontrivial system, there are usually many such tasks that must share the CPU and otherresources. Thus they must be scheduled in a way that allows each of them to adhere to theirtiming constraints. Fixed-priority, preemptive scheduling algorithms are well-suited for controldomain applications, not only because they possess periodic behavior, but also because ecientschedulability tests can be applied. Rate-monotonic scheduling, originally developed by Lui andLayland, is the rst well-known algorithm of this kind. In their seminal paper [34] they proposeda priority assignment algorithm, in which a task with the shorter period is assigned the higherpriority (hence the name rate-monotonic scheduling (RMS)). They also showed that such priorityassignment is optimal in a sense that whenever it fails to nd a feasible priority ordering, neithercan any other static priority algorithm. However, their algorithm is applicable only to the periodictask model where tasks have xed periods, deadlines are equal to periods, and tasks are totallyindependent of each other.Recent research has made signicant enhancements to this model which relaxes the originalrestrictions. In [31] Leung and Merrill showed that deadline monotonic priority assignment is alsooptimal where deadlines are shorter than periods. In [45] Sha et al. presented two protocols whichenable tasks to interact via shared resources, while still guaranteeing the tasks' deadlines. Mostrecently, a group of researchers at the University of York developed a set of analytical techniqueswhich can provide schedulability tests for broad classes of tasks, including those whose deadlinesare greater than their periods. [5, 51, 50].In this dissertation work we choose the York model, mainly because of its generality. Thefollowing notation is used in the remainder of this chapter.   = f1; 2; : : : ; ng denotes a set of n tasks to be scheduled. Ti denotes the period of task i. Di denotes the deadline of task i. ci denotes the worst-case execution time of i.The xed priority scheduling theory of York is based on response time analysis. The responsetime of task i is dened as the time interval between when a request for i arrives, and when inishes its execution servicing the request. If we can conrm that the maximum response time ofi is no greater than Di, we can guarantee that i will meet its deadline even in the worst-case. Werst consider the simpler case where Di  Ti. 54
Let Ri denote the maximum response time of task i. Then Ri is computed as shown below.Ri = ci + Xj2hp(i)dRiTj ecj (Eq6.1)where hp(i) is the set of higher priority tasks than i. Observe that Ri is composed of two com-ponents, namely execution time ci and interference. The interference, the second term of Eq 6.1,is the amount of time during which i is preempted by the higher priority tasks in hp(i) since thearrival of its request. As Eq 6.1 is a recurrence equation on Ri, an iterative algorithm computesRi by initially assigning it ci, and then getting a new value until it converges on a xpoint.On the other hand, where there exists a task i such that Di > Ti, Eq 6.1 is not sucient. Thisis because uncompleted iterations of i can now interfere with the current one. In this case thefollowing general equation is used instead as discussed in [51].Ri = maxq=0;1;2;:::fri;q   q  Tig (Eq6.2)where ri;q = (q + 1)ci + Xj2hp(i)dri;qTj ecjConsider the case of three periodic tasks, where the source of task 2 is given in Figure 6.5.Task Execution Time Period Deadline1 c1 = 400 T1 = 1000 D1 = 10002 c2 = 400 T2 = 1600 D2 = 16003 c3 = 570 T3 = 2500 D3 = 2500Since the periods are equal to the deadlines, rate-monotonic priority assignment is a natural choice.In the above table the row order corresponds to the priority order; i.e., 1 is assigned the highestpriority. We can carry out the response time analysis for these tasks using Eq 6.1 as follows:For 1: R1 = 400 < D1 = 1000For 2: R2 = 400 + d800=1000e400 = 800 < D2 = 1600For 3: R3 = 570 + d2570=1000e400+ d2570=1600e400 = 2570 > D3 = 2500We observe that the two high priority tasks 1 and 2 are schedulable, while 3 is not. (R3 isgreater than D3 when 3 runs at priority 3.) In an eort to make the task set schedulable, wemight try some hacking: e.g., by promoting 3 to the highest priority level. Although this makes3 schedulable, it does not achieve the desired schedulability, since 2 will now be unschedulable.55




1 2 21 1T3 = 25003 3 3Figure 6.6: Simulated Time Line for the Example Task SetReal-time task slicing is based on two simple observations; that is, (1) the current practice isto assume that an entire control-loop must nish by its deadline, but (2) the high-level TCELsemantics mandates only the observable event operations be nished within the task's time frame.The key idea of the task slicing method is as follows. We decompose a task  into two subtasks:one containing all observable event operations, and the other all remaining local operations. Wecall the former the IO-hander and the latter the state-update component, and we denote them by IO and State, respectively. Figure 6.7 demonstrates the decomposition of the control-loop taskoriginally shown in Figure 6.3.After the decomposition, we ensure that the IO-handler subtask will execute within its allowabletime frame. On the other hand, we may postpone the execution of the state-update subtask underthe worst-case task phasing. Finally, we maintain precedence constraints between  IO and State,originally induced by the task's data and control dependences.The task decomposition itself is carried out by the program slicing technique. As we stressedabove, we put the greatest emphasis on preserving the timing behavior of observable events andthe precedence constraints derived in Subsection 6.1.1. Before presenting the systematic slicingprocedure we describe our approach using our example task set.Assume that slicing 2 yields the greatest benet in schedulability. We decompose 2's codeinto IO-handling  IO2 and state-update State2 , as shown in Figure 6.8. Their computation times areseparately calculated as follows: cIO2 = 2:2ms; cState2 = 1:9msNote that the sum of the two execution times is slightly greater than the original execution time57
IO Handler State-Update IO HandlerInputk OutputkStatek Statek+1Figure 6.7: Decomposed Task at the kth Period
/ Subtask  IO2 /input(Sensor, &data);c = !null(data);if (c)f t1 = F1(state);t3 = F3(data);t4 = F4(data);cmd = F6(t1, t3, t4);output(Actuator, cmd);g
/ Subtask State2 /if (c)f t2 = F2(state);state = F5(t1, t2, t3);gstatus dump(\logle", cmd, state);Figure 6.8: Two Decomposed Subtasks of Task 258
4.0ms of 2.To enforce that the precedence constraints between the subtasks, we rewrite them for 2: (1)the ith instance of  IO2 must nish before the ith instance of State2 ; and (2) the kth instance ofState2 must nish before the k + 1st instance of  IO2 starts. Thus, the scheduling and dispatchingsubsystems must guarantee the following execution behavior of 2. IO2 ! State2 !  IO2 ! State2 ! : : :The net result of the transformation of 2 is as follows: within the worst-case time interval of 2T2,we may only see one whole instance of State2 .Remainder of This Chapter. Throughout the remainder of this chapter we discuss the gen-eralized slicing method for schedulability tuning, explaining the details of the three components.In Section 6.2 we present the program slicing algorithm which is the crux of our transformation.In Section 6.3 we discuss new schedulability tests, and then show how they are used, by way of amotivating example. In Section 6.4 we introduce a priority ordering algorithm that is capable ofmaking feasible slicing decisions, as well as nding a feasible priority order for a given task set. Todemonstrate the eectiveness of this algorithm we show the result of an experiment we conductedon a task set drawn from an avionics platform.6.2 Automatic Task Decomposition by Program SlicingThe idea behind the task decomposition is, as discussed in Subsection 6.1.3, to accept a task andthen generate its two code components: one that corresponds to a subthread triggering all ob-servable events, and the other that corresponds to a subthread computing the next-state update.Straightforward as it may look, the decomposition can be a very complex compiler problem. Manyfactors make this the case, among which are intertwined threads of control, nested control struc-tures, complex data dependences between statements, procedure calls in the task code, etc. Tocope with these problems in a systematic manner, we harness a novel application of program slicing[41, 52, 53]. For the sake of brevity, we assume the following: Function calls are inlined. Loops are unrolled. The intermediate code of programs is translated into static single assignment form [8, 21, 16].59





t1=F1(state) t2=F2(state) t3=F3(data) t4=F4(data)
status_log(−, cmd, state)input(−, &data)
output(−, cmd)state =F5(t1,t2,t3)
cmd =







output(−, cmd)cmd =F6(t1,t3,t4)Figure 6.10: Slice with respect to Criterion hL9; cmdialgorithm is given below:Algorithm 6.2 Decompose task  into  IO and State:Step 1 Compute the slice of  with respect to CIO() using Algorithm 6.1. The generatedslice =CIO() becomes  IO.Step 2 Delete from  all repeated statements of  IO except for the conditional statements.The remaining code becomes State.Figure 6.8 shows the two subtasks  IO2 and State2 of 2 computed by Algorithm 6.2 with slicingcriteria CIO() = fhL1; datai; hL9; cmdig.6.2.2 Assigning Times to SubtasksProgram slicing may well increase worst-case execution times of tasks for a number of reasons: (1)control structures are replicated and will be executed twice; (2) splitting a basic block may increasethe number of register load and store operations [2]; and (3) worst-case execution time paths of thetwo resultant subtasks may be incorrectly derived. We take a close look at the last factor, sinceit tends to take up the greatest portion of the increase, though it is not a genuine cause of theincrease, but an artifact of overly-conservative timing prediction.62







Figure 6.11: Slicing a ConditionalHowever, tighter worst-case execution time can be easily obtained by correlating the conditionalpredicate of the subtask  IO with that of the subtask State. For example, in Figure 6.11 if \IO"is executed in subtask  IO, then we know that the empty left branch will be executed in subtaskState. Thus wt( 0) can be rened as follows:wt( 0) = 2 wt(c) + maxfwt(IO); wt(ST)g63
For the given two subtasks of i, we carry out the following simple steps which are based on thenotion of predicate correlation to compute tight worst-case execution times of subtasks  IOi andStatei .Step 1 Calculate c0i by running a timing tool with the code of  0i .Step 2 Calculate cIOi by running a timing tool with the code of  IOi .Step 3 Calculate cStatei such that cStatei = c0i   cIOi .This will serve as a good rough estimate for the transformed task code. Then we can use a prolerto account for the two other factors that incur timing overhead.6.3 Scheduling Alternatives and Their AnalysesWe now sketch a simple, high-level procedure that uses slicing to transform an unschedulable taskset into a schedulable one. The input is set   of n tasks, processed in order from n to 1. Ifthe task set is found unschedulable, the slicer algorithm is invoked to decompose n into its twoconstituent threads, which then replace n in  . If the updated set is still deemed unschedulable,the procedure goes to work on n 1, and so on.In [15] we present a detailed alternative to this approach, in which tasks are processed from 1 ton; i.e., the rst task found unschedulable is selected for slicing. One can imagine other alternativesas well.However, any such scheme is critically dependent on two elements:(1) A scheduling policy that can exploit our task model; i.e. while the State threads can misstheir original deadlines, the precedence constraints between instances  IO and State must bemaintained.(2) An oine schedulability analyzer for the given scheduling policy.In this section we present an analytic approach that systematically addresses (1) and (2).A static priority approach: In [15] we present a RMS based method which enjoys a simplepriority assignment rule and analysis test for its dual-priority scheme. This is certainly one of itsstrengths. Its principal weakness is that the online component lacks the simplicity found in pure,static priority scheduling due to its semi-dynamic dual priority assignment.64
Thus the following question arises: when can a set of transformed TCEL tasks be scheduledunder a fully preemptive, static priority scheme? Burns [6] provides an answer to this questionafter identifying a simple, but essential fact about the TCEL task model. That is, whenever welet a task's deadline be greater than its period, this represents a relaxation of the classical rate-monotonic restrictions put forth in [34]. Thus the rate-monotonic priority assignment may not bethe optimal one.Given set  0 of transformed TCEL tasks 01 =  IO1 ; State1 02 =  IO2 ; State2... 0n =  IOn ; Statenit turns out the appropriate priority assignment is not only dependent on the deadlines (as in thepure deadline-monotonic model), but also on the respective execution times of each IO-handler andstate-update component. In [6] Burns presents a search algorithm to generate the feasible static-priority order { or to detect when no such order exists. Thus the approach includes the followingcomponents.Online Scheduler: This is a simple, preemptive dispatching mechanism, in which priority \ties"are broken in favor of the task dispatched rst. Thus, for example, a task's current iteration willnish before the next one starts.Oine Analyzer: The analyzer is constructive, in that it produces a feasible priority assignmentif one exists. If no such assignment exists, perhaps the programmer may have to go back to thesystem design step and reply on more aggressive system-tuning.For given task set   = f1; 2; : : : ; ng Burns' priority assignment algorithm accepts the pre-processed task set  0 = f 01;  02; : : : ;  0ng as its input, where  0i is a sliced version of i. It then beginslooking for a task that can run at the lowest priority (level n)2. After such a task, say,  0k is found,the algorithm proceeds to search the new task set  0   f 0kg for the second lowest priority task,and so on. There is an important fact that leads to the optimality of this algorithm: while a taskis being tested for priority level p, all p   1 tasks whose priorities have not yet been assigned areassumed to run at a higher priority. In xed-priority, preemptive scheduling, since a lower prioritytask can never preempt the higher priority tasks, selections made for priority levels p or below willnot aect those above p.2For n tasks, n denotes the lowest priority level, and 1 the highest.65
During this priority ordering, the schedulability test of  0i (  IOi ; Statei ) for priority p yieldsthe following two conditions:(1) Whether  IOi can always run within time Di at priority p, and(2) Whether q consecutive iterations of  0i can run within q  Ti +Di at priority p where q  1.Condition (1) is required by the TCEL's semantics; condition (2) accounts for the case where atleast one iteration of Statei is delayed. The schedulability test boils down to a check to see if themaximum response time of  IOi is no greater than Di in either case.The maximum response time (denoted by RIOi ) of  IOi with respect to hp(i) is computed asbelow: ri;q = q(cIOi + cStatei ) + cIOi + Xj2hp(i)dri;qTj ecjRIOi = maxq=0;1;2;:::fri;q   q  Tig (Eq6.3)We must subtract q  Ti from ri;q to obtain the real response time, since ri;q is measured fromthe start of the qth period prior to the current period. Although q is denoted as an unboundednumber in Eq 6.3, it can be trivially shown that there exists bounded response time RIOi as longas utilization of tasks in hp(i) and i is less than 100% [51]. The value of q is bounded below by psuch that r0i;p < p  Ti +Di wherer0i;p = (p+ 1)(cIOi + cStatei ) + Xj2hp(i)dr0i;pTj ecj :The intuition behind this is the execution pattern of  0i repeats after the entire execution of  0i tswithin its time frame.Now recall the unschedulable task set we showed in Subsection 6.1.2. Suppose that only 2 wassliced. This requires priority rearrangement among the tasks, since RMS is no longer optimal inthe transformed task model. The result of new priority ordering is as follows:3  1   02In the next section we show how this ordering is obtained. But given that we have an ordering, we66
can check it using Eq 6.3.For 3:R3 = 570 < D3 = 2500For 1:R1 = 400 + d970=2500e570 = 970 < D1 = 1000For  02:r02;1 = 2  410 + d2750=2500e570+ d2750=1000e400 = 2750 < T2 +D2 = 3200r2;0 = 220 + d1590=2500e570+ d1590=1000e400 = 1590r2;1 = 410 + 220 + d2400=2500e570+ d2400=1000e400 = 2400RIO2 = maxf1590; 2400  1600g = 1590 < D2 = 1600As a result, the task set is shown to be schedulable under the new priority assignment.6.4 Priority Ordering with Task SlicingAs discussed in Section 6.3, Burns' priority assignment algorithm expects that all tasks in   aresliced before they are submitted for priority assignment. However, it is typically not desirable toslice all tasks in the application due to execution time overhead incurred by task slicing. As anexample, consider a task set whose utilization is 0.96. Suppose that task slicing uniformly increasesthe worst-case execution times of the tasks by 5%. If we naively slice all the tasks, this will resultin utilization of 1.008 and render the task set permanently unschedulable.Moreover, since we view slicing as a means of tuning an application, it should selectively beapplied to tasks which will realize the greatest benet.To address this problem, we present an algorithm that not only nds a feasible task priorityordering, but also picks only a small subset of tasks to slice. For a given ordered list of tasks  = [1; 2; : : : ; n], we make the following denitions. sliced(i) : a boolean variable denoting whether or not i is sliced. c0i = cIOi + cStatei .We refer to a certain permutation  0 of   as a conguration, i.e.  0 denotes a priority ordering3 ofthe tasks in  , and sliced(i) is dened for all i 2  0. There are n! dierent priority orderings, and3The rst task in the list has the highest priority. 67
2n possible slicing choices. Thus the algorithm's job is to choose a task conguration among 2n n!distinct ones in an ecient manner.Denition 6.1 (Feasibility) For a given  , a conguration  0 is said to be feasible i all tasksin  0 meet their deadlines under the priority ordering and slicing choice denoted by  0.6.4.1 Feasibility TestSince Burns' algorithm accepts a pre-sliced application, it can compute the exact amount of inter-ference from the higher priority tasks when it considers a task for a priority. On the other hand,our problem is to slice for schedulability. (Recall that for a task i, ci 6= cIOi + cStatei .) Thus itseems inevitable to search the entire solution space of size 2n  n! in order to nd a feasible taskconguration.Fortunately, there are cases where we can make a slicing decision without exhaustively exploringthe search space. We rely on the response time analysis summarized by equations Eq 6.2 and Eq 6.3to nd these cases. To be specic, we make use of the following schedulability test.Feasible(L; k) if :sliced(k) then maxq=0;1;2:::frk;q   q  Tkg  Dkelse maxq=0;1;2:::fr0k;q   q  Tkg  DkwhereS = fi 2 L j sliced(i)g;rk;q = (q + 1)ck +Pj2L Sd rk;qTj ecj +Pj2Sd rk;qTj ec0j ; andr0k;q = q  c0k + cIOk +Pj2L Sd r0k;qTj ecj +Pj2Sd r0k;qTj ec0j :\:sliced(k) ^ Feasible(L; k)" denotes that the unsliced k is schedulable with tasks in L runningat higher priorities. Similarly, \sliced(k) ^ Feasible(L; k)" means that k, when being sliced, isschedulable with tasks in L.6.4.2 The AlgorithmWe now present the priority ordering algorithm in Figure 6.12. We describe below variables we usein the algorithm.   = [1; 2; : : : ; n] is the input task list which is initially ordered in nondecreasing order ofthe deadlines. Such a deadline monotonic ordering is desirable as a starting point, since most68
tasks, except for a small number of tasks to be sliced will be consistent with the ordering. The two parameters L1 and L2 of function Search collectively hold the list of tasks to bepriority-ordered. In every invocation, function Search(L1;L2) returns either the priority-ordered list of the tasks or false, if it cannot nd any feasible ordering among them.We use operator \@" to denote list append operation.In every invocation, function Search attempts to assign the last task in L1 ( in Figure 6.12)priority level jL1j+ jL2j. The condition on line (1) denotes that the algorithm has already generateda complete task conguration of  .The condition on line (2) means that the algorithm has checked all tasks in lists L1 and L2 forpriority level jL1j+ jL2j, but it can assign none of them that priority. Thus false is returned.When the condition on line (3) holds, the algorithm checks if  is feasible with current highpriority tasks. Before doing this, the algorithm attempts to nd a feasible priority ordering amongthe higher priority tasks (on line (4)). If it cannot do so,  is infeasible in the current priority level,and thus the algorithm tries other task for the current priority level, invoking tail recursion on line(12).On the other hand, if the higher priority tasks are schedulable, the algorithm checks if  isfeasible. If so, it returns a new ordering L@[ ]. Otherwise, the algorithm slices  and sees if  0 isfeasible. If  0 is deemed feasible, it returns L@[ 0].6.4.3 A Larger ExampleWe constructed a task set consisting of eighteen periodic tasks, based on the avionics applicationdescribed in [35, 51]. We added tighter deadlines to the original task set, and modied the executiontimes of some of the tasks. As a result, the task set had utilization of 0.836, and it was unschedulable.The resultant timing specication of our task set is given in Table 6.1 where the time unit is 1microseconds.We make the following assumptions for the task set, which we have found representative.1. Only small portion of a task { no more than 25 % of the original task code in terms of theworst case execution time { can be sliced.2. Slicing incurs no more that 5 % increase in a task's worst-case execution time.69
algorithm PriAssign( )beginreturn(Search( ; []));endlist function Search(L1;L2)case(1) when L1 = L2 = []: return([]);(2) when L1 = [];L2 6= []: return(false);(3) when L1 = L01@[ ]:(4) L = Search(L01@L2; []);(5) if L 6= false then(6) if Feasible(L; ) then(7) return(L@[ ]);(8) else(9)  0 = Slice();(10) if Feasible(L;  0) then(11) return(L@[ 0]);endend(12) return(Search(L01; [ ]@L2));endFigure 6.12: Algorithm for Priority Ordering with Slicing DecisionWhen we ran the priority ordering algorithm with the task set in Table 6.1, it chose to slicetasks 4; 7 and 16, and made the task set schedulable. The utilization grew slightly to 0.844. Weshow the result in Table 6.2, where \R" denotes the maximum response time of an unsliced task,and RIO and RState respectively represent the maximum response times of the two components ofan sliced task.6.5 SummaryIn this chapter we addressed the schedulability tuning problem for real-time applications in discretecontrol domains. We solved this problem with a two-tier approach. At one end was a semantics-based compiler transformation method that safely segregates two subthreads from given task code;at the other end was an extended schedulability analysis test, and its associated feasible priorityordering algorithm. While the compiler-based tool had to be carefully guided by the scheduling70
T D c cIO cState1 1000 1000 51 51 02 25000 5000 2000 1600 5003 25000 5000 1000 800 2504 40000 5000 2000 1600 5005 50000 20000 3000 2400 7506 200000 20000 3000 2400 7507 50000 25000 5000 4000 12508 59000 25000 8000 6400 20009 80000 80000 9000 7200 225010 80000 80000 2000 1600 50011 100000 80000 8000 6400 200012 100000 100000 5000 4000 125013 200000 100000 3000 2400 75014 200000 100000 1000 800 25015 200000 120000 1000 800 25016 200000 140000 2000 1600 50017 1000000 1000000 1000 800 25018 1000000 1000000 1000 800 250Table 6.1: Example Task Setcomponent in our approach, the conventional separation of concerns between compilation andscheduling could still be maintained.
71
T D c c0 R RIO RState Sliced?1 1000 1000 51 51 51 0 0 n2 25000 5000 2000 2100 2153 0 0 n3 25000 5000 1000 1050 3204 0 0 n4 40000 5000 2000 2100 0 4855 5406 y6 200000 20000 3000 3150 8559 0 0 n5 50000 20000 3000 3150 11712 0 0 n8 59000 25000 8000 8400 20171 0 0 n7 50000 25000 5000 5250 0 24375 28829 y9 80000 80000 9000 9450 38339 0 0 n10 80000 80000 2000 2100 42643 0 0 n11 100000 80000 8000 8400 71372 0 0 n12 100000 100000 5000 5250 79780 0 0 n13 200000 100000 3000 3150 96747 0 0 n14 200000 100000 1000 1050 97798 0 0 n15 200000 120000 1000 1050 98849 0 0 n16 200000 140000 2000 2100 0 139890 140441 y17 1000000 1000000 1000 1050 141492 0 0 n18 1000000 1000000 1000 1050 142543 0 0 nTable 6.2: Priority Assignment with Program Slicing
72
Chapter 7Practical Considerations and PrototypeImplementationThe TCEL paradigm helps incorporate a higher level of abstraction into real-time domains. Aswe have shown, TCEL's event-based semantics constrains only those operations that are critical toreal-time operation; i.e., the events denoted in the specication or those derived from it. As such, asource program is an appropriate representation of the designer's intentions, and it need not over-burden the system with unnecessary constraints. Moreover, the event-based semantics enables ourcompiler tools to transform the program, and helps resolve conicts between the timing constraintsand the code's actual execution time. Since this is exactly the type of dirty work that compilersdo best, a human programmer's time is probably better spent elsewhere.On the other hand, the success of our compiler tools is contingent upon the limitations of staticprogram analysis techniques. In this chapter we consider such limitations, and show how we canwork around the problems associated with them.We faced these problems in our implementation of the real-time slicer tool which uses an existingdata and control dependence analyzer. We nish this chapter with a discussion of the prototypeimplementation which we call TimeWare/SLICE.7.1 Practical Considerations.For the sake of brevity we presented the code scheduler and the program slicer in a rather idealizedform, abstracting out some implementation-related considerations. For example, we assume that agiven program is in its perfect GSA form so that all spurious dependences are eectively removed.We also assume that available timing tools provide fairly tight execution time bounds for small code73
segments. During our research these implementation-related factors revealed themselves via threesources: (1) experiences in building our prototypes, (2) experiences in dealing with large programs,and (3) discussions with colleagues who design and build production-quality real-time systems. Inthe following subsections we briey summarize several of these considerations.7.1.1 Limits of Data-Flow AnalysisBoth of the code scheduler and the program slicer heavily rely on contemporary compiler methods,including intra- and inter-procedural data and control analysis. And as with all program transfor-mation algorithms, the limitations of this enabling technology become a constraining factor of ourapproach. For example, current static data-ow analysis is incapable of disambiguating all pointeraliases (which at worst is an undecidable problem). Thus we cannot always translate the TCELsource into its corresponding \perfect" GSA form. We partially assuage the problem by adoptingtechniques such as (1) inlining procedures to avoid inter-procedural aliases; (2) rendering in GSAform only those assignments that contain statically analyzable variables; and (3) unrolling loopbodies. Of course these and similar methods will degrade the code scheduler's performance, eitherby increasing the amount of code, or by decreasing its ecacy. However, dependence analyzers areimproving at a rapid rate, and our algorithm will improve along with them. For example, if weincorporate the recent advances in loop dependence analyses such as those in the Omega Test [43],we may not have to unroll loops to slice a real-time task. We can obtain better slices for loopsusing techniques like loop distribution.7.1.2 Limits of Timing AnalysisAnother limiting factor is the diculty of achieving accurate, static timing analysis in the face ofmore complicated architectures. Quite simply, it has become incredibly dicult to use vendor-supplied benchmarks, and to model the interplay between pipelines, hierarchical caches, sharedmemories, register windows, etc. Thus with an approach like ours, it seems meaningless to predictthe execution time of a single instruction (or even a small block). First, the CPU time will probablybe too small to make a dierence in achieving feasibility, and second, the \noise" in the predictionwill be too large.Thus we have adopted a hierarchical abstraction approach to deal with time predictions. Forexample, in the program of Figure 5.3 we accounted only for the CPU-intensive function calls thatperformed complex operations, while ignoring the execution time of ner-grained instructions. Thesame approach can be used on larger-grained structures. Our experience shows the compiler should74
usually hunt for the \big-game targets," and forget about the smaller ones.However after code scheduling or program slicing is completed, it becomes imperative to verifythe result with a more sophisticated timing tool; for example, a good proler. Performing suchre-timing is especially important in a cached memory structure, where code scheduling will alwayschange the instruction alignment. We note that all modern RISC compilers re-order instructionsto some degree; thus the ecacy of any source-level timing analysis is diminishing.7.1.3 User InteractionThe above two factors argue against the fully automated code synthesis and program slicing tools.There is also a third factor, which we discussed in reference to speculative code motion. That is,programmers of production-quality, real-time systems will simply not accept a compiler technologythat \outsmarts" them, and possibly \disobeys" their intentions. They will, however, accept atool that helps tune their systems, but not at the price of sacricing traceability to their originalprograms. A simple example illustrates the importance of this. Consider what might happen ifan instruction that interacts with the environment fails to be annotated as an event (which couldeasily happen with memory-mapped IO). If the instruction is relocated outside of its source section,debugging the transformed program could become a nightmare. Even worse, a fatal timing faultmay accrue at runtime, since the program slicer may place it in the state thread, which can bedelayed past the original deadline.All of these considerations argue for a front-end that permits the programmer to interact withthe tool during system-tuning. With our transformation engine as its foundation, a graphicalinterface allows a programmer to selectively apply the transformations { and also remain informedof the results. We have implemented such a tool for program slicing. In this tool, programmersare asked to pick a slicing criterion from the tool's program source screen, rather than the programslicer doing it automatically. In this way, programmers can selectively control the application ofprogram slicing.7.2 TimeWare/SLICE: the Prototype ImplementationAs a proof of concept, we have implemented a program slicer tool we named TimeWare/SLICE.The key features of the tool are as follows: It computes a program slice with respect to a given slicing criterion.75
 It works on source code. It allows users to save a computed slice, and to carry out operations between the currentand the saved slices. These operations include intersection between slices, union of slices andsubtracting one slice from another. These commands are used to segregate special threadsfrom given task code. It provides a graphical user interface. It generates the transformed task code.In this section we discuss the implementation of TimeWare/SLICE.7.2.1 TimeWare/SLICE Tool ScreensA source program is displayed on the two tool windows: one called the primary window and theother the secondary window. Figure 7.1 demonstrates a possible layout on the tool screen of a SUNSparc station equipped with a 17 inches display.The primary window provides users with a work place where they can pick a slicing criterionand get the slicing result. The result is shown as a set of highlighted source lines on the window. Onthe other hand, the secondary window provides with a buer space where a user can temporarilystore a pre-computed slice. When a user carries out operations between two slices (one on theprimary window and the other on the secondary window), the primary window works as if it werean accumulator. That is, the primary window provides the rst operand and gets the result. Werefer to the slice on the primary window as a current slice, and the one on the secondary windowas a saved slice.7.2.2 CommandsFigure 7.2 shows a screen dump of the primary window of TimeWare/SLICE. The highlighted sourcelines correspond to a computed slice. Now we describe the command buttons that are located belowthe text window.Picking a Slicing Criterion. A slicing criterion consists of a source line and a variable name.By pressing the leftmost button of a mouse and then dragging the pointer on the primary window,the user can select a variable name. Then the selected string is highlighted.Similarly, by moving the pointer and clicking the middle button of a mouse, the user can marka source line. At that point, an arrow sign appears at the beginning of the line.76
Figure 7.1: Tool Screens of TimeWare/SLICERealtime Slice Command. The  Realtime Slice function computes a static program slice ofthe function one of whose source lines are picked as part of the slicing criterion. Figure 7.2 showsthe slice with respect to the slicing criterion h line 29, state i.Add and Delete Line Commands. When  Add Line (or Del Line ) is pressed, TimeWare/SLICEadds (or deletes) a current line. These commands are useful to manually edit a slice. If the currentslice is empty then  Del Line has no eect.Subtract Command. The  Subtract function subtracts the saved slice from the current slice,and then displays the result on the primary window. As a result, all source lines common to bothslices are removed from the current slice.Add Command. The  Add function adds the saved slice to the current slice, and then displaysthe result on the primary window. 77
Figure 7.2: Output of TimeWare/SLICE78
Intersect Command. The  Intersect function computes the intersection of the saved slice withthe current slice, and then displays the result on the primary window.Isolate Command. The  Isolate function deletes all repeated lines from the current slice, butleaves the control predicates of the current slice undeleted. The  Isolate command is used to isolatethe state-update slice from the IO slice.Transform Command. The  Transform command produces transformed code by attaching thecurrent slice to a user-specied target line. The result is copied into a le but the original coderemains intact.Save, Load and Swap Commands. These buttons are used to save the current slice onto thesecondary window, load the saved slice onto the primary window, and swap the two slices.7.2.3 ImplementationThe prototype implementation of TimeWare/SLICE is based on a dynamic program slicing toolSPYDER developed at Purdue University [1]. SPYDER is originally a program debugging toolrelying on dynamic slicing, and it consists of two components: a modied version of GCC (GNUC compiler) and GDB (GNU symbolic debugger). The role of the modied GCC is to produce theprogram dependence graph for an input program as well as the object code. SPYDER traversesthe graph to compute a static program slice.We had to tailor the implementation of SPYDER due to the following limitations.(1) It does not allow users to pick a general slicing criterion. Instead, it limits the criterion to avariable name.(2) It is a program analysis tool where our implementation actually transforms the program.(3) Its static slicer is not complete, which results in incorrect slices being produced. For example,the static data ow analyzer of the modied GCC does not detect redenitions of a globalvariable within a function, but SPYDER does not take into account of such limitation. Wehad to retool TimeWare/SLICE to conquer this problem.We have added several features to the TimeWare/SLICE implementation, as we described inthe previous subsection. We made a strong assumption that every function call has a potential to79
redene every global variable. This is due to SPYDER's limited interprocedural analysis. Whileour assumptions may result in too a large slice, users of TimeWare/SLICE can still modify theautomatically generated slice using the editing facilities. In such a case it is always better to besafe than sorry.The implementation of TimeWare/SLICE enjoys all the benets of GNU-based software. Forinstance, it can easily be compiled and run on various hardware platforms.7.3 SummaryIn this chapter we considered two fundamental technologies that enabled our tool-based method-ology. Unfortunately, these technologies still impose practical limitations and they impede thedevelopment of fully automated compiler tools. Thus it is desirable that programmers should beable to closely guide and trace tool applications. Keeping this philosophy as a principle, we de-veloped a program slicer tool that allowed user interactions and traceability of transformationsthrough a graphical front-end.
80
Chapter 8ConclusionsReal-time computing, once the realm of small, hard-coded embedded systems, has evolved into amajor computing discipline. Many practical techniques and formal theories have been developed forreal-time computing, and they are now widely available for various computer applications. The goalof our work is to provide engineers with software support where we believe they need it the most:in system-tuning. We identied three problems in this area and then presented a comprehensivesolution to them.The rst problem arises well before system-tuning, but it results in many of the inconsistenciesthat only tuning can remedy. That is, a high-level real-time program must be translated into bothscheduler-oriented tasks and associated timing constraints on the tasks. The second problem is inreconciling the dierence between task execution characteristics and the timing constraints. Thelast problem is attaining the desired real-time schedulability of the task set.Our alternative to system-tuning is based on semi-automatic, compiler-based transformationmethods as shown in Figure 8.1. Our approach consists of three ingredients: a new real-timeprogramming language and two novel compiler transformation methods.The TCEL Language. We designed the TCEL language, which possesses high-level timingconstructs for deadlines, periodicity, etc. Unlike other languages that came before it, TCEL has asemantics based on time-constrained relationships between observable events. The semantics yieldsa clear interpretation of timing behavior.Feasible Code Synthesis. Armed with the event-based semantics, we developed a programtransformation called feasible code synthesis, to help synthesize feasible task code from a TCELprogram. The objective is to correct intra-task feasibility errors.81
High-LevelReal-Time ProgramsTiming Constraints on TaskscompilationsynthesisFeasibility Tests(Intra-Task Consistency)
RequirementsSpecicationDecomposed Tasks,
Schedulability Tests(Inter-Task Consistency) Compiler-basedTransformation MethodsFeasible Code Synthesis(Chapter 5)failurefailurefailure success
renementrenement
Real-Time Task Slicing(Chapter 6)System-Tuning,Re-implementationfailureFigure 8.1: Software Development Process: Two AlternativesReal-Time Task Slicing. After the feasibility of each task is achieved the entire task set maystill be unschedulable. We developed a transformation method to address this problem, and we callit real-time task slicing. While the optimization is completely local, the improvement is realizedglobally, for the entire task set.Alan Burns at the University of York independently developed a xed-priority scheduling schemeto support the TCEL model. His analysis, in turn, led us to make a further improvement in theslicing tool; specically, we developed a priority ordering algorithm that derives only a small subsetof tasks to be sliced for schedulability. 82
8.1 Future DirectionsWe originally envisioned TCEL as a stand-alone real-time programming language. However, overtime the emphasis of this work shifted from \programming languages" to \tuning." Eventually werealized that this tuning approach was applicable to many dierent languages where the minimalrequirement is an annotation mechanism that can both distinguish observable events in a real-time program, and establish timing relationships between the events. Indeed, perhaps a graphicallanguage may prove better suited to this purpose. Though their application domain is dierent fromours, several recent multimedia tools have been developed, which allow users to specify temporalrelationships between several continuous media [26]. We plan to borrow some of the ideas used inthese languages to graphically describe timing relationships in a real-time program.When a real-time application is written in such a graphical language, the compiler has more free-dom to interleave operations because code blocks are only constrained by the specied precedencesand data dependences. We are investigating this research direction, since it accords with our thrustin a graphical user interface. Moreover, we believe a wider spectrum of compiler transformationswill be applicable to the program.There is another interesting research direction which is closely related to our tool-based approachto real-time programming. The compiler-based approach we presented in this dissertation can helpdevelopers tune applications. However, when the design is awed, no amount of tuning can helpachieve either feasibility or schedulability. Flaws are often the result of translating a high-levelspecication into a set of schedulable \tasks." For example, real-time programmers manuallybreak a complex real-time design into tasks, while trying to maintain the functional correctness, sothat the real-time schedulers can tractably guarantee the system's requirements. Consequently, thisdecomposition introduces a new class of intermediate timing constraints, which are artifacts usedto realize the original high-level requirements. If our tuning tools cannot help the programmersachieve an implementation consistent with the intermediate constraints, then the whole manualdecomposition may have to be repeated.An important research direction is to automate this process with a comprehensive designmethodology. Although our compiler-based approach provides a partial solution by means of anew programming language, we plan to push it into a higher level in the design hierarchy. Werecently laid the foundation to achieve this goal, via an automated design methodology that workshand-in-hand with real-time design tools [17]. We call this end-to-end design.The methodology links two more phases in the design hierarchy, i.e., a high-level componentderives intermediate constraints using the low-level tuning tools. The result is a way to aid pro-83
grammers in decomposing tasks and deriving the intermediate rate constraints.We believe that this type of strategy can signicantly streamline the design process, since itsupports a variety of low-level resource-specic considerations early on in the life-cycle. We arecurrently investigating a full-scale version of this method in order to incorporate network commu-nication, as well as the load on each processor.
84
Bibliography[1] H. Agrawal, R. DeMillo, and E. Spaord. Debugging with dynamic slicing and backtracking.Software Practice and Experience, 23(6):590{616, June 1993.[2] A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools. AddisonWesley Publishing Company, 1986.[3] A. Aiken and A. Nicolau. A development environment for horizontal microcode. IEEE Trans-actions on Software Engineering, pages 584{594, May 1988.[4] F. Allen, B. Rosen, and K. Zadeck. the forthcoming Optimization in Compilers. AddisonWesley Publishing Company, 1992.[5] N. Audsley. Optimal priority assignment and feasibility of static priority tasks with arbitrarystart times. Technical Report YCS 164, Department of Computer Science, University of York,England, December 1991.[6] A. Burns. Fixed priority scheduling with deadlines prior to completion. Technical Report YCS212 (1993), Department of Computer Science, University of York, England, October 1993.[7] J.-D. Choi, R. Cytron, and J. Ferrante. On the ecient engineering of ambitious programanalysis. IEEE Transactions on Software Engineering, 20(2):105{114, February 1994.[8] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and F. Zadeck. Eciently computing staticsingle assignment form and the control dependence graph. ACM Transactions on ProgrammingLanguages and Systems, 13:451{490, October 1991.[9] B. Dasarathy. Timing constraints of real-time systems: Constructs for expressing them,method for validating them. IEEE Transactions on Software Engineering, 11(1):80{86, Jan-uary 1985.[10] K. Ebcioglu and A. Nicolau. A global resource-constrained parallelization technique. In In-ternational Conference on Supercomputing, pages 154{163. ACM Press, June 1989.85
[11] J. Ferrante and K. Ottenstein. The program dependence graph and its use in optimization.ACM Transactions on Programming Languages and Systems, 9:319{345, July 1987.[12] J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactionson Computers, 30:478{490, July 1981.[13] M. Garey and D. Johnson. Computer and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.[14] F. Gasperoni. Compilation techniques for VLIW architectures. Technical Report RC14915(#66741), IBM T. J. Watson Research Center, September 1989.[15] R. Gerber and S. Hong. Semantics-based compiler transformations for enhanced schedulabil-ity. In Proceedings of IEEE Real-Time Systems Symposium, pages 232{242. IEEE ComputerSociety Press, December 1993.[16] R. Gerber and S. Hong. Compiling real-time programs with timing constraint renement andstructural code motion. IEEE Transactions on Software Engineering, 1995. To Appear.[17] R. Gerber, S. Hong, and M. Saksena. Guaranteeing end-to-end timing constraints by cali-brating intermediate processes. In Proceedings of IEEE Real-Time Systems Symposium, pages192{203. IEEE Computer Society Press, December 1994. Also to appear in IEEE Transactionson Software Engineering.[18] P. Gopinath and R. Gupta. Applying compiler techniques to scheduling in real-time systems. InProceedings of IEEE Real-Time Systems Symposium, pages 247{256. IEEE Computer SocietyPress, December 1990.[19] M. Harmon, T. Baker, and D. Whalley. A retargetable technique for predicting executiontime. In Proceedings of IEEE Real-Time Systems Symposium, pages 68{77. IEEE ComputerSociety Press, December 1992.[20] P. Havlak. Interprocedural Symbolic Analysis. PhD thesis, Department of Computer Science,Rice University, May 1994.[21] S. Hong and R. Gerber. Compiling real-time programs into schedulable code. In Proceedings ofthe ACM SIGPLAN '93 Conference on Programming Language Design and Implementation.ACM Press, June 1993. SIGPLAN Notices, 28(6):166-176.[22] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graph. ACMTransactions on Programming Languages and Systems, 12:26{60, January 1990.86
[23] Y. Ishikawa, H. Tokuda, and C. Mercer. Object-oriented real-time language design: Constructsfor timing constraints. In Proceedings of OOPSLA-90, pages 289{298, October 1990.[24] F. Jahanian and A. Mok. Safety analysis of timing properties in real-time systems. IEEETransactions on Software Engineering, 12(9):890{904, September 1986.[25] K. Kenny and K. Lin. Building exible real-time systems using the Flex language. IEEEComputer, pages 70{78, May 1991.[26] M. Kim. Hyperstories: Combining time, space and asynchrony in multimedia documents.Technical Report RC 19277 (#83726), IBM T. J. Watson Research Center, October 1993.[27] E. Kligerman and A. Stoyenko. Real-Time Euclid: A language for reliable real-time systems.IEEE Transactions on Software Engineering, 12:941{949, September 1986.[28] J. Krause. GN&C domain modeling: Functionality requirements for xed rate algorithms.Technical Report (DRAFT) version 0.2, Honeywell Systems and Research Center, December1991.[29] I. Lee, P. Bremond-Gregoire, and R. Gerber. A process algebraic approach to the specicationand analysis of resource-bound real-time systems. IEEE Proceedings, 82(1), January 1994.[30] I. Lee and V. Gehlot. Language constructs for real-time programming. In Proceedings of IEEEReal-Time Systems Symposium, pages 57{66. IEEE Computer Society Press, 1985.[31] J. Leung and M. Merill. A note on the preemptive scheduling of periodic, real-time tasks.Information Processing Letters, 11(3):115{118, November 1980.[32] S. Lim, Y. Bae, C. Jang, B. Rhee, S. Min, C. Park, H. Shin, K. Park, and C. Kim. Anaccurate worst case timing analysis for risc processors. In Proceedings of IEEE Real-TimeSystems Symposium, pages 97{108. IEEE Computer Society Press, December 1994.[33] K. Lin and S. Natarajan. Expressing and maintaining timing constraints in FLEX. In Pro-ceedings of IEEE Real-Time Systems Symposium. IEEE Computer Society Press, December1988.[34] C. Liu and J. Layland. Scheduling algorithm for multiprogramming in a hard real-time envi-ronment. Journal of the ACM, 20(1):46{61, January 1973.[35] C. Locke, D. Vogel, and T Mesler. Building a predictable avionics platform in ada: A casestudy. In Proceedings of IEEE Real-Time Systems Symposium, pages 181{189. IEEE ComputerSociety Press, December 1991. 87
[36] T. Marlowe and S. Masticola. Safe optimization for hard real-time programming. In SecondInternational Conference on Systems Integration, pages 438{446, June 1992.[37] M. Merritt, F. Modungo, and M. Tuttle. Time-Constrained Automata. In CONCUR '91,August 1991.[38] A. Nicolau. Parallelism, Memory Anti-aliasing and Correctness Issues for a Trace SchedulingCompiler. PhD thesis, Yale University, June 1984.[39] V. Nirkhe. Application of Partial Evaluation to Hard Real-Time Programming. PhD thesis,Department of Computer Science, University of Maryland at College Park, May 1992.[40] V. Nirkhe and W. Pugh. Partial evaluator for the Maruti hard real-time system. In Pro-ceedings of IEEE Real-Time Systems Symposium, pages 64{73. IEEE Computer Society Press,December 1991.[41] K. Ottenstein and L. Ottenstein. The program dependence graph in a software developmentenvironment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Sympo-sium on Practical Software Development Environments, pages 177{184, May 1984.[42] C. Park and A. Shaw. Experimenting with a program timing tool based on source-level timingschema. In Proceedings of IEEE Real-Time Systems Symposium, pages 72{81. IEEE ComputerSociety Press, December 1990.[43] W. Pugh and D. Wonnacott. Eliminating false data dependences using the Omega test. InProceedings of the ACM SIGPLAN '92 Conference on Programming Language Design andImplementation. ACM Press, June 1992.[44] D. Whalley R. Arnold, F. Mueller. Bounding worst-case instruction cache performance. InProceedings of IEEE Real-Time Systems Symposium, pages 172{181. IEEE Computer SocietyPress, December 1994.[45] L. Sha, R. Rajkumar, and J. Lehoczky. Priority inheritance protocols: An approach to real-time synchronization. IEEE Transactions on Software Engineering, 39:1175{1185, September1990.[46] U. Shankar. A simple assertional proof system for real-time systems. In Proceedings of IEEEReal-Time Systems Symposium, pages 167{176. IEEE Computer Society Press, December 1992.[47] A. Shaw. Reasoning about time in higher level language software. IEEE Transactions onSoftware Engineering, pages 875{889, July 1989.88
[48] M. Smith, M. Horowitz, and M. Lam. Ecient superscalar performance through boosting.In Fifth International Conference on Architectural Support for Programming Languages andOperating Systems, pages 248{259. ACM Press, October 1992.[49] A. Stoyenko. A schedulability analyzer for Real-Time Euclid. In Proceedings of IEEE Real-Time Systems Symposium, pages 218{227. IEEE Computer Society Press, December 1987.[50] K. Tindell. Using oset information to analyse static priority pre-emptively scheduled tasksets. Technical Report YCS 182 (1992), Department of Computer Science, University of York,England, August 1992.[51] K. Tindell, A. Burns, and A. Wellings. An extendible approach for analysing xed priorityhard real-time tasks. The Journal of Real-Time Systems, 6(2):133{152, March 1994.[52] G. Venkatesh. The semantic approach to program slicing. In Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, June 1991.[53] M. Weiser. Program slicing. IEEE Transactions on Software Engineering, 10:352{357, July1984.[54] V. Wolfe, S. Davidson, and I. Lee. RTC: Language support for real-time concurrency. InProceedings of IEEE Real-Time Systems Symposium, pages 43{52. IEEE Computer SocietyPress, December 1991.[55] N. Zhang, A. Burns, and M. Nicholson. Pipelined processors and worst case execution times.The Journal of Real-Time Systems, 5(4), October 1993.
89
