Compiling Real-Time Programs with Timing Constraint Refinement and 
Structural Code Motion by Gerber, Richard & Hong, Seongsoo
Compiling Real-Time Programs with Timing ConstraintRenement and Structural Code Motion Richard Gerber and Seongsoo HongDepartment of Computer ScienceUniversity of MarylandCollege Park, MD 20742(301) 405-2710rich@cs.umd.edu sshong@cs.umd.eduUniversity of Maryland Technical ReportUMD CS-TR-3323, UMIACS-TR-94-90July 1994AbstractWe present a programming language called TCEL (Time-Constrained Event Language),whose semantics is based on time-constrained relationships between observable events. Such asemantics infers only those timing constraints necessary to achieve real-time correctness, withoutover-constraining the system. Moreover, an optimizing compiler can exploit this looser semanticsto help tune the code, so that its worst-case execution time is consistent with its real-timerequirements.In this paper we describe such a transformation system, which works in two phases. Firstthe TCEL source code is translated into an intermediate representation. Then an instruction-scheduling algorithm rearranges selected unobservable operations, and synthesizes tasks guar-anteed to respect the original event-based constraints.Keywords: Real-time, programming languages, compiler optimization, code scheduling,single-static assignment, timing analysis, trace scheduling, code motion.This research is supported in part by ONR grant N00014-94-10228, NSF grant CCR-9209333, and an NSF YoungInvestigator Award CCR-9357850. A preliminary abstract of this material appeared in the Proceedings of the ACMSIGPLAN 93 Conference on Programming Language Design and Implementation (June 1993).
1 IntroductionDeveloping a real-time system involves balancing a delicate equation. On one side are the functionalrequirements, which dene valid translations from inputs to outputs. As such they are realized byprograms, or consumers of time. On the other side are the temporal requirements, which placeupper and lower time bounds between the inputs and outputs. Thus the requirements constraintime. When this equation fails to get balanced, the result is often a costly and arduous processof system tuning. This typically involves multiple, painful phases of instrumentation and hand-optimization. Additional measures may include re-coding key subsystems in assembly language,o-loading functions in programmable logic, or perhaps redesigning the system altogether.Several programming languages help manage the requirements side of the design equation;examples are [13, 17, 19, 23, 27]. These languages provide programmers with a convenient meansof postulating timing constraints within a program's text. The constraints are, in turn, conveyed tothe real-time scheduler as a directive, or perhaps replaced by kernel calls to be invoked at runtime.In this paper we present an automated methodology to help balance the other side of the designequation. This is done via two interrelated factors: a programming language and a compiler trans-formation engine. The real-time language is called TCEL (Time-Constrained Event Language),which contains timing constructs not unlike those in the abovementioned languages. However itdiers signicantly, in that its semantics is based on the time-constrained relationships betweenobservable events. Since this imposes a \loose" interpretation of the constraints, a compiler canhelp rearrange the unobservable code to aid in the tuning process.The distinction between events and local operations is not found in most real-time programmingmethodologies. Indeed, the standard approach is to establish timing constraints on blocks of code,without discriminating between the instructions within the code itself.We have found this distinction to be quite useful. Since any relevant instruction can be anno-tated as event, the approach can be extended to most notions of \communication." For example,an event can be a message-passing operation, an access to memory-mapped I/O, an instruction thatinduces side-eects on other tasks, or for that matter, a reference to any designated function callor variable. (For simplicity, in the sequel we assume that only \send" and \receive" operationsare observable.)Consider the TCEL program fragment below, which receives sensor data, delays, receives moredata; then it transforms the data into a command and sends out the result. The nal send musttake place within 4.0 ms of receiving the original message.1
L1: doL2: receive(p,&obj coords1);L3: start after 3.5 ms nish within 4.0 msL4: fL5: receive(p,&obj coords2);L6: r1 = F(obj coords1);L7: r2 = G(obj coords2);L8: next cmd = H(r1,r2);L9: send(q,next cmd);gIn the usual interpretation of this program, statements L5-L9 would always execute after the delay;i.e., the after construct would be \executed" like a \sleep" command in Unix. TCEL's semantics,on the other hand, only induce timing constraints between the three event-triggering instructions:L2, L5 and L9. As for the unobservable statements L6-L8, their execution is bound only by naturalcontrol and data dependences.This looser semantics yields an immediate benet: if the execution times of L6-L8 conict withthe 4ms deadline, the unobservable code can be moved to help tune the program to its hardwareenvironment. Indeed, perhaps all (or part) of L6 can be executed while the program delays. Itmay be possible to specialize parts of L7 and L8 to do the same; perhaps some pre-computationscan even be executed before L1. In performing these transformations, the observable events act as\semantic markers," denoting the places where code can be moved.In this paper we provide a means to carry out such transformations in a semi-automated fashion.Specically, we are concerned with correcting feasibility faults, in which execution requirementsconict with the real-time constraints. While this objective is straightforward, achieving it isquite a dicult problem. Indeed, we show that even in the case of a basic block, determiningthe transformation strategy is NP-hard. And the situation gets much more complicated whenthe program possesses a branching structure, i.e., when actual execution paths are determined atruntime.Thus we take a greedy approximation approach, which works in several phases. First theTCEL source is translated into a single-static-assignment (SSA) representation [3], whose namingconventions help isolate the \worst-case" execution paths. Next the code is decomposed into severalblocks, and equations are generated to constrain their start and nish-times. Finally, a variant oftrace-scheduling [6] is used to relocate the unobservable code, and hopefully attain feasibility.The remainder of this paper is organized as follows. In the following section we survey relatedwork in programming languages, semantics and optimization methods for real-time systems. Then,in Section 3 we present the syntax and semantics of TCEL, and follow it by Section 4, in which weintroduce some preliminary notations. In Section 5 we give an overview of the scheduling problem,and briey sketch our solution strategy. In Section 6 we discuss the decomposition phase, andthe way we derive our code-based timing constraints. In Section 7 we present the code scheduling2
algorithm, which synthesizes feasible code from infeasible sections. We conclude in Section 8 withremarks on the enabling technologies we utilize, and the way we approach some of their limitations.2 Related WorkTCEL's semantics was inspired by a principle commonly applied in formal methods. That is, whenreasoning about a real-time concurrent system it is often useful to consider only \events of interest,"and to abstract away local-state information. Indeed, almost all formal models ease this processby making some distinction between an \event" and a corresponding \action." For example, inReal-Time Logic [14], events are instantaneous { and require no resources { while actions consumenonzero time. Similar distinctions exist in RTRL [4], Timed IO Automata [21], ACSR [16], and inalmost every formal approach to real-time.It therefore seemed natural to extend this common technique to a \full-blown" real-time pro-gramming language, in which the \events" correspond to actual IO operations within C code. Alogical consequence was the ability to exploit this looser semantics, and to use compiler transfor-mations to move unobservable instructions out of over-constrained code blocks.Most other real-time languages do not make such a distinction, and instead place constraintson the boundaries of code blocks. Two paradigms are used in these languages: either constraintsare expressed directly in the program itself (as in [19, 17, 27]), or they are postulated in a separateinterface, and then passed to the scheduler as directives. A common language-based approach (rstpresented in [17]) is to provide constructs such as \within t do f: : :g," \at t do f: : :g" and \aftert do f: : :g." An alternative, taken in [19], is to set up linear constraint expressions on the the starttimes and deadlines of code blocks. We have borrowed from both approaches: in the TCEL sourcewe use the higher-level constructs, while in our intermediate code we make use of the constraintrepresentation. But in TCEL the semantics is quite dierent, as it establishes constraints betweenthe observable events within the code, and not on the code's textual boundaries.There have been other compiler-based approaches to real-time programming [9, 10, 12, 23, 20].These approaches, while addressing dierent problems associated with real-time programming,share a common goal, namely enhancing the predictability and schedulability of programs. In[10] a compiler classies application program on the basis of its predictability and monotonicity,and creates partitions which have a higher degree of adaptability. The objective is to produce atransformed program possessing a smaller variance in its execution time. In [23] a partial evaluatoris applied to a source program, which produces residual code that is both more optimized and moredeterministic. In [28] an approach to speculative execution is postulated for distributed real-timesystems. This is tangentially similar to our application of \speculative transformations," since bothbreak control-dependences that are predicated on inputs. The principal objective in [28], however,is quite dierent, in that \shadow threads" are forked o to execute on available resources.3




















E5Figure 1: Typical Flow Graph.such events executed either \rst" or \last." For example, consider the fragment from a typicalow graph in Figure 1.Depending on the path taken, the last event executed in the reference block may be either E1or E2. Similarly, the rst event in the constraint block will be E3 or E4, while the last event will beeither E4 or E5. To denote such possibilities, we introduce two mappings FIRST and LAST fromcode blocks to sets of events. That is, LAST(RB) = fE1; E2g, FIRST(CB) = fE3; E4g andLAST(CB) = fE4; E5g. Thus, the \do" construct introduces two potential constraints betweenan executed event from LAST(RB) and another from FIRST(CB), as well as one constraint betweentwo executed events from LAST(RB) and LAST(CB) each.The second real-time construct denotes a statement with cyclic behavior of a positive periodicity:every p [while hconditioni ][ start after tmin ] [ start before tmax1 ] [nish within tmax2 ]hconstraint blockiAs long as the \while" condition is true, the observable events in the constraint block executeevery p time units. Akin to an untimed while-loop, when the condition evaluates to false thestatement terminates. However, unlike the untimed counterpart, event operations cannot be partof the condition. In its real-time behavior, the interpretation of the \every" construct is similarto that of \do." For example, assume that the statement is rst scheduled at time t, and thatthe \while" condition is true for periods 0 through i. The periodic constraints established bythis statement are depicted in Figure 2, where the time-line shows the rst two instances of thestatement. 5
observable event occurrence
t t + p t + 2pt + tmint + tmax1 t + tmax2 t + p + tmint + p + tmax1t + p + tmax2 t + ipFigure 2: Behavior of Periodic Timing Construct.Examining the time-line, we see that the every statement is released at time t, and that withinthe rst frame, the rst observable event (denoted by an arrow) occurs between t + tmin andt+ tmax1. Similarly, the rst frame's last event occurs before t+ tmax2. Generalizing, the followingconstraints are induced for period i: start after tmin: The rst event executed in the CB occurs after t + (i  1)p+ tmin. start before tmax1: The rst event executed in the CB occurs before t+ (i  1)p+ tmax1. nish within tmax2: The last event executed in the CB occurs before t + (i  1)p+ tmax2.As we have stated, timing constructs may be arbitrarily nested. Consider the program inFigure 3(A), which is a (very gross) 2-dimensional abstraction of an aircraft navigation/controlloop. A set of route coordinates are maintained in the array \GOAL," which is maintained byanother module. The TCEL program's role is to (1) sample the aircraft's current coordinates, its(true) heading, roll, and its ground speed; (2) get the next route coordinate to visit; (3) computethe relative attitude between the heading and the coordinate; and (4) adjust the course by updatingthrottle and roll. Adjustments are made in discrete increments, and are contingent on the currentroll and velocity, as well as the amount that the course must be changed.The timing constraints are induced as follows:(1) Control updates are made periodically, with rate 50/second.(2) In order to give the actuators time to get updated (and for the craft to respond accordingly),all updates must be made within the rst 5 ms of each period.(3) Velocity (ground speed) is obtained via a \request-response" protocol from an external unit;the response arrives with maximum latency of 0.75 ms.(4) To correlate ground speed with outputs, all throttle and ap updates must be made withinof 3.1 ms of actual ground speed sample. In the best case this may be made upon issuing therequest. 6
If the specication mandated additional timing constraints, clearly we could employ furtherlevels of nesting to achieve them. For example, suppose we desired to add a fth timing requirementto the four listed above:(5) The nal two outputs must be correlated within 0.5 ms of each other. (This is not unrealistic,since the two outputs control are coordinated to eect the angular adjustment.)To accomplish this we would replace the sequential composition with an additional nested TCELstatement: do output(THROT, throttle);nish within 0.5 msoutput(FLAP Cntrl, wap);The net runtime eect would simply be a renement of the potential behaviors; i.e., the time-event relationships exhibited by the altered program would be a subset of those in the originalversion.4 Basic NotationsThe output of TCEL compiler's machine-independent pass is a ow graph, which contains theoriginal timing information. For example, Figure 3(B) shows the ow graph for our ight controlprogram in Figure 3(A), where for the sake of brevity we have left the code in its original C form.We call this structure a hierarchical ow graph, since it maintains the program's original hier-archical levels of scoping.Denition 4.1 A hierarchical ow graph HFG(B) of code block B is a directed graph(V;E; entry(B); exit(B))where V is a set of nodes, E is a set of edges representing control ow, and where entry(B) 2 V ,exit(B) 2 V denote the unique entry and exit of B, respectively.A vertex n 2 V may be either a basic block of B, an entry node, an exit node, or anotherhierarchical ow graph HFG(B').Thus all nested constructs, including loops, are reduced into single nodes. Moreover, since we donot move code out of loops { but only within their bodies { we can treat a HFG as an acyclicgraph, and ignore all back edges.Of course our hierarchical structure assumes that the program's (at) ow graph reducible [1].But since \structured" programs without unrestricted gotos lead to reducible ow graphs, withoutloss of generality we assume that our programs possess this property.7





t min max1 max2t t





























    safeDtheta(rtheta,roll)
Figure 3: (A) Source Code for Flight Controller, and (B) Flow Graph.8
For a given block B, let HFG(B) = (V;E; entry(B); exit(B)). We can easily extend traditionalcompiler terminology to HFG(B) as follows.Paths. A path (or trace) between n1; n2 2 V is denoted by \n1 !b n2," where b is the sequenceof nodes traversed between (and including) n1 and n2. When b includes a node m 2 V , we denotethis by overloading the set membership operator, i.e., as \m 2 b."We also use the path relation, but omit the actual path, to denote the existence of some pathbetween two nodes, i.e., n1 ! n2 def= 9 b : n1 !b n2Data-dependence. For a node n 2 V , we dene Def (n) and Use(n) to be sets of variablesdened by n and used in n, respectively. Thus, for nodes n1; n2 2 V , n2 is data dependent on n1(denoted \n1 d* n2") i there is a path b such that n1 !b n2 and9v 2 Def (n1) \Use(n2) :: (8n 2 V :: (n 2 b) ^ v 2 Def (n))) n = n1)For nodes n1 and n2 2 V , we say that n2 is transitively data dependent on n1 (denoted \n1 d*+n2") i there is a path n1 d* n01 d* n001 d* : : : d* n2Control dependence. For nodes n1; n2 2 V , n2 is control dependent on n1 (denoted \n1 c* n2")i n1 represents a control predicate and n2 is immediately nested within the loop or conditionalwhose predicate is represented by n1.Dependence Closure. The dependence closure for node n in the block B, denoted by \DC (n;B),"contains n and all nodes m that reach n via zero or more control or data dependence edges. It isinductively dened by the following least x-point operation:DC(n;B) = fix F (fng) [ fng; whereF (S) = fm 2 V j 9n0 2 S : m d* n0 _m c* n0gWhen we are concerned only with data dependences, we make use of data dependence closure,\DDC (n;B)" dened as below. DDC (n;B) = fm 2 V jm d*+ ng [ fng9
5 The Problem and Our SolutionTCEL provides a powerful framework for describing real-time programs at the source-code level.The timing constructs allow a programmer to express the application's real-time constraints in astraightforward manner; moreover, the event semantics provides an unambiguous interpretationof the constructs. However, two fundamental issues remain, which are endemic to all real-timesystems.1. A program may not be feasible, i.e., a single process's execution time may conict with itsown real-time constraints.2. While the program may be feasible, it may not be schedulable under any tractable real-timescheduling algorithm.As for the feasibility problem, consider the following TCEL fragment:do input(P, &m);start after 10ms nish within 20msf input(Q, &x);S; [20ms]output(R, y);gThe code's timing constraints mandate a 10ms latency between the events generated by \in-put(P, &m)" and \input(Q, &x)," as well as a 20ms deadline between the events generated by\input(P, &m)" and \output(R, y)." Meanwhile, the bracketed \20ms" denotes that the un-observable statement S requires a maximum of 20ms to execute, a bound obtained by a timinganalysis tool (e.g., [11, 18, 24, 25, 29]). Consequently, the program possesses an inherent conict,since S requires 20ms to execute while it is only allowed 10ms.We address this problem by an approach we call feasible code synthesis. In our example thiswould involve decomposing S and, if possible, moving instructions not dependent on \x" outof the overloaded section. However merely achieving feasibility may be of little help, since thetransformed code may still not be schedulable under any known methods. For example, when aprogram contains if-then-else branches, then the actual execution paths (and the events executed)are determined dynamically. But since schedulers must provide guarantees, they do not have theexibility to instantaneously, dynamically reschedule a task set whenever an event is triggered.Indeed, while an event-based semantics makes conceptual sense at the source-program level, mostreal-time schedulers only accept timing constraints on the start and nish times of tasks.Since the strategies used to achieve feasibility have a profound aect on the ultimate taskstructure, we solve these two problems together. Thus the role of the TCEL compiler is to partition10
event-driven source programs into time-constrained blocks of code, in which all of the blocks arefeasible.5.1 The Problem of Feasible Code SynthesisEven without the task partitioning component, achieving feasibility is a nontrivial problem. This isobviously the case if we allowed potentially unbounded loops (or conditional goto's), which wouldrender the problem undecidable. But since real-time programs must be amenable to worst-casetiming estimates, we assume that upper bounds can be obtained for execution times. Formally, welet \wt(S)" denote a statement's worst-case execution time, where we assume that wt(S) 2 [0;1)for any statement S. Obviously wt is implementation dependent, and its tightness is determinedby quality of the analysis tool used to generate the bound.1While the feasibility problem may be decidable for our domain, it is not necessarily trivial. Evenwhen program is structured like our example above, and where S is a basic block, simply decidingwhether the program can be made feasible is still NP-hard.The problem can be stated as follows: given a TCEL timing constructdo RB start after tmin start before tmax1 nish within tmax2 CBis it possible to transform RB and CB to meet the following constraints?(1) On any execution path, the original ordering of observable events is maintained.(2) The original data and control dependences are preserved between instructions and events.(3) The code's execution time does not conict with the timing constraints between the events.Clauses (1)-(2) imply that the transformed code must be functionally correct: events may notbe reordered, and the original relationships between input and output data must be maintained.Clause (3) means that the new code is feasible.This problem is NP-hard, due to the existence of immovable operations and data dependences.Theorem 5.1 Feasible code synthesis is NP-hard.Proof: The proof follows by a straightforward transformation from \Partition[SP12]" [7] to feasiblecode synthesis. Consider an instance (A; s) of Partition, where A = fa1; a2; : : : ; ang is a set ofelements, and where s : A 7! N is the cost of each element. Letting Pni=1 s(ai) = 2T , a partitionof A is some A0  A such that Pa2A0 s(a) = T = Pb2A A0 s(b). Determining whether such an A0exists is equivalent to determining whether the following TCEL program can be made feasible:1We return to this issue in the Conclusion. 11
doE1: input(P, &x);start after T nish within 2TfE2: output(Q, g(x));L1: x1=f1(x); [s(a1)]L2: x2=f2(x); [s(a2)]... ...Ln: xn=fn(x); [s(an)]E3: output(R, h(x1; : : : ; xn));gHere E1   E3 generate events, L1   Ln are unobservable instructions, and each line is consideredatomic (that is, a line must be relocated as a single entity). Then by the construction,(1) E2, L1   Ln are mutually data independent.(2) E2, L1   Ln are data dependent on E1.(3) E3 is data dependent on L1   Ln.If we assume that wt(Li) = s(ai) for 1  i  n, and that wt(Ej) = 0 for 1  j  3, then there existsa partition of our original set A if and only if there is a feasible transformation for the program.As for the \if" part, assume there is a partition A0  A. Then for all ai 2 A0, moving thecorresponding instruction Li between the events E1 and E2 ensures feasibility: exactly T executiontime is consumed between E1 and E2, and another T is used between E2 and E3.As for the \only if" part, assume there is a feasible transformation. Then, since 2T executiontime is used overall, the constraints mandate that at most T of it be used between E1 and E2,and the rest between E2 and E3. Thus we must move some set of instructions L  fL1; : : : Lngbetween E1 and E2, where PLj2L wt(Lj) = T . But then the corresponding elements in A form apartition.When both the constraint block and reference block consist of straight-line code, the problemis obviously in NP as well. A feasible ordering can always be \guessed" and then veried, whichconsequently yields the following corollary.Corollary: Feasible code synthesis is NP-complete for straight-line code.5.2 Solution StrategyIn proving Theorem 5.1 we used the simplest possible TCEL program, which possess just two basicblocks. In this case a feasible transformation would simply reorder the instructions, while keepingthe program's fundamental structure intact.But the situation gets signicantly more complicated when the program possesses branches,and when the events that get executed are determined at runtime. Since attaining feasibility man-dates that we statically guarantee the timing constraints along all execution paths, reordering the12
































end_S4Figure 4: The Flow Graph of a Timing Construct and its Section Division.6 Section GenerationSection generation is the process of decomposing the program into a set of \tasks" (or sections).The input is the original HFG (e.g, that portrayed in Figure 3(B)), with the output being a slightlydierent HFG, which is more amenable to real-time dispatching. This involves dividing a timingconstruct into ve code sections, as portrayed in Figure 4. As can be seen, the reference blockis decomposed into three sub-blocks. The unobservable code before the rst observable statementbecomes an interface section (S1). The code containing the observable statements becomes thereference section (S2). The unobservable code after the observable statements becomes the rstpart of the delay section (S3). Consequently, the topmost unobservable code of the constraint blockbecomes the second part of S3, and so on.6.1 Determining Section BoundariesRecall the discussion of the FIRST and LAST functions in Section 3. Since a code block maycontain complicated control structures, we require a convenient means of dening the boundaries ofS2 and S4 { the sections that contain observable events. We accomplish this by inserting \markers"in the ow graph, which consume no time and are not visible. The following marker denitionsguarantee that there are unique boundaries into and out of the sections containing observableevents. begin S2: This marker is inserted directly after the unobservable instruction most closelydominating LAST (RB) [ fexit(RB)g. 14
 end S2: This marker is inserted directly before the unobservable instruction most closelypost-dominating LAST (RB) [ fentry(RB)g. begin S4: This marker is inserted directly after the unobservable instruction most closelydominating FIRST (CB) [ fexit(CB)g. end S4: This marker is inserted directly before the unobservable instruction most closelypost-dominating LAST (CB) [ fentry(CB)g.For example, consider the constraint block in Figure 4. The unobservable node B9 post-dominates LAST(CB) and the entry node. Thus, its logical place is in the interface section S5,which is not subject to the construct's timing constraints. Hence the need for the marker end S4,which is the unique exit point for the constrained section S4.Now, let the variable S2.start correspond to the actual time that the marker begin S2 is \exe-cuted" (that is, the dispatch time of section S2), and let S2.nish correspond to the time that thesection ends. Similarly, let S4.start and S4.nish represent the start and nish times of section S4.Using these variables we can represent the section decomposition of a TCEL construct in a mannersimilar to that found in the Flex language [15].Recall the ight controller program from Figure 3. Figure 5 illustrates its constituent sections.The constraint-expression for S6 corresponds to the program's outer, periodic loop. As the programis in SSA form, -functions appear at conuence points where dierent values of the same variablein the original programmerge. For example, \dtheta4=(dtheta1,dtheta2,dtheta3)" is insertedat the place where three dierent values of dtheta merge. As a result, each use of a variable isreached by a unique assignment. Again, the bracketed numbers denote the maximum executiontimes of the corresponding operations on the targeted CPU. On modern architectures, ne-grainedoperations like simple assignments possess minuscule execution times, and cannot be measured (inisolation) by any timing tool. For the sake of presentation we assume that such instructions takezero time, and concentrate on larger-grained function calls and the like. 26.2 Deriving Code-Based Timing ConstraintsAs seen in Figure 5, the code-based timing constraints can be expressed as conjunctions of linearinequalities between start-times and nish-times of dierent sections. However, note the dierencebetween the code-based constraints and the TCEL source-level constraints: In Figure 3 the \nishwithin" deadline is 3.1ms, while in Figure 5 it is tightened to 3.0ms. There is good reason for this{ the new code-based timing constraints must be strong enough to guarantee the original semanticsof the event-based constraints. That is, they must take into account the program's execution-timecharacteristics. In general, consider the TCEL construct such as2We revisit this issue in the Conclusion. 15
S6: (S6.start[p]p20ms, S6.nish[p]p20ms+5ms)fS1: f gx1 = (gx3, gx0);gy1 = (gy3, gy0);i1 = (i3, i0);input(GPS, &x1, &y1); [.1ms]input(NAV, &theta1); [.1ms]input(IMU, &roll1); [.1ms]gS2: f output(Cntrl out, REQ VEL); [.1ms]gS3: f /* Null */ gS4: (S4.startS2.nish+0.75ms, S4.nishS2.nish+3.0ms)f input(Cntrl in, &vel1); [.1ms]c1 = GOAL[i1].passedif (c1) fgx2 = GOAL[i1].x;gy2 = GOAL[i1].y;i2 = (i1+1) % NCOORD;ggx3 = (gx2, gx1);gy3 = (gy2, gy1);i3 = (i2, i1);rtheta1 = compRelAtt(theta1, x1, y1, gx3, gy3); [.25ms]c2 = jrtheta1j < EPS;if (c2)dtheta1 = 0.0;elsefc3 = vel1 < VHIGH;if (c3)dtheta2 = rtheta1;elsedtheta3 = safeDtheta(rtheta1, roll1); [.43ms]gdtheta4 = (dtheta1, dtheta2, dtheta3);wap = compFlapw(roll1, vel1, dtheta4); [.95ms]throttle1 = compThrot(roll1, vel1, dtheta4); [.89ms]output(THROT, throttle1); [.1ms]output(FLAP Cntrl, wap); [.1ms]gS5: f / null / ggFigure 5: Flight Control Program: After Section Generation.16
do RB start after tmin start before tmax1 nish within tmax2 CBObviously, the TCEL parameters are not tight enough to guarantee the correctness of the code-based constraints. For example, if we wish to maintain the \tmax1" requirement, it is not sucientto simply mandate that S4 starts within a maximum delay of tmax1 after S2 ends (though this iscertainly necessary). We can see in Figure 4 that the event actually executed in LAST(S2) may beE1, while the event executed in FIRST(S4) may be E4. Thus the naive strategy fails to factor inthe execution times of B3 and B4.However, the event-based semantics is clear: the time between the executed event in LAST(S2)and the executed event in FIRST(S4) is at most tmax1. To guarantee that this occurs, we mustaccount for all possible execution scenarios. Specically, we must tighten the constraints, allowingfor the maximum amount of time between an event in LAST(S2) and end S2, as well as themaximum amount of execution time between begin S4 and an event in FIRST(S4). We mustsimilarly adjust tmax2. To do this, we make the following denitions: S2 def= maxfwt(p) j e 2 LAST(S2); e!p end S2g. S4 def= maxfwt(p) j e 2 FIRST(S4); begin S4!p eg.Note that S2 and S4 are sensitive to not only code's execution time characteristics, but alsochanges made to some paths between events and markers during program translation. For example,changes to paths between end S2 and a node in LAST(S2) might require re-evaluation of S2.Now the code-based timing constraints can be postulated as follows:(1) S4:start  S2:nish + Tmin (where Tmin = tmin)(2) S4:start  S2:nish + Tmax1 (where Tmax1 = tmax1  S2  S4)(3) S4:nish  S2:nish + Tmax2 (where Tmax2 = tmax2  S2)These timing constraints are strong enough to guarantee the original event-based timing constraints.(By convention, if the \start after" constraint is omitted, we consider tmin to be 0. Similarly,when either the \start before" or \nish within" constraints are missing, we consider tmax1 =1or tmax2 =1, respectively.) Returning to Figure 5, we can see that equation (3) indeed mandatestightening the original 3.1ms to 3.0ms.Now we wish to determine when (1)-(3) can be met. That is, what do these equations inferabout the program's allowable worst-case execution-time behavior? This can easily be derived if weadd precedence constraints reecting the natural ow of the program; i.e., that S4 executes afterS3, which executes after S2:(4) S2:nish + wt(S3)  S4:start(5) S4:start+ wt(S4)  S4:nishEliminating S2.nish, S4.start and S4.nish from (1)-(5), we end up with:17
Section Duration Constraint (DUR(S))S3 minfTmax1; Tmax2  wt(S4)gS4 Tmax2   TminTable 1: Timing Constraints of S3 and S4.(a) Tmin  Tmax1(b) wt(S3)  Tmax1(c) wt(S3) + wt(S4)  Tmax2(d) wt(S4)  Tmax2   TminObviously, (a) had better be true in order for the TCEL construct to make any sense. For the pur-poses of our algorithm we combine (b) and (c), yielding the following two constraints on executiontimes: () wt(S3)  minfTmax1; Tmax2  wt(S4)g() wt(S4)  Tmax2   TminThese are the necessary and sucient conditions to achieve feasibility, and they are summarized inTable 1.Returning to our example, we nd that section S4 violates its duration constraint (DUR(S4) =2:25ms),3 since wt(S4) by far exceeds it. (Adding up the time annotations yields wt(S4) = 2:82msalong the worst-case execution time path.) In the next subsection we discuss our code-schedulingtechniques which handle cases such as this, in which the duration constraints fail to hold.7 Code SchedulingThe code scheduling algorithm is inspired by a common compiler strategy used for VLIW andsuperscalar architectures [2, 5, 6, 8, 22, 26]. In such domains, an optimizing compiler exploits aprogram's inherent ne-grained parallelism, and \packs" its computations into as many functionalunits as possible. Thus the objective is to keep each unit busy, and to achieve better overallthroughput.Our problem context has an entirely dierent goal, and it cannot be solved by directly apply-ing well-known techniques such as Trace Scheduling [6] or Percolation Scheduling [2, 5]. We areconcerned not with enhancing average-case performance, but instead with ensuring feasibility. Infact, we will be satised with even increasing the program's overall execution time { as long as thetiming constraints are met.3Tmax2   Tmin = 3:0ms  0:75ms = 2:25ms 18
algorithmCode Scheduling(T) / T is a timing construct /input: the ordered set of sections fS1; S2; : : : ; S5g in Tbegind = Tmax2   Tmin;/ Schedule code from S4 into S3 /call Schedule Section(S4, S3, d, ;);recompute Tmax1; / to reect the change in S4 /if (wt(S3)  Tmin) then exit(\No scheduling needed for S3.");else d = minfTmax1; Tmax2   wt(S4)g;/ Schedule code from S3 into S1 /call Schedule Section(S3, S1, d, Def(S2));end Figure 6: Top-level Algorithm for Code Synthesis.7.1 The Top-Level AlgorithmOur approach to code scheduling is a greedy approximation, and it attempts to attain the desiredconsistency of a timing construct in a section-by-section manner. It inspects sections S4 and S3 (inreverse topological order), and checks whether they satisfy their duration constraints. If S4 violatesits constraint, the algorithm attempts to reduce its surplus execution time by moving nodes intosection S3. In turn it processes section S3, which may now contain newly moved code.To perform greedy code motion, we have adapted a technique from the approach to TraceScheduling in [6], and we use it as a component of the code scheduling algorithm. In our approach,nodes lying on paths that exceed their section's duration constraint are considered for code motion.We distinguish such paths as critical traces. Formally, a critical trace t of section S is dened as apath entry(S)!t exit(S)such that wt(t) > DUR(S). The reason we use the trace-based approach is straightforward: op-timizing to avoid hard real-time exceptions demands scheduling only the critical traces, and noothers.Figure 6 sketches the algorithm. Note that Tmax1 is recomputed after scheduling S4 and beforescheduling S3. This is mandatory, since S4 may be changed during the scheduling. Also observethat the code of S3 is moved into S1, while that of S4 is moved into S3. We disallow code frommoving into S2 because it could potentially change the value of S2, which would in turn invalidateour assumptions about DUR(S4). In order to complete the procedure in a single pass, we assumethat S2 remains constant. In reality this restriction does not seriously limit the approach: fromour experience, events in the RB typically lie in straight-line code (and thus S2 contains a singleinstruction, as in our example). 19
The top-level algorithm calls subroutine \Schedule Section," which then schedules the over-loaded section at hand. Note that when code is scheduled from S3 into S1, the variables denedin S2 are passed to the subroutine, which ensures the dependences from S2 to S3 are maintained.In the following subsection we discuss in detail the innards of this subroutine, and the strategies ituses to solve the scheduling problem.7.2 Subroutine Schedule Section(S,D,DUR(S),Vbar)For the ow graphs of source section S and destination section D, the critical trace schedulingproblem is to construct new ow graphs for S and D such that:(1) The observable nodes of S remain in S and keep their relative order on any paths in S.(2) wt(t)  DUR(S) holds for all traces t in S.(3) Execution ordering established by code's original data dependences is preserved.When code is scheduled from S3 into S1 the parameter Vbar { containing the variables dened inS2 { is required to help maintain property (3).As we have stated, the trace-based approach is attractive precisely because it allows us toconcentrate on paths which violate the duration constraints. However, a direct application ofTrace Scheduling induces a severe liability { extra code must be inserted to preserve the program'ssemantic integrity. In the parlance of instruction-scheduling, this is typically called bookkeepingcode.Consider the SSA program in Figure 7(B). Since the instruction \z1=F(x)" is free of a datadependence on the variable\y," it may be eligible to be moved into S3. (This transformation {which we develop in the sequel { is called speculative, since it breaks a control dependence.) Asshown in Figure 7(C), moving the instruction requires no additional code to maintain the program'ssemantics; SSA's naming conventions maintain the correctness.However a more aggressive policy could be carried out, which is shown in Figure 7(D). Notethat additional code may be moved without breaking data dependences; even the variable r may besplit into movable parts (i.e., r1 and r2) and the parts that depend on y (i.e., r3 and r5). However,the price we pay is the additional bookkeeping code required to maintain correctness.One obvious problem with bookkeeping is that it induces a signicant amount of extra code{ indeed, if carried to extremes, the transformations in Figure 7(D) may result in an exponentialblow-up. And in our problem context bookkeeping may have an additional, \fatal" eect:Scheduling a critical trace may insert bookkeeping code on other, non-critical traces, andthereby increase their execution times. Hence a non-critical trace may become critical.To avoid this drawback of Trace Scheduling, we use the type of transformation depicted inFigure 7(C), and we use it as aggressively as possible. The strategy involves repetitively applying20
TCEL Source SSA Form Our Approach Bookkeeping Approachdoinput(P, &x);start after t1nish within t2f input(Q, &y);if p(y) fa = E(y);z = F(x);g elsez = G(y);if q(y);r = H(z);elser = K(z);s = I(r);...g
...S2: f input(P, &x);gS3: f /* Null */ gS4: (S4.start : : : ,S4.nish : : : )f input(Q, &y);if p(y) fa1 = E(y);z1 = F(x);g elsez2 = G(y);a = (a1, a0);z = (z1, z2);if q(y)r1 = H(z);elser2 = K(z);r = (r1, r2);s = I(r);...g
...S2: f input(P, &x);gS3: f z1=F(x);gS4: (S4.start : : : ,S4.nish : : : )f input(Q, &y);if p(y)a1 = E(y);elsez2 = G(y);a = (a1, a0);z = (z1, z2);if q(y)r1 = H(z);elser2 = K(z);r = (r1, r2);s = I(r);...g
...S2: f input(P, &x);gS3: f z1 = F(x);r1 = H(z1);r2 = K(z1);gS4: (S4.start : : : ,S4.nish : : : )f input(Q, &y);if p(y) fa1 = E(y);if q(y) f gelse f gr3 = (r1, r2);g else fz2 = G(y);if q(y)r4=H(z2);elser5=K(z2);r6 = (r4, r5);ga = (a1, a0);z = (z1, z2);r = (r3, r6);s = I(r);...gFigure 7: (A) Source, (B) SSA Form, (C) Bookkeeping-free Transformations and (D) Bookkeeping
21
if c x=a+3x=a+2 x3=(x1,x2)x3=(x1,x2)if c if cx2=a+3 x2=a+3x1=a+2x1=a+2(A) (B) (C)Figure 8: Speculative Code Motion: (A) Original Code, (B) SSA Code and (C) Transformed Code.the following three steps: (i) nding a critical trace t; (ii) identifying a node n which can be movedinto the destination section D; and (iii) moving n into D, along with n's ancestor nodes requiredto maintain the program's semantics. Since our objective is to keep the amount of new code to aminimum, step (iii) translates into the following rules for moving n into D:(1) n's data dependence predecessors are moved along with n; i.e., the nodes on which n istransitively data dependent.(2) The control-dependence predecessors (i.e., the if-then-else's guarding n) are treated as fol-lows:(a) If possible they are copied into D, so that they still guard the execution of n.(b) Otherwise (as in Figure 7(C)), n will now be unguarded in its destination section D.Thus the end result of code scheduling appears as if large-grained control structures were rearranged,and hence we name the strategy structural code motion. Yet code scheduling is still trace-based,since it is driven by worst-case paths.Consider a node n to be moved into the destination section. When all of n's dependencepredecessors (both data and control) are moved (or copied) along with n, the new execution orderingis guaranteed to maintain the program's original semantics.But what are the ramications of case 2(b) above, i.e., where control dependences are broken?Consider Figure 8(A), where we assume that the condition variable \c" is dependent on an inputevent. Examining the source code, assume that we wish to move the node 
 	x=a+3 above 
 	if c .The moved instruction is executed regardless of the control-predicate's outcome; hence the name\speculative transformation." 22
Carrying out speculative transformations raises three critical issues: variable naming, executiontime and safety.Naming: Consider what would happen if the the transformation shown in Figure 8(C) wereperformed at the source level. Since x would always end up dened as a+3, one branch of theconditional would result in an incorrect state. Fortunately, the SSA form of the program ensuresthat multiply dened source variables { and their corresponding -functions { maintain the originalsemantics, regardless of where assignments are moved. Examining Figure 8(C), we see that x1 andx2 are dened sequentially. By SSA's naming conventions, the node x3=(x1,x2) ensures that x3always carries the assignment corresponding to the original source variable x.Timing: Figure 8 shows how speculative transformations can easily increase the execution timeof the destination section D. This is not necessarily harmful, since D may in fact possess sucientslack for both instructions to execute. (Indeed, D may contain an explicit delay.) But if D itselfexceeds its own duration constraint, excessive speculative transformations could make mattersworse. Thus we take the following approach: the algorithm performs speculative code motion onlywhen feasibility cannot be achieved with the non-speculative variety.Safety: Perhaps the most critical issue is the correctness of the transformed program. Afterall, the source code is written by a human programmer. When an instruction appears within thebody of a conditional (but is free of a transitive dependence on it), one should still assume thatthe programmer had a good reason for putting it there. Often the reason stems from a personalcoding style, or perhaps for readability. Also, splitting variable denitions in the style of SSA is arather unnatural practice at the source level.Referring back to Figure 7(C), we note that the \eager" execution of \F" should be safe if: (1)it contains no observable events, (2) it induces no global side-eects, and (3) it does not cause anexception. We can assume that (1)-(2) hold { otherwise \F" would not have been moved in the rstplace. However, verifying property (3) may be dicult, since there may be an invariant relationshipbetween \p" and \F" that only the programmer understands.While this seems to argue against speculative transformation, recall that our objective is toassist programmers in tuning faulty code. And production real-time programmers will nd thistype of code reordering sadly familiar, since it is usually carried out by hand, and often under thepressure of an approaching release deadline. Our technique can help in this eort, since it helpsautomate this process by (1) identifying the \good target" instructions to move, (2) by transferringthem to their \correct" places, and (3) by analyzing the results. Nonetheless, we do believe that thisshould be an interactive process (perhaps driven by a graphical front-end), in which the programmervisually checks each transformation.44We discuss this issue in the Conclusion. 23
For the sake of brevity, however, we present the algorithm in a fully automatic form. Thus weassume that any node n that can be speculatively executed is \pre-checked," and is denoted by thecondition spec(n).Unconditional and Speculative Movability. The preceding discussion leads to three classesof instructions: those that can be unconditionally moved, those that can be moved to executespeculatively, and those which cannot be moved at all. The following denitions distinguish betweenthese cases.Denition 7.1 (Unconditional Movability) Mu(S; Vbar) is the set of nodes in S that do nottrigger events, and do not use any variables in Vbar; i.e.Mu(S; Vbar) = fm 2 S jm is not an event ^ Use(m) \ Vbar = ;gThen Umove(n) denotes that we can unconditionally schedule node n from S into D:Umove(n)  DC (n; S) Mu(S; Vbar)That is, all of n's data and control dependence ancestors are also unconditionally movable { andwhen n is moved, they will be moved (or copied) as well.Denition 7.2 (Speculative Movability) Additional considerations come into play when a nodeis speculative executed. Consider the set Ms(S; Vbar):Ms(S; Vbar) = fm 2 Mu(S; Vbar) j spec(m) ^ if m is a  function then Umove(m) holdsgIf a node n is in Ms(S; Vbar) then (1) it uses no variables in Vbar, (2) it triggers no events, (3) theprogrammer has checked that it doesn't cause a local exception, and (4) if it is a  function, itcan be unconditionally moved (along with its ancestors), which obviates the need for bookkeeping.Then Smove(n) denotes that we can speculatively move node n:Smove(n)  DDC (n; S) Ms(S; Vbar)That is, when Smove(n) holds true, all of n's data-dependent ancestors can be moved too, withoutmandating bookkeeping.The Algorithm. The code scheduling algorithm is presented in Figure 7.2. It is composed ofthree stages: pre-processing, marking/deleting and post-processing. In the pre-processing stage,S's ow graph is traversed in topological order, during which the conditions Umove and Smove are24
subroutine Schedule Section(S, D, d, Vbar)input: source section S, destination section D, duration constraint dbeginforeach node n in S in topological order doevaluate Umove(n) and Smove(n);make a copy S' of S;compute t in S such that wt(t) = maxfwt(t0) j entry(S)!t0 exit(S)g;while (wt(t) > d)call Sched(t; S; S0);recompute t in S such that wt(t) = maxfwt(t0) j entry(S)!t0 exit(S)g;enddelete all unmarked nodes from S';delete all predicate nodes guarding null code from S;append S' to end of D;endsubroutine Sched(t; S; S0)beginif there is some n 2 t such that Umove(n) holds thenbeginSelect rst such n 2 t such that Umove(n) holds;foreach node m 2 DC (n; S) domark[m,S'] := true;if m is not a control-predicate then Delete m from S;endendelse if there is some n 2 t such that Smove(n) holds thenbeginSelect rst such n 2 t such that Smove(n) holds;foreach node m 2 DDC (n; S) domark[m,S'] := true;if m is not a control-predicate then Delete m from S;endendelse exit(\Unable to synthesize.");end Figure 9: The Section Scheduling Algorithm.25
evaluated. A topological traversal ensures that whenever a node n is visited, all ancestor nodesin DC(n; S) have already been processed; hence a single traversal is sucient to evaluate theseconditions. Then a \clone" S' of S is created, which is used to hold the part of the ow graph tobe transferred to D.Next the algorithm searches for a critical trace, and if one exists it invokes subroutine \Sched."Sched makes use of array \mark," each of whose entries corresponds to a node in S'. Whenever\mark[m;S 0] = true," it means that node m will be \moved" into into the destination section.Sched examines the critical trace in topological order, looking for node n such that Umove(n)is true. If such a node n exists, then closure DC (n; S) is generated; its non-predicate membersare deleted from S, while all corresponding nodes in S' are marked. If no unconditionally movableinstruction exists, then a transformation of the speculative variety is attempted. And if no movablenode is present, the program is forced to exit.At the end, if all critical traces were scheduled, the algorithm proceeds to a post-processingstage. If speculative transformation was carried out then S' will, by denition, contain branchingstructures with empty predicate nodes. In this case, the nodes on the dierent branches of thepredicate node are merged into a single block.Finally, S' is attached to the end of the destination section D. Cleaning up, the algorithm deletescontrol-predicates which guard empty nodes in section S { i.e., the \if" nodes whose correspondingbodies and -functions were completely transferred to S'.Example, Revisited. We return to our original ight controller example from Figure 5, andsubject it to the code scheduling algorithm. The end-result appears in Figure 10. In schedulingS4, \Sched" unconditionally moves the function call compRelAtt, as well as the other nodes in itsdependence closure. This reduces wt(S4) by .25ms, which now stands at 2.57ms. Since DUR(S4) =2:25ms), further reductions are made by entering the speculative transformation phase; this resultsin moving one conditional branch and the function call safeDtheta beyond the immovable control-predicate if (c3).After the transformation, the implementation satises the necessary condition for consistency,since the body of S4 requires 2.14ms in the worst-case. Now that wt(S3) = 0:68ms is less thanDUR(S3), the code scheduling successfully terminates without further scheduling S3.In addition to such an instant benet, the transformation converts possibly wasteful delay intouseful computation time, since the new code in S3 can be scheduled within the delay intervalbetween S2 and S4. 26
S6: (S6.start[p]p20ms, S6.nish[p]p20ms+5ms)fS1: f gx1 = (gx3, gx0);gy1 = (gy3, gy0);i1 = (i3, i0);input(GPS, &x1, &y1); [.1ms]input(NAV, &theta1); [.1ms]input(IMU, &roll1); [.1ms]gS2: f output(Cntrl out, REQ VEL); [.1ms]gS3: f c1 = GOAL[i1].passed;if (c1) fgx2 = GOAL[i1].x;gy2 = GOAL[i1].y;i2 = (i1+1)% NCOORD;ggx3 = (gx2, gx1);gy3 = (gy2, gy1);i3 = (i2, i1);rtheta1 = compRelAtt(theta1, x1, y1, gx3, gy3); [.25ms]c2 = jrtheta1j < EPS;if (c2)/* null */elsedtheta3 = safeDtheta(rtheta1, roll1); [.43ms]gS4: (S4.startS2.nish+0.75ms, S4.nishS2.nish+3.0ms)f input(Cntrl in, &vel1); [.1ms]if (c2)dtheta1 = 0.0;elsefc3 = vel1 < VHIGH;if (c3)dtheta2 = rtheta1;else/* null */gdtheta4 = (dtheta1, dtheta2, dtheta3);wap = compFlapw(roll1, vel1, dtheta4); [.95ms]throttle1 = compThrot(roll1, vel1, dtheta4); [.89ms]output(THROT, throttle1); [.1ms]output(FLAP Cntrl, wap); [.1ms]gS5: f / null / ggFigure 10: Flight Control Program: After Code Scheduling.27
8 Concluding RemarksThe TCEL paradigm helps incorporate a higher level of abstraction into real-time domains. Aswe have shown, TCEL's event-based semantics constrains only those operations that are criticalto real-time operation; i.e., the events denoted in the specication or those derived from it. Assuch, a source program is an appropriate representation of the designer's intentions, and it neednot over-burden the system with unnecessary constraints. Moreover, the event-based semanticsenables our scheduling tool to transform the program, and help to resolve conicts between thetiming constraints and the code's actual execution time. Since this is exactly the type of dirty workthat compilers do best, a human programmer's time is probably better spent elsewhere.Practical Considerations. For the sake of brevity we presented the code scheduler in a ratheridealized form, abstracting out some of the implementation-related considerations. During ourresearch these factors revealed themselves via three sources: (1) experiences in building a prototypeimplementation, (2) experiences in dealing with programs signicantly larger than the examples inthis paper, and (3) discussions with colleagues who design and build production-quality real-timesystems. In the following paragraphs we briey summarize several of these considerations.Limits of Data-Flow Analysis. The code scheduler heavily relies on contemporary compilermethods, including intra- and inter-procedural data and control analysis. And as with all programtransformation algorithms, the limitations of this enabling technology become a constraining factorof our approach. For example, current static data-ow analysis is incapable of disambiguating allpointer aliases (which at worst is an undecidable problem). Thus we cannot always translate theTCEL source into its corresponding \perfect" SSA form. We partially assuage the problem byadopting techniques such as (1) inlining procedures to avoid inter-procedural aliases; (2) renderingin SSA form only those assignments that contain statically analyzable variables; and (3) unrollingloop bodies. Of course these and similar methods will degrade the code scheduler's performance,either by increasing the amount of code, or by decreasing its ecacy. The good news, however,is that dependence analyzers are improving at a rapid rate, and our algorithm will improve alongwith them.Limits of Timing Analysis. Another limiting factor is the diculty of achieving accurate,static timing analysis in the face of more complicated architectures. Quite simply, it has becomeincredibly dicult to use vendor-supplied benchmarks, and model the interplay between pipelines,hierarchical caches, shared memories, register windows, etc. Thus with an approach like ours, itseems meaningless to predict the execution time of a single instruction (or even a small block).First, the CPU time will probably be too small to make a dierence in achieving feasibility, andsecond, the \noise" in the prediction will be too large.Thus we have adopted a hierarchical abstraction approach to deal with time predictions. For28
example, in our ight controller program we accounted only for the CPU-intensive function callsthat performed complex operations, while ignoring the execution time of ner-grained instructions.The same approach can be used on larger-grained structures within the HFG. Our experience showsthe transform engine should usually hunt for the \big-game targets," and forget about the smallerones.However after code scheduling is completed, it becomes imperative to verify the result with amore sophisticated timing tool; for example, a good proler. Performing such re-timing is especiallyimportant in a cached memory structure, where code scheduling will always change the instructionalignment. (We note that all modern RISC compilers re-order instructions to some degree; thusthe ecacy of any source-level timing analysis is diminishing.)User Interaction. The above two factors argue against a fully automated code synthesis tool.There is also a third factor, which we discussed in reference to speculative code motion. That is,programmers of production-quality, real-time systems will simply not accept a compiler technologythat \outsmarts" them, and possibly \disobeys" their intentions. They will, however, enthusiasti-cally embrace a tool that helps tune their systems, but not at the price of sacricing traceabilityto their original programs. A simple example illustrates the importance of this. Consider whatmight happen if an instruction that interacts with the environment fails to be annotated as anevent (which could easily happen with memory-mapped IO). If the instruction is relocated outsideof its source section, debugging the transformed program could become a nightmare.All of these considerations argue for a front-end that permits the programmer to interact withthe tool during code scheduling. With our scheduling engine as its foundation, a graphical interfacewould allow a programmer to selectively apply the transformations { and also remain informed ofthe results.Pushing Forward. We have recently turned our attention to a more aggressive goal { inter-task transformations to achieve schedulability. In [9] we explore a technique that helps auto-tunean unschedulable task set into a schedulable one, by isolating the time-critical threads, and thenensuring that they can be run under a xed-priority dispatcher.Finally, we note that the tradition in real-time programs has been to treat all instructions uni-formly { as \code" subject to various timing constraints. We have shown that \opening up" aprogram to consider its event-based semantics can be a great help in achieving feasibility. This ap-proach can potentially be used as a rst-line defense in the tuning process, and is usually preferableto measures like hand-optimizing the code, redesigning subsystems, or re-implementing componentsin silicon. 29
A Appendix: The Static Single Assignment FormA program is dened to be in SSA form if each use of a variable is reached by exactly one assignmentto it [3]. Thus, a program's SSA representation can be obtained iteratively applying the followingprocess: For each variable in the program, (1) unique names are given to all of its appearances onthe left-hand-side of an assignment; and (2) all of the uses reached by that assignment are renamedto correspond to the new name. The following examples demonstrate this process. In the straightline code, each assignment to a variable is given a subscripted name, and all of its uses are thenrenamed as well. v = f();a = v + 1;v = g();b = v + 2; =) v1 = f();a = v1 + 1;v2 = g();b = v2 + 2;Conditional statements require a bit more work in achieving the SSA form. At conuence pointsin the CFG, merge functions called -functions are introduced. A -function for a variable mergesits possible values from distinct incoming control ow paths, and produces one argument for eachcontrol ow predecessor.if condthen v = f();else v = g();a = v; =) if condthen v1 = f();else v2 = g();v3 = (v1; v2);a = v3;Since a -function is not an actual function to be generated, it is implemented with multipleassignments, as shown below. if condthen v1 = f(); v3= v1;else v2 = f(); v3 = v2;a := v3;References[1] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. AddisonWesley Publishing Company, 1986.[2] A. Aiken and A. Nicolau. A development environment for horizontal microcode. IEEE Trans-actions on Software Engineering, pages 584{594, May 1988.[3] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Eciently comput-ing static single assignment form and the control dependence graph. ACM Transactions onProgramming Languages and systems, 9:319{345, July 1987.30
[4] B. Dasarathy. Timing constraints of real-time systems: Constructs for expressing them,method for validating them. IEEE Transactions on Software Engineering, 11(1):80{86, Jan-uary 1985.[5] K. Ebcioglu and A. Nicolau. A global resource-constrained parallelization technique. In In-ternational Conference on Supercomputing, pages 154{163. ACM Press, June 1989.[6] J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transac-tions on Computer, 30:478{490, July 1981.[7] M. R. Garey and D. S. Johnson. Computer and Intractability: A Guide to the Theory ofNP-Completeness. W. H. Freeman and Company, 1979.[8] F. Gasperoni. Compilation techniques for VLIW architectures. Technical Report RC14915(#66741), IBM T. J. Watson Research Center, September 1989.[9] R. Gerber and S. Hong. Semantics-based compiler transformations for enhanced schedulability.In Proceedings IEEE Real-Time Systems Symposium, pages 232{242. IEEE Computer SocietyPress, December 1993.[10] P. Gopinath and R. Gupta. Applying compiler techniques to scheduling in real-time systems.In Proceedings IEEE Real-Time Systems Symposium, pages 247{256. IEEE Computer SocietyPress, December 1990.[11] M. G. Harmon, T. P. Baker, and D. B. Whalley. A retargetable technique for predictingexecution time. In Proceedings IEEE Real-Time Systems Symposium, pages 68{77. IEEEComputer Society Press, December 1992.[12] S. Hong and R. Gerber. Scheduling with compiler transformations: the TCEL approach.In Proceedings IEEE Workshop on Real-Time Operating Systems and Software, pages 80{84.IEEE Computer Society Press, May 1993. IEEE RTTC Real-Time Newsletter, 9(1/2):80-84.[13] Y. Ishikawa, H. Tokuda, and C. W. Mercer. Object-oriented real-time language design: Con-structs for timing constraints. In Proceedings of OOPSLA-90, pages 289{298, October 1990.[14] F. Jahanian and Al Mok. Safety analysis of timing properties in real-time systems. IEEETransactions on Software Engineering, 12(9):890{904, September 1986.[15] K. B. Kenny and K.-J. Lin. Building exible real-time systems using the Flex language. IEEEComputer, pages 70{78, May 1991.[16] I. Lee, P. Bremond-Gregoire, and R. Gerber. A Process Algebraic Apprach to the Specicationand Analysis of Resource-Bound Real-Time Systems. IEEE Proceedings, 82(1), January 1994.31
[17] I. Lee and V. Gehlot. Language constructs for real-time programming. In Proceedings IEEEReal-Time Systems Symposium, pages 57{66. IEEE Computer Society Press, 1985.[18] S. Lim, Y. Bae, C. Jang, B. Rhee, S. Min, C. Park, H. Shin, K. Park, and C. Kim. Anaccurate worst case timing analysis for risc processors. In Proceedings IEEE Real-Time SystemsSymposium. IEEE Computer Society Press, December 1994. To appear.[19] K. J. Lin and S. Natarajan. Expressing and maintaining timing constraints in FLEX. InProceedings IEEE Real-Time Systems Symposium. IEEE Computer Society Press, December1988.[20] T. Marlowe and S. Masticola. Safe optimization for hard real-time programming. In SecondInternational Conference on Systems Integration, pages 438{446, June 1992.[21] M. Merritt, F. Modungo, and M. Tuttle. Time-Constrained Automata. In CONCUR '91,August 1991.[22] A. Nicolau. Parallelism, Memory Anti-aliasing and Correctness Issues for a Trace SchedulingCompiler. PhD thesis, Yale University, June 1984.[23] V. Nirkhe. Application of Partial Evaluation to Hard Real-Time Programming. PhD thesis,Department of Computer Science, University of Maryland at College Park, May 1992.[24] C. Park and A. C. Shaw. Experimenting with a program timing tool based on source-leveltiming schema. In Proceedings IEEE Real-Time Systems Symposium, pages 72{81. IEEEComputer Society Press, December 1990.[25] A. C. Shaw. Reasoning about time in higher level language software. IEEE Transactions onSoftware Engineering, pages 875{889, July 1989.[26] M. Smith, M. Horowitz, and M. Lam. Ecient superscalar performance through boosting.In Fifth International Conference on Architectural Support for Programming Languages andOperating Systems, pages 248{259. ACM Press, October 1992.[27] V. Wolfe, S. Davidson, and I. Lee. RTC: Language support for real-time concurrency. InProceedings IEEE Real-Time Systems Symposium, pages 43{52. IEEE Computer Society Press,December 1991.[28] M. Younis, T. Marlowe, and A. Stoyenko. Compiler transformations for speculative executionin a real-time system. In Proceedings IEEE Real-Time Systems Symposium, 1994. to appear.[29] N. Zhang, A. Burns, and M. Nicholson. Pipelined processors and worst case execution times.The Journal of Real-Time Systems, 5(4), October 1993.32
