EquiMax. A New Formulation of Acyclic Scheduling Problem for ILP Processors by Touati, Sid
HAL Id: hal-00646739
https://hal.inria.fr/hal-00646739
Submitted on 23 Dec 2011
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
EquiMax. A New Formulation of Acyclic Scheduling
Problem for ILP Processors
Sid Touati
To cite this version:
Sid Touati. EquiMax. A New Formulation of Acyclic Scheduling Problem for ILP Processors.
Gyungho and Pen-Chung Yew. Interaction between Compilers and Computer Architecture, Kluwer
Academic Publishers, 2001, 0-7923-7370-7. ￿hal-00646739￿
EquiMax: A New Formulation of AcyclicScheduling Problem for ILP ProcessorsSid-Ahmed-Ali TouatiNovember 9, 2000AbstractIn this paper, we give a new formulation of acyclic scheduling problemunder registers and resources constraints in multiple instructions issuingprocessors (VLIW or superscalar). Given a direct acyclic data dependencegraphG = (V;E), the complexity of our integer linear programming modelis bounded by O(jV j2) variables and O(jEj + jV j2) constraints accordingto a target architecture description. This complexity is better than thecomplexity of the existing techniques which includes a worst total scheduletime factor.1 IntroductionTo sustain increases in processor performance, current compilers try to takebenet from instruction level parallelism (ILP) present in nowadays processors.Multiple operations are issued in the same clock cycle to increase the throughputof executed operations per cycle. Completing a computation as soon as possi-ble is a scheduling problem constrained by many factors. The most importantones are the data dependencies, the availability of hardware features and theavailability of registers. The data dependencies dene the code semantic andthe intrinsic available ILP in the code. The resources constraints limit the thenumber of instructions issued during the same clock cycle because of the lack offree functional units (FU). Also, architectural characteristics of current proces-sors reveal heterogeneous and complex pipelined FUs where an operation canuse a group of FUs in dierent clock cycles during its presence in the pipeline.Finally, since accessing a register has a null latency, we need to keep as manyvalues in the registers as possible.Unfortunately, theoretical studies on scheduling reveal that integrating re-sources constraints [3] or registers constraints [13] are two NP-complete prob-lems. Combining scheduling under both registers and resources constraints isalso NP-complete [4]. General compilers use many heuristics to get an optimizedschedule in polynomial time complexity. However, embedded applications orreal time systems may need optimal (best) schedule. For this purpose, we need1
a \good" formulation for the problem. Many work has been done using integerlinear programming (intLP) models. In our work, we present a new formulationof acyclic scheduling in basic blocs (BB) such that the complexity of the modelgenerated is lower than the current ones, like we will explain in the end of thispaper. Our formulation should reduce the resolution time since we considerablyreduce the number of variables and constraints in the generated intLP model.This paper is organized as following. We rst present the model of thetargeted processors in Sec. 2 and the acyclic data dependence graph (DDG) to bescheduled in Sect. 3 : in our study, we assume heterogeneous FUs, more than oneregister type, and delayed latencies of writing into and reading from registers.The problem of acyclic scheduling is briey recalled in Sect. 4. After, we denesome intLP modeling techniques in Sect. 5 to show how to linearize some logicaloperators (disjunction and equivalence) and how to compute the maximum of aset of integers. We then use these techniques to write our EquiMax (Equivalence-Maximum) intLP formulation in Sect. 6. We present some achieved work in thiseld in Sect. 7 and conclude by our remarks and perspectives in Sect. 8.2 Machine DescriptionAn ILP processor [14] takes benet from inherent parallelism in the instructionow and issues multiple operations per clock cycle thanks to the pipelined exe-cution and the presence of multiple functional units (FUs). An operation can beexecuted on one or more functional units (FU). We model the complex behaviorof the execution of the operations on FUs by the reservation tables [15]. Weattach to each instruction a reservation table (RT) to describe at which clockcycle a FU is busy due to the execution of that instruction on it. A RT consistsin a two-dimensional table, where the number of lines is the latency of the oper-ation, and the columns consists in the set of FUs. Given a RT of an instructionu, RT u(c; q) = 1 means that u executes on the FU q during the clock cycle cafter its issuing. The number of columns in RT is bounded by the set of FUs,and the number of lines bounded by the deep of the pipeline.The target machine M is described by the set of its hardware resources, itsregisters types, and the set of operations which execute on these resources :1. the set of registers types in the target architecture is T . For instance, thetarget architecture of the code in Fig. 2 has T = fint; f loatg ;2. the machine resources are represented by the couple (Q; ~NQ) : Q = fq1; : : : ; qMg is the set of the dierent FUs ; ~NQ = [Nq1 ; : : : ; NqM ] where Nq is the number of copies of q.3. the set of instructions is represented by a couple (IS ; ~RT ) : IS = fug is the instructions set which can be executed on M ;2




































(1) code before scheduling and register allocation
?(2) the DDG G
(a) fload [i1], fRa(b) fload [i2], fRb(c) fload [i3], fRc(d) fmult fRa, fRb, fRd(g) ftoint fRc, iRg(i) iadd iRg, 4, iRi(h) fdiv fRd, iRe, fRh(e) imultadd fRa, fRb, fRc, iRe(j) fadd fRj, 1 , fRj(k) fsub fRk, 1 , fRk(f) fmultadd fRb, iRi, fRc, fRfFigure 2: DDG modelbe kept in registers. We consider then that there is a ow arc from thesevalues to ? (like the ow arc (k;?) 2 ER;float).Finally, we consider that reading from and writing into a register may bedelayed from the beginning of the schedule time (VLIW case). We dene thetwo delay functions r;t and w;t such that :w;t : VR;t ! Nu 7! w;t(u)= 0  w;t(u) < lat(u)the write cycle of ut into a register of type t is (u) + w;t(u)r;t : V ! Nu 7! r;t(u)= 0  r;t(u)  w;t(u) < lat(u)the read cycle of ut from a register of type t is (u) + r;t(u)4 Scheduling ProblemLike explained above, a valid schedule  of G is rst constrained by the inherentdata dependency relations between operations or any other serial constraints.The target architecture add other constraints which are registers and resourcesconstraints.4.1 Registers ConstraintsGiven a DDG G = (V;E; ; w;t; r;t), a value ut 2 VR;t is alive at the rst stepafter the writing of ut until its last reading (consumption). The set of consumersof a value ut 2 VR;t is the set of operations that read it :Cons(ut) = fv= 9e = (u; v) 2 ER;tg4
For instance, Cons(bfloat) = fd; e; fg and Cons(kfloat) = f?g in Fig. 2. Thelast consumption of a value is called the killing date and noted ;8ut 2 VR;t kill(ut) = maxv2Cons(ut)  (v) + r;t(v)We assume that a value written at a clock cycle c in a register is availableone step later. That is to say, if operation u reads from a register at a clockcycle c while operation v is writing in it at the same clock cycle, u does notget v's result but gets the value that was previously stored in that register.Then, the lifetime interval LTut of the value ut is ](u) + w;t(u); kill(ut)].Having all value's lifetime intervals, the number of registers of type t neededto store all dened values is the maximum number of values of type t that aresimultaneously alive. We call this number the register need and we note it :RNt(G) = max0c jvsat(i)jwherevsat(c) = fut 2 VR;t=c 2 LTutg is the set of values of type t alive at clock cycle cTo compute the register need of type t, we build the indirected interferencegraph Ht = (VR;t; E), such that ut and vt are adjacent i they are simultane-ously alive. The register need RNt(G) is then the cardinality of the maximalclique (complete subgraph) of Ht.Since the number Rt of available registers of type t is limited in the targetmachine, we need to nd a schedule which doesn't need more than Rt registersfor each register type t : 8t 2 T RNt(G)  RtIf we cannot nd such schedule, spill code has to be generated, i.e. we muststore some values in memory rather than in registers. Spilling increases thetotal schedule time because it inserts new operations in the BB and the spilleddata may cause cache misses.4.2 Resources ConstraintsResources constraints are simply the fact that two operations must not executesimultaneously on the same FU, i.e. the total number of operations whichexecute on a FU q during a clock cycle c must not exceed the number of theFU copies Nq. By using the reservation table dened above, an operation uexecutes on a FU q during a clock cycle c i RT u[c   (u); q] = 1. Formally,the resources constraints are written :80  c  ; 8q 2 Q Xu2V RT u[c  (u); q]  Nq5
5 Integer Linear Programming TechniquesAn integer linear programming problem (intLP) [7] is to solve : maximize (or minimize) cxsubject to Ax = bwith c; x 2 Nn : x  0, and A is an (m  n) constraints matrix. This is thestandard formulation. In fact, we can use any other linear constraints (;; >;>;=).5.1 Writing Logical Operators with Linear ConstraintsIntrinsically, an ILP problem denes the two boolean operators ^ and : : having two constraints matrix A and A0, saying that x must be a solutionfor both Ax  b and A0x  b0 is modeled by : AA0 x   bb0  having a constraints matrixA withm lines (m linear constraints f1; f2;    ; fm),forcing x to do not verify Ax  b is modeled by :f1(x) < b1 _ f2(x) < b2 _    _ fm(x) < bmIn [7], the authors shown how to model the disjunctive operator _. Considerthe problem :  maximize (or minimize) f(x)subject to : g(x)  0 _ h(x)  0By introducing a binary variable  2 f0; 1g, this disjunction is equivalentto :  g(x)  gh(x)  (1  )hwhere g and h are two known non null nite lower bounds for g and h resp.We can also generalize to arbitrary number of constraints in a disjunctiveformula _n :_n(f1;    ; fn) = (f1(x)  0 _ f2(x)  0 _    _ fn(x)  0)Since the dichotomy operator _ is associative, we group the constraints two bytwo using a binary tree. We can either express _n by grouping the constraintsusing a perfect binary tree as shown in Fig. 3.(a), or using a left associativebinary tree as shown in Fig. 3.(b). With both techniques, there is (n 1) internal6
(a) Perfect Binary Tree (b) Left Associative Binary Tree
_ __ _ _ _
_
f3(x)  0 fn 1(x)  0fn(x)  0 f1(x)  0 f2(x)  0f3(x)  0fn 1(x)  0
fn(x)  0__f4(x)  0f2(x)  0_ _f1(x)  0Figure 3: Expressing an n-Disjunction with Linear Constraints_ operators which need to dene (n  1) boolean variables (h1;    ; hn 1). Thenal constraints system to express _n has O(n) constraints (f1;    ; fn) andO(n 1) boolean binary variables (h1;    ; hn 1). The non null lower bounds ofthe linear functions are always nite. They always can be computed staticallyand propagated up in the binary tree [16].From above, we can deduce the linear constraints of any other logical oper-ator :1. g(x)  0 =) h(x)  0 can be written g(x) < 0 _ h(x)  02. g(x)  0() h(x)  0 can be written g(x)  0 ^ h(x)  0 _  h(x) < 0 ^ g(x) < 05.2 Computing the Maximum with Linear ConstraintsIn our intLP formulation, we need to compute the function z = max(x; y) whichcan formulated by considering the following constraints :8>>>><>>>>: z  xz  yz  (1  )x + yz  y + (1  )x 2 f0; 1gwhere (x; y) are two nite non null upper bounds for x; y resp. We can also ex-press themaxn function with arbitrary number of parameters z = maxn(x1; x2;    ; xn).Sincemax is associative, we use a binary tree like explained for the n-disjunctionoperator in Fig. 3. The number of internal nodes including the root is equal ton  1, so we need to dene n  2 intermediate variables (that hold intermediatemaximums) and (n 1) systems to compute\max" operators. It leads to a com-plexity of O(n   2) = O(n) intermediate variables and O(4  (n   1)) = O(n)linear constraints (each \max" operator needs 4 linear constraints to be dened)7
and O(n   1) = O(n) binary variables (each max operator needs 1 boolean).The non null upper bounds of the linear functions are always nite if the domainsets of the variables xi is bounded [16].6 EquiMax Integer Programming FormulationIn this section, we dene a new formulation of scheduling problem using integerlinear programming (intLP). We named it EquiMax because it uses the linearconstraints which express the equivalence relation (()) and the functionmaxn.6.1 Scheduling Variables and Objective FunctionFor all operations u 2 V , we dene the integer variable u that computes theschedule time. The objective function of our model is to minimize the totalschedule time i.e. minimize ?.The rst linear constraints are those which describe the precedence relations,so we write in the model :8e = (u; v) 2 E v   u  (e)There is O(jV j) scheduling variables and O(jEj) linear constraints. To make thedomains set of our variables bounded, we assume T as the worst possible sched-ule time. We chose T suciently large, where for instance T =Pu2V lat(u) isa suitable worst total schedule time2. Then, we write the following constraint :?  TAs consequence, we deduce for any u 2 V : u  u = LonguestPathTo(u) is the \as soon as possible" schedule time ; u  u = T  LonguestPathFrom(u) is the \as later as possible" sched-ule time according to the worst total schedule time T ;6.2 Registers Constraints6.2.1 Interference GraphThe lifetime interval of a value ut of type t isLTut =]u + w;t(u); maxv2Cons(ut)  v + r;t(v)]We dene for each value ut the variable kut that computes its killing date. Thenumber of such dened variables is O(jT j  jVR;tj). Since the domain of ourvariables is bounded, we know that kut is bounded by the two following niteschedule times : 8t 2 T 8ut 2 VR;t kut < kut  kutwhere2The case where no ILP is exploited. 8
 kut = u + w;t(u) is the rst possible denition date of ut ; kut = maxv2Cons(ut)  v + r;t(v) is the latest possible killing date of ut.We use the maxn linear constraints to compute kut like explained in Sect. 5.2 :we need to dene for each kut O(jCons(ut)j) variables and O(4  jCons(ut)j)linear constraints to compute it. The total complexity to dene all killing datesfor all registers types is bounded by O(jV j2) variables and O(jV j2) constraints.Now, we can consider Ht the indirected interference graph of G for theregister type t. For any couple of values of the same type ut; vt 2 VR;t, wedene a binary variable stu;v 2 f0; 1g such that it is set to 1 if the two valueslifetimes intervals interfere : 8t 2 T ; 8 couple ut; vt 2 VR;tstu;v =  1 if LTut \ LTvt 6= 0 otherwiseFor any registers type t 2 T , the number of variables stu;v is the number ofcombinations of 2 values among jVR;tj i.e.  jVR;tj  (jVR;tj   1)=2.LTut \ LTvt =  means that one of the two lifetime intervals is \before"the other, i.e. LTut  LTvt _ LTvt  LTut where  denotes is the precedenceoperator (\before") in interval algebra [12]. Then, we have to express :stu;v = 1() : LTut  LTvt _ LTvt  LTutSince stu;v 2 f0; 1g, these constraints are equivalent to :stu;v  1()  kut   v   w;t(v)  1  0kvt   u   w;t(u)  1  0Given three logical expressions (P;Q; S), (P () (Q ^ S)) is equivalent to(P ^ Q ^ S) _ (:P ^ :Q) _ (:P ^ :S). We write these two disjunctions withlinear constraints by introducing two binary variables h; h0 2 f0; 1g (see Sect. 5)and computing the nite non null lower bounds of the linear functions. Thisleads to write in the model : 8t 2 T ; 8 couple ut; vt 2 VR;t8>>>>>>>>>>>><>>>>>>>>>>>>:
stu;v + h+ h0   1  0kut   v   w;t(v)  min( 1; kut   v   w;t(v)   1) (h+ h0)  1  0kvt   u   w;t(u) min( 1; kvt   u   w;t(u)  1) (h+ h0)  1  0 stu;v   h+ h0 + 1  0 ku + v + w(v) +min( 1;  kut + v + w;t(v))  (h  h0   1)  0 stu;v   h0 + 1  0 kvt + u + w(u) +min( 1;  kvt + u + w;t(u)) (h0   1)  0h; h0 2 f0; 1g 9
The complexity of computing all the stu;v variables is O jVR;tj (jVR;tj  1) bi-nary variables (two booleans for each couple of values (ut; vt)) and O 7=2jVR;tj(jVR;tj   1)j linear constraints (7 linear constraints for each couple of values).The total complexity of considering the interference graph Ht is then boundedby O(jVR;tj2) variables and O(jVR;tj2) constraints.6.2.2 Maximal Clique in the Interference GraphThe maximum number of values of type t simultaneously alive corresponds to amaximal clique in Ht = (VR;t; Et), where (ut; vt) 2 Et i their lifetime intervalsinterfere (stu;v = 1). For simplicity, rather to to handle the interference graphitself, we prefer considering its complementary graph H 0t = (VR;t; E 0t) where(ut; vt) 2 E 0t i their lifetime intervals do not interfere (stu;v = 0). Then, themaximum number of values of type t simultaneously alive corresponds to amaximal independent set3 in H 0t.To write the constraints which describe the independent sets (IS), we denea binary variable xut 2 f0; 1g for each value xut 2 VR;t such that xut = 1 iut belongs to an IS of H 0t. We must express in the model the following linearconstraints :8t 2 T 8 couple xut ; xvt 2 VR;t xut + xvt  1() stu;v = 0Since stu;v 2 f0; 1g and by using the linear expressions of the equivalence (()),we introduce a boolean h 2 f0; 1g (see Sect. 5). The IS are dened in the intLPmodel by considering : 8>>>><>>>>:  xut   xvt + h+ 1  0 stu;v + h  0xut + xvt   2h  0stu;v   h  0h 2 f0; 1gThe number of the variables xut is O(jVR;tj). The number of introduced binaryvariables to express the equivalences is O(2 jVR;tj  (jVR;tj   1)). The numberof linear constraints to dene the IS is O(2 jVR;tj  (jVR;tj   1)).The registers constraints are the fact that any set of values simultaneouslyalive of registers type t must not exceed the number of available registers Rt.The maximal IS in H 0t is the maximal Put2VR;t xut . Thereby, we write in themodel ; 8t 2 T Xut2VR;t xut  RtThere is O(jT j) = O(1) such constraints. The total complexity of computingthe maximal independent sets in H 0t (maximal cliques in Ht) is then boundedby O(jVR;tj2) variables and O(jVR;tj2) constraints.3It is a subgraph such that there is no two adjacent nodes.10
6.3 Resources Constraints6.3.1 Conicting GraphThe resources constraints are handled by considering for each FU an indirectedgraph Fq = (V; Eq) which represents conicts between the instructions on a FUq 2 Q. For any couple of operations, (u; v) 2 Eq i u and v are in conicts on q.Any clique in Fq represents the set of operations that use q at the same clockcycle. So, any clique must not exceed Nq the number of the FU copies.We dene a binary variable fqu;v 2 f0; 1g such that fqu;v = 1 i there is aconict between u; v on the FU q. For each FU, there is O(1=2jV j (jV j 1))fq binary variables. To compute them, we use the reservation tables explainedin Sect. 2. Having the RT of two operations types u and v, we can deduce whena structural hazards occurs on a FU q. For example, the operations a and idescribed in Fig. 2 have the RT of Fig. 1. These two operations are in conicton the ALU i a = v _ a + 1 = v . The general formulation of conictingvariables is the disjunction of all cases where a conict on the FU may occur.Let Uu;q be the set of clock cycles in the reservation table of u where the FUq is used by u :8u 2 V 8q 2 Q Uu;q = fc 2 N=RT u[c; q] = 1gThe set of all cases where two operations conicts on a FU q are described bythe cartesian product Uu;qUv;q. The general formula of the binary conictingvariables is then : 8q 2 Q8q 2 Q 8 couple u; v 2 V fqu;v = 1() _(c1;c2)2Uu;qUv;q u + c1 = v + c2We use the linear constraints of equivalences and disjunctions dened in Sect. 5to write the linear description of this formula in the model. The number ofterms in this disjunction depends on Uu;q  Uv;q which is a function of thetarget architecture characteristics (reservation tables and instructions set), andthereby it is a constant for any input DDG. We can write the linear constraints ofconicting cases of all the couples of instructions in IS only once for the targetarchitecture, and then instantiate them for any DDG. The total complexityof computing the conicting variables fq is bounded by O(jV j2) variables andO(jV j2) constraints.6.3.2 Maximal Click in the Conicting GraphFor simplicity, rather than considering the conict graph Fq itself, we use itscomplementary F 0q = (V; E 0q) such that (u; v) 2 E 0q i u and v are not in conictson q (fqu;v = 0). Then, a clique in Fq becomes an independent set in F 0q .We dene a binary variable yqu 2 f0; 1g for each operation u such that yqu = 1i u belongs to an IS of F 0q . We write in the intLP model the linear constraintsof IS : 8q 2 Q 8 couple u; v 2 V yqu + yqv  1() fqu;v = 011
Since fqu;v 2 f0; 1g and by using the linear constraints of the equivalence (Sect. 5),we introduce a binary variable h 2 f0; 1g. These constraints become :8>>>><>>>>:  yqu   yqv + h+ 1  0 fqu;v + h  0yqu + yqv   2h  0fqu;v   h  0h 2 f0; 1gThere is O(1=2 jV j  (jV j   1)) binary variables h for each FU (one for eachcouple of operations) and O(2jV j(jV j 1)) linear constraints to describe theIS. The resources constraints are the fact the cardinality of the any independentset in F 0q must not exceed Nq . We write in the model :8q 2 Q Xu2V yqu  NqThere is O(jQj) = O(1) such linear constraints.6.4 SummaryOur integer LP model has a total complexity bounded by O(jV j2) variables andO(jEj+ jV j2) constraints :1. the objective function : minimize ?2. the total number of integer variables is bounded by O(jV j2) :(a) O(jV j) scheduling variables : u for each node u 2 V ;(b) O (jVR;tj (jVR;tj 1))=2 interference binary variables for each reg-isters type t : stu;v 2 f0; 1g for all couples ut; vt 2 VR;t ;(c) O(jVR;tj) binary independent sets variables for the complementaryinterference graph H 0t of the register type t : xut 2 f0; 1g for eachvalue ut 2 VR;t ;(d) O (jV j  (jV j   1))=2 conict binary variables for each FU q :fqu;v 2 f0; 1g for all couples u; v 2 V ;(e) O(jV j) binary independent sets variables for the complementary con-ict graph F 0q of each FU q : yqu 2 f0; 1g for each operation u 2 V ;(f) the total number of intermediate and binary variables to write maxn,n-disjunctions and equivalence with linear constraints is bounded byO(jV j2) ;3. the total number of linear constraints is bounded by O(jEj + jV j2) :(a) O(jEj) scheduling constraints :8e = (u; v) 2 E v   u  (e)12
(b) the total number of interval lifetimes interference constraints is boundedO(jVR;tj2) for each register type t :8t 2 T stu;v = 1() : LTut  Lvt _ Lvt  Lut(c) the total number of independent sets constraints for the complemen-tary interference graph H 0t is bounded by O(jVR;tj2) for each registertype t : 8t 2 T xut + xvt  1() stu;v = 0(d) the number of registers constraints is O(jT j) = O(1) :8t 2 T Xut2VR;t xut  Rt(e) the total number of conicting constraints is bounded by O(jV j2) foreach FU q :8q 2 Q fqu;v = 1() _(c1;c2)2Uu;qUv;q u + c1 = v + c2(f) the total number of independent sets constraints for the complemen-tary conict graph F 0q is bounded by O(jV j2) :8q 2 Q yu + yv  1() fu;v = 0(g) the number of resources constraints is O(jQj) = O(1) :8q 2 Q Xu2V yu  Nq(h) the total number of linear constraints to expressmaxn, n-disjunctionsand equivalence is bounded by O(jV j2) ;We can optimize the length of our model by considering ; a precedence constraints e = (u; v) is redundant and can be evicted fromthe model i lp(u; v) > (e), where lp(u; v) denotes the longest path fromu to v ; two values (ut; vt) 2 VR;t can never be simultaneously alive i for allpossible schedules one value is always dened after the killing date of theother. This is the case if any of the two following conditions is veried :8v0 2 Cons(vt) lp(v0; u)  r(v0)  w(u)8u0 2 Cons(ut) lp(u0; v)  r(u0)  w(v)such that if no path exists between two nodes, we consider it as  1 ;13
 two operations u; v 2 V can never conict on a FU q i they can neveruse q at the same clock cycle. This is the case if any of the two followingconditions is veried :8c 2 Uu;q 8c0 2 Uv;q lp(u; v) > c  c08c0 2 Uv;q 8c 2 Uu;q lp(v; u) > c0   csuch that if no path exists between two nodes, we consider it as  1.7 Related WorkAcyclic scheduling under registers and resources constraints is a classical prob-lem where lot of work has been done. An intLP formulation (SILP) was denedin [17] to compute an optimal schedule with register allocation under resourcesconstraints. The complexity of this model is bounded by O(jV j2) variables andO(jV j2) constraints. However, this formulation does not introduce registersconstraints, i.e. it does not limit the number of values simultaneously alive.Moreover, the resources usage patterns which they use was simple and do notformalize structural hazards that are present in most current ILP processors.A formulation, called OASIC, introduced registers constraints and was given in[8, 9]. The number of variables was O(jV j2) but the number of linear constraintsgrown exponentially due to registers constraints. An extension of OASIC formu-lation was written in [11] to take into account non regular registers sets (someregisters must not be used by some operations) and some other special con-straints on ILP which are specic to their target processor characteristics. Theregisters constraints was formulated but not integrated in that model becauseof the exponential number of constraints to be generated.Lot of work has also been done for cyclic scheduling problem (softwarepipelining) under registers and resources constraints. It is easy to rewrite theseintLP models to solve acyclic scheduling problems. Hanen has written an origi-nal formulation to linearize disjunctive resources constraints in [10]. The draw-back of her formulation is to treat only simple resources, i.e. an operation canexecute only on a single FU. Feautrier in [6] has extended this latter to takeinto account multiple copies of one FU. However, his formulation has the samedrawback as in [17] and does not treat complex and heterogeneous FUs. Cyclicscheduling under both registers and resources constraints has been formulated in[1, 4, 5]. All these formulations have a complexity which depends on a worst to-tal schedule time T . Indeed, they dene a binary variable u;c for each operationu and for each execution step c during the whole execution interval [0; T ]. u;cis set to 1 i the operation u is scheduled at the clock cycle c. The complexityof their models is clearly bounded by O(T jV j) variables and O(jEj+T jV j)constraints. The factor T can be very large since it depends on the input dataitself (critical paths and specied operations latencies), and not depend on theamount of input data. For instance, if we are sure statically that the access tothe memory performed by the operation a in Fig. 2 is a cache miss, then we14
would specify that its latency is a memory access ( 100) rather than a cacheaccess in order to better exploit free slots during scheduling. In this case, thenumber of variables and constraints in the intLP model is multiplied by a factorof hundred.The coecients introduced by our formulation in the nal constraints matrixare all bounded by T and  T , which is the case of the coecients in the modelsdened in [1, 4, 5]. If T is very huge, resolving an EquiMax model or any of theprevious formulations may cause computational overows : in fact, searching foran exact solution of an intLP model needs to compute some determinants of theconstraints matrix which can be very huge if the coecients are suciently large[2]. Since EquiMax reduces the size of the constraints matrix, computing thesedeterminants must be less critical with our formulation than with the previoustechniques.8 ConclusionIn this work, we give an intLP formulation of scheduling under resources andregisters constraints. The FUs can have a complex usage pattern and are mod-eled by reservation tables. We handle multiple registers types and delayed readfrom and write into the registers. The complexity of our model depends onlyon the number of operations to be scheduled and on the number of serial con-straints. Theoretically, our formulation should reduce considerably the time ofnding the exact solution. In the future, we will extend our formulation tocyclic scheduling (software pipelining), where the values lifetime intervals andthe resources usage patterns become cyclic.References[1] Eric Altman. Optimal Software Pipelining with Functional Units and Reg-isters. PhD thesis, McGill University, Montreal, October 1995.[2] William Cook, William H. Cunningham, William R. Pulleyblank, andAlexander Schrijver. Combinatorial optimization. J. Wiley and sons, 1998.[3] Alain Darte, Yves Robert, and Frederic Vivien. Scheduling and AutomaticParallelization. Birkhauser Boston , 2000.[4] Christine Eisenbeis, Franco Gasperoni, and Uwe Schwiegelshohn. Allocat-ing Registers in Multiple Instruction-Issuing Processors. In Lubomir Bicand Wim Bohm and Paraskevas Evripidou and Jean-Luc Gaudiot, editor,Proceedings of the IFIP WG 10.3 Working Conference on Parallel Archi-tectures and Compilation Techniques, PACT'95, pages 290{293, Limassol,Cyprus, June 27{29, 1995. ACM Press.15
[5] Christine Eisenbeis and Antoine Sawaya. Optimal Loop Parallelizationunder Register Constraints. In Sixth Workshop on Compilers for ParallelComputers CPC'96. , pages 245{259, Aachen - Germany, December 1996.[6] Paul Feautrier". Fine-Grain Scheduling under Resource Constraints. InProceedings of the 7th International Workshop on Languages and Compilersfor Parallel Computing, Lecture Notes in Computer Science, pages 1{15.Springer-Verlag, August 1994.[7] Robert S. Garnkel and George L. Nemhauser. Integer Programming. JohnWiley & Sons, New York, 1972. Series in Decision and Control.[8] C. H. Gebotys. Optimal Scheduling and Allocation of Embedded VLSIChips. In Proceedings of the 29th Conference on Design Automation, pages116{119, Los Alamitos, CA, USA, June 1992. IEEE Computer SocietyPress.[9] C. H. Gebotys and M. I. Elmasry. A Global Optimization Approach for Ar-chitectural Synthesis. In Proceedings of the IEEE International Conferenceon Computer-Aided Design, pages 258{261, Santa Clara, CA, November1990. IEEE Computer Society Press.[10] Claire Hanen. Study of NP-hard Cyclic Scheduling problem: The periodicrecurrent job-shop. In International Workshop on Compiler for ParallelComputers. Ecole des Mines de Paris, December 1990.[11] D. Kaestner and M. Langenbach. Code Optimization by Integer LinearProgramming. Lecture Notes in Computer Science, 1575:122{136, 1999.[12] Martin Charles Golumbic and Ron Shamir. Interval Graphs, Interval Or-ders and the Consistency of Temporal Events. In Proceedings of Theory ofComputing and Systems (ISTCS'92), volume 601 of LNCS, pages 32{42,Berlin, Germany, May 1992. Springer.[13] R. Sethi. Complete register allocation problems. SIAM Journal on Com-puting, 4(3):226{248, 1975.[14] Jurij Silc, Borut Bobic, and Theo Ungerer. Processor Architecture: fromDataow to Superscalar and Beyond. Springer, premiere edition, 1999.[15] M. Tokoro, E. Tamura, and T. Takizuka. Optimization of Microprograms.IEEE Trans. on Computers, C-30(7):491{504, 1981.[16] Sid-Ahmed-Ali Touati. Optimal Register Saturation in Super-scalar and VLIW Codes. Research Report, INRIA, October 2000.ftp.inria.fr/INRIA/Projects/a3/touati/optiRS.ps.gz.[17] L. Zhang. SILP: Scheduling and Register Allocation with Integer LinearProgramming. PhD thesis, University of Saarlands, 1996.16
