Introduction
A cyclic scheduling problem is specified by a set of generic operations that are processed an infinity of times and a set of constraints linked to precedence relations between the operations or to usage of limited resources. The objective is to generally maximize the throughput of scheduling. In this paper, we focus on the resource-constrained modulo scheduling problem (RCMSP), a general periodic scheduling problem, abstracted from the problem solved by compilers when optimizing inner loops at instruction level for VLIW parallel processors.
We briefly describe the context of the RCMSP in terms of parallel computing and very long instruction word (VLIW) architectures. We refer to [dD04, SMD04, RS02, ED97, EDA94, DAA08] for more details. In VLIW architectures, the instruction scheduling problem is a major compilation issue. To reduce the troughput, software pipelining is widely used. Software pipeling is a technique that allows to interleave the successive operations of a loop. Among software pipeling techniques, this paper focuses on the modulo scheduling technique, which implements software pipelining by generating periodic schedules. This is a suboptimal policy compared to k-periodic scheduling [HM94] but it is used in practice for ease of implementation. The goal is to find a valid schedule for the instructions of a loop (the local schedule) that can be overlapped with itself infinitely. In the modulo scheduling framework, the interval between two local schedules is called initiation interval (or period) and is the main indicator of the schedule quality. The modulo scheduling algorithm must take into account the constraints of the target processor, this is, latencies of operations, resources, and size of the register files. Also, it should consider optimizing secondary goals such as, minimizing the schedule length of a loop iteration, minimizing the register requirements of the resulting modulo schedule. Algorithms based on optimal solvers have been proposed, and are referred to as optimal modulo schedulers. In this study, register constraints and objectives are ignored. In this context, The RCMSP can be informally defined as a periodic scheduling problem consisting in minimizing the period, while satisfying precedence and resource constraints.
Solving the instruction scheduling problem at compilation phase in less time critical than for real time scheduling, integer linear programming (ILP) is a relevant technique for the RCMSP [dD04, ED97, EDA94, DAA08]. Hence, different ILP formulations for the RCMSP have been proposed and used in practice. These formulations can be presented as generalizations of the classical non preemptive time-indexed formulations of Pritsker et al. [PWW69] and the tighter variant presented by Christofides et al. [CAVT87] , for the (non periodic) resource constrained project scheduling problem (RCPSP). However, because of the periodic nature of the problem, the time-indexed formulation of the RCPSP can be extended in two different ways, yielding two categories of ILP formulations. Eichenberger et Davidson [ED97] proposed a first extension comprising both binary and integer variables. We call this category of formulations the decomposed formulations. Dupont de Dinechin [dD04, Dup07] proposed a second extension comprising only binary variables. We call the latter category the direct formulations. Ayala and Artigues [AA10] performed a series of computational experiments on a set of industrial and randomly-generated instances. They show that in terms of exact solving, the ILP formulations perform reasonably well although high computational times may be required to reach optimality for moderate size instances. For some hard instances optimum could not even be found after hours of computing.
Heuristic solving scheme have been proposed since many years to solve this problem, among which the decomposed software pipeling technique [Rau93, GS94, BdDH06, CDR98a] . This approach decomposed this problem into two subproblems solved consecutively. First, a cyclic scheduling problem ignoring resource constraints is considered and a so-called legal retiming of the operations is issued. Second, a standard acyclic problem, taking this retiming as input, is solved through list scheduling techniques. An extension of decomposed software pipelining has been proposed by [BH09a] for the resource-constrained cyclic scheduling problem with precedence delays. This approach has a very good performance in time and distance to optimum. However, even for the best variant of this approach, the optimum is not always reached, especially when the operations have non unit resource demands, which corresponds to the general case.
In this paper, we propose an hybrid approach, which uses the decomposed software pipeling method to obtain a good retiming. Then the obtained retiming is used to build a ILP formulation using the Eichenberger et Davidson [ED97] scheme. The formulation has a reduced size which allows to solve it exactly. Experimental results show that a lot more problems are solved with this new approach. The gap to the optimal solution is really small (0 or 1%) on all the tested problem instances.
Section 2 defines the considered resource-constrained modulo scheduling problem and presents the data intances used for the experiments. Section 3 presents the ILP formulations proposed by Eichenberger et Davidson [ED97] and Dupont de Dinechin [dD04, Dup07] and reports experimental results in terms of exact solving of the considered problem instances. In Section 4, the approximation algorithms based on decomposed software pipeling are presented, as well as their results on the same problem instances. Section 5 details the proposed hybrid method and gives its results in comparison with the two previously-presented methods. Concluding remarks are drawn in Section 6.
The Resource-Constrained Modulo Scheduling Problem (RCMSP)
Instruction schedules, produced by the compiler in modern processors like VLIW (Very Long Insctruction Word) architectures, are a performance critical optimization that has a direct impact on the overall system cost and energy consumption. High-quality instruction schedules enable to reduce the operating frequency given real-time processing requirements. This combinatorial problem is also known as software pipelining [AJLA95] and can be expressed by a cyclic scheduling problem with resource constraints. Among the different cyclic scheduling frameworks, modulo scheduling [Rau94] focuses on finding a periodic schedule with the minimal period λ and it is the most successful in production compilers. In order to model the software pipelining problem, [DAA08] proposes an extension of the classic modulo scheduling problem to Resource-Constrained Modulo Scheduling Problems RCMSP where the resources are adapted from the renewable resource of the Resource-Constrained Project Scheduling Problem RCPSP [BDM + 99].
Problem formulation
Modulo scheduling is equivalent to periodic scheduling with an integral period λ ≥ 1. We consider a set V = {O 1 , . . . , O n } of n generic operations of unit duration. A schedule is defined by a mapping σ : V × N → N, where σ(O i , q) represents the start time of the q th instance (O i , q) of generic operation O i . For ease of notation, σ(O i , q) will be denoted as σ q i in the remainder of the paper. In addition, a periodic schedule satisfies:
Let E be the set that represents dependencies among operations, each dependence constraint can be written:
where θ j i is the latency and represents dependence length while ω j i is the distance, representing the number of iterations separating the two operations involved.
Introducing σ i as a shortcut for σ 0 i , the start time of the operation at the first iteration (defining the local schedule), and using (1) we obtain the following dependence constraints:
Each operation i ∈ {1, . . . , n} requires b s i ∈ N units of each resource s ∈ {1, . . . , k}. Each resource s has a limited availability m s ∈ N. Note that since λ ≥ 1, (1) implies that several instances of the same operation cannot be scheduled in parallel. Consequently, the set of operation instances in process at a time step t ∈ N is the set A(t) = {i ∈ {1, . . . , n}|∃q ∈ N, σ q i = t}. The resource constraint can be written as follows:
Consider now a "generic" time step τ ∈ {0, . . . , λ − 1} and the set B(τ ) = {i ∈ {1, . . . , n}|∃q ∈ N, σ i = τ + qλ} of operations having their start time σ i equal to τ modulo λ. Thanks to the schedule periodicity, the resource constraint can be also simplified as it can be shown that there always exists T ∈ N such that for t ≥ T , we have A(t) = B(τ ) and for each t < T we have A(t) ⊂ B(τ ) where t = τ + qλ. The resource constraints can then be replaced by the following "modulo" resource constraints:
Finally the RCMSP aims at finding a schedule σ ∈ N n that satisfies constraints (2-3) and minimizes the period λ.
Example
To illustrate the resource-constrained modulo scheduling problem that occurs in software pipeline, a sample C program and the corresponding ST200 VLIW processor operations are given in Figure 1 The dependence graph corresponding to the example in Figure 1 is displayed in Figure 2 . Each node represents a generic operation and each arc represents a dependence. Pairs of values for (θ j i , ω j i ) are displayed close to their arc. A dummy node 0 represents the start of schedule and a dummy node n + 1 = 8 represents the end of the schedule. In Table 1 we display the resource 1 availabilities and the resource requirements of each operation class 2 . 
Lower bounds on the optimal period
Without resource constraints, the modulo scheduling problem is polynomially solvable. However, finding a modulo schedule of minimal period λ is known to be N P-hard when resources are limited [HM94] .
If we assume infinite availability of resources, the optimal period of a schedule is defined by
where L(C) and H(C) are respectively the sum of latencies and the sum of distances of arcs along the circuit C. It is well known (see. [HM94] ) that λ opt ≥ λ ∞ . This bound is due to precedence constraints only and can be computed in polynomial time using a critical circuit algorithm. On the other hand, a bound due to resource constraints only can be defined by
That is the minimum such that the renewable resources are not over-subscribed and it can be easily proven that λ opt ≥ λ res .
In the rest of the paper we shall denote by λ min = max(λ res , λ ∞ ), the largest of the two above-defined lower bounds.
Architecture and data
In this paper, we made our experiments on a real benchmark of graphs issued from the ST200 compiler with real ressource requirements, where the smallest instance "gsm-st231.10.rcms" has 10 operations and 42 dependence edges, and the biggest one "gsm-st231.18.rcms" has 214 operations and 1063 edges.
Firstly, we made our experiments on a real VLIW architecture (ST200 of STMicroelectronics) with 6 functional units (resources) whose availabilities and operation demands are described in Table 1 Then, we considered the instances presented in [AA10] obtained from the above-defined instances by setting to 10 the availability of each resource and, for each operation, by randomly generating a resource demand vector whose components are chosen in the interval {1, · · · , 10}. Table 2 presents, for each original instance from the ST200 compiler, the number of operations #operations, the number of dependence constraints #prec, the precedence-based lower bound λ prec , the resource-based lower bound for the original instances λ res (ind) and the resource-based lower bound for the modified instances λ res (mod). One observes the resource constraints are much tighter for the modified instances.
The tests were performed on an INTEL(R) Core(TM) 2 DUO 1.99GHz RAM.
In our experimental framework,
• two ILP formulations, presented in section 3, are tested: Eichenberger et al. Decomposed formulation [ED97] and Dupont-de-Dinechin et al.
Direct Formulation (FDI) [dD04] . We applied CPLEX 11 to solve the different ILP formulations.
• different DSP algorithms are introduced in section 4 and tested considering several retiming policies (Gasperoni and Schwiegelshohn's approach [GS94] , longest path in the pattern minimization [CDR98b] , zero weighted edges minimization and combined longest path and zero weighted edges minimizations [DH00a] ) and different acyclic scheduling strategies (As early as possible scheduling, a list with critical paths as priorities, a list with Zinder-Roper [ZR98] priorities).
• a new algorithm, that combines the two previous approachs, is introduced in section 5.
We notice that this is the first study that compares and combines ILP formulations and DSP approximation algorithms for RCMSP. 10 For the non periodic resource-constrained project scheduling (RCPSP), Pritsker et al. [PWW69] proposed an ILP formulation based on time-indexed binary variables z t i such that z t i = 1 if and only if the start time of operation i is equal to t. According to the periodic nature of the RCMSP, there are two ways for extending this model, yielding the two ILP models described in this section. To obtain easily linear constraints, both formulations suppose that λ is fixed. Hence the minimum λ for which the corresponding ILP is feasible is the desired optimum period. In our experiments, we simply perform a linear search by iteratively solving the ILP starting with λ = λ min . Note that this solving scheme allows to easily integrate a secondary objective. Although, in this paper, only period minimization is considered, the ILP are in fact solved with an objective function set to the weighted sum of the operation start times. If w i denote the weight of an operation i, the objective is min n i=1 w i σ i . As a preamble, Ayala and Artigues [AA10] have shown that, with this objective, the two formulations presented below are equivalent in the sense they give the same LP relaxation lower bound.
Direct formulation [dD04, Dup07]
Dupont-de-Dinechin [dD04, Dup07] proposed a time-indexed formulation based on a direct discretization of start times σ i . This formulation is based on binary variables x t i such that x t i = 1 if and only if σ i = t and we have σ i = T −1 t=0 tx t i , where T is any upper bound of the makespan allowing to achieve the optimum λ. For ease of notation, we suppose there is an integer Q such that T = Qλ (we can always increase T as needed to obtain this property). This formulation (direct) is expressed as follows:
As already mentionned, objective (4) is minimization of the weighted sum of the operation start times. Constraints (5) state that each generic operation has to be started exactly once in {0, . . . , T − 1}. Constraints (7) ensure that the usage of a resource never exceeds its availability. Here, set B(τ ) is the set of operations such that x t i = 1 for any t such that t = τ + qλ with q ∈ {0, . . . , T λ − 1 = Q − 1}. Inspired by the results of Christophides et al for the RCPSP [CAVT87], Dupont-de-Dinechin [dD04, Dup07] introduced the following so-called "disaggregated" precedence constraints:
As in the preceding case, replacing constraints (6) by constraints (9) yields a tighter formulation (direct+). The proof can be extended from the results obtained for the RCPSP (see e.g. [SBJ99] ).
Decomposed formulation [ED97]
The start time of the generic operation i can be decomposed according to the division by λ. We have
Following this decomposition, Eichenberger and Davidson [ED97] introduce integer variables α i and binary variables y τ i such that, y τ i = 1 if only if τ i = τ which yields also τ i = λ−1 τ =0 τ y τ i . The formulation (decomp) is expressed as follows:
As for the direct formulation, by replacing σ i by λ−1 τ =0 τ y τ i + α i λ, objective (10) represent the weighted sum of operation start times. Constraints (12) are the precedence constraints (2). Constraints (11) state that each generic operation has to be started exactly once in the period or, equivalently, that the remainder of the division of σ i by λ lies in {0, . . . , λ − 1}. With this decomposition, set B(τ ) is precisely the set of operations such that y τ i = 1, which directly gives resource constraints (13) from original modulo resource constraints (3).
Based on results obtained by Chaudhuri et al. [CWM94] , Eichenberger and Davidson [ED97] propose a new precedence constraint, they call "structured" precedence constraint.
Replacing constraints (12) with constraints (16) yields a tighter formulation (decomp+) (see [ED97] for the proof).
Experimental study
We now evaluate the performance of the standard (direct+) and (decomp+) formulations on the industrial and modified instances for obtaining optimal solutions. First, remark that a lower bound of the optimal period can be obtained by finding the minimal λ for wich the LP relaxation is feasible. Theoretical results from [AA10] established that the (direct) and (decomposed) ILP yield the same lower bound, as well as (direct+) and (de-composed+). Unfortunately, the computational experiments carried out in [AA10] also showed that for the same industrial and modified instances as the one considered in the present paper, the lower bound obtained this way never exceeds λ min , the trivial lower bound. A better lower bound can be obtained by the new formulation proposes in [AA10] but this formulaiton has an exponential number of variables and cannot be used for direct ILP solving.
For integer solving, we use here the (direct+) and (decomp+) formulations. Starting with the trivial lower bound on the period λ min , the branchand-bound of the ILP solver is used, incrementing the period until a feasible 13 solution is found which yields the optimal period . Tables 13 and 15 shows the optimal of best known integer solutions obtained by (direct+) and (de-comp+).
Tables 3, 4, 5 and 6 show a classification of instances according to the time of execution and the size of the instance. Table 13 shows that all industrial ST200 instances except one (gsm-st231.18) can be solved to optimality. The (decomp+) formulation is faster than the (direct+) formulation. However, if some instances are solved very quickly, more than 161 hours are necessary to solve instance adpcm-st231.2. For the most part solved industrial instances, the optimal period is equal to the trivial lower bound λ min . In just one case (instance "adpcm-st231.2") the optimal period is not equal to the trivial lower bound. Table 15 shows the optimal of best known integer solutions obtained by (direct+) and (decomp+) on the modified instances. It appears that the modified instances are harder to solve as less optimal solutions are found. There are 6 instances whose optimal solution is not found after three weeks of time run. We have two instances for which, after three weeks of time run, a feasible but not optimal solution is found (with gaps of 1.75% and 8%). The hardess of the modified instances, compared to the industrial instances, is also reflected by the fact that, for the optimally solved modified instances, the minimal period is now larger than the trivial lower bound.
Approximation algorithms
Among the software pipelining algorithms, a guaranteed approach, called Decomposed Software Pipelining (DSP), has been proposed by Gasperoni and Schwiegelshohn [GS94] , followed by the retiming method by Calland, Darte and Robert [CDR98b] to solve the problem for parallel processors and ordinary precedence. The main idea of DSP is to decouple the problem into dependence constraints and resource constraints so as to decompose the problem into two subproblems: a cyclic scheduling problem ignoring resource constraints: loop shifting, and a standard acyclic graph for which efficient techniques are known: loop compaction. Initialy defined on parallel iden-tical processors, this approach has been extended to pipelined processors with latencies in [DH00a] . The worst case analysis of the extended algorithm is given in [BH09b] which provides a performance guarantee when arbitrary list schedule is used for loop compaction in the acyclic stage.
Loop shifting
The idea behind DSP is to define a shift for each operation to extract an acyclic subgraph of the initial uniform precedence graph and to schedule it using any non-cyclic scheduling algorithm. The acyclic schedule provided by the algorithm is then moved, using the initial shift, to a periodic schedule which period is the makespan of the acyclic schedule. [CDR98b] propose to use retiming as a shift and they define a legal retiming by:
The intuition behind retiming is that (O i , q), which corresponds to the (q + 1) th execution of the first instance (O i , 0), can also be interpreted as the
Changing the definition of first occurences of operations allows to interleave operations of different iterations into the new first iteration. Precedence relations also move:
Hence for these new generic operations (O ′ i ) 1≤i≤n , the first iteration fulfils the precedence relations given by a graph called G R computed from G by keeping only the arcs for which R j + ω j i − R i = 0. Notice that G R is acyclic since G has no zero height circuit.
Several ideas have been investigated to find a legal retiming: Gasperoni and Schwiegelsohn [GS94] approach can be reformulated as finding a legal retiming associated to a periodic schedule assuming unlimited resources. In [CDR98b] , where using retiming for loop shifting is formalized, the authors consider two optimizations:
• the length of the longest path in G R minimization • the number of edges in G R minimization, so as to reduce the number of precedence constraints for loop compaction For the first objective, such retiming can be computed in polynomial time using the retiming algorithm due to Leiserson and Saxe [LS88] for clockperiod minimization. Huard and Darte [DH00a] extended this algorithm to precedence latency problems.
For the second optimization, in [CDR98b] an ILP formulation was formulated to solve this problem. Then, [DH00a] defines a polynomial purely graph algorithm inspired by a minimal cost flow algorithm proposed by Fulkerson, known as out-of-kilter method [GM85] . An algorithm that computes retiming which combines longest path and zero weighted edges minimizations was also proposed to solve this problem.
Loop compaction
The idea behind DSP approach is to choose a particular retiming R, use a guaranteed algorithm to get a schedule π of G R , and then to extend the guarantee to the induced periodic schedule.
List scheduling algorithms are the most used heuristics for scheduling with precedence and resource constraints. A simple list scheduling algorithm consist on scheduling operations as early as possible in order to meet resource constraints. To each operation, we can give a label to reflect the scheduling priority of that operation. Then, a list algorithm with priorities builds a solution by scheduling at each time the heighest priority operation among a set of concurrent operations ready to be issued. An intuitive priority would be the longest path in the acyclic graph. Zinder and Roper [ZR98] proposes to compute priorities taking into acount also the resouce constraints. The mean idea is to compute the priority of each operation using the schedule, that meets the precedence and resource constraints, of its successors. This algorithm was initially introduced for parallel processors systems and extended to pipelined processors in [BH10] .
DSP algorithm
To formalize DSP approach, let R be a legal retiming of G and π be any (non cyclic) schedule of G R that fulfills the resource constraints as well as the precedences induced by G R . We note π i the start time of operation O i in this schedule. In fact, a slightly lower value of λ R can be computed as follows:
. Then, for any integer q ∈ N setting
we have the following result:
Lemma 1 For any feasible retiming R, σ(R) is a feasible periodic schedule of G with period λ R .
Proof. First, we prove that, at any time slot t, the operations scheduled at t in σ(R) meet the resource constraints. We note F t = {O i |q ∈ N, σ q i (R) = t} the set of operations whose occurrences are scheduled at t. Let O i and O j be two operations in F t . We note q and q ′ their corresponding occurrences issued in time slot t. Hence,
Since π i < C R max ≤ λ R and similarly π j < λ R , −λ R < π i − π j < λ R and then, π i = π j and R i + q = R j + q ′ .
Hence, O i and O j are performed on the same time slot π i in the acyclic schedule π. Thus, O j ∈ F π i and then F t ⊆ F π i . Since π fulfills the resource constraints induced by G R , F π i (and then F t ) meets the resource constraints.
For precedence constraints, we need to prove that, for any arc (O i , O j ) ∈ E and any integer q ∈ N:
Hence, we have to verify that the following inequality is satisfied for each
We have two cases: either (O i , O j ) is kept in G R or not.
• In the first case, R j − R i + ω j i = 0. Since π fulfills the precedence constraints induced by G R , we have π i + θ j i ≤ π j . Then the inequality is satisfied.
• In the second case, R j − R i + ω j i > 0 and by definition,
Thus the inequality is available in this case too.
which achieves the proof. Now to illustrate the DSP approach, we consider the following generic algorithm by using a list algorithm to produce π.
Algorithm 1: Extended DSP 1. Find a legal retiming R for G;
3. Perform a list scheduling on G R coping with both precedence and resource constraints. Compute π i the start time of operation O i in this schedule and λ R ; 4. Define the cyclic schedule σ(R) by:
A legal retiming for G of Figure 2 is given in A schedule built by Algorithm 1 with period λ R = 3 is depicted in Figure  4 . The two arcs of G R are shown. We remark that, in this cyclic schedule, there are some idle cycles. In [BH09b] the following worst case bound is established when one of the retimings described in section 4.1 is applied for loop shifting and a list algorithm is considered for loop compaction:
Theorem 1
where B max and θ max denote respectively the maximum capacity of each resource and the maximum precedence latency.
Notice that [DH00a] studied the worst case behavior only on identical parallel processors and this is the first worst case performance bound produced for pipelined processors.
This theoretical worst case ratio is quite large, and could discourage people to use such approach in industrial applications. Thus, in the next section, we propose to study the efficency of different DSP algorithms, extending those given in the litterature, from an experimental point of view.
Experimental study
The first experimental observations for shifting algorithms are depicted in the following Table 8 : Average distances to the best period given by DSP algorithms.
For ST200 architecture, the first and the second optimise the critical part of the acyclic graph and give the best period for more than 80% of instances. However, the third and the fourth retimings are more efficient for random resource demands. This observation confirms the conclusion of [DH00b] , that zero weighted edges minimization is efficient mainly when resources are very limited, while Gasperoni and Schwiegelsohn heuristic give good results in the opposite case.
For the acyclic scheduling level, using priorities like longest path or Zinder-Roper priorities gives significantly better results than a simple as early as possible algorithm. We compared compaction using this two algorithms without any shifting and their results were almost similar (see . Table  9) , with a slight advantage of longest path priorities whose running time is smaller.
Model
ST200 RCCSP Compaction ǫ moy ǫ = 0 ǫ moy ǫ = 0 liste 0 0.72 75% 0.78 64% liste 1 0.14 95% 0.22 78% liste 2 0.06 95% 0.14 89% Table 9 : Comparaison of list algorithms for loop compaction.
As these algorithms are relatively fast, one can properly design a platform implementing them all and choosing the minimal period among the computed values. Now we consider the best period given by approximation algorithms considering only the instances solved by ILP formulations in order to compare the two approaches. An extract of the corresponding results is given in Table  10 .
Model ST200 RCCSP Tests reaching λ opt 30/36 1 20/28 2 Average ratio to λ opt 1.03 1.04 Maximal ratio to λ opt 1.33 1.21 Average distance to λ opt 0.25 0.82 Maximal distance to λ opt 3 5 Table 10 : Comparaison of DSP solutions to optimal periods computed by ILP.
We first evaluated the ratio between the period given by DSP algorithms and the optimal period given by ILP formulation. With a value not greater than 1.3, it appears that the practical worst case ratio is much lower than the theoritical one. Also for the random ressource allocations, the average ratio is below 1.05. Moreover, we can see from these statistics that the optimal period is reached for more than 82% of instances. For the other instances, except some critical ones, the distance to the optimal period is equal to one. This proves that decomposed software pipelining is still an interesting technique even with complex resource and precedence constraints.
But like any heuristic, DSP algorithms can fall into traps and, for some of our instances, they might give a period with five time slots more than the optimal period.
In conclusion, we notice that ILP formulations may induce very long execution times and could not solve some instances of the problem in a reasonable time. On the other hand, DSP approach is very efficient in general, but can have a relatively large deviation from the optimal for critical instances. However there are opportunities for improvement, and in the next section, we present a new algorithm to overcome the weaknesses of this two approachs.
In this section, we propose to combine exact method and approximation algorithms to exploit the good properties by applying them to problems they can efficiently solve and to avoid as much as possible their respective defects.
Hybrid algorithm
The mean idea is to use an optimal algorithm for loop compaction instead of an approximation algorithm. So we propose to use the results of the first level of DSP approach and then the Eichenberger et al. formulation described in section 3.2 is applied for the second level. We get the following algorithm:
Algorithm 2: The hybrid approach 1. Find a legal retiming R for G;
2. Compute λ R by solving the following system
3. Define the cyclic schedule σ(R) by:
Algorithm 2 replaces the third step of Algorithm 1 by the computation 23 of an exact solution by ILP instead of a list algorithm. Thus it computes for each operation O i , the corresponding retiming R i by one of DSP algorithms. Then, these values are injected in Eichenberger et al formulation to replace the variables α i (period in which the first occurrence of operation O i is placed in the schedule). Notice that unlike the ILP approach described in section 3, once a legal retiming is given, the minimization of the period λ can be solved directly by integer linear programming, without a second criteria and binary search for the minimal λ.
Experimental study
This algorithm is computed for the architectures described above and the corresponding statistics are given in Table 11 .
Model ST200 RCCSP CP U ms sec min > 1h No 1 ms sec min > 1h No 2 ♯Oper ♯Instances Total  36  35  1  0  0  0  29  0  3  2  2   Table 11 : Distribution of instances depending on theirs sizes and computing times by the hybrid approach.
Compared to ILP formulation results given in Tables 13 and 15 , we notice that there is a significant decreasing in computation time and more instances are solved. For ST200 architecture, all tests reach the optimal solution, even for the critical instance that is not solved by ILP formulation, as the period given by the hybrid approach is equal to the resource bound.
For the random resource allocation architecture (modified instances), the improvement is seen mainly in the computation time and in the number of instances solved by this approach compared to ILP formulations. However, there is almost no difference compared to list algorithms results given in Table 10 . The hybrid approach improves the DSP period by one time slot for 4 instances, and for the remaining ones, the period is equal to that given by list algorithms. This proves that list algorithms give very efficient results for the acyclic level of DSP. On the other hand, the limited efficiency of this approach in this case may be due to the fact that the retiming does not take into account the criticality of resources which is more important for this architecture. This points a little weakness of DSP approach which is based on the separation into two levels: retime then compact. So, even if we rely on an optimal algorithm in the second level, the choice of the retiming may not be the best one for loop compaction.
Concluding remarks
This paper reports an experimental study of integer linear programming (ILP) formulations and Decomposed Software Pipelining (DSP) algorithms to solve the resource-constrained modulo scheduling problem. The experiments were ran on a set of real instances as well as randomly modified instances to reach more variability on resource usage. Although experiments had been made separately on both techniques, it is the first time that the results of DSP algorithms are compared to the real optimum. The experiments indicate that ILP formulation cannot solve all the problem in-stances in reasonable execution time, and that DSP may fail in some cases to find a really close to optimal solution although the performance of this approach is very good in terms of execution time and mean deviation to optimal. To overcome the drawbacks of the two algorithms, we proposed a new hybrid algorithm based on a combination of the retiming approach and of ILP formulation. The methods solves fastly the set of industrial instances to optimality and obtain good solutions on the set of harder instances with random resource demands. The approach shows however some limits that suggest that an integration of resource constraints during the retiming phase as well as the design of a dedicated branch and bound replacing ILP for the second phase are promising research directions. .0020 1 0 gsm-st231.5 11 0.0001 11 0.0020 1 0 gsm-st231.6 7 0.0000 7 0.0020 1 0 gsm-st231.7 11 0.0001 11 0.0020 1 0 gsm-st231.8 8 0.0000 8 0.0020 1 0 gsm-st231.9 28 0.0001 28 0.0020 1 0 gsm-st231.10 4 0.0000 4 0.0020 1 0 gsm-st231.11 20 0.0001 20 0.0020 1 0 gsm-st231.12 8 0.0001 8 0.0020 1 0 gsm-st231.13 19 0.0001 19 0.0020 1 0 gsm-st231.14 10 0.0001 10 0.0020 1 0 gsm-st231.15 8 0.0000 8 0.0020 1 0 gsm-st231.16 16 0.0001 16 0.0020 1 0 gsm-st231.17 9 0.0001 12 0.0020 1 0 gsm-st231.18 53 45.0000 53 900.0000 1 0 gsm-st231.19 8 0.0000 8 0.0020 1 0 gsm-st231.20 6 0.0000 6 0.0020 1 0 gsm-st231.21 18 0.0001 18 0.0020 1 0 gsm-st231.22 18 0.0001 18 0.0020 1 0 gsm-st231.25 16 0.0002 16 0.0020 1 0 gsm-st231.29 11 0.0001 11 0.0020 1 0 gsm-st231.30 7 0.0000 7 0.0020 1 0 gsm-st231.31 11 0.0000 11 0.0020 1 0 gsm-st231.32 15 0.0000 15 0.0020 1 0 gsm-st231.33 15 0.0000 15 0.0020 1 0 gsm-st231.34 4 0.0000 5 0.0020 1 0 gsm-st231.35 6 0.0000 6 0.0020 1 0 gsm-st231.36 10 0.0000 10 0.0020 1 0 gsm-st231.39 9 0.0000 8 0.0020 1 0 gsm-st231.40 10 0.0000 10 0.0020 1 0 gsm-st231.41 18 0.0000 18 0.0020 1 0 gsm-st231.42 6 0.0000 6 0.0020 1 0 gsm-st231.43 9 0.0000 11 0.0020 1 0 33 
