A method based on s o m a r e pipelining has been recently proposed to optimize mono-phase clocked sequential circuits. The resulting circuits are multi-phase clocked sequential circuits, where all clocks have the same period. To preserve finctionality of the original circuit, registers must be placed according to a correct schedule. This schedule also ensures the maximum throughput. In that method, it is question of ( I ) how to determine a schedule that requires the minimum number of registers, and (2) how to place these registers optimally. In this paper, problems (1) and (2) are tackled simultaneously. More precisely, we deal with the problem of determining schedules with the minimum register requirements, where the optimal register placement is done during the schedule determination. To optimally solve that problem, we provide a mixed integer linear program that we use to derive a linear program, which is polynomial-time solvable. Experimental results confirm the effctiveness of the approach, and show that significant reductions of the number of registers can be obtained.
Introduction
Software pipelining is a powerful technique for increasing the instruction-level parallelism for parallel processors. This method overlaps the execution of successive iterations. It has recently been used to develop a method for optimizing mono-phase clocked sequential circuits [2] . The resulting circuit is amulti-phase clocked circuit, where all clocks have the same period. That method may be described as follows. First, the optimal clock period is determined, and a schedule of all the functional elements of the circuit is computed. Second, in order to preserve the behavior of the original circuit, registers are placed, independenfly of their initial placement, according to that schedule. Finally, once the registers are placed, the phases are determined.
With this method, it is question of (1) how to determine a schedule that produces the minimum number of required registers, and (2) how to place the minimum number of registers even if that schedule is already determined. Solving (1) and (2) is of great interest, since reducing the number of registers allows to reduce the number of control signals, the area of the circuit, and the power consumption.
In 141, the authors have provided two polynomial-time solvable methods to determine schedules for reducing register requirements, and the number of the required phases. Compared to the original method [2], these methods proved very efficient in reducing the number of registers and the number of the required phases.
Nevertheless, the problem of how to efficiently place registers in the circuit is not addressed.
In this paper, we focus on solving simultaneously (1) and (2) that are outlined above. More precisely, we tackle the problem of determining schedules that yields the minimum number of registers, where the optimal register placement is done during the schedule determination. To optimally solve that problem, we provide a mixed integer linear program (MILP), which we use to derive a linear program (LP) that is polynomial-time solvable. To test the effectiveness of the approach, we experiment the M E P and the LP on well known benchmarks, and we show the superiority of that approach over the original method [2].
This paper is organized as follows. The next section gives some notations and definitions used in this article. Section 3 briefly reviews the registers placement step in the method based on software pipelining, which was outlined above. Also, it shows that the algorithm used to place registers is greedy. The problem we tackle and its optimal solution are presented in Section 4, and a linear program for that problem is given in Section 5. Section 6 provides experimental results and Section 7 concludes the paper.
Preliminaries

The cyclic graph model
In order to minimize the clock period of a synchronous sequential circuit, it is modeled (as in [2]) as a directed cyclic graph G = (V, E, d, w ) , where Vis the set of functional elements in the circuit, and E is the set of edges which represent interconnections between vertices. Each vertex v in V has a non-negative integer propagation delay d ( v ) E N , which is assumed to be fixed. Each edge e,,,,, from U to v, in E is weighted with a register count w ( e y ,,) E N , representing the number of registers on the wire between U and v. Figure 1 presents an example of a circuit and its directed cyclic graph model. In this figure, large rectangles represent functional elements, and small rectangles represent registers. Wires are oriented to show the propagation direction of the signals. The propagation delay of each functional element of this circuit is specified as a label on the left of each large rectangle. This example will be used through this paper, and will serve to illustrate the initial specification for the problem to optimize. The initial specification is in general a synchronous circuit with a single-phase clock. The minimum clock period of the circuit in Figure 1 as specified is 7,
Circuit.
Cyclic graph model. 
2.3.
Maximum throughput of synchronous sequential circuits
The throughput, T, of a synchronous sequential circuit is bounded by the inverse of the length, P, of the critical paths in the circuit. Based on data dependencies constraints only, the maximum throughput is [l]:
We define a schedule s 
where e,,, , , E E and P = 1 / T . A binary search may be used to find the minimal value of P for which there is no positive cycle in G , [l] . Without loss of generality, we assume that P is greater than or equal to the execution delay of each computational element in the circuit.
For the example in Figure 1 , we have that P = 6. This value corresponds to the cycle defined by vertices vl, v2, v4, and v5.
wp(e,, , ) = d(u) -P . w(e,, ,J, -
Schedule for a given throughput
From equation (1) and inequality (2), we have that: w(e,,, ,,I .
In the case of periodic schedules, determining a valid schedule of all the instances of each vertex v in Vis equivalent to determining so(v) for each v in V, which is also equivalent to determining solutions to the system of inequalities described by (5). To solve this system, the graph Gp, previously described, may be used. To find an ASAP schedule, Bellman-Ford's algorithm [5] for longest paths, from a chosen vertex v, tis the others, may be applied on the graph G,. Finding an ALAP schedule may be done as follows.
Step 1, a graph G has to be derived from G , by inverting the direction of each edge in G p Step 2, Bel.lman-Ford's algorithm for longest paths, from the vertex vX to the others, has to be applied on the graph G', where the weights of its edges are defined by equation (4). Finally, step 3, the ALAP schedule is obtained by multiplying each result in step 2 by -1. Relatively to v ,~ = vl, the ASAP schedules of vertices v,, v2. v3, v4, v5, and v6 of the circuit in Figure 1 are 0, -3, 3, -1, -4, and -3, respectively. Their ALAP schedules are 0, -3,4, -1, -4, and 1, respectively. 
Schedule graph
(6)
Because s is periodic with period P , equation (6) may be written as follows: Ve,,. ,, E E, ws(e,,, = so(v)-s0(u) + P . w(e,,, ,.) (7)
The graph G, is consistent if and only if for each edge eu, L' in E, w,(e,,, ,) 2 d(u) . This is derived from equation (2). Figure 2 shows a consistent schedule graph, where edges are labeled with w, values, for the circuit in Figure 1 , using the ASAP schedule determined in Section 2.4. 
Register placemenit
In the method proposed in (21, which was outlined in Section 1, a register placement step is needed in order to preserve the behavior of the original circuit. The placement of registers is derived from a schedule graph G,, by breakinl: every path in G, that is longer than the optimal clock period P . Fcir paths having a length less than P, no register is required because operations chaining is assumed.
For the circuit in Figure 1 
Problem formulation and optimal solution
As mentioned in Section I, two problems arise in the method based on software pipelining proposed in [2] : the first one is how to determine a schedule that yields the minimum number of required registers, and the second one is how to place the minimum number of required registers even if that schedule is already determined. Our focus in this paper is to simultaneously solve these two problems. More precisely, the problem (Prob) we tackle is to determine a schedule with the minimum register requirements, where the register placement is done during the schedule determination. To optimally solve Prob, we provide a mixed integer linear program (MILP), and use it to derive a linear program which is polynomial-time solvable.
Before presenting that MILP, let us first give some requirements. Figure 4 gives a portion of the cyclic graph modeling the circuit, where i and j are two computational elements. xi, denotes the number of registers that must be placed on the arc e i j to guarantee that the length, l i j of every path that goes tojvia i, is less than or equal to the optimal clock period P . lij will be defined in the following. Note that as in [2] , operation chaining is assumed, and hence no register is required if I;, 2 P . Suppose that paths that go to j via i are already examined in order to determine if some registers must be placed on them or not. Let m; bea no-negative real greater than or equal to each rest obtained by dividing the length of each one of those paths by P . The length lid of every path that goes to j via i is the sum of mi and w,(e. .), where ws(eij) is defined by equation (7). yi,; is the rest of the &ision of lij by P . We require that mi I ( P -d ( i ) ) , which guarantee that if a register R is on the output of computational element i, then its schedule will be after i finishes its execution. Figure 5 presents a mathematical formulation to Prob. The objective function expresses the number of registers to be placed in the circuit. Equations (8), (9), and (10) are equivalent to the definition of xi, y;, and mi, respectively. Inequality (11) is equivalent to (5). (13) is required, since the number of registers must be an integer. In this formulation, the variables are xi,j yi, j , mi and the schedule so(u) for each computational element U.
The formulation in Figure 5 can be linearized as follows. Using the fact that LxJ I x < LxJ + 1 , and that no register is required if the length of a path is less than or equal P, equation ( Equations (9) and (10) together can be replaced by
Vek,i E E , m i > ( s o ( i ) -s o ( k ) + P . w ( e k , i ) + m k ) -P .~k , i
After linearizing the formulation in Figure 5 , we obtain(!&! MILP to optimally solve Prob as presented in Figure 6 . In this figure, equations (16) and (17) are equivalent to (14). (18) is  equivalent to (15). (19) , (20) and (21) are equivalent to (12), (11) and (13), respectively. The variables are not negative. 
Ve;, E E , xi, is integer (21) I Figure 6 . A MILP to optimally solve Prob.
A linear program for solving Prob
Linear programs are polynomial-time solvable [7, 81. A linear program for solving Prob can be obtained by deleting the constraint that xi, is integer in Figure 6 . In this case, once the linear program is solved, the number of registers to be placed on the arc e i j is 
Experimental results
To test the effectiveness of our approach, the MILP in Figure 6 and the corresponding linear program (LP), obtained by ignoring the constraint integer in (21), are experimented on well known benchmarks. Circuits from the ISCAS89 benchmark suite are used to test the efficiency of the LP in terms of the run-time and of the reduction of the number of registers inserted in the circuit. The mathematical formulations for each circuit are automatically generated by a module we coded in C++ and integrated in a tool we developed in (41. We did not implement the cited polynomial-time algorithms for linear programs, but the Lp-Solve tool [ 1 I] (in the public domain) is used to solve the generated mathematical formulations. Obtained results are given in Tables 1 and 2 , where the first column gives the name of the circuit and the second column presents the number, N I , of registers placed using the algorithm in Table 2 , the fifth column gives the run-time in seconds on an UltraSparc 10 with IGB RAM. As Table  1 reports, significant reductions of the number of required registers are obtained. Substantial reductions are also obtained using the LP. Indeed, as summarized by 
Conclusions
A method based & software pipelining has recently been proposed to optimize mono-phase clocked sequential circuit. The resulting circuit is a multi-phase clocked circuit, where all clocks have the same period. To preserve the behavior of the original circuit, registers are placed according to a schedule, which has the maximum throughput.
In that method, two problems arise: how to determine schedules that lead to a minimal register requirements, and how to place the minimum number of required registers even if these schedules are already determined.
In this paper, we have sirnultaneously tackled these two problems. We have provided a mixed integer linear program and used it to derive a linear program, which is polynomial-time solvable. Experimental results on well known benchmarks confirmed the effectiveness of the approach we propose. Indeed, significant reductions of the number of required registers have been obtained in very short run-time.
