As the feature size of integrated circuits is reduced to the deep sub-micron level or the nanometer level, the interconnect delay is becoming more and more important in determining the total delay of a circuit.
Introduction
As the feature size of integrated circuits (IC) is reduced to the deep sub-micron level or the nanometer level, the interconnect delay is becoming more and more important in determining the total delay of a circuit [1, 2] . In the traditional design flow of IC, high-level synthesis (HLS) does scheduling and allocation first, and then floorplan determines the actual positions of modules in the physical design. As little information concerning the interconnect delays can be gained in the high-level synthesis phase, the interconnect wire delay is only optimized in the layout phases. However, changing the topological structure of the circuit in synthesis phase will have a significant effect on the interconnect delay.
Some researchers have already addressed the problem of incorporating physical design information in high-level synthesis. The 3-D scheduling algorithm [3] considers floorplan in high level synthesis. The algorithm decides the shape and position of each functional unit by floorplanning concurrently as operations are scheduled and functional units are allocated. Gridbased (GB) [4] algorithm does not consider the scheduling problem. Instead, it combines binding with onedimensional floorplan, and translates the problem into a two-dimensional grid placement problem to minimize the interconnect wire length. However, the use of one-dimensional floorplan is very limited. Choi and Levitan [5] improved the method by using more accurate estimation of interconnect wire area and delay, but the method still uses one-dimensional floorplan. Other approaches are possible. BINET [6] is a binding algorithm that performs incremental binding and floorplan on a previously scheduled result based on a network flow model. Shantanu and Miriam [7] have incorporated floorplan into the high-level synthesis formulation using a data-transfer model. Prabhakaran and Banerjee [8] In all the algorithms discussed above, high level synthesis and floorplan are integrated. However, simply combining them together will cause a dramatic increase in the scale of the problem's searching space. In order to accelerate the search phase, the precision of floorplan is often sacrificed. To overcome this problem, a reallocation and rescheduling procedure based on a good floorplan result can be used. The procedure can be very useful for optimizing the interconnect delay of a circuit. However, in a general method, the floorplan will be destroyed during the re-synthesis phase.
In this paper, a force-balance-based re-synthesis algorithm is presented. With this method, the floorplan result is kept as an invariant. The algorithm can be used either in getting a new synthesis solution based on the floorplan result when these two phases are integrated together, or in optimizing the interconnect delay after the floorplan phase is finished.
Re-Synthesis After Floorplan
The inputs to the algorithm include the following: a schedule that assigns operations to control steps, a binding that assigns operations to functional units, and a floorplan that assigns functional units to positions. The output of the algorithm is a new schedule and a new binding with an unaltered floorplan.
For a synchronized circuit, the total delay of the circuit can be calculated by T ts = (1) where T is the total delay of the circuit, t is the delay of each control step, and s is the number of cycles.
As shown in Eq. (1), when the number of cycles is fixed, the total delay T depends only on the length of t in Eq. (1). The main objective of our approach is to reduce t in Eq. (1) by reducing the interconnect delay of each control step. t is calculated by t =max (t i ), 0≤i≤max_steps (2) where t i is the exact delay of step i.
The exact delay of step i is calculated by t i = max (t ij ), 0≤j≤max_unit_number (3) where t ij is the exact delay of the operation scheduled to begin at step i and to be allocated to unit j.
Assume that operation op is scheduled to execute from step i to step i+k−1; the exact delay of op is calculated by t ij = (t f + t in + t out )/k (4) where t f is the delay of the functional unit, t in is the delay of the input wires, and t out is the delay of the output wires.
t in and t out in Eq. (4) can be calculated by
where (in / out ) i t is the delay of the i-th input/output wire of the operation. In our approach, since no routing is performed, the lengths of interconnect wires between two functional units are simply estimated by the half perimeter of the minimum rectangle that contains the two functional units. The Elmore delay model is used to calculate the delay of the interconnect wires, as shown by 1 ,
where r is the resistance of wire per unit length, c is the capacitance of wire per unit length, and l is the length of wire.
In the high-level synthesis phase before floorplan, no information about the interconnect delay can be considered. The scheduling and allocation task can only proceed based on a delay-less interconnect model. Re-synthesis after floorplan can achieve an optimization based on accurate estimation of the interconnect delay. To avoid the usual problem of destroying the floorplan performance in the re-synthesis phase, a force-balance-based interconnect delay driven resynthesis algorithm (FIDER) is used as described in the following.
Basic idea of FIDER
In this paper, the behavior of the circuit is presented by control-data flow graph (CDFG) [9] . A two-dimensional grid is used to present the scheduling and allocation result as shown in Fig. 1 , which is first introduced by Jang and Barry [4] . The columns of the grid represent scheduling control steps, while the rows represent the functional units to be used in the circuit. The result of scheduling and allocation can be considered as the placement of the two-dimensional grid ( Fig. 1) . In Fig. 1a , Operation B is placed in Row 2 and Column 2, which means that Operation B is scheduled to
Step 2 and allocated to Unit 2. If Operation B is moved into Row 2 and Column 1, as shown in Fig. 1b , then this means that Operation B is reallocated to Unit 1.
As described before, the main objective of our approach is to reduce the maximum delay of each control step. The main idea of the re-synthesis procedure is presented in Fig. 2 . The initial result of the synthesis procedure is shown in Fig. 2a , and the generated floorplan based on this result is shown in Fig. 2f. From Fig. 2f , it can be seen that Operation A is allocated to functional unit f 2 , which is too far from reg4 to satisfy the delay constraint (Assume that f 3 is just the very position for Operation A).
(1) The operation allocated to f 3 (Operation C in this case) is taken out of the grid. Then, Operation A is reallocated to f 3 , as shown in Fig. 2b .
(2) Operation C is allocated to another functional unit. If it is possible to allow Operation C to be executed in more control steps, then it is rescheduled, as shown in Fig. 2c .
(3) The grid is perturbed by rescheduling some operations under some constraints, as shown in Fig. 2d .
(4) These procedures are repeated to find a better solution.
In this way, we can find an improved solution without changing the floorplan result.
Reallocation after floorplan
For each op in the grid, the local path set S of this operation can be defined as S fR =< >
In Eq. (7), f is the functional unit where op is allocated, and R is the set of registers which have a direct data flow with op, as shown in Fig. 3 . A tuple <f, r> denotes a wire between functional unit f and register r. It is obvious that the best functional unit for an operation is the one in which interconnect wire length is minimized. However, minimizing the interconnect wire length of one operation may cause a wire length increase for other interconnects. The main objective of reallocation is, therefore, to reduce the length of interconnect wires of all operations impartially.
For each op in the grid, assume that the operation is allocated to functional unit f. A virtual force that acts on the op is calculated by In this case, the best position for op on a chip is where the virtual force acting on op is minimized. Additionally, the best situation for the circuit is that all its operations are located at their best or nearly best positions. In the grid representation, the best column for op in the grid means the best functional unit for op on the chip. The reallocation algorithm is described as: 
Rescheduling after reallocation
The lifetime of variables must be considered when a rescheduling procedure is carried out. In our approach, another two-dimensional grid is used to represent the scheduling and allocation of variables. As shown in Fig.  4 , the columns of the grid stand for the registers, while the rows of the grid stand for the lifetime of the variables. Variable b is scheduled in Steps 2 and 3, which means that the variable is produced at the end of Step 1, and must be stored in a register at the very beginning of Step 2, and be held until Step 3 is finished.
Fig. 4 Use a two-dimensional grid to present the allocation of registers
In order to identify these two grids in the text to follow, the grid shown in Fig. 1 is called a module-grid , and the grid shown in Fig. 4 is called a register-grid.
As shown in Eq. (4), increasing k can reduce the average delay of each step. For example, if an operation o is scheduled to begin execution at step i and finish in the same step, then rescheduling this operation to begin at step i and finish in step i+1 will be helpful to reduce t i in Eq. (3). This rescheduling is not certain to reduce the maximum delay of each step, but will increase the probability of reducing the maximum delay. However, the inputs of each operation should not be changed until the operation is finished. The constraint corresponding to this rescheduling approach can be described as follows.
Assume an operation o is scheduled to execute from step i to step j.
(1) The operation o can be rescheduled to execute from step i−1 to step j if all its previous operations are finished before step i−1 (it is assumed that the inputs of each operation should be stored in registers whenever they are produced, and be held until the operation is finished).
(2) The operation o can be rescheduled to execute from step i to step j+1 if (i) all its succeeding operations are scheduled to begin after steps j+1, and (ii) all registers in its local step set are not occupied in step j+1, i.e., the input registers of this operation must not be occupied by other operations in step j+1.
This kind of rescheduling approach is called an expansion of operations. The algorithm is described as The reallocation and expansion procedures should be called alternately to find a good solution. It must be noted that whenever an expansion procedure is called, the solution before expansion should be backed up and then restored before another reallocation procedure is called. If this is not done, the operation expanded will be stuck into its expanded form, and will not be able to make any further contribution to the reduction of the total circuit delay.
Improvement Procedure
The improvement procedure contains two phases, grid perturbation and a simulated annealing approach.
Perturbation of the module-grid
The simulated annealing algorithm is used widely in circuit optimization. However, the expansion procedure described previously is not suitable for the simulated annealing method. Moreover, adaption of the reallocation to allow a simulated annealing approach will only limit the effect of optimization. In order to solve this problem, we try to perturb the module-grid at various times by rescheduling some operations to different steps. In this way, the search space of the simulated annealing approach will be widely expanded.
However, the perturbations must be called out following some constraints. The approach is very similar to that already described in Section 1. All the perturbations can be divided into two parts.
Assume an operation op is scheduled to execute from step i to step j.
(1) The operation op can be rescheduled to execute from step i−1 to step j−1 if all its previous operations are finished before step i−1.
(2) The operation op can be rescheduled to execute from step i+1 to step j+1 if (i) all its succeeding operations are scheduled to begin after step j+1 and (ii) all registers in its local step set are not occupied in step j+1.
The disturbance algorithm is described as 
Simulated annealing approach of FIDER
A simulated annealing approach is used to call the reallocation and rescheduling procedures iteratively. The main objective of this simulated annealing approach is to minimize t in Eq. (2). 
Experimental Results
We have implemented the reallocation and rescheduling algorithms in the C++ programming language, and executed the program on a SUN UltralSparc workstation v880. The parameters used are: the wire resitance per unit length, r, is 0.075 Ω/µm; the wire capacitance per unit length, C, is 0.118 fF/µm.
We test the algorithm under two fabrication technologies (FT). Delays of the functional units for the different fabrication technologies are given in Table 1 . The initial length of each control step is based on the delay of the adders. As shown in Table 1 , for FT=0.25µm, the adder delay is 2 ns, and the initial length of each control step should be 2 ns. Similarly, the initial length of each control step should be 1 ns when FT=0.18 µm. We use the behavioral description of the Fir11, Iir7, and Ellipf filters in VHDL to check the algorithms. The input VHDL files are firstly compiled into a data flow graph, and then a list procedure is used to make the initial scheduling and allocation solution. An interconnect-aware register allocation algorithm is used to allocate variables to the registers. A corner block list (CBL)-based floorplan algorithm [10] is used to obtain the floorplan. The constraints in experiments are the maximum delay and maximum area of the chip. The maximum number of each kind of functional unit can be specified. If the maximum number for a certain kind of functional unit is not specified, the system will use the maximum available number under the constraint of the maximum allowed area as the maximum number of this kind of functional unit. The basic information for these benchmarks is presented in Table 2 . The experimental results are shown in Table 3 . The best optimization of the maximum delay of each control step results in an improvement by up to 22.19%. The resulting floorplans of ellipf in different fabrication technologies are shown in Fig. 5 .
The conclusion can be drawn from the experimental results that the effect of reallocation and rescheduling strongly depends on the initial synthesis and floorplan, significant benefit can be obtained by reducing the interconnect delay of the circuit. 
Conclusions
In this paper a force-balance-based re-synthesis algorithm is presented. The main objective is to reduce the total delay of a circuit by reducing the circuit interconnect delay. The algorithm can be used to optimize the result of allocation and scheduling after floorplan is made or to quickly adjust the synthesis results based on a temporary floorplan when integrating synthesis and floorplan together. However, at present the proposed algorithm only deals with the optimization of data paths interconnect. We plan to integrate interconnect delay optimization of the controller also into this algorithm. Additionally, the optimization of pipeline architecture should also be integrated into the algorithm, and a more accurate delay model should be used to test the algorithm.
