An incremental timing driven placement algorithm is presented. We invoduce a fast path-based analytical approach for timing improvement. Our method achieves timing optimization by reducing the enclosing bounding boxes of selected nets on critical paths. Furthermore, this technique Vies to minimize modificaUons to the initial placement while improving the delay of the circuit incrementally. ?\U0 contributions of this work are 1) efficient conversion of a path-based timing minimization problem to a geomevic net-constraint problem and 2) minimal modification of a placement to improve timing. Our technique can take an initial placement from any algorithm and improve timing iteratively. The experiments show that b e proposed approach is very efficient.
Introduction
Timing driven placement is an important step in physical design of integrated circuits. In high-speed circuits. a significant portion of timing minimization is performed in the placement stage in the f o m of wire length minimization, followed by further optimization techniques such as buffer insertion either during placement or routing. In complex designs of today, various qualities such as delay, power consumption, heat profile and area are Optimized io an intricate process that can strike a balance between these metricr. Multiple iterations might be needed far Uming doswe. Given the complexities of the designs and tight time-to-market constraints, it is highly desired to have a placement algorithm that 1) needs fewer iterations of the whole optimization cycle, and 2) in minimizing a given mevic such as delay, disturbs the current placement minimally so that other metrics such as heat profile of the design do not change dramatically. For these reasons, incremental placement methods that focus on the most critical paths in the design would be very useful in design convergence.
liming optimization during placement has been an important research topic especially since we entered the deep sub-micron era. Research efforts on this problem can be classified into two categories: path-based and net-based approaches. Path-based techniques enumerate some critical paths and try to optimize them. hence decreasing the overall delay of the circuit. Edgebased methods assign each edge with a "criricaliry" number that indicates how the edge contributes to the delay of the circuit based on the paths that pass through it. Path-based methods could suffer from large runtimes because of the potentially exponential number of paths in a circuit. On the other hand, edge-based methods suffer from the fact that they lose the big picture: they no longer see paths, but only edges. Hence, the efforts to minimize wire lengths corresponding to critical edges might not be as harmonious as the path-based methcds.
Among works on timing driven placement, [I] puts higher weights an timing critical nets in the framework of force directed placement. Difficulty with this approach is to determine how much weight should be increased and which particular nets along the timing path should be weighted more, i.e., the slack assigoment problem. Delay from interconnection could be estimated by the size of net bounding box during placement.
Direct control on the size of bounding box was prexnted in [2][3]. They used linear programming to ensure all nets meet their bounding box constraints. The work in [4] shows a postrouting timing optimization technique with long runtimes. We present a timing optimization tecbnique applied during placement that uses 'het consvaints" to reduce circuit delay.
"Net constraint" is the maximum size of the bounding box of a net.
The net constraint generation process should consider the path. If one critical path consists of a few net segments. more cansvaiot should be put on individual nets on the path compared to a case in which the path consists of many edges. In the latter case, small contractions of the nets can add up to a significant delay decrease, a luxury that the former case lacks. In the past, heuristic methods (or user input) have been used to decide what consvaints to put on different nets. The quality of net constraint generation is crucial to the effectiveness of a net-canstraintbased timing driven placement [3] . Here, we propose analytic formulations to generate net constraints to reduce circuit delay in such a way that placement disturbance is minimum.
Problem Formulation
The delay of a net in the placement step is usually estimated by the half perimeter of the net's bounding box, which is the smallest rectangle enclosing all pins connected by the net. The maximum delay on any path from a primary input pin (or memory element output) to a primary output pin (or a memory element input) determines the delay of the circuit. If the delay is more than the target delay of the design, the placement is invalid and shodd be funher optimized.
If the initial placement misses the target delay goal by a small amount, it has high chances of meeting the timing goal by incremenlal changes. Figure 1 shows a simple illustration of our proposed flow. The input to our flow is the placement generated by any placement method (e.g., Dragon). Then. critical paths are identified as part (a) next, the bounding boxes are contracted as in part (b) and overlaps are removed as in part (c). Finally, the new placement with better timing is obtained as in part (d). Figure 1 ilhstrates one iteration (a-d) of our iterative algorithm.
To improve timing, bounding boxes on the critical paths should be reduced. The number of paths could increase exponentially as the circuit size grows. So. to keep the problem size manageable. we use inremediare rarger riming and consider a certain number of critical paths and optimize placement for Permission lo make digital or hard copies of all or pan of this work for personal or classroom use is granted without fee provided that copies are not made m distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy othetwise, to republish, to post on sewers or to redistribute to lists, requires prior specific permission andor a fee. ICCAD'O3, November 11-13,2003, San Jose, Califomia, USA. these paths. Target timing can be achieved by iteratively reducing the incremental farget timing. Another benefit of this iterative approach is that we can handle critical paths dynamically. If only critical paths (i.e., those currently violating delay bound) are considered during the whole optimization process, other non-critical paths tend to become critical as we move cells around and change bounding boxes of other nets. An iteratively incremental approach can handle those new critical paths in the following iterations.
Two different approaches to the timing optimization are given here. The first approach uses net bounding boxes as intermediate variables in the timing optimization process: path negative slacks are translated to net contractions in one linear programming formulation, followed by another problem that achieves net contractions by moving cells. The second approach uses a single formulation to move cells directly in order to meet timing comtraints, while minimizing cell movements. Even though the second approach is more intuitive, but it results in far worse runtimes and is presented in the Appendix. Sections 2 and 3 describe the first approach.
The details of our approach are presented in subsequent sections in the following order. Section 3 shows how to identify nets whose bounding boxes are to be reduced. In Section 4, we show how cells are moved to other places with minimum impact on the initial placement. The final step of Ule proposed technique is legalization and is explained in Section 5. Experimental results are in Section 6, followed by concluding remarks in Section 7. 
Net Constraint Generation
In this section, we describe OUT incremental timingoptimization technique. Our method is two-tier. In the first step, we try distribute "net constraints" on net bounding boxes to reduce delay on critical paths. The next section describes how we translate bounding box changes to individual cell movemenls Distributing negative slack of paths to individual nets has to be done carefully. If many critical paths share some nets, timing on those critical paths can be efficiently improved by reducing the bounding-boxes of the shared nets. This strategy aims to minimize the perturbation to the placement. The amount of reduction of bounding boxes can be determined analytically using linear programming.
For a placement with a delay djn2,,, and for incremental target timing, d4# (= dgn,,kl -At), all paths that violate the incremental target timing are identified. If there are too many paths violating this dEr3 only K paths are used and d,, is adjusted accordingly (in the experiment, K is set to 2000). For a path parhi with path delay d4, the amount of delay that should be decreased along parhi is (d, -4,) . The objective of the linear program is to minimize the net changes in order to minimize the impact on the placement. In the following expression, Mu (> 0) is the decrease of wire delay on the jrh net on path,. Consider k ( S K) most critical paths, whose delays are more than d,. To achieve the target timing, the following constrains should be met: In (4). we construct the objective function in a way that each critical net appears only once, which is indicated with (*). As a result. this objective function will give higher priority to reducing the bounding boxes of the shared nets because they appear in more (3) inequalities.
To minimize the changes to the placement, each AB, and AB, is bounded IO small values. as Follows.
SP 8 ,
E+ and B , are the size of the bounding box of net j of parhi fromthe current placement. To decide the minimum p in (5). we try to solve the linear program using (3), (4) and ( 5 ) with a small p . If no feasible solution is obtained, we increase it by a small amount (in the experinient, 5%). Intuitively, large AB, values are likely to change the current placement significantly and may cause more non-critical paths to become critical. Hence we opt for small p . By forcing all M's as non-negative reduction. no net bounding boxes on critical paths will he expanded. The output of the linear program will be AB, and AB, values that have to be enforced 10 meet he incremental timing goal.
Contraction of the Critical Nets with Minimal Placement Changes
Using these tighter bounding boxes from the linear program in Section 3, cells are moved to other positions to meet the net constraints (i.e., hounding box changes are translated into cell movements). The way that cells are moved to other positions should be done with minimum perturbation to the current placement. This problem can he modeled as an optimization problem. whose objective is minimum placement perturbation with constraints that all pair of connected bounding boxes by some shared nodes should he kept as linked. To better illustrate this idea, consider the example of Figure 2. Net, connects cell,, cell3 celli and cell4, nefl connects cell3, cell4 and c d , and nrti connects c~l l .~, ce&, cell, and cell,. Assume that the solution to the linear equations from Section 3 requires tho1 the horizontal length of the bounding box of neti be reduced by 30%. with no reductions on nef, and "el.?. The solution would be somcthing likc pan (b) in Figure 2 . In this example, lhme are overlaps between net, and net2 and between ner, and net.?.
The solution would require only cell, and cell, to move toward the center of ner2. But if "eli is to be reduced by 60% in the horizontal direction, thc net conlractian will result in part (c) in Figure 2 . In this case the centers of ?el2 and ne13 will move as a result of displacing cell>, cells and cell8.
This technique reduces those critical nets at the expense of other non-critical nets. Once cells are moved to other places using the above technique, some overlaps are created. Hence, after the net contraction phase, an overlap resolving step should be followed. (venical) coordinates, cells are assigned to two child bins, which are created by a vertical (horizontal) cut of the placement area. This is followed by the same bisection placement in alternating horimontallvenical directions until all overlaps are removed. In our experiment, during this process. white space was actively utilized to keep the total wire length increase small. If initial placement has uniformly distributed white space of IO%, we relaxed the minimum white space of each child region to 5%.
Overlap removal
The overlap removal method is fast because the only computation involved is sorting of the cells by their coordinates.
Experimental Results
We have implemented the proposed analytic approach for timing optimization and experimented on a Linux machine with Intel Pentium 930 MHz processor with 512 MB memory. We used Ip-solver [SI to solve the linear program and OOQP 161 to solve the quadratic program. We ran our implementation for 4 test circuits from [7] and 4 from [SI. The same wire delay mcdel as 171 and 181 were used respectively.
As input to our approach, we got an initial placement by running the placer of 171 and [SI, followed by iterations of our proposed algorithm (either 5 iterations or until wire length increase was more than 10%). We set incremenral rorger riming as follows and the uscd number of critical paths was 2000.
ti, = (*,",re", -f"ewj,<&,aj ) x 0.8 + tWW"?
The results are shown in Table 1 . AS we continue iterating the proposed optimization, we observed a trade-off curve, i.e. the total wire length generally increases and circuit timing improves.
An example of the trade-off curve is shown in Figure 3 . In the case of placements generated by Dragon, our approach resulted in more than 10% increase in wire length in its first iteration, so only one iteration was used. Runtimes reported in the "our mntime"column are the total runtimes of all iterations.
Conclusion
In this paper, we proposed an analytic incremental placement technique to improve circuit timing. Our approach uses an optimization technique to minimize Ihe perturbation on the placement. Experimental results show that the changes are small while achieving timing improvement in very shon runtimes. experiments showed that this formulation gives a little worse quality (for the same delay, wirelength increase is worse) and very long run-time (2m vs 4h). This result is shown in Figure 3. . . . from w machine, I*: using a placer based on 171 ma 1.5 GHz Pennrixm CPU. with 2GB of memory
bounding-box,

