In this paper we present a placement-intelligent resynthesis methodology and optimization algorithms to meet post-layout timing constraints while at the same time reducing the interconnect congestion. We b egin with the synthesized design netlist after initial placement and make incremental modications { taking placement into account { to generate a nal netlist and placement that meets the delay constraints after place and route. The algorithms described have been implemented a s p art of a tool for placement based r esynthesis. The tool has been used with a number of pre-optimized designs from industry and to obtain improvements in post-placement delays ranging from 13 to 22% with improved r outability.
Introduction
The goal of the ASIC design process is to realize a design as a gate array or standard cell array which satises a set of user/technology dened constraints such as area, delay, maximum load/maximum fanout, and hold time. Conventional ASIC design methodologies typically consist of a synthesis phase wherein the design is synthesized, optimized, and mapped to a target library, followed by placement and routing phase.
During traditional logic synthesis, circuits are optimized to meet user-dened timing and area constraints. The timing measures used during this stage of the process consist mainly of gate delays and rough approximations for interconnect delay -usually a fanout based value, with the wiring parasitic capacitance related to the fanout of the net using a piecewise linear model. As illustrated in Fig. 1 , such an approximation can be grossly inadequate to predict the real interconnect delay after layout. Cells X and Y both drive nets with 3 fanouts. However it is easy to see that the actual loads are very dierent, since cell X drives a m uch longer net.
Work done while at Cadence Design Systems, Inc., 2655 Seely Road, San Jose, CA 95134
With toda y's submicron design processes the wiring parasitic becomes a dominant factor in the total delay of a timing path [1] . It is predicted that for a 0.5 micron process, the wiring can contribute as much as 60% of the total delay. When the wiring eect is dominan t, traditional synthesis tools that use a fanout-based model may be optimizing a timing value which is signicantly dierent from the actual post layout value. Another problem with traditional synthesis is in area estimation. Typically, the tools try to optimize the total gate area, and the interconnect area and the routability of the chip are not taken into account. As a result, although the total gate area of the synthesized netlist is quite small, it may not t into the assigned die area after layout. The problem of fusing layout considerations into the logic synthesis process has been addressed in a few papers in recent y ears [2] , [3] , [4] , [5] . Abouzeid et al. [2] describe an algebraic factorization method based on lexicographic expressions of boolean functions to reduce both gate and wiring areas. It attempts to express the set of logic functions in a form that will result in layered structured cones with ordered input injection after technology mapping. Pedram and Bhat [3] describe a method in which a fast global placement is done of the logic network (decomposed into a canonical graph of 2-input NAND gates and inverters), and this placement information is used to guide the wiring estimation as well as the technology mapping. The placement information is updated dynamically so that the nodes processed later can use more accurate information. This 31 st ACM/IEEE Design Automation Conference ® Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying it is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1994 ACM 0-89791-653-0/94/0006 3.50 approach is extended in [4] to include the logic restructuring and the technology decomposition phases of logic synthesis. This is further extended to the post-mapping phase in [5] . Here, a fanout optimization algorithm based on alphabetic trees is presented that generates fanout trees free of internal edge crossings thus improving routing area.
However, all these methods suer from the fact that they work on layout measures that can be far removed from the actual layout. It is extremely dicult to accurately predict the post-layout interconnect lengths during the synthesis process. Realistic estimates of interconnect can be obtained only after actual placement is done. To obtain good results, synthesis tools must work on the postlayout results. Traditionally the approach has been to extract the interconnect parasitics after place and route and backannotate them to the synthesis tool. The synthesis tool then makes changes to the netlist and passes the changes to the layout tool. If these changes are large scale, major changes in the placement will result, rendering the backannotated values meaningless. Hence this approach will give rise to multiple iterations, with no guarantee that it will converge (see Fig. 2a ). Partly due to this problem, permissible modications to the netlist currently have been very restricted (usually limited to resizing of buers). As wiring delays become more signicant with higher congestion, the resynthesis process can alleviate many o f t h e post-layout problems if it can take placement information into account.
Our Approach
In this paper we present a placement-intelligent resynthesis process that can improve post-layout timing while improving the interconnect congestion. Fig. 2 contrasts the conventional ASIC Design ow with our proposed approach using placement-based synthesis.
Design Description
Constraints met? (Fig. 2b) , after synthesis of the design is done, a placement is obtained. The synthesized netlist, along with this initial placement, becomes the input to the placement-based resynthesis tool. A placementbased timing analysis of the design is used to direct the resynthesis process, which attempts to improve delay, meet all design constraints, and alleviate congestion. The output from the resynthesis phase is a modied netlist and suggested locations for the newly added components. The incremental place and route tool derives a placement that conforms as much as possible to the suggested locations. The changes done by the resynthesis process are incremental in nature and the chances of converging to a placement close to the suggested locations are very high.
The techniques and algorithms described have been incorporated into a system for placement-based synthesis, and the results have been very encouraging. Our experiments show a 13-22% improvement in the post-layout delays as well as improved routability.
The rest of the paper is organized as follows. In Section 2 we present a s c heme to do placement-based timing analysis along with an ecient method to compute the wiring delays. In Section 3 we describe some transformations and algorithms to do post-placement delay optimization and to improve routability. In Section 4 we describe how these algorithms have been incorporated into a complete system for doing placement-based resynthesis. We present some experimental results in Section 5 and concluding remarks in Section 6.
Placement-based Timing Analysis
The placement-based timing analyser computes the gate and wiring (interconnect) delay, calculates the minimum and maximum arrival/required times at all points in the network and determines the critical paths. The information from timing analysis is used to generate delay reports as well as to direct the optimization algorithms.
Our delay calculations are based on the following equation for the delay of a gate [1] :
is the intrinsic gate delay. D L is the load delay due to the loading at the output pin of the current gate given by D L = R drive (C pins + C wire ) where R drive is the drive resistance of the gate, C pins is the total pin capacitance, C pins = C out + P i C i , and C wire is the estimated wiring capacitance. D W is the interconnect or wiring RC delay. D S is the input slew delay due to slowly changing input signals given by D S = SD Lprevious where S is the slew sensitivity factor of the current stage and D Lprevious is the load delay of the previous stage.
The estimates for the wiring parasitics (C wire ) are computed from the placement locations of the components as described in the next section.
Wire load computation
In order to compute the parasitics for a net, it is assumed that the terminals are connected as a spanning tree.
Let S be a set of terminals that are connected by a net. Let the coordinates of terminal i be denoted by ( x i ; y i ). Let d be the driving terminal of the net (d 2 S) and let C i be the input capacitance of terminals i. N and P are two sets of terminals. C h and C v are capacitances per unit length for the horizontal and vertical routing layers respectively.
The procedures compute-wire-load and computenode-load are used to perform the load computation. The downstream capacitance of each node of the spanning tree, which is the capacitance of the subtree of the spanning tree rooted at the node, is also computed and stored. In fact the downstream capacitance of a node is the total wire capacitance of a net connecting that node and all its children. These downstream capacitances will be used extensively during placement-based delay optimization as described in later sections.
The procedure compute-wire-load(see Fig. 3 ) also updates the wireload at each node in the spanning tree, and this is used by the subsequent procedure computenode-load. (including the input capacitance of the node) We can then compute C Tv for every node v using the procedure compute-node-load (see Fig. 3 ). Here C wire (v) is the wire load at v computed from computewire-load, and C in (v) is the input pin capacitance at v.
3 Placement-based delay optimization This is done by applying a number of transformations on the critical path of the circuit to reduce the critical path delay. A n umber of techniques have been described in the literature [6] , [7] , [8] . However, none of these methods explicitly take the placement i n to account. In our work we c hoose delay-reducing transformations which take the placement i n to account and behave in a manner that will only slightly perturb the intial placement while considerably improving the routability.
As a preliminary step to applying the transformations we strip o all the buers and merge all redundant i n v erters introduced by the initial synthesis run. This will allow us to do a placement-based buer insertion from scratch taking true wire parasitics into account. One might w onder, if we are going to strip o all the buers during resynthesis, why i n troduce them in the rst place? We h a v e found that by including buers in the initial placement, we can minimize changes in the net area of the components in the netlist. This improves chances that the incremental placer will nd a solution close to the suggested placement.
The transformations are described below in detail along with their placement and routing implications.
Fanout buering
In this technique, the output of a cell driving a large number of fanouts is selected, and a buer is inserted to buer a portion of the fanouts to minimize the delay. The approach w e use is based on the fast heuristic algorithm described in [8] . Fig. 4 illustrates the basic transformation. The fanouts moved over to the output of the inserted buer are typically non-critical ones. The fanout buering transformation As an additional requirement, a solution is desired that would improve the routability of the circuit. Keeping in mind that the wiring delay can be quite signicant when compared to the gate delays, we formulate the solution to the problem as follows.
Consider a net consisting of a source node p 0 and a set of n destination terminals fp 1 ; p 2 ; : : : ; p n g . Let the location of each terminal p i be given by ( x i ; y i ). Let the required arrival time at terminal p i be given by r i and the actual arrival time be given by a i . The problem is to nd a buer conguration for the net with locations for the buers such that max i2f1:::png fa i r i g is minimized. Fig. 5a shows a net composed of a source terminal p 0 and destination terminals p 1 ; p 2 ; : : : ; p 5 . The buer conguration that meets the timing constraints may look like the one given in Fig. 5b . In this case two additional buers are introduced near the locations (x 3 ; y 3 ) and (x 0 ; y 0 ). Figure 5: Buering the net to improve delay We assume the terminals in the net are connected as a minimal spanning tree (MST). We restrict the locations of the candidate buers to the internal vertices of the minimal spanning tree. Since any subtree of a minimal spanning tree is also a minimal spanning tree [9] , any subnets generated by the fanout buering will constitute a subtree of the original MST. The loads of such subnets can be precomputed and stored at the nodes as described in Section 2.1.
These assumptions make it possible to have a computationally ecient solution to the problem. As an added advantage, this approach does not introduce any additional routing for the net. In other words, the routing wire length is almost the same as that of a single minimum spanning net connecting the terminals together. Thus by the act of removing all the buers initially and reinserting them in a near optimal fashion, we a c hieve considerable improvements in routability. This can be clearly seen in Fig 11 which shows the buered net before and after postplacement resynthesis.
The procedure (fanout-buffering-trans) t o d o t h e fanout buering transformation is given in Fig. 6 . A set of terminals, M , is partitioned into two groups G and H , where H is buered and G is not (see Fig. 4 ). L is the component library and buf (2 L) is the buer that is inserted. Let i be a terminal (i 2 M ) and let s i be the slack (r i a r ) a t i . T ( i ) is the minimal spanning subtree rooted at i and C Ti is the load at i (see Section 2.1). R d and R buf are the drive resistances of the driver term and the output term of the inserted buer respectively. C buf is the input capacitance of the buer and D buf the intrinsic delay through the buer. Figure 6 : Procedure to do the fanout buering transformation
The place and route tool will resolve the overlaps introduced by the addition of the cell with only minor perturbation of the cells in the local region.
Gate Resizing
In this technique, a cell is replaced with a cell in the library that is equivalent in functionality but having a better drive strength, intrinsic delay, load or other characteristic that makes it more appropriate [6] . This substitution is basically done in place, and any resulting overlaps would be minor and resolvable by the place and route tool.
Let P be the set of all the input terminals that are driven by the fanout of the driver gate d, and let Q be the set of all the input terminals that are driven by the fanins to the driver gate (see Fig. 7 ).
In the example shown P = fp 1 ; p 2 ; : : : ; p m gand Q = fq 1 ; q 2 ; : : : ; q n g . It can be easily seen that replacing the gate d with another gate will only aect the slacks at these two sets of terminals and hence the slacks need only be recomputed at these terminals. The procedure for doing the gate resizing transformation is shown in Fig. 8 . G d is the set of gates in the library that are equivalent i n functionality t o d but with dierent delay c haracteristics.
The overall procedure to do the delay optimization is shown in Fig 9. N is the network, and C is a list of terminals that constitute the critical path. the fanout buering transformation, and R is the gate resizing transformation. The transformations are tried in turn, and whenever one is successfully applied, the timing trace is redone and the procedure works on the new critical path. The order in which the transformations are applied does aect the quality of the result, but only slightly. The procedure terminates when no transformation can be successfully applied or when a certain number of iterations (specied by the user) are completed.
A Placement Based Synthesis System
The algorithms developed in this paper have been incorporated into a system for doing placement-based resynthesis. Fig. 10 shows the steps in the placement-based resynthesis process. In addition to doing delay optimization with improved routability, it also attempts to satisfy a n umber of other design/technology constraints.
We start by stripping o all the buers and removing all redundant i n v erter cascades introduced by the initial synthesis run. This will allow us to do a placement-based buer insertion taking the true wire parasitics into account.
The next step is to do the placement-based delay optimization using fanout buering and gate resizing trans- Transformations to satisfy maximum load/maximum fanout constraints are then done. Most ASIC cell libraries have a maximum load/maximum fanout constraint o n t h e output pins of the cells that species the limiting load or the maximum number of fanouts that it can drive. In order to satisfy these constraints, we need to satisfy the inequality M axload P C in + P C out + C wire . Here again conventional synthesis tools may not be able to accurately estimate the wiring load, C wire , s o w e use a set of placement based transformations to meet these constraints.
Next, we correct for holdtime violations. These are violations caused when the clock arrives too late at the clock pin of a ip-op. This might be caused by the clock line being too long when compared to the data path. Again this would be known only when we consider the actual placement and accurately estimate the wiring delays. This is corrected by adding additional delays in the data path.
Finally the output netlist is generated along with suggested placement locations for the added components. 
Experimental Results
The placement-based resynthesis algorithms have been tried on a number of designs obtained from industry. Both standard cell and gate array c~¢signs have been run through with libraries using a 0.8 micron process technology. Table 1 summarizes some of these results. Most of these designs started as Verilog/VHDL descriptions and were synthesized using commercial synthesis tools. They were then placed and routed using commercial tools. Some of these designs had actually gone through multiple iterations of running placement, ba.ckannotating the delays and then resynthesizing. The designs range in size from about 580 cells ( 1269 equivalent gates) to 5000+ cells (25K equivalent Ta,ble 1: Placement-based Delay Opt. Results l ci. ~-I Size I Delay 1% I chit cdL¢ gates w.m. pre-postred gates). We notice that the wire models sometimes tend to overestimate and sometimes underestimate the actual deelays and so are not very reliable for the sub-micron devices (-indicates no wire-model was available for the library). We see an improvement in virtually all the examples with the delay reductions ranging from 13 to 22%. Another result we noticed was the improvement in the routability, as illustrated in Fig. 11 . The layout on the left is the buffered net highlighted in the original design. The layout on the right is the same buffered net highlighted after it had been resynthesized using the tool. It is evident that the placement configuration of the buffers on the right results in much less congestion and therefore in a more routable design.
Conclusion
In this paper we have proposed a methodology and described algorithms implementing the methodology to resynthesize a design to meet the delay constraints after placement. It starts with the design after initial placement is done and makes incremental modifications to the network, taking placement into account. A final netlist and placement are then generated that meet the delay constraints after placement. We have developed a set of algorithms and implemented them as part of a system for placement-based synthesis. The results have been very encouraging. The tool has been used with a number of preoptimized, placed designs from industry and has resulted in improvements of post-placement delays ranging from 13 to 22% with improved routability.
