Traditionally logic synthesis and layout tools optimize designs without interaction between them. Lack of communication between the two tools often results in inferior post-layout circuit implementations. This paper presents three aspects of coupling synthesis with layout to minimize post-layout area and delay of circuits. It presents two new techniques for computing net-weights based on timing slacks, and shows how performance improvement with little overhead in area can be achieved. Secondly, it presents a novel idea of exploiting logic equivalence information in circuits to minimize circuit area and delay during layout. An algorithm for computing logic equivalence classes and performing net swapping using the equivalence classes during layout is described. Lastly, it shows the sensitivity of post-layout delays of circuits to wiring models used in synthesis and demonstrates how resynthesis techniques can be effectively used to generate good post-layout implementation. Significant reductions in post-layout area and delay on several industrial designs have been observed.
INTRODUCTION
In the quest for faster and denser circuits, designers attempt to optimize the speed and area of a circuit at several levels of the design process. Two such levels are:
1. The logic level, where the design is represented as an interconnection of logic gates. 2. The layout level, where the design has a geometric representation.
To perform optimization at the logic level, designers have been increasingly using logic synthesis tools. 'This work was performed while the authors were with Hewlett-Packard.
125
this relationship can be exploited to obtain faster and smaller circuits. Designers use logic synthesis tools to generate logic implementations from the specifications of a design, subject to a set of constraints. The constraints are generally timing and/or area constraints. Synthesis tools may succeed in producing an implementation that satisfies the constraints. However, synthesis tools optimize the circuit at this stage with only rough estimates of the layout effects such as interconnect capacitance. The optimization at this level--called the pre-layout levelmmay not lead to optimal post-layout implementations. Often, the post-layout timing of these implementations fails to meet the specified timing constraints. Designers attempt to overcome timing violations by rerouting signals along critical paths to reduce the wiring capacitance, or by modifying the logic along the critical path. Without the aid of automatic tools, these tasks are tedious and time consuming.
Net-weighting has been used as a mechanism to speed up the critical paths in a circuit [1 ] , [2] , [3] , [4] . The idea behind net-weighting is to assign heavy weights to nets on critical paths, and use the netweights to guide the routing to reduce the wire lengths of nets. Dunlop et al. 1 use net-weighting to improve the speed of synchronous logic by assigning netweights to signals in the clock path based on the clock skew. The weights on the combinational logic paths are determined by the deviation of the maximum frequency at which the circuit can operate from the desired frequency of operation. Tsay et al. [2] use an analytic technique for net-weighting using the zero-slack algorithm [5] . However, in all these papers, netweighting is considered during the layout phase only. More recently Pedram [6] has shown a method of coupling synthesis with layout using layout-driven technology mapping. In this technique, during the technology mapping step, the tool looks ahead to potential layout effects, and generates a mapped circuit which considers the layout area and wire delay.
The techniques presented here are different from the ones presented in the previous papers. We still use the synthesis tools in the conventional way for generating circuits, but interchange information in both the forward and backward paths between synthesis and layout to obtain good post-layout implementations. In this paper, we demonstrate two ways of coupling logic synthesis with layout toolsmnet-weighting and logic equivalence. We forward annotate the netlist from the synthesis tool with net-weights and logic equivalence classes. We approach the netweighting technique from a different angle. We link net-weighting with the logic synthesis process, and let the synthesis tool generate a set of net-weights to start the place and route process, based on the critical paths observed in the synthesis tool. We propose two new strategies for automatic net-weighting of circuits. We demonstrate the effectiveness of netweighting in reducing the delay of circuits over several points in the design space. Further, we show that net-weighting does not always increase the area of the layout as speculated in previous work [1] .
We also show that logic equivalence inherent in many circuits can be effectively used to reduce the area and increase the performance. During the logic synthesis phase, the synthesis tool can recognize and mark signals that are logically equivalent. Such signals can be interchanged during the placement stage to achieve a significant reduction in the circuit area, and improvement in performance. This technique does not rely on the technology mapper generating netlists that potentially reduce wire lengths as LILY [6] does. It can be effectively used with any of the technology mappers in commercial synthesis tools.
Finally, we show how the synthesis-layout coupling can be used for resynthesis of circuits to obtain more efficient implementations. We show the potential sensitivity of the quality of circuits produced to the wiring models used in synthesis.
The synthesis-layout loop discussed in this paper has been fully automated. The results presented in this paper are on real industrial designs, and show significant improvements in area and delay with fairly small run times.
THE SYNTHESIS-LAYOUT COUPLING
Designers use logic synthesis tools to generate netlist implementations of designs under a set of timing and/or area constraints [7] , [8] . The circuit produced by the synthesis tool is optimized at the pre-layout level. At this level, the synthesis and optimization process cannot accurately incorporate the effect of the layout process on the area and delay of the circuit. The layout tools start with the netlist generated by the logic synthesis tools, and attempt to optimize the layout to meet the area and speed constraints. The real goal of the designer is to start with a functional specification of the design and produce a layout which meets area and timing constraints. The disjoint optimization steps, and lack of communication between the synthesis and layout tools often causes suboptimal post-layout circuits. To meet the real objective of producing better post-layout circuits, coupling between logic synthesis and layout tools is necessary.
The coupling between the logic synthesis tool and the layout tool is bi-directional. In the forward direction, the synthesis tool provides the layout tool with timing and functional information, which can be effectively used during the layout process. Net-weights In this section we describe an algorithm used to compute the net-weights. This algorithm is used iteratively in the loop shown in Fig. 2 . In the first iteration estimated wiring capacitances are used for determining the timing slacks. In the subsequent iterations capacitances extracted from the layout are back-annotated onto the circuit before computing the net- } Initially all nets are assigned the same net-weight. For our experiments this value was or 100. In steps 2 and 3 static timing analysis and slack computation are done. In step 4, net s are sorted in the increasing order of normalized slacks so that nets with the smallest slacks, i.e., nets that are most critical, are selected first for net-weighting. We believe that at most 50% of the nets can be improved with net weighting. So the netweights of the top 50% of the nets are computed in steps 5 and 6. The weight of a net in the current iteration is determined by the weighting function and also by the ratio of the wire delay to the intrinsic delay of the gate driving the net. This is done to ensure that the nets in which wire delay dominates are assigned larger weight. In step 6, the weight of the net in the current iteration is added to the current weight of the net so that the effect of net-weights in the previous iterations is carried forward.
Usage of Net-weights During Layout
The placement algorithm used in LLAMA is based on the force-directed placement technique [10] . But here, the forces on the nets (used to determine the relative positions of cells in the placement) are proportional to the net-weights. Thus, nets with large weights exert a large force on the cells they connect to, thereby placing the blocks closer. LLAMA does not use net-weights for determining the priority order for routing. Rather, it uses the net-weights to minimize the weighted wire length. LLAMA uses netweights in attempting to minimize weighted wire length.
Experimental Results with Net-weighting
In this section, we describe some experiments performed to study the effect of applying the net-weighting strategies on the post-layout delay and area of circuits. For these experiments, we use two designs, CKT1 and CKT2. Several implementations of these designs were synthesized with several different timing constraints using PASS. A CMOS standard-cell library was used as the target technology. These implementations vary in complexity from about 1000 gates to 4000 gates.
The iteration loop used for net-weighting is shown in Fig. 2 . A netlist is synthesized from the design specifications. Its net-weights are computed using one of the net-weighting strategies. The netlist is placed and routed. The circuit is back-annotated with the wiring capacitances extracted from the layout. 
EXPLOITING LOGIC EQUIVALENCE IN CIRCUITS
In this section, we introduce the idea of logic equivalence in circuits, and show how logic equivalence can be effectively used to reduce the post-layout area and delay of circuits. This technique is general enough to be used on circuits synthesized using any technology mapper, including layout-driven technology mappers such as LILY. Using this technique, the synthesis tool can generate and forward-annotate the netlist with the logic equivalence information. This process is one of marking certain signals as equivalent. This information can be used by the place and route tool to interchange equivalent signals, which results in significant reduction in circuit area and improvement in circuit performance. Large capacitive loads cause two problems--(i) large delays, and (ii) large current densities, which can give rise to electromigration problems. Synthesis systems often attempt to limit the fanouts of gates to reasonably small values by replicating logic. Fig. 6(a) shows a situation in which gate G1 is required to drive 15 other gates. Fig. 6(b) is a modified implementation in which three copies of the gate G are created by limiting the fanout of each gate to 5. The outputs of the gates Gll G12 and G13 are output equivalent.
Existence of Logic Equivalence
Output equivalence can also be found in buffer trees. In many large circuits, some primary inputs are heavily loaded, i.e., their fanouts are very large. To limit the fanout of each primary input to a reasonable value, a tree of inverters rooted at the primary input is constructed. One such tree is shown in Fig. 7 . It can be seen that the outputs of G 1, G 2, G 7, G 8, G 9 and Glo are output equivalent. Equivalence in buffer trees has been exploited independently by Brasen [9] . Example of output-equivalence.
Existence of input equivalence in a circuit is illustrated with the example shown in Fig. 8 tool is preserved. This is important for timing-driven designs, but if minimization of the area is the only criterion this restriction could be relaxed.
Algorithm to Find Logic Equivalence Classes
The logic equivalence algorithm implemented in PASS finds a useful subset of the output equivalence relationships in a combinational circuit. This subset includes the logic equivalence arising from logic replication or buffeting in circuits generated by PASS. Figure 8 . The function of a logic gate in PASS is represented symbolically with nested lists using prefix notation, using the operators for NOT, * for AND, and + for OR. Thus an AOI33 cell (which is a 3-wide and-orinvert cell) would have a symbolic function:
(!(+(* a b c) (* d e f)))
All occurrences of gates with the same logic function have the same symbolic functions, regardless of drive strength.
Each equivalence class has associated with it a numerical index and a symbolic function, which expresses the logic function of the driving gate in terms of the equivalence classes of its input nets. Each net is annotated with the index of its equivalence class as the latter is computed. Thus the equivalence classes are derived indirectly, consisting of all nets with the same index. Permutable terms of the symbolic function are sorted to put all structurally equivalent functions into a canonical form. For example, for an instance of the AOI33 above, if the input pins a through f belong to equivalence classes 17, 5, 12, 8, 3, and 7, respectively, the symbolic function after substitution and sorting would be:
(!(+(* 3 7 8)(* 5 12 17)))
The sorting is performed "inside-out"moperands of inner lists are sorted first, then operands of outer lists are sorted in lexicographical order. Figure 10 shows a simple circuit with five equiva- Table IV . The symbolic function for primary input nets is a singleton list consisting of the index of the equivalence class of that input (in our implementation every input belongs to a unique equivalence class, but this could be easily generalized to cover logically equivalent primary inputs). An outline of the algorithm follows. The algorithm partitions the nets of the circuit into equivalence classes. Initially, each input net is placed in its own equivalence class. These nets are the initial members of set S. The set T consists of nets which have not yet been placed in an equivalence class. Initially T consists of all nets (except primary inputs) in the circuit. The algorithm proceeds via a topological sort of the circuit, moving nets from set T (those nets whose equivalence class has not yet been determined) to set S as the equivalence classes are found.
Algorithm logic-equivalence(circuit); { Get the symbolic function of cell C; The logic equivalence classes are annotated on the netlist supplied to LLAMA. Each element in an equivalence class is essentially an output of a gate or a primary input that drives other gate inputs or primary outputs of the circuit. For a 2-pin net there is one driver and one driven signal (load). It is a straight forward extension of the algorithm to handle multiple loads by replicating the driver as many times as it has loads. Recall the description in Section 4.2 that the wire length is reduced by swapping the driven signals between drivers in the same equivalence class.
LLAMA uses a recursive slicing algorithm described below. The algorithm performs pair-wise interchange between two loads driven by nets in the same equivalence class to reduce the wire length.
Algorithm reduce-wire-length (equiv-class, N) { /* Let N number of 2-pin nets in the equivalence class. Then there are N loads and N drivers; Form a second sub problem by taking the remaining drivers and loads; 8. Recursively call reduce-wire-length on each of the sub problems.
Results with Logic Equivalence
In this section, we show the effectiveness of logic equivalence in reducing the circuit area and delay. A CMOS standard-cell library is used to generate circuit implementations for a number of designs using PASS. These are large control blocks used in real chips. The circuits have various complexities. The number of inputs varies up to 316, the number of outputs up to 404. The circuits were synthesized using PASS. The output equivalence classes for the circuits are also determined by PASS, and the netlist is annotated with these classes. LLAMA places the netlist and swaps the signals in each equivalence class. After the layout, the wiring capacitance is extracted, and static timing analysis is done using PASS. Table V shows the post-layout area and delay of each circuit with and without using logic equivalence. In this table, the column Design lists one implementation each of seven different designs. The second and third columns are post-layout area and delay of the circuit without exploiting the logic equivalence information in the circuit. The fourth and fifth columns are the post-layout area and delay of the circuit after performing net-swapping based on the equivalence classes. The last two columns show the percentage change in the area and delay by using logic equivalence. The negative entries mean that the corresponding area or delay decreased. It can be seen that both the area and delay decrease significantly for all the implementations considered as a result of using the logic equivalence information inherent in the circuit during layout. Fig. 11 shows two layouts with and without using logic equivalence for net-swapping.
RESYNTHESIS TECHNIQUES
Automatic coupling of logic synthesis with layout provides a mechanism for guiding the synthesis process to obtain efficient circuits. There are two ways of approaching resynthesis--(i) relaxing timing constraints, (ii) improving the wiring model used for synthesis. These are illustrated in this section using CKT2 as an example. While the resynthesis process built into the PASS-HARP/LLAMA loop is general, the following strategy is recommended. For initial synthesis, the estimated wiring capacitance is set to zero. The process is carried out all the way through routing and capacitance extraction. Based on the values of the extracted capacitance, the wiring model is revised, and the circuit is resynthesized using this model.
Relaxation of Timing Constraints

CONCLUSIONS 5.2 Sensitivity to Wiring Model Used for Synthesis
The pre-layout wiring model used by synthesis tools is a crude approximation for the actual wiring capacitance in the circuit. This is necessarily so because the first time a circuit is synthesized, there is no layout We have introduced the concept of logic equivalence in netlists, and developed an algorithm that identifies a useful subset of the output equivalence as defined in this paper. The algorithm has been incorporated into the logic synthesis system PASS, which synthesizes netlists, and annotates them with the logic equivalence classes. We have shown that by interchanging nets in the equivalence classes, it is possible to reduce the wire length which reduces the area and delay of circuits. The reduction in area is about 20%, and the average improvement in delay is about 9%. The amount of improvement in delay and area that can be obtained depends on the number of equivalence classes in the circuit, and on the number of elements in each class. The smaller improvements in area and delay in circuits CKT6 and CKT7 are attributable to the smaller size of the equivalence classes.
While net swapping using logic equivalence reduces the total wire length, which generally reduces both area and delay, there may be situations in which an individual wire length may increase. If such a wire happens to be on the critical path, then the delay may actually increase, but it is straightforward to prevent it from happening. We can simply annotate some nets (e.g., critical nets) to prevent them from being swapped even if they are equivalent to some other nets, or alternatively, we can delete such nets from the equivalence classes. It is also possible that net swapping may reduce the wire lengths on the critical paths also, but the delay may increase because now there may be much lower-drive cells on the critical path. Rather than preventing net swapping in such a case, it is more appropriate to overcome this problem using a post-layout cell substitution technique, where low-drive cells are traded for high drive cells as needed. Determination of the input equivalence classes of the type shown in Fig. 8 is similar to the problem of determining symmetric (or partially symmetric) boolean functions. We are looking into extending our algorithm to capture the equivalence classes of this type also.
In our experiments we used a two-layer routing model. Because of this, wire length reductions could result in significant area reductions, since they directly reduce the amount of space devoted to routing between the cells. In technologies with three or more }ayers of routing, the area reduction is likely to be less dramatic. However, the technique described in this paper should still be useful to reduce the wiring capacitance, and hence power and delay.
We have demonstrated that over-constraining a design, either by specifying very aggressive timing constraints or by using overly conservative wiring models for synthesis, can result in implementations with inferior post-layout area and delay. In fact, since netweighting and logic equivalence can provide some improvement in delay, the designer can specify looser timing constraints for synthesis, which generally resuits in smaller circuits. We have also shown that the post-layout area and delay can be very sensitive to the wiring model used for synthesis. Resynthesis by calibrating the wiring model in the synthesis-layout loop can generate smaller and faster circuits.
The synthesis-layout coupling using PASS and HARP/LLAMA has been fully automated to carry out synthesis, layout and resynthesis with or without using net-weighting and logic equivalence. Since both net-weighting and logic equivalence are aimed at reducing the wire length, we expect some interaction between these techniques when used together. "We have done some tests in which logic equivalence is followed by net-weighting, and observed that the overall delay reduction is better than using either technique alone, and that the layout area is smaller than the area using un-weighted implementations.
Another application for net swapping using logic equivalence is clock tree synthesis. Structurally, clock trees are similar to the buffer trees generated during logic synthesis as in Fig. 7 
