We present new concepts to integrate logic synthesis and physical design. Our methodology uses general Boolean transformations as known from technology-independent synthesis, and a recursive bi-partitioning placement algorithm. In each partitioning step, the precision of the layout data increases. This allows effective guidance of the logic synthesis operations for cycle time optimization. An additional advantage of our approach is that no complicated layout corrections are needed when the netlist is changed.
INTRODUCTION
Cycle time optimization has become one of the most important issues for the design of highly integrated circuits. A lot of performance optimization techniques exist at all stages of the design flow. However, during the last couple of years it has become apparent that the exchange of technical data between the different design levels is insufficient in conventional design flows. Optimization algorithms often lack important information and therefore fail to exploit the full optimization potential of the circuit.
During logic synthesis the interconnect delay is only approximated by rough net models. Layout data is not available at this stage. However, with the advent of deep submicron technologies the appropriate consideration of geometrical circuit data to estimate interconnect delay has become key in circuit design. This is why performance optimization at the logic synthesis stage cannot be successful for deep submicron circuits. Errors in the approximation could result in a design far off from an optimal one. In order to better exploit the global optimization potential of a circuit it is unavoidable to improve the interaction between the logical and physical design stages.
Simple feedback loops between the stages of logic synthesis and layout generation are not sufficient. They do not guarantee convergence and tend to be very time-consuming. Only a more intricate combination of suitable algorithms for logic synthesis and layout generation can lead to effective solutions.
In this paper. we propose a new technique to directly integrate general Boolean netlist transformations into a timing-driven placement algorithm. By close interaction between logic synthesis and placement we obtain accurate data for the wire delays to select netlist transformations. Since logic transformations are performed ~h i / c the placement is generated and not after its completion we can alter the logic circuit structure without additional layout correction steps.
University of Frankfurt
Electronic Design Automation Group 60054 Frankfurt am Main, Germany
PREVIOUS WORK
Recently, several approaches to improve the interaction between logic synthesis and physical design have been published [2,9, 1 I , We can roughly divide these approaches into two classes. The first class of approaches is anchored in logic synthesis. No placement has yet been generated. The interconnect delay is estimated based on the netlist structure [ 161. This approach permits full flexibility of the logic transformations and can exploit the complete spectrum of logic synthesis techniques. However, without any Iayout data it is not easy to get good approximations for the interconnect delays.
The second class of approaches starts with a physical design so that an accurate post-layout delay model is obtained. It is attempted to optimize the circuit by a restricted set of logical netlist transformations. These transformations can be applied before the routing process [15] . In this case the wire lengths are estimated from the placement. The other possibility is to apply the transformations in a routed design with all wire lengths known [9] .
Performing logical transformations in a completed physical design is a delicate issue. Most approaches only use /oca/ transformations to keep tight control on the changes in the netlist. Transformations are often restricted to buffer insertion [I41 or resynthesis of small parts of the circuit. The method described in [3] uses local decomposition and remapping as netlist transformation. Some approaches also use layout transformations like gate sizing [4] or wire sizing [SI that do not change the netlist. It is also possible to combine optimizing transformations with technology mapping [ 1 I , 131.
Jiang et al. [9] proposed a method for post-layout performance improvement. They start with a complete circuit layout. For the optimization algorithm they employ a redundancy addition and removal technique in combination with an engineering change order (ECO) layout tool.
In [IS] Stenz et al. use global netlist transformations (signal substitutions) together with an iterative placement algorithm. They also start with a netlist after placement. For performance optimization they propose a two-phase algorithm. The first phase restructures the netlist by signal substitutions. In the second phase they legalize the perturbed placement by an iterative placement improvement algorithm.
The main disadvantage of all these methods starting with a complete layout description is that it is difficult to exploit the full Boolean optimization potential. If the circuit is changed greatly. the layout also changes significantly and convergence cannot be guaranteed. Additionally, after each transformation a legalization step is necessary to correct the layout. For this legalization step ECO-algorithms [9] or other placement improvement algorithms [IS] are used.
THE PROPOSED APPROACH
To avoid the disadvantages of the described approaches, we propose a new method to merge logic synthesis and physical design. Instead of performing logic transformations before or after placement we perform them duririg placement generation. We integrate netlist transformations in a recursive partitioning-based placement algorithm. All layout information is used at the moment it becomes available. During the first iterations of the recursive partitioning process our approach resembles the first class of methods described in the previous section. Only a global placement exists and the wire length estimations are still very rough. On the other hand, there is almost unlimited freedom to apply Boolean circuit transformations. As the procedure continues the circuit partitions become smaller and the performance estimation becomes more accurate. Boolean transformations are used incrementally to make corrections according to the refined timing model. In this way we attempt to combine the advantages of the approaches described in the previous section while avoiding some of their disadvantages.
In our previous work [ 121. we have already experimented with this paradigm and obtained promising results for speed optimization. However, in [ 121 cell rcplicntiori was the only logical transformation being considered. Recently, in [6] it has been shown that a large variety of local timing optimizations such as cloning, remapping, gate sizing, buffer insertion and clock tree optimization can also be integrated into such a framework. Our goal is to demonstrate that general Boolean transformations as originally developed for technology-independent logic synthesis can also be performed during placement. We show that not only concepts for local timing correction are suitable for this framework but that it is beneficial to merge a general Boolean optimization phase into placement generation.
Outline
The overall algorithm is shown in Fig. I . The algorithm starts with the netlist and a description of the standard cell library. In each level of the algorithm two steps are performed. In the first step all regions containing more than one cell are bipartitioned into two child regions using the well-known FiducciaMatthysis-algorithm [7] . As the child regions become smaller in each level the locations of the cells are determined with increasing precision and better estimations of the wire lengths can be obtained. The bi-partitioning algorithm is described in more detail in section 3.2.
In the second step, appropriate netlist transformations are performed based on the current estimation of wire lengths. These transformations are described in section 3.3.
As long as there exist regions containing more than one cell the algorithm proceeds to the next level. Finally the cells are arranged in rows. This changes the locations of the cells only marginally.
Partitioning-Based Timing-Driven Placement
In this section we briefly describe the timing-driven placement algorithm and the delay cost function which we use in our approach.
The placement algorithm starts with a netlist generated by a logic synthesis tool and the chip area that is available for placement. For the placement process we use a recursive bi-partitioning algorithm [I] . The algorithm iteratively bi-partitions so-called circuit I-egioris. A region is a set of cells together with the chip area allocated for placing the cells. In each recursion level of the algorithm all current regions are bi-partitioned into two child regions [7] .
Additionally we use a connectivity clustering algorithm [SI for improving both run time and quality of result. Before each bipartitioning step, the algorithm selects cells to be merged into clusters according to their connectivity.
With every recursion level, the sizes of the regions decrease, so that in each step the individual cell positions can be determined with increasing accuracy. Recursive partitioning is performed until every region consists of a single cell. For the final placement the cells are arranged in rows. which changes the locations of the cells only slightly.
For partitioning, the circuit is represented as a weighted hypergraph. The weights of the hyper-edges represent the circuit delay and are calculated as follows:
Immediately before each bi-partitioning step, wire lengths are estimated based on the current cell locations. The position of a cell within a region is approximated by the region center point. As the region sizes decrease with each recursion level, the approximation becomes better. Using the approximate cell positions, wire lengths are estimated for calculating the wire capacitances.
Next. the arrival time and the required time are calculated for each signal using a static timing analysis. The arrival time for all primary inputs is set to 0. The niaxiniuriz path delay is the largest arrival time among the primary outputs. It is used a5 the required time for every primary output. The slack of a signal is the difference between its required time and its arrival time. From the slack of a signal, an upper bound for the length of the corresponding wire can be calculated.
The ratio between the maximally allowable wire length and the minimally achievable wire length determines the edge weight used in the cost function of the min-cut bi-partitioning algorithm.
Cycle Time Optimization by Netlist Transformations
In order to be able to integrate general Boolean optimization into a placement algorithm. the logic transformations to be used musl fulfill some important requirements.
1. The optimization framework should not be restricted tc, technology-independent circuit descriptions, but should facilitate optimization of mapped netlists. 3. The transformations should make maximum use of the existing optimization potential. i.e., they should not be restricted to local netlist transformations. 3. Although the scope of the optimizations must not be local, the optimization process as a whole should be decomposable into a series of individual optimizing operations that each affect only a limited number of gates.
In order to meet these requirements, we use so-called implicaritbased circuit transformations [lo] . Each transformation consists of the following two steps:
V-360 I . Calculation and insertion of implicants for a network function using an AND/OR reasoning technique called wciirsive leal-rlillg [IO] . 2. Identification and removal of redundancy using ATPG. With these two steps, the structure of the circuitry implementing an internal network function is modified, however, the logic function is not changed. Each transformation only affects a limited number of gates. However, as can be proved [lo], arbitrary circuit transformations including the conventional synthesis techniques such as functional decomposition, kerneling, transduction, can be described using this two-step methodology. Implicant-based network transformations have already been applied very effectively in technology-independent multi-level logic optimization. Fig. 2 shows an example how the delay of the critical path (shown as bold lines) can be reduced by implicant-based network transformations. Fig. 2(a) shows the original circuit. In the first step recursive learning finds the implication (U = 0) + (7) = 0). The implicant f is added to the function at :r/ using an additional AND gate ( Fig. 2(b) ). The logic function of the circuit does not change by this modification. In the second step, two gates are identified as redundant using ATPG and removed (labeled 'X' in Fig. 2(b) ). The resulting circuit is shown in Fig. 2(c) . As can be seen, this transformation has made the critical path significantly shorter. In our approach, these implicant-based logic circuit transformations are tightly integrated into the placement algorithm (see Fig. 1 ). Fig. 3 shows the flowchart for the netlist transformation algorithm.
First, for every signal in the circuit a set of implicants is calculated and saved. Then, for each implicant, a circuit transformation following the above-mentioned two-step methodology is performed. By inserting the implicant, additional redundancy may be introduced which can occur anywhere in the circuit, not only in the immediate vicinity of the implicant. All redundant connections and gates are removed by ATPG-based redundancy elimination. It is this step by which a delay improvement may be achieved. If redundancy elimination removes gates or inputs to gates on the critical path, the cycle time of the circuit may be improved. For this reason, only transformations which yield redundancy are considered further. They are called transformation cmzdidotes.
Inserting an implicant either results in adding a single wire or in adding a new cell with additional wires. In the former case, the current placement is not changed. In the latter case, the new cell is assigned to a region such that the corresponding wires have minimal length.
In redundancy removal, wires as well as cells may be removed. Again, removal of a wire does not change the placement. However, if one or more cells have to be removed, they are also deleted from their partitions.
After the insertion or removal of cells, the cell area allocated for a partition does no longer correspond to the sum of the cell sizes in the partition. Therefore, after a transformation, the new region sizes have to be calculated. Note that this is the only layout correction step needed in our algorithm.
For each transformation candidate, a static timing analysis is performed and the maximum path delay is calculated. If a transformation reduces the maximum path delay, it is saved in the cnizdidate set. From all transformation candidates, the one which yields the greatest improvement in cycle time is performed and removed from the set.
After a transformation has been performed. other transformations in the candidate set may become invalid, because the corresponding implications no longer exist or the corresponding gates have been removed from the circuit. Also, the cycle time may have changed. Therefore, after a transformation. all remaining candidates are checked for validity and are removed if invalid. Then, using static timing analysis, the cycle time improvement is recalculated for the valid candidates. Transformation candidates yielding no improvement are also removed from the set. The results show that our new approach can reduce the cycle time for all benchmark circuits. On the average, a reduction of 14% is achieved. With the second experiment we demonstrated that the cycle time improvement is much smaller if we perform the Boolean transformations before the layout process not using any layout data. For some circuits the cycle time even increases in this experiment because wrong decisions are made during logic synthesis without the correct delay data from the layout.
V-36

CONCLUSIONS
also be performed for optimization during placement. By this interaction between logic synthesis and placement we can use the layout data at the moment it becomes available. Compared to the performance-driven placement of the original netlist, we can reduce the cycle time of the benchmark circuits by 14% on the average. In our experiments, the same logic transformations without any layout data could not improve the circuit cycle time considerably. This shows that a close interaction between logic synthesis and layout generation is indeed necessary.
