In today's design of VLSI high speed circuits, frequency has a major impact on the number of repeaters that needs to be inserted. A microprocessor operating at less than 200Mhz might require several hundred repeaters, while one operating at greater than 5OOMhz may require a number in the thousands. The following paper describes an efficient and simple way to automatically determine buffer placement based on maintaining equal transition time for all gate input signals across the net. A maximum allowable transition time is determined (limited by the frequency of the circuit), and correlated with the interconnect Elmore Delay. A Spice RC model having nodes with physical locations (X, Y coordinates) can be obtained by extraction tools providing standard parasitic format (SPF). This can then be used with the results of the algorithm for repeater placement to determine the exact physical location desired for each repeater.
Introduction
In a high performance VLSI design, signal propagation over on chip metal interconnect is significant due to RC delay. In general, the interconnect delay increases with the square of the length of the line. Delay can be reduced by inserting inverting or non-inverting buffers. A large scale integration design has certain objectives when it comes to inserting repeaters. These design objectives are (a) to minimize interconnect delay and (b) limit transition times while limiting impact on (c) area and (d) power.
Algorithms for post-layout buffer insertion in interconnect wiring were studied by van Ginneken [4] with further contributions by Kannan, Suaris and Fang 131 and by Lillis, Cheng and Lin [4, 61. Given required arrival times at cell inputs and analyzing Elmore delay on a globally routed net, van Ginneken presented a dynamic programming algorithm that positioned and inserted buffers such that the latest possible arrival time at the interconnect driving point could be achieved. Buffers were then connected during a detail route stage. Work by Lillis, et. al. 11 , 21 employed a similar approach in providing polynomial-time algorithms for simultaneous buffer insertion and wire sizing and, later, algorithms for power minimization and multiple-source nets.
In practical application there are additional considerations that must be applied in a large, high performance design. One such consideration is the necessity for signals propagated through wiring with RC degradation to achieve acceptable voltage levels for recognition of logic low and high. This translates into a limitation on signal transition or rise and fall times at gate inputs. In this paper Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 98, San Francisco, California 0 1998 ACM 0-8979 1 -964-5/98/06..$5 .oO we present a simple and efficient method using transition time limitation for post-layout buffer insertion and show how the solution achieved is near optimal, providing a practical trade-off among timing, power and area.
Our repeater insertion method starts with a Spice analysis to fmd the optimum delay and to limit the transition times while not losing track of area and power. The goal here is to find the wire sizing of each metal and buffer size in the design that best fit the above mentioned requirements (delay, area, and power). Once the buffer and the wire sizes have been selected, a correlation between transition time (measured in Spice) and the Elmore delay is performed (Section 2). The Elmore delay corresponding to the transition time (limited by the frequency of the design) can be determined and used as a limiting value in the repeater insertion algorithm. When inserting repeaters, equal transition times are maintained at the input of each gate in the net. The placement of repeaters can be easily obtained given that the extraction tools provide standard parasitic format (SPF) with X and Y coordinates for each node in the net (Section 3). The implementation of this method is described in Section 4 followed by the conclusions in Section 5.
Transition Time and Elmore Delay Correlation

Background
The Elmore delay, as described in [6] , is an efficient way of calculating the delay for an RC tree. A first order approximation of RC delay at any node i on an RC tree is given by the Elmore time constant n
where Rki is the resistance of the portion of the (unique) path between the input and node i, that is common with the (unique) path between the input and node k, and Ck is the capacitance at node k.
The first order approximation of the waveform at node i is
where Ti is the time constant Elmore delay given by (1) . The time at which the voltage at node i reaches any value Vx is determined by r ,
A quick approximation of 20-80% (or 10-90%) transition time can be calculated by substituting Vx with voltages at 20 and 80% of
VDD.
Repeater Sue optimum length between repeaters of 24 and 32 x minimum size. 
130/136
The ratio coefficient of the transition time (20-80%) based on first order approximation is about 1.39.
Transition Time vs Elmore Delay
The resulting ratio of the transition time vs the Elmore delay of the above equation (4) would not be accurate enough for this repeater insertion method. The reason being is that the optimum distance between repeaters, and the size of the repeater itself that will satisfy the above design requirements have been determined by running Spice simulations. Therefore using the above method for correlating Elmore delay and transition time is inaccurate. The idea is to determine a limiting value for the Elmore delay, that corresponds to the maximum transition time obtained by running Spice simulations.
Once the optimum wire lengths/widths, buffer sizes and maximum transition time allowed for a design, have been determined, a comparison of the Spice waveform vs the above first order waveform approximation (2) can be performed. This can be accomplished by finding the ratio coefficient between transition time (measured in Spice for optimum wire length/width and buffer size) and the Elmore delay.
To find the optimum delay in our design, that will not affect the area and power, a set of Spice simulations were run. This was done for different buffer and wire sizes. Since area and power for our processor were strict limitations, the buffer and wire sizes have been chosen accordingly. kr a limit for metal 1-4 wire width/ space, a double the minimum width (2w) and a triple the minimum space (3s) was enough to siititisfy our area and power goals. A limitation to buffer sizes was also necessary due to the fact that a number of these were placed in the channels (32 x minimum size).
Placing large buffers in the channels and making wires unnecessarily wide would present significant blockage to the routing. This would have a signilicant impact on the area, due to wire congestion. Figure 1 shows the result of running Spice on
OuClu*@*).., Figure 1 . Delay per mm for 1 . 5~ Wire width, 1/1.5/2 w space, and 32/24/16 x minimum buffer size
As an observation, when the width or space between wires decreases the delay/" increases. Different wire widths and spaces were chosen for our design based on needs. For instance, for critical nets that required optimum delays, wide widths and spaces were selected. In aU the other nets, more conservative solutions were chosen such that area and power would not be affected. Table 1 shows some measurements of delay/mm and the As an example, it was found that, for a metal 1-4 wire with 1.5 w width, 2w space and 24 x minimum size buffer, the minimum delay per mm (including repeater delay) occurs at 2.5 -3.0" between the chosen sized repeaters (see figure 1) . Looking at the curve (figure 1). the delay per mm does not change much between 2.5" and 3. 5". Since our main goal is to insert as less repeaters as possible while still be close to the optimum delay, 3.5 mm looks like a better option than 2.5 and 3. 0". 
176/184
Using this method, we have reduced the number of repeaters by -29%. As a result area and power are reduced considerably.But there is a limit to how long a wire can be in between two repeaters at a certain frequency. For our example, as shown in figure 2 , the waveform dies out due to RC degradation as the length is increased. Making the wire longer than 3.5 mm will considerably affect the transition time to a point where the waveform does not reach the voltage ramps (VDD and VSS). The length that we have chosen initially (3. 5") looks safe. The voltage waveform corresponding to 3.5 mm is captured and shown in figure 3 . Also the corresponding first order approximation (see equation (2)) of the same wire is overlapped on the same plot for comparison purposes. The Elmore delay constant Ti in (2) was calculated for the 3.5" wire and used in the first order approximation. Thus Equation (2) becomes a simple exponential function of time. As can be observed, the waveforms almost match at the 50% of VDD point, but not at 20 and 80% points.
IE '* . e
Figure 3. Spice waveform and first order approximation
To get the Elmore. time constant that corresponds to the transition time at the load, a ratio of the Elmore delay and transition time was plotted with respect to the wire length (see figure 4) . The ratio corresponding to 3.0 mm of wire is 0.98 and for 3. 5" is 1.01.
Figure 4. Ratio of Elmore delay and transition time vs length
It is interesting to notice that the ratio does not change much as the wire length between repeaters increases. The ratio seems to be in the range of -0.9 -1.1 for lengths greater than lmm.
While we have only described the Elmore delay and transition time correlation only fix point to point wires, this procedure can also be applied for more complex branching nets. Experiments showed that, for branching nets with total wire lengths similar to the ones described above, the ratio does not differ significantly from the above mentioned range (0.9 -1.1).
Insertion Method
After the maxitnum transition time for a particular circuit performance has been determined, and the correlation with the interconnect Elniore delay has been performed, a maximum allowable RC delay is set as input to our algorithm. In addition an optimum repeater is determined by Spice simulations and a timing model is created. To obtain RC data from physical design, an RC extraction tool can be used, that can provide an SPF (Standard Parasitic Format) output to be used by the insertion tool. Standard Parasitic Format is a way to represent RC data and it is compatible with the Spice format. Block timing models are also needed to provide driver and load information.
In an SPF (Standard Parasitic Format) data, the RC nets are represented as lumped RC models (see figure 5 ). To be able to insert repeaters, there is a need for transforming the lumped RC model into a distributed model.
Figure 5. Lumped to distributed RC model transformation
Once the distributed capacitance has been calculated, the Elmore delay to each of the loads in the net from driver is calculated (figure 6). The repeaters are being inserted starting from the load with the highest Elmore delay towards the driver. While parsing through the RC net segment by segment, Elmore delay is calculated considering a repeater as driver of the segment. If the delay exceeds the maximum, a repeater is inserted in the segment, forming a new subnet. The process starts again from the highest Elmore delay load. In the case of branching, the assumed branching node is moved from an assumed repeater input to a repeater output. This continues until the maximum delay is exceeded, in which case a repeater is inserted with its output at the branching node (figure 7). The process continues until all the pents in the net have been explored. To optimize the total delay of the net with the repeaters inserted, there are some special cases where a repeater could be removed if it is too close to the driver. This is presented as step (10) coordinates) can be obtained by extraction tools providing standard parasitic format (SPF) data. This can then be used with the results of the above algorithm for repeater placement to determine the exact physical location desired for each repeater. This can be done by calculating the X, Y coordinates for the point at which the repeater is to be inserted along a segment extending between two nodes with coordinates given.
Implementation
The above method has been implemented in the design of Ultra Sparc III microprocessor at Sun Microsystems. A repeater map that shows the distribution of repeaters throughout the chip is shown in figure 9 (a similar miip can be obtained which indicates the number of repeaters per square mm) as a result of running the repeater insertion program on SPF data with node coordinates. This is not to be considered as the h a l picture of where the repeaters are physically located in the chip, but rather the optimum locations from the timing point of view. Since placing the repeaters at those exact locations was not physically possible, a more realistic approach was to place blocks of repeaters in the very heavy areas, as shown in the map by the darker color indicating a large number of repeaters. The rest of the repeaters can be placed in the vicinity blocks. After blocks of repeaters are placed in the floorplan in the designated locations (indiczted by the dark color areas or by the number of repeaters per square mm) and the remaining ones in the logic blocks, nets are rerouted with a routing tool. Timing and distance constraints can be put on certain nets to enforce optimum routing, but even that will not give the needeed results after the first shot. Several iterations are required through: Fullchip RTL, Routing, RC extraction, Fullchip Timing Analysis (mainly identifying the critical paths arid long wires) and back, until a satisfymg result is obtained in terms of area, power and timing.
Conclusion
This paper presents a practical repeater insertion method that satisfies the three main goals in designing a high speed microprocessor: delay, area and power. We have shown, step by step, how the method has been developed by first (a) identify the maximum wire length between repeaters and transition time by running spice simulations, (b) correlate the transition time with the Elmore delay based on the chosen wire lengths and repeaters resulting in a ratio coefficient, (c) use this coefficient to find the maximum Elmore delay to be. used as an input to our repeater insertion program, (d) develop a simple and applicable algorithm that ensures equal transition time (limited by the frequency of the design), for all the gate input signals across the net and (e) a map of repeater distribution is to be obtained given a standard parasitic format (SPF) with node coordinates specified. The repeater insertion method has been implemented in the design of Ultra Sparc III microprocessor at Sun Microsystems.
Acknowledgments
As a closing statement, we would like to thank Fabian Klass, Farzad Chehrazi, and Tony Todesco from our physical design group for their help and support in the areas of circuit simulation and power analysis.
