Abstract
Introduction
As the process technology advances into the deep submicron era, interconnect plays a dominant role in determining circuit performance and signal integrity [9] . Crosstalk-induced noise significantly affects delay and reduces signal integrity when technology improves, spacing diminishes, and coupling capacitance/inductance increases. As a result, crosstalk has become a comparably important design metric to timing, area, and power in modern VLSI design.
Buffer insertion is one of the most effective and popular techniques to reduce interconnect delay and decouple coupling effects. Traditionally, buffer insertion/sizing is performed during the post-layout stage. As the SIA technology roadmap predicts, the number of buffers inserted for performance optimization will grow dramatically [9] . It is obviously infeasible to insert hundred of thousands buffers during the post-layout stage when most routing region is occupied by routing wires. Therefore, it is desirable to incorporate buffer planning into floorplanning to ensure timing closure and design convergence.
Alpert and Devgan [1] presented formulae for computing the optimal number and locations of buffers for timing optimization. Cong, Kong, and Pan in [3, 4] and Sarkar, Sundararaman, and Koh in [6] presented pioneering works on buffer block planning for interconnect-driven floorplanning. Given a slicing floorplan, they computed the feasible regions for buffer insertion, clustered uniform-sized buffers into blocks, and placed these buffer blocks into the dead spaces and channels in the floorplan for timing optimization. Both [3, 4] and [6] considered an independent feasible region (IFR) for inserting a buffer, where the region is determined with the locations of other buffers being fixed. It has been pointed in [6] that for two-dimensional nets, IFRs are not completely independent since the assignments of buffers into their IFRs must ensure a monotonic path from the source to the sink. Further, they do not consider crosstalk noise.
For noise analysis and avoidance, there is much work in the literature. However, most of the previous works are intended for either routing or post-layout optimization. As mentioned earlier, due to the increasing design complexity, it may not be feasible to insert a large number of buffers for noise avoidance during the routing and post-layout stages. Therefore, there is a need to consider buffer planning earlier at the floorplanning stage when there is more flexibility to allocate silicon space for buffers to ensure timing closure and design convergence.
In this paper, we derive formulae of buffer insertion for both timing and noise optimization, and apply the formulae to noise-aware buffer planning for interconnect-driven floorplanning. Our algorithm is based on a threestage approach:
Stage 1 computes the optimal number and locations of buffers to minimize delay for a wire segment, and then computes the feasible regions that meet the timing constraints. Stage 2 computes the feasible regions to meet the noise constraints. Stage 3 finds the intersection of the feasible regions for meeting both the timing and noise constraints, allocate buffers to their feasible regions in dead space or channels, and expand a channel, if there is not enough dead space for buffer insertion.
Experimental results show that our approach achieves an average success rate of 80.9% (78.2%) of nets meeting timing constraints alone (both timing and noise constraints) and consumes an average extra area of only 0.49% (0.66%) over the given floorplan, compared with the average success rate of 75.6% of nets meeting timing constraints alone and an extra area of 1.33% by [3, 4] .
Preliminaries
In this section, we first give some notation, then introduce the delay and noise models considering coupling capacitance.
´Ù Úµ: the noise on the wire segment between two neighboring buffers Ù and Ú. ÅÚ: the noise margin for a buffer or a sink Ú. It is the maximum allowable noise without creating any logic error. We use 0.8V, same as [2] .
: the number of buffers to minimize the delay of a wire.
ÓÔØ: the minimum delay from source to sink with the optimal buffers inserted.
Ü: the optimal length of a wire segment between the source and the first buffer to minimize delay.
Ý: the optimal length of a wire segment between every pair of neighboring buffers to minimize delay. ¡ : the maximum distance that the Ø buffer can be moved away from its optimum location without violating the timing constraint.
: the fixed ratio of coupling to total wire capacitance. We use ¼ , same as [2] . Throughout the rest of this paper, we will use the notation/parameters listed in Table 1 to facilitate our technical discussions. The set of parameters are based on the ¼ ½ Ñ technology given in the NTRS'97 roadmap [8] .
We use a switch-level RC model with an intrinsic buffer delay for a buffer, the -model for a wire segment, and the Elmore model [5] with fringing and coulping capacitance considerations to compute the delay. See Figure 1 for an illustration. Figure 2 shows our noise model which considers coupling capacitance . The coupling capacitance is proportional to the fringing capacitance ( ) and the coupling length (Ð ), and it is inversely proportional to the distance ( ) between the aggressor and the victim nets, i.e., » Ð .
We adopt a more conservative approach and set to be the minimum wire distance Ñ Ò specified by the design rule. Furthermore, we set the coupling length (Ð ) to be two times of the victim net's length (ÐÚ ), which is the most pessimistic situation when the victim net is fully coupled from both sides by two aggressors' nets. As a result,
Further, we adopt ¾ in our model to express the worst case that all the aggressor nets have a different voltage transition with the victim net, for which the Miller effect makes the coupling phenomenon more significant (doubles the coupling effect). It will be clear in Section 5 that even with such most pessimistic estimation of crosstalk, our algorithms still outperform previous works. Note that our algorithms readily apply to other models with less pessimistic estimation.
Consider a wire Ù Úµ, where Ù and Ú are two nodes in a buffered tree. Let the length of the wire segment be Ð , and Ì´Úµ be the subtree rooted at Ú. Á Ì´Úµ is the total downstream current seen at Ú and is the current induced by aggressor nets on downstream wires of Ú. The current on a unit-length wire induced by aggressor nets is ¼ Ô ¼ [2] , where ¼ is the unit-length wire capacitance, and is modeled also as some fraction of the unit-length wire capacitance of the victim net. The resulting noise ´Ù Úµ induced from the coupling current is the voltage pulse coupled from aggressor nets in the victim net for a wire segment Ù Úµ. We have the following noise constraint:
The feasible region for buffers that satisfies the noise constraint [2] is given by
Based on the delay and noise models, we first compute the respective feasible regions¨ and¨Ò for inserting buffer to satisfy the delay and noise constraints, and then find the intersection of¨ and¨Ò to derive the feasible region for buffer that meets both the delay and noise constraints. See Figure 3 for an illustration. 3 Noise-Aware Floorplanning
Feasible Region for Delay
The feasible region for delay,¨ , is the maximum region satisfying the timing constraint ÌÖ Õ [3] for buffer . Consider a single wire with the optimal number of buffers inserted [1] , and let the optimal delay be ÓÔØ. Suppose that the -th buffer (½ ) is moved away from its optimum position by a distance ¡ , where ¡ is positive if the buffer is moved forward, and negative otherwise. Considering the worst-case coupling capacitance, in which the unit length wire capacitance become ¼ · ¾ , and placing buffers at equidistance (i.e., Ü Ý Þ as shown in Figure4) , the following theorem specifies a sufficient condition for placing a buffer in a region that meets the timing constraint. . With this simplification, it is easy to see that the linear terms in both Equation (5) and Equation (6) 
Placing buffers at equidistance, i.e., Ü Ý Þ as shown in Figure4),
we have:
The theorem thus follows.
To achieve the maximal feasible region, we derive the following two corollaries demonstrating how to apply the concept of degree of freedom. If all buffers can be shifted along only the same direction, the constant multiple changes from´¾ ½µ to . We have the following corollary.
Corollary 1.1 If a buffer can shift either forwards (positive sign) or backwards (negative sign), the maximal feasible region ¡ for a buffer is given by:
¡ Ö ÌÖ Õ ÓÔǾ ¾ ½µÖ¼´ ¼ · ¾ µ (9)
Corollary 1.2 If all buffers can be shifted in only the same direction, the maximal feasible region ¡ for a buffer is given by
It should be noted that a larger feasible region can be obtained under more restricted conditions. For example, if only Ö out of the buffers are moved, we may replace with Ö in the above two corollaries and get a larger feasible region. When we have Ö ½, that is, only one buffer is moved, both corollaries lead to the solution:
The same result can be obtained if in Corollary 2 all buffers are moved with the same distance in the same direction.
Feasible Region for Noise
The Feasible region for noise¨Ò is the maximum allowable length in each net satisfying the noise margins before buffer insertion. To estimate the feasible region for noise¨Ò , we apply the noise formulae [2] as listed below.
The induced noise current on wire segment ´Ù Úµ is computed by Á ¼Ð . To satisfy the noise constraint, a buffer can be inserted at Ù in Equation (2), where the range of feasible region Ï´¨Ò µ for buffers satisfying the noise constraint is computed from Equation (3).
We adopt 2-terminal nets as inputs, and the approach is to scan from the sink × with the given Å´× µ to the ×¼. Since the accumulated crosstalkinduced current Á Ì´Úµ is zero for 2-terminal pins, the noise formula is modified to:
According to the above formulae (12) (13) of Ï´¨Ò µ, we have the feasible region for noise¨Ò :
In the above Equation of¨Ò (14), the¨Ò is the maximum length from next buffer ·½ back to without resulting in logic error. Figure 3 shows the intersection area of¨ and¨Ò . This intersection for each buffer guarantees to meet both the timing constraint and the noise bound. pin nets  a9c3  147  1202  22  1613  ac3  27  212  75  446  ami33  33  123  43  363  ami49  49  408  22  545  apte  9  97  73  172  hc7  77  449  51  1450  hp  11  83  45  226  playout  62  2506  192  2150  xc5  50  1005  2  2275  xerox  10  203  2  455   Table 2 : Statistics of the MCNC benchmark circuits.
The NBF algorithm shown in Figure 5 is a three-stage approach: Table 3 : Experimental results on the MCNC93 benchmark circuits (Both NBF and NBF ²Ò ran on a 400 MHz SUN Ultra 60 workstation, while BBP ran on a 500
MHz Intel Pentium-III PC).
Stage 1: As shown in Figure 4 (a), we compute¨ . For each net, compute the optimal number , and the distance Ü, Ý, and Þ. Also we have ÓÔØ from the source ×¼ to the sink. For each buffer inserted in this net, compute the buffer position and its feasible region for timing¨ . Stage 2: Illustrated in Figure 6 , we use both buffer movement and buffer insertion to meet the noise constraint. For each buffer location , from back to ½, we find¨ ,¨Ò , and the intersection (¨ ) of¨ and¨Ò . There are three possible relations between¨ and¨Ò as follows.
-Case 1 (with buffer insertion): There is no intersection between¨Ò and¨ . We apply buffer insertion to meet the noise constraint (see Figure 6 (a)). Stage 3: The third stage is to allocate buffers to¨ . We allocate buffers to their feasible regions in dead space or channels within the routing region. If there is not enough capacity in dead space and channels, we will expand a channel. In this case, the height and/or the width of the whole chip is expanded and the extra area is computed accordingly. Note that we must recompute the timing and noise for all nets involved in the expanded channel to check if both constraints are still satisfied. 
Experimental Results
The NBF algorithm was implemented in the C language on a 400 MHz Sun Ultra 60 workstation and experimented on the MCNC benchmark circuits used in [3, 4, 6 ] (see Table 2 for the circuits and their statistics). We compared NBF ²Ò (the NBF algorithm with both delay and noise considerations) and NBF (the NBF algorithm with delay optimization alone) with the buffer block planning algorithms BBP presented in [3, 4] based on the same data, initial floorplans, delay budgets and parameters generated in [3, 4] .
The parameters for interconnects and buffers are given in Table 1 . All parameters used here are the same as [2, 3, 4] . Same as [2] , we adopted the rise time for an aggressor net as 0.25 ns, the power supply voltage as 1.8 V, and noise margin as 0.8 V. Same as [3, 4] , all nets were 2-pin connections and the power/ground nets were excluded. Also, we used the delay constraints provided by the authors of [3, 4] , which were originally randomly generated in ½ ¼ ÌÓÔØ ½ ¾¼ÌÓÔØ . (Note that we did not compare with [6] because it used different constraints.) Table 3 gives the results obtained from the BBP, NBF , and NBF ²Ò algorithms. The columns "#Net%Net Meet" give the number/percentage of the nets that meet the delay constraints alone for BBP and NBF , and both the delay and noise constraints for NBF ²Ò . The columns "#Buffers Inserted" list the total numbers of buffers inserted to satisfy the given constraints. The columns "Area Expansion Ratio" report the percentages of areas increased after buffer insertion, computed by´Ø Ò Û Ô Ö Ø ÓÖ Ò Ð Ô Ö µ Ø ÓÖ Ò Ð Ô Ö . The columns "CPU Time" give the running times in second for the algorithms.
As shown in Table 3 , NBF (NBF ²Ò ) achieves an average success rate of 80.8% (78.2%) of nets meeting the timing (both timing and noise) constraints, uses only 761 (966) buffers in total, and consumes an average extra area of only 0.49% (0.66%) over the given floorplan, compared with the average success rate of 75.6% meeting timing constraints, 1782 buffers inserted, and extra area of 1.33% resulted from BBP. The experimental results show that NBF and NBF ²Ò perform much better than BBP, and the overhead for NBF ²Ò to meet the additional noise constraints is quite small (2.6% fewer nets meeting the constraints and a 0.17% larger area). Further, our algorithms tend to obtain large gains for larger circuits; e.g., for the largest circuit xc5, NBF (NBF ²Ò ) achieves an average success rate of 93.7% (92.0%) of nets meeting the timing (both timing and noise) constraints while BBP obtains the average success rate of only 73.2% of nets meeting the timing constraint alone. The results reveal that our feasible region formulae for both delay and noise are very effective in improving multi-metric performance by consuming very small resources. The running times for our algorithms are reasonable. (Note that the CPU times reported in Table 3 are based on different machines.)
