Abstract
Introduction
Performance optimization has always been a critical step in the design of intcgrated circuits. Process technology scaling has madc interconnect performance more dominant than transistor and logic performance. With the continued scaling of process technology, the interconnect resistance per unit length continues to increase, the capacitance per unit length remains roughly constant and logic delay continues to decrease. These trends have caused interconnect delay to become more dominant than logic delay. Process technology options, such copper wires, can only provide temporary relief. The trend of increasing interconnect dominance is expected to continue.
Interconncct-driven timing optimization techniques, such as wire sizing, buffer insertion and gate sizing have gained widespread acceptance in deep submicron design [7] . In particular, buffer insertion techniques have been successful in reducing interconnect delay. To the first order, interconnect delay is proportional to the square of the lcngth of the wire. Inserting buffers effectively divides the wire into smaller segments, which makes the interconnect delay almost linear in terms length (plus the buffer deiays).
Additional advantages of buffer insertion will make this optimization even more pervasive as thc ratio of device to interconnect delay continues to decrease.
Several works study delay-driven buffer insertion. Closed formed solutions for 2-pin nets are proposed in E13 [4] [ 5 ] and 191. In 1161, Van Ginneken develops a dynamic programming algorithm which finds the optiinul buffer placement under the Elmore delay model [lo] . In [ 121, Lillis Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 98, San Francisco, California 01998 ACM 0-89791-964-5/98/06.. $5.00 et al. extends this algorithm to simultaneously perforin wire sizing, while also minimizing the total number of buffers.
Finally, Alpert and Devgan [ I ] propose a wire segmenting pre-processing algorithm to handle the one buffcr per wire limitation of Van Ginncken's algorithm, which results in a smooth trade-off between solution quality and run time.
Although timing optimization has always been critical in the design process, present day design techniques and process technologies arc making noisc analysis and avoidance as important. The shrinking of the minimum distance between adjacent wires has caused an increase in their coupled capacitance. Furthermore, as the ratio of wire thickness to width continues to increase, so will the ratio of coupling to total capacitance. Coupling capacitance can cause a switching net to induce noise onto a neighboring net, resulting in an incorrect functional response. Further, the widespread use of dynamic logic circuits has madc noise avoidance even more critical since these logic families are more susceptible to noise failure. It is no longer sufficient or even acceptable to optimize only for delay. Noise avoidance techniques must become an integral part of the performance optimization environment. Buffer insertion provides a suitablc platform for optimizing both timing and noisc. Figure l(a) shows the noise et'fect that an aggressor net (top) can have on a victim net (bottom). The coupling capacitance may cause an input signal on the aggressor net to induce a noise pulsc on the victim net. If the rcsulting noise is greater than the tolerable noise margin of the sink, then an electrical fault results. Figure l(b) shows how inserting a buffer can distribute the capacitive coupling between the two newly created wires. resulting in smaller noise pulses on the input of the insertcd buffer and on the sink. If the amplitudc of these noise pulses are less than he noisc margins for the sink and the buffer, then the circuit will hnction correctly.
Noise analysis is typically performed through detailed circuit simulation or through reduced order interconnect analysis (e.g., AWE[ 131 and RICE [ 151) . Although the latter is more efficient, it is still too slow to be used within an optimization tool. Instead, we adopt the noise metric of [8] . The rest of the paper is as follows. Section 2 prcscnts notation and definitions. In Section 3, we derive a formula for the maximum wire length such that no noise violation is induced and also present two optiinal algorithms for noise avoidance. Section 4 presents a third algorithm for minimizing delay such that all noise constraints are satisfied. Finally, Section 5 presents experimental results.
Preliminaries
A routing tree T = ( V , E ) contains a set of n - 
Delay Optimization
As in [I] (1)
The Elmore delay for a wire e = (u, i~) is given , b y D e l a y ( e ) = R1,(C1,/2 + C, .,). The delay through a gate
Each sink si has a given required arrival time R A T ( s i ) , and assume that the input signal arrives at the source node at time zero. The condition b'si E SI, Deln,v(so-si) 5 I A non-binary tree can be convtxted into a binary tree by inserting wires with zero resistance and capacitance where appropriate. A buffer placed on an internal node with degree d is interpretcd as having one input, onc output, and d -1 fanouts.
Noise Avoidance
In [SI, Devgan proposes a coupled noise estimation mctric which is an upper bound for RC and overdamped RLC circuits. The metric depends on the resistance of the victim net, the resistance of the gate driving the victim nct, coupling capacitanccs to thc aggressor nets, and the rise times and the slopes of the signals on the aggressor ncts. For cxample, consider thc three aggressor nets and thc singlc 2-pin victim net in Figure 2 . The wire in the victim net is segmented into seven new wircs such that each new wirc is completely couplcd to either 0, 1 or 2 of the aggrcssor nets.
f u f u The coupling Capacitance from an aggressor net can be modeled as some fraction of the wire capacitance of the victim net. Given t siinultancously switching aggressor ncts I , .. ., t near wire e , let h , , . . ., h, be the ratios of coupling to wirc capacitance from thc aggrcssor ncts to e , and let 0,. .... 0, be the slopes ( i t . , power supply voltage over input risc time) of the aggressor net signals. The total cur-rent I,, induced by the aggressor nets on P is
Often, information about neighboring aggrcssor ncts is incomplete, especially when buffer insertion is pcrfonncd before routing. When performing buffer insertion in mtimution mode, one might assume that (i) each wirc is coupled to exactly one aggressor net, (ii) the slopc of all aggressor nets is 0, and (iii) some fixed ratio h of thc total capacitance of each wire is due to coupling capacitance. Under thcse assumptions I , = hfiC, for each wirc e .
Let IT( ,,) be the total downstream current seen at v , i.c.,
Each wire adds to the noise induced on thc victim nct. 
S ( v ) = t n i n s i E S I , N M ( s i ) -N o i s r ( v -s i ) .
Noise slack serves cquivalcntly ad"a noise margin for internal nodes. Noise constraints for downstream sinks in T ( v ) are satisfied
if and only if N S ( v ) is greater than the noise seen at v .
if there are no noise violations.
Problem Formulations
We study two different buffer insertion problems. The first problem seeks to fix all noise violations with the fewest possible buffers. Delay is not considered 
This problem may be useful for non-critical nets, for which delay optimization is unnecessary. For timing-critical nets, we must consider both noise and delay at the same time.
Problem 2: Given a tree T = ( { s o } u SI u I N , E ) , a buffer library 5 , and noise margins y~( v )
A third formulation can seek to minimize the total number of buffers inserted by while satisfying both noise and timing constraints. Algorithm 3, which is used to solve Problem 2,
can also be applied to address this third formulation using 
Noise Constrained Buffer Insertion
We begin with the simplest case of a wire with uniform width and neighboring coupling capacitance, as shown in 
NS(,YO) = NS(so)-R,,,lj-(,v,')
if N S ( s o ) < 0 , crcate internal node w at so. Set M ( M J ) = h . Algorithm 1, Noise Avoidance for Single-Sink Trees, is presented in Figure 5 . The algorithm accepts a routing tree, and a single buffer type 0.
Step 1 initializes the current and noise slack of the sink node, then Steps 2-4 climb up the tree visiting each node in turn.
Step 3 examines whether or not a buffer needs to be inserted on the current wire e = ( [ I , V ) , by computing the noise from placing a buffer at u . If this noisc is less than the noise slack, no buffer needs to be inserted, so the algorithm coinputes the downstream current and noise slack for node u , then moves to the next wire. But, if the noise is larger then the noise slack, then a buffer must be inserted.
Step 4 coinputes the maximum length that this buffer may be inserted from v , and inserts it there at a new internal node w . Finally, Step 5 computes the noise slack at the driver and inserts a buffer right after the driver if there is a noise violation (which can only occur if R , Y , ,~ R,,). Optimality follows from the fact that buffers are always inserted their maximal distance up the tree, according to Theorem I . 
Noise Avoidance for Multi-Sink Trees
SI = Algorit/z/n 2( T . / q f r ( v ) ) S,. = Algorithin 2
( T . r i g h / ( v ) )
Set i = 1 and .j = 1 . Insert nodes w,, w,. just after 1' on the lefthand right bran1:hes. If there is a violation, then two new solutions, one with a buffer on the left branch and one a new buffer on the right branch, are generated and inscrtcd into the current list of candidates. When the algorithm terminates. the solution(s) in S with the f'cwcst numbcr of buffers is choscn.
M ( w ) = 6 , a = (I(/',-/), N M ( h ) -I R l , ( l l , -l ) , M )
The algorithm returns an optimal solution to Problcm 1 in time quadratic in /VI . As for Algorithm I, onc can obtain an optimal solution for a buffer library with multiplc buffer types by selecting the buffer type with lcast resistance.
Optimizing Noise and Delay
To address Problem 2, wc modify the approach of Van Ginneken [ 161 to include noise avoidance. Whcnever Van Ginneken's algorithm considers inserting a buffer. we check the noise constraints; if they have been violated, the buffcr is not inserted. Hence. our algorithm generates fewer solutions than Van Ginneken's algorithm since it prunes solutions which have noise violations. is the noisc slack at V , and M is thc current solution. Figure   7 illustrates Algorithm 3, which is the samc as Van Ginneken's' except for the modifications in boldface.
S = Find-Cnncis(so) .

forcach a = ( C T ( , , , , p q ( .~c~) , N S ( v ) . M ) E S do
Step I of Figure 7 segments the wires to gencratc sufficient possible buffer locations.
Step 2 calls Find-Cands which
The algorithm can be modified to handle inverting buffers [I?]. returns a list of candidate solutions.
Step 3 adds the driver delay and computes the noisc slack, then the candidate with the best timing slack, such that noisc constraints are satisfied, is returned in Step 4.
The Find-Cands procedure shown in Figure 8 . It takes the node v as input, recursively computcs the lists of possible candidates for all the nodes in T(v), and then returns thc candidates for v . Find-Cands consists of four main parts:
Steps 1-4 constructs candidates for the children of v and merges them to form S , the set of candidates for v .
Ste 5 inserts considers each buffer t pc in the libra a n 8 adds the buffer which yields the hg., slack s u x that noise constmints are satisfied. A bu fer wlll not be inserted if there is a noise violation. This step is the fundamental difference bctwcen Algorithm 3 and Van Ginnekcn's algorithm [ 161. Step 6 computes the new load, slack, current and noise slack for each candidate induced by the parent wire of \r . BuffOpt climinatcd all noisc problems in the design. DOpt could not fix all noise problems, and thc average delay penalty from using BuffOpt instead of DOpt was less than 2%.
BuffOpt Successfully Avoids Noise
We ran BuffOpt on thc 500 ncts, and observed that Buf'fOpt identified 423 noisc violations and successfully inserted buffers to fix all of them. To verify Buffopt's identification of noise violations, wc ran 3dnoise on the ncts bcforc running BuffOpt. The accurate analysis of 3dnoise idcntificd 386 nets with noise violations, all of which were also idcntified by BuffOpt. BuffOpt identified 423 -386 = 37 more nets with violations, which shows that the noisc metric [8] is slightly conservative. We also ran 3dnoisc on the nets after running BuffOpt, and 3dnoise identified no noise violations. This data is summarized in Table 2 . 
Optimizing Delaiy Alone is Insufficient
We now compare BuffOpt to DOpt (optimal delay-driven buffer insertion) in terms of noise avoidance. Since BuffOpt never inserted more than four buffers on any net, we ran DOpt four times in which no solution was allowed to have more than k buffers where k ranged from 1 to 4. We denote one such run of DOpt by DOpt(k). Tablc Finally, we compare DOpt to BuffOpt in terms of total delay. We first ran BuffOpt on the 500 nets and recorded the buffers inserted for each net. We then ran DOpt for the same number of buffers as BuffOpt inscrtcd in order to make an apples to apples comparison. We computcd the reduction in total delay for each net arid averaged the results by the number of buffers inserted. The cumulative results are presented in Table 4 Observe from the last column that the overall averagc dclay penalty is only 6 ps, or equivalently 1.99%, from avoiding noise. Thus, Buffopt is able to integrate noise into a dclaydriven algorithm with virtually no loss in total delay.
Recall that Algorithm 3 is optimal when thc buffer library contains a single buffer type, but we could not guarantcc optiinality for a largcr buffer library. The DOpt results in Table 4 form an upper bound on the optimal solution to Problem 2 since DOpt is optimal for delay alonc. Obscrvc that even with a buffcr library of size 11, BuffOpt solutions are virtually optimal since they are on average within 2% of an uppcr bound.
