Chemical-mechanical planarization (CMP) and other manufacturing steps in very deep-submicron VLSI have varying effects on device and interconnect features, depending on the local layout density. To improve manufacturability and performance predictability, area fill features are inserted into the layout to improve uniformity with respect to density criteria. However, the performance impact of area fill insertion is not considered by any fill method in the literature. In this paper, we first review and develop estimates for capacitance and timing overhead of area fill insertions. We then give the first formulations of the Performance Impact Limited Fill (PJL-Fill) problem with the objective of either minimizing total delay impact (MDFC) or maximizing the minimum slack of all nets (MSFC), subject to inserting a given prescribed amount of fill. For the MDFC PIL-Fill problem, we describe three practical solution approaches based on Integer Linear Programming (ILP-I and ILP-II) and the Greedy method. For the MSFC PE-Fill problem, we describe an iterated greedy method that integrates call to an industry static timing analysis tool. We test our methods on layout testcases obtained from industry. Compared with the normal fill method [3], our ILP-11 method for MDFC PIL-Fill problem achieves between 25% and 90% reduction in terms of total weighted edge delay (roughly, a measure of sum of node slacks) impact while maintaining identical quality of the layout density control; and our iterated greedy method for MSFC PIL-Fill problem also shows significant advantage with respect to the minimum slack of nets on post-fill layout.
INTRODUCTION
* This research was sup orted by a grant from Cadence Design Systems, Inc., and by the !hRCO/DARF'A Gigascale Silicon Research Center.
Chemical-mechanical planarization (CMP) and other manufacturing steps in nanometer-scale VLSI processes have varying effects on device and interconnect features, depending on local attributes of the layout. To improve manufacturability and performance predictability, foundry rules require that a layout be made uniform with respect to prescribed density criteria, through insertion of area j l l ("dummy") geometries.
All existing methods for synthesis of area fill are based on discretization [3, 41: the layout is partitioned into tiles, and filling constraints or objectives (e.g., minimizing the maximum variation in feature area content) are enforced for square windows that each consists of r x r tiles. In practice, then, layout density control is achieved by enforcing density bounds in a finite set of windows. Invoking terminology from previous literature, we say that the foundry rules and EDA tools (physical verification and layout) attempt to enforce density bounds within ? overlapping jixed dissections, where r determines the "phase shift" wIr by which the dissections are offset from each other. The resulting fiwed rdissection (see Figure 1 ) partitions the n x n layout into tiles z, , While area fill feature insertion can significantly reduce layout density variation, it can also change interconnect signal delay and crosstalk by changing coupling capacitance. These changes can be harmful to timing closure flows, especially since fill is typically added as a physical verification or even post-GDSII (at the foundry) step. Therefore, in addition to satisfying density requirements, dummy fill insertion should also minimize per3lormance impact. However, the issues associated with capacitance and area fill are complex and there is no existing published work on performancedriven area fill synthesis.' Our present work assumes that area fill consists of squares of floating fill; we seek a fill placement with minimum delay impact of fill insertion. In the next section, we review related works in the PIL-Fill domain. In Section 3, we briefly review interconnect capacitance estimation models, and describe our simplified capacitance impact and delay impact model for float-'Although this concept has been recent1 mentioned in some startup web sites [lo, 13, 141 , no details of hnctionality are given. Currently, metrological methodologies are used to determine the "best" choice of buffer distance, dummy fill type (grounded versus floating), and dummy fill pattem.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ing fill. Section 4 formulates the PIL-Fill problem with two different objectives, and solution approaches are given in Section 6 and Section 7. Section 8 gives experimental results and we conclude in Section 9.
RELATED WORK
According to Stine et al. [Ill, to minimize the increase in interconnect capacitance that results from area fill, (i) the total amount of added fill should be minimized, (ii) the linewidth of the fill pattern should be minimized, (iii) the spacing between fill lines should be maximized, and (iv) the buffer distance should be maximized. Unfortunately, these guidelines are rather generic. We observe that restricting the amount of dummy fill and increasing the buffer distance has the unwanted effect of limiting the possible improvements in uniformity achieved by fill insertion. Furthermore, such guidelines are not precisely matched to the relevant underlying criteria, e.g., capacitance minimization does not comprehend the delay and timing slack impact of added capacitance. While no work has (in our opinion) yet addressed the PIL-RI1 problem, two related works are of interest.
Work at Motorola by Grobman et al. [8] points out that the main parameters to influence the change in interconnect capacitance due to fill insertion are feature ("block') sizes and proximity to interconnect lines. The larger the size of the block, the larger the consequent interaction between interconnect lines. Similarly, the closer blocks are to interconnect lines, the stronger their interaction will be. When interconnect lines are more sparsely situated, floating fill has greater performance impact.
Work at MIT Microsystems Technology Laboratories [l 11 proposes a rule-based area fill methodology. To minimize the added interconnect capacitance resulting from fill, a dummy fill design rule is found by modeling the effects on interconnect capacitance of different design rules (which are consistent with the fill pattem density requirement). estimate area fill impact on active line delay, we focus on the capacitance increment in the active line due to the fill. In Figure 2 (A), the total capacitance of an active line before area fill is inserted can be written as where CB is the per-unit length capacitance between the active line and its neighboring active line, 1 is the overlap length of the two active lines, FQ is permittivity of free space, is the relative permittivity of the material between the two conductors, and a is the overlapping area between them.
CAPACITANCE AND DELAY MODELS
For the general case (with two rows of dummy fills) in Fig- ure 2(A), the total capacitance between two active lines is where CA is the capacitance between the dummy feature and the active line, and CC is the capacitance between the dummy features. In this equation, w is the dummy feature width, s is the space between dummy features, and k is the number of dummy features between the two active lines. We assume that the floating dummy features have no effect on Ce due to their small size. To simplify the estimation, we use a simple parallel plate capacitance model. We can then approximate the impact of two rows of dummy features by making one combined row of dummy features, as shown in Figure 2 (B).
Generalizing to m rows of dummy features, we obtain the following estimate of per-unit coupling capacitance between two active lines separated by a column of m dummy features:
When w << d, we can further simplify the calculation as a linear one (see Equation (4)), where e''Er$"'W is the incremental capacitance due to dummy feature insertion. Then, the total capacitance between two active lines can be estimated as
(1 -w . k ) .
(5) With respect to interconnect delay, our discussion below will use the Elmore delay model to estimate total delay increase due to area fill. Elmore delay [7] of a cascaded N-stage RC chain is Each node j on the chain contributes to Z N , the product of the capacitance at node j and the total resistance between j and the source node. If the capacitance at node i increases by ACi, the increment of Elmore delay at any node k below node i is (7) Equation (6) implies that Elmore delay is an additive with respect to capacitance along any source-sink path. That is, if we add the coupling capacitance C, at position x, the delays at all nodes below the position x will increase by C, . R,. Here, R, is a constant, and equal to the total resistance between the source and the position x (we will refer to this as entry resistance, i.e., an "upstream" resistance).
PROBLEM FORMULATIONS
Performance-impact limited area fill synthesis has two objectives: 0 minimizing the layout density variation due to CMP planariza-0 minimizing the dummy features' impact on circuit perforIt is difficult to satisfy the two objectives simultaneously. Practical approaches will tend to optimize one objective while transforming the other into constraints. In this section, we propose two performance-impact limited area fill problem formulations (PILFill) in which the objectives are to minimize performance impact, subject to a constraint of prescribed amounts of fill in every tile. tion; and mance (e.g., signal delay and timing slack).
Min-Delay-Fill-Constrained Objective
Our minimum delay with fill constraint, or MDFC, formulation2, can be stated as follows. Given a fixed-dissection routed layout and the design rule forjoating square fill features, insert a prescribed amount of fill in each tile such that the performance impact (i.e., the total increase in wire segment delay) is minimized.
Since each tile in the fixed-dissection layout can be considered independently, we may reformulate the MDFC PIL-Fill problem on a per-tile b.asis. In other words, for each tile the following optimization is separately performed.
Given tile T , a prescribed total area offill features to be added into T , a size for eachfill feature, a set of slack sites (i.e., sites available forfill insertion) in T per the design rules forfloating square fill, and the direction of currentflow and the per-unit length resistance for each interconnect segment in T , insertfill features into T such that total impact on delay is minimized.
For this per-tile MDFC PIL-Fill problem, we use the above capacitance approximations (essentially the same as those in [l 11) and the Elmore delay model. Under the Elmore delay model, the impact of each wire segment delay on the total sink delay of the routing net is found by multiplying by the number of downstream sinks. Thus, we define the weight of an active line I as W! = the number of downstream sinks which allows us to directly minimize total sink delay impact over all nets in a given tile.3
Max-MinSlack-Fill-Constrained Objective
A weakness of the MDFC PL-Fill formulation is that we minimize the total delay impact independently in each tile. That is, the impact due to fill features on the signal delay of complete timing paths is not directly considered. Thus, we also propose to maximize the minimum slack of all nets, still subject to a constraint of prescribed amounts of fill in every tile region of the layout. We call 2We have also studied a minimum variation with delay constraint formulation, but it is less tractable to optimization heuristics and we do not discuss it here. 3This objective, which is correlated with total impact on sink actual arrival times, brings us closer to the ideal of being timing-slack driven. this a maximum min-slack withfill constraint, or MSFC, formulation. Given afixed-dissection routed layout and the design rule forfloating square fill features, insert a prescribed amount of fill in each tile such that the minimum slack over all nets in the layout is maximized. We use a commercial Static Timing Analysis tool (Cadence Pearl) to extract slacks at all pins of each net in the layout.
GEOMETRY COMPUTATION
The key computational geometry task in solving PIL-Fill problems is to find all pairs of parallel active line segments, as well as the slack space (i.e., empty sites where fill geometries can be inserted) between each such pair. Without loss of generality, we assume that the routing direction is horizontal on the selected layer.
We define a slack site column as a column of available sites for fill features between two active lines or between an active line and a layout boundary. A slack block is a maximal contiguous set of slack site columns having equal height (and, due to the fill site grid, equal width). Figure 3 shows seven such slack blocks in a tile. As an example, the fill features located in the slack block C in Figure 3 will affect the coupling capacitance on active lines 1 and 6. We also define the size of a slack site column as the number of empty sites in the column available for fill insertion.
To find such slack columns in the layout, we first obtain the position of each active line. After sorting the active lines according to y-coordinates (for horizontal routing direction) or x-coordinates (for vertical routing direction), we scan the whole layout from the bottom boundary (for horizontal routing direction) or from the left boundary (for vertical routing direction) to find the slack columns between active lines or between boundary and active line.
APPROACHES FOR MDFC PIL-FILL

Integer Linear Programming Approach I
In our flow, we calculate post-routing interconnect delay after obtaining routing information from a DEF file. From the analysis in Section 3, we know that the columns of dummy features have the additivity property with respect to coupling capacitance, and we can approximate the coupling capacitance of m dummy features in one column by a linear function (4). Without loss of generality, we assume the routing direction on the layer is horizontal, and we also ignore any wrong-direction routing. The MDFC PIL-Fill problem is then captured by an Integer Linear Programming formulation. We first make the following definitions. The objective function (8) implies that we minimize the weighted incremental Elmore delay caused by dummy feature insertions. L is the total number of active lines in the tile. Constraints (9) ensure that the total number of covered (i.e. used) slack sites is equal to the number of dummy features. Constraints (10) are used to capture the incremental capacitance caused by mk dummy features in column k between each pair of active lines. Here, a is the overlapping area between two active lines per slack column, dk is the distance between them, and w is the dummy feature width. Equations (1 1) capture the total Elmore delay increment due to dummy feature insertions in all slack columns along the active line 1 in the tile. (RI + E$=, rl) is the total resistance between the source and the position k on the active line 1 in the tile. p is the x-coordinate of the leftmost point of the active line in the tile; k is overloaded to also denote the xcoordinate of slack column k.
Constraints (12) ensure that the number of covered slack sites in any column is no greater than the column size (capacity).
Integer Linear Programming Approach I1
~~
In the previous subsection, we used the linear approximation for coupling capacitance between two active lines after dummy fill insertion. This is not accurate when the dummy feature width is not substantially less than the distance between the two active lines. Since (i) all dummy features have the same shape, (ii) the potential number of dummy fill features (and their positions, given the fixeddissection layout) in each slack column is limited, (iii) the size of any slack column is also limited, and (iv) the other parameters ( E~, er, and d ) in Equation (3) are constant for each pair of active lines, we can pre-build a lookup table f ( n , d ) that gives the capacitance increment for inserting n fill features between any pair of active lines that are separated by distance d . Based on the lookup table, a more accurate ILP formulation can be given. We add the following definition to our terminology. (16), (17), (18), and (21) 
Greedy Method
From Equation (1 l), the impact on delay due to the dummy features depends on the total resistance between the source and the current node. Our final algorithmic approach for the MDFC PILFill problem is to greedily insert dummy features along active line segments where the incremental delay is minimum. This greedy approach is described in Figure 4.4 
ITERATED APPROACHES FOR MSFC PIL-FILL
To maximize minimum slack over all nets in the post-fill layout, we propose an Iterated Greedy approach based on iterations between the static timing analysis (STA) tool and the area fill synthesis. Performance impact due to fill feature insertion during area fill synthesis is written in Reduced Standard Parasitic Format (RSPF) as a file input to STA tool. This approach uses the same capacitance and delay models as in the MDFC PIL-Fill approaches. After obtaining the density requirements from normal area fill synthesis and slack site columns from the scan-line algorithm, we run the industry STA tool to get the slack values of all input pins in the layout and set the slack of each active line as the minimum slack of its downstream input pins. We consider the slack value of a given slack site column to be the minimum slack of its neighboring active lines. Then, all slack site columns are sorted according to their slack values. Among them, the slack site column with maximum slack value is chosen for fill feature insertion. For each tile intersecting with this slack site column, the number of fill features actually inserted in the column is dependent on the number of required fill features of the tile, the overlapping size of the slack site column, and the column's slack value. Once a feasible number of fill features has been inserted into 4As presented, the Greedy algorithm will tend to insert fill close to the active line with minimum resistance. This may lead to worsening of critical path delay and hence cycle time in some pathological cases, compared to random fill insertion. This can be circumvented by placing an upper bound on the added net delay. 
5.
Calculate entry resistances R,(p,q) of Ni in its intersected tiles 6. Find signal directions of Ni in its intersected tiles 7. Run scan-line algorithm to get slack site columns in layout 8. For each tile Ti, Do 9.
10.
11.
12.
Calculate induced coupling capacitances of column k 13.
14. 15.
16.
17.
18.
Delete the slack column 19. the tile, the number of required fill features of the tile and the size of the affected slack site column are updated. The added delay is estimated based on our capacitance and delay models, and the slack value of the slack site column updated accordingly. These steps are repeated until fill requirements for all tiles in the layout are met.
To prevent the greedy method from quickly reaching a local minimum, we introduce two variables that enable iterations between STA and area fill synthesis. 0 LBslack gives a lower bound on the slack value of slack site columns. Once the largest slack value of any slack site column is less than LBslack. the filling loop is stopped and a new iteration between STA and area fill synthesis is initiated with smaller LBslack.
0 UBdelay gives an upper bound on the total added delay in the layout. Once the newly added delay during an iteration exceed UBdelay, the filling loop is stopped and a new iteration between STA and area fill synthesis is initiated. Our algorithm is described in detail in Figure 5 , where the following definitions are used. total number of required fill features in the given lay-0 RF out.
0 RI$ e number of required fill features for tile zj. number of inserted fill features in column k in tile zj.
COMPUTATIONAL EXPERIENCE
We have tested our proposed algorithms using five layout test cases, denoted T1, T2, T3, T4 and T5, obtained from industry sources. Each of the test cases was obtained in LEFDEF format. 
Update RSPF file with the capacitance increase Table 1 reports the total delay increase over all wire segments due to the "normal" fill method [3], and due to our three performance-5 0~r experimental testbed integrates GDSII Stream and intemallydeveloped eometric rocessing en ines, coded in C++ under Solaris 2.8. & use C P L h version 7.8 as the integer linear programming solver. All runtimes are CPU seconds on a 300 MHz Sun Ultra-10 with 1 GB of RAM. impact limited fill methods. As shown in the table, all total delay increases from the PIL-Fill methods are better than the total delay increase resulting from the normal fill method [3] . Among the PIL-Fill methods, the ILP-I1 method has the smallest delay increase (e.g, up to 90% reduction in non-weighted total delay increase for case T1/32/2, compared to the normal fill result) and its runtime is reasonable. The Greedy method is better than the ILP-I method, but not nearly as good as the ILP-I1 method. The linear approximation used in the ILP-I method apparently suffers from excessive loss of accuracy. For example, for cases T1/32/8, T1/20/2, and T1/20/4, the results from the ILP-I method are even worse than the normal fill results. Our experiments also show that the improvement in total delay impact depends on dissection size. As explained above, when the dissection becomes too fine-grain, it becomes harder to consider the total impact of a slack site column
MDFC PIL-Fill Experiments
CONCLUSIONS AND FUTURE RESEARCH
In this paper, we have developed approximations for the capacitance impact of area fill insertion, and given the first formulations for the Performance Impact Limited Fill (PIL-Fill) problem. We have presented two Integer Linear Programming based approaches and a Greedy method for the MDFC PIL-Fill problem, as well as an iterated greedy method for the MSFC PIL-Fill problem. Experiments on industry layouts indicate that our PIL-Fill methods can reduce the total delay impact of fill, or the impact on minimum slack, by very significant percentages.
Our ongoing research is focused on budgeting slacks along segments so that computationally expensive iteration with STA can be avoided in the optimization procedure. Other research addresses alternative PIL-Fill formulations, e.g., wherein an upper bound on timing impact constrains the minimization of layout density variasince we handle the overlapping tiles separately. tion. Table 2 shows the results from the weighted performance-impact limited fill methods. Similar to the non-weighted PIL-Fill results, the ILP-I1 method gives the best solution quality (e.g., up to 93% reduction in weighted total delay increase for case T 1/32/2, compared to the normal fill method) and retains its practicality.
MSFC PIL-Fill Experiments
I Testcase II Oris Lavout II DenConstr I Normal I i M S C P K l MaxDen: maximum window density on layout; MinDen: minimum window density on layout; DenConstr: density requirement specified as a minimum post-fill window density; MSFC-P E : results of MSFC PIL-Fill method; minSlack: minimum slack over all nets (ps).
In Table 3 , we compare the minimum slack of all nets after the "normal" fill method and after our performance-impact limited fill method, where the density requirement is specified as a post-fill minimum window density. Our experiments show that the fill results from the "normal" fill method may be unacceptable with respect to the minimum slack of nets since these slack values become close to 0 or negative. In contrast, our iterated greedy method for MSFC PIL-Fill performs much better and all post-fill minimum slack values are much larger than 0. The differences between the minimum slack values of "normal" fill result and MSFC PIL-Fill result are show substantial advantages of our approach.
