Chemical-mechanical planarization (CMP) and other manufacturing steps in very deep-submicron VLSI have varying e ects on device and interconnect features, depending on the local layout density. To improve m a n ufacturability and performance predictability, area ll features are inserted into the layout to improve uniformity with respect to density criteria. However, the performance impact of area ll insertion is not considered by a n y ll method in the literature. In this paper, we rst review and develop estimates for capacitance and timing overhead of area ll insertion. We t h e n g i v e the rst formulation of the Performance Impact Limited Fill (PILFill) problem, and describe three practical solution approaches based on Integer Linear Programming (ILP-I and ILP-II) and the Greedy method. We test our methods on two layout testcases obtained from industry. Compared with the normal ll method, 3 our ILP-II method achieves between 25% and 90% reduction in terms of total weighted edge delay (roughly, a measure of sum of node slacks) impact, while maintaining identical quality of the layout density control.
INTRODUCTION
Chemical-mechanical planarization (CMP) and other manufacturing steps in nanometer-scale VLSI processes have v arying e ects on device and interconnect features, depending on local attributes of the layout. To improve manufacturability and performance predictability, foundry rules require that a layout be made uniform with respect to prescribed density criteria, through insertion of area l l (\dummy") geometries.
All existing methods for synthesis of area ll are based on discretization 3 4 : the layout is partitioned into tiles, and lling constraints or objectives (e.g., minimizing the maximum variation in feature area content) are enforced for square windows that each consists of r r tiles. In practice, then, layout density control is achieved by enforcing density bounds in a nite set of windows. Invoking terminology from previous literature, we s a y that the foundry rules and EDA tools (physical veri cation and layout) attempt to enforce density bounds within r 2 overlapping xed dissections, where r determines the \phase shift" w=r by which the dissections are o set from each other. The resulting xed r-dissection (see Figure 1 ) partitions the n n layout into tiles T ij , then covers the layout by w w-windows W ij , i j = 1 : : : nr w ; 1, such that each window W ij consists of r 2 tiles T kl , k = i : : : i + r ; 1, l = j : : : j + r ; 1. While area ll feature insertion can signi cantly reduce layout density variation, it can also change interconnect signal delay and crosstalk by changing coupling capacitance. These changes can be harmful to timing closure ows, especially since ll is typically added as a physical veri cation or even post-GDSII (at the foundry) step. Therefore, in addition to satisfying density requirements, dummy ll insertion should also minimize performance impact associated with metal ll. However, the issues associated with capacitance and area ll are complex. Currently, metrological methodologies are used to nd out the \best" choice of bu er distance, dummy l l t ypes (grounded versus oating), and dummy ll patterns. There is no existing published work on performance-driven area ll synthesis. In this paper, our contributions include: a n e w PIL-Fill problem formulation for performance-impact limited area ll synthesis practical integer linear programming formulations, along with a greedy method, for the Minimum Delay, Fill-Constrained variant of the PIL-Fill problem and experimental comparisons of our three approaches, con rming the advantages of our work over previous methods (up to 90% reductions in delay impact compared with the normal ll methods 3 ), and identifying at least one practical method for deployment. In this work, we assume that area ll consists of squares of oating ll we seek a ll placement with minimum delay impact of ll insertion. In the next section, we review related works in the PIL-Fill domain. In Section 3, we brie y review interconnect capacitance estimation models, and describe our simpli ed capacitance impact and delay impact model for oating ll. Section 4 formulates the PIL-Fill problem, and solution approaches are given in Section 5. Section 6 gives experimental results and we conclude in Section 7.
RELATED WORK
According to Stine et al.'s paper, 11 to minimize the increase in interconnect capacitance that results from area ll, (i) the total amount of added ll should be minimized, (ii) the linewidth of the ll pattern should be minimized, (iii) the spacing between ll lines should be maximized, and (iv) the bu er distance should be maximized. Unfortunately, these guidelines are rather generic. We observe that restricting the amount o f dummy ll and increasing the bu er distance has the unwanted e ect of limiting the possible improvements in uniformity achieved by ll insertion. Furthermore, such guidelines are not precisely matched to the relevant underlying criteria: e.g., the capacitance minimization objective is oblivious to the delay and timing slack impact of the added capacitance. While no work has (in our opinion) yet addressed the PIL-Fill problem, in the remainder of this section we review two related works.
Work at Motorola by Grobman et al. 8 points out that the main parameters to in uence the change in interconnect capacitance due to ll insertion are feature (\block") sizes and proximity to interconnect lines. The larger the size of the block, the larger the consequent i n teraction between interconnect lines. Similarly, t h e closer blocks are to interconnect lines, the stronger their interaction will be.
Grobman et al. 8 consider several structures that are expected to represent the most profound e ects. In one limiting case, dense lines e ectively do not su er much from oating ll placement a b o ve and below, since their capacitance is dominated by coupling to neighbors. On the other hand, when interconnect lines are y Although this concept has been recently mentioned in some startup web sites, 10, 13, 14 no details of functionality a r e given. more sparsely situated, oating ll has greater performance impact. The importance of dummy ll size is also examined: large block shapes more e ectively transmit local e ects to their extent, and hence if lling is to be performed over critical paths, use of smaller ll blocks with the same lling density helps limit the increase of interconnect capacitance.
Work at MIT Microsystems Technology Laboratories   11 proposes a rule-based area ll methodology. To minimize the added interconnect capacitance resulting from ll, a dummy ll design rule is designed by modeling the e ects on interconnect capacitance of di erent design rules (while satisfying the density requirement). Three canonical parameters are considered in design rules: bu er distance (buf), block width (w), the block space (s). E ects of ll on interconnect capacitance are calculated from the canonical parameters as well as line width and spacing. The calculation result is coupled with the minimum pattern density goals to obtain an optimized dummy ll design rule. It is important to note that the MIT methodology yields only a rule: the ll insertion is not driven by a n y c o n text (e.g., per-net or per-wire segment delay o r s l a c k considerations).
CAPACITANCE AND DELAY MODELS
We n o w review basic interconnect capacitance models and provide simpli ed expressions to model capacitance and delay impact of ll insertion. Works on multilayer interconnect capacitance extraction include 1-D, 2-D, 2.5-D and 3-D analytic models. 1, 2, 5, 6, 12 For example, the interconnect capacitance 1 at each circuit node is calculated via a model consisting of three conducting layers over the substrate treated as a (ground potential) reference plane. In general, the capacitance of interest at any node consists of three components, (i) overlap (area) capacitance, C a , formed by the surface overlap (in two dimensions) of two conductors (ii) lateral coupling capacitance, C lt , between two parallel conductors on the same plane and (iii) fringe capacitance, C f r , that represents coupling between two conductors on di erent planes. In other words, the interconnect capacitance at any node is given by C t = C a + C lt + C f r (1) Overlap and fringing capacitance of active (switching) lines are not signi cantly a ected by the insertion of small oating dummy features 1 we hence mainly consider the impact of area ll on the lateral coupling capacitance between active lines.
A t ypical ll insertion approach i s t o g r i d t h e l a yout into sites according to the ll feature size and design rules, then insert the ll features into the slack sites to satisfy the density requirements. To m a k e the following discussion clear, we group the ll features between two parallel active lines as: a r ow of dummy features is a line of dummy features which is parallel to the active lines and a c olumn of dummy features is a line of dummy features which is perpendicular to the active lines.
To estimate area ll impact on active line delay, we focus on the capacitance increment in the active line due to the ll. In Figure 2 (A), the total capacitance of an active line before area ll is inserted can be written as C orig = C B l (2) C B = 0 r a d (3) where C B is the per-unit length capacitance between the active line and its neighboring active line, l is the overlap length of the two active lines, 0 is permittivity of free space, r is the relative permittivity o f t h e material between the two conductors, and a is the overlapping area between them. where C A is the capacitance between the dummy feature and the active line, and C C is the capacitance between the dummy features. In this equation, w is the dummy feature width, s is the space between dummy features, and k is the number of dummy features between the two active lines. Note that we assume that the oating dummy features have no e ect on C B due to their small size.
To simplify the estimation, we use a simple parallel plate capacitance model. We can then approximate the impact of two r o ws of dummy features by making one combined row o f d u m m y features, as shown in Figure 2 is the incremental capacitance due to dummy feature insertion. So, the total capacitance between two active lines can be estimated as:
C fill = C A 0 w k + C B (l ; w k) ( 
7)
With respect to interconnect delay, our discussion below will use the Elmore delay model to estimate total delay increase due to area ll. Elmore delay From Equation (8), we know that Elmore delay e n j o ys an additivity property with respect to capacitance along any source-sink path. That is, if we add the coupling capacitance C x at position x, the delay of the nodes after the position x will increase by C x R x . Here, R x is constant, and equal to the total resistance between the source and the position x (below, we call this an entry resistance, i.e., an \upstream" resistance).
PROBLEM FORMULATIONS
Performance-impact limited area ll synthesis has two objectives: minimizing the layout density v ariation due to CMP planarization and minimizing the dummy features' impact on circuit performance (e.g., signal delay and timing slack).
It is di cult to satisfy the two objectives simultaneously. Practical approaches will tend to optimize one objective while transforming the other into constraints. In this section, we propose a performance-impact limited area ll problem formulation (PIL-Fill) in which the objective is to minimize delay impact, subject to a constraint of prescribed amounts of ll in every tile region of the layout. We c a l l t h i s a minimum delay with ll constraint, o r M D F C, formulation.
z The MDFC PIL-Fill problem can be stated as follows. Given a xed-dissection routed layout and the design rule for oating square ll features, insert a prescribed amount of ll in each tile such that the performance impact (i.e., the total increase in wire s e gment delay) is minimized. Since each tile can be considered independently in the xed-dissection layout, we m a y reformulate the MDFC PIL-Fill problem on a per-tile basis. In other words, for each tile the following optimization is separately performed.
Given tile T, a p r escribed total area o f l l f e atures to be a d d e d into T, a size for each ll feature, a set of slack sites (i.e., sites available for ll insertion) in T per the design rules for oating square ll, and the direction of current ow and the per-unit length resistance for each interconnect segment in T, insert ll features into in T such that total impact on delay is minimized.
A w eakness of this formulation is that we minimize the total delay impact independently in each tile. We are not able at this point to ensure, e.g., that total impact on every timing path is less than available positive timing slack. (Possible methodologies, including the use of budgeted capacitances that are typically available during timing-driven synthesis, place and route, are noted below.) For now, we will focus on the MDFC PIL-Fill formulation per tile, using the capacitance approximations given above x and the Elmore delay model. Under z We h a ve also studied a minimum variation with delay constraint formulation, but it is less tractable to optimization heuristics and we do not discuss it here.
x These are essentially the same as those used in the MIT approach. 
APPROACHES FOR MDFC PIL-FILL
We n o w p r e s e n t three methods for the MDFC PIL-Fill problem, after developing necessary preliminary material on slack site columns and the scan-line approach.
Slack Site Columns and Scan-Line Algorithm
The key computational geometry task in solving MDFC PIL-Fill is to nd all pairs of parallel active line segments, and the slack site columns (i.e., columns of available sites for ll features) between each pair of active lines. We de ne the size of a slack site column as the number of sites in the column. Without loss of generality, we assume that the routing direction is horizontal on the selected layer. The column index k of the sites in the slack site column can used as the index of the column between two active lines. We discuss three distinct de nitions of slack site columns within a tile these de nitions are ordered by increasing accuracy in how they capture the capacitance impact of ll on active lines.
SlackColumn-I: The simplest de nition captures only slack site columns between active lines within the tile. Figure 4 shows two s l a c k b l o c k s A a n d B b e t ween the active lines in the given tile. To nd such slack columns, only the active lines intersecting with the tile need to be scanned for each tile. However, with this de nition, the remaining slack space in the tile cannot be used during ll insertion, which causes problems when the total size of slack columns is less than the required number of ll features.
SlackColumn-II: A more accurate de nition captures slack site columns between the active lines, between the active line and tile boundary, and between tile boundaries. Figure 5 shows six such slack blocks in the tile. Among them, slack columns in block B have no associated active lines in the tile.
Obviously, the drawback of this slack site column de nition is that ll features will be inserted into the slack columns without consideration of associated active lines (e.g., outside the tile, with respect to block B), and this causes inaccuracy with respect to the minimum delay impact objective.
{ This objective, which is correlated with total impact on sink actual arrival times, moves us closer to the ideal of being timing-slack driven. and tile boundary, and tile boundaries within tile. 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 111111 111111  111111  111111 111111   111111  111111 111111  111111  111111  111111  111111 111111   111111 111111  111111  111111 111111  111111  111111 111111  111111   000000000  000000000 000000000  000000000  000000000  000000000  000000000  000000000  000000000 000000000   111111111  111111111  111111111  111111111  111111111  111111111  111111111  111111111  111111111 SlackColumn-III: Finally, w e can de ne slack site columns to capture the cases of slack sites between the active lines in adjacent tiles as well as between the active lines and layout boundary. Figure 6 shows seven such slack blocks in the tile the columns in each b l o c k will be accounted for in estimating capacitance increase in the tile. This is the most accurate de nition of slack slack column since the possible impact of ll features at any position on the layout will be considered. For example, the ll features located in the slack b l o c k C in Figure 6 will a ect the coupling capacitance on active lines 1 and 6, while those in the slack b l o c k B in Figure 5 do not have their impacts on any active lines captured.
To n d the SlackColumn-III columns in the layout, we rst obtain the position of each active line. After sorting according to y-coordinates (for horizontal routing direction) or x-coordinates (for vertical routing direction), we scan the whole layout from the bottom boundary (for horizontal routing direction) or from the left boundary (for vertical routing direction) to nd the slack columns between active lines or between boundary and active l i n e . Finally, w e calculate the overlapping area of each slack column in each tile intersecting with it, as well as the list of active l i n e s i n tersecting with each t i l e . The algorithm is described in detail in Figure 7 For each slack column intersecting with the active line Do 8 .
End the slack column at the active l i n e 9.
If the size of slack column is larger than 0 Then 10.
Add the slack column into C o l s 11.
Else Ignore the slack column
12.
Create a new one starting from the active line Calculate the number of sites in slack column within the tile Tij 21.
Add the slack column into C o l s ij 22.
Add the correlated active line(s) into ALij Figure 7 : Scan-line algorithm to nd ll slack columns on given layer (assuming horizontal routing direction).
Integer Linear Programming Approach I
In our ow, we calculate post-routing interconnect delay after obtaining routing information from a DEF le. From the analysis in Section 3, we k n o w that the columns of dummy features have the additivity property w i t h respect to coupling capacitance, and we can approximate the coupling capacitance of m dummy features in one column by a linear function (6) . Without loss of generality, w e assume the routing direction on the layer is horizontal, and we also ignore any wrong-direction routing. The PIL-Fill problem is then captured by an Integer Linear Programming formulation. We rst make the following de nitions.
W l weight of active l i n e l C k size (capacity) of feasible slack site column k for dummy features within the tile m k number of dummy features inserted in column C k C a p k incremental capacitance caused by t h e m k dummy features in column C k , calculated according to Equation 6 l total delay increment on active l i n e l due to the insertion of dummy features along it in the tile R l total (upstream) resistance of path from the source node to the entry point of active line l into the tile and r l per-unit resistance of active l i n e l. 
The objective function (10) implies that we minimize the weighted incremental Elmore delay caused by dummy feature insertions. L is the total number of active lines in the tile.
Constraints (11) imply that the total number of covered slack sites is equal to the number of dummy features.
Constraints (12) are used to capture the incremental capacitance caused by m k dummy features in column k between each pair of active lines. Here, a is the overlapping area between two active lines, d k is the distance between them, and w is the dummy feature width.
Constraints (13) are used to capture the total Elmore delay increment due to dummy feature insertions in all slack columns along the active line l in the tile. (R l + P k s=p r l ) is the total resistance between the source and the position k on the active l i n e l in the tile. p is the x-coordinate of the left-most point o f t h e active line in the tile, k is the x-coordinate of slack column k.
Constraints (14) imply that the number of covered (i.e., used) slack sites in any column should be less than the column size (capacity).
Integer Linear Programming Approach II
In the previous subsection, we used the linear approximation for coupling capacitance between two active lines after dummy ll insertion. This is not accurate when the dummy feature width is not substantially less than the distance between the two active lines. Since (i) all dummy features have the same shape, (ii) the potential number of dummy ll features (and their positions, given the xed-dissection layout) in each slack column is limited, (iii) the size of any slack column is also limited, and (iv) the other parameters ( o , r , a n d d) in Equation (5) P k s=p r l ) is the total resistance between the source and the position k on the active l i n e l in the tile. p is the x-coordinate of the left-most point o f t h e active line in the tile, k is the x-coordinate of slack column k.
Greedy Method
From Equation (13), the impact on delay due to the dummy features is dependent on the total resistance between the source and the current node. Our nal algorithmic approach is to greedily insert dummy features along active line segments where the incremental delay is minimum. This greedy approach is described in Figure  8 . 
COMPUTATIONAL EXPERIENCE
We h a ve tested our proposed algorithms using two l a yout test cases, denoted T1 and T2, obtained from industry sources. Each of the test cases was obtained in LEF/DEF format, and \Normal" ll was synthesized using the normal ll method 3 according to the parameters shown in the leftmost column of Table 1 . . In Table 1 , we measure the total delay increase on all wire segments due to the \normal" ll method, 3 and due to our three performance-impact limited ll methods. As shown in the table, all total delay increases from the PIL-Fill methods are better than the total delay increase resulting from the normal ll method. 3 Among the PIL-Fill methods, the ILP-II method has the smallest delay increase (e.g, up to 90% reduction in non-weighted total delay increase for case T1=32=2, compared to the normal ll result) and its run time is reasonable. The Greedy method is better than the ILP-I method, but not nearly as good as the ILP-II method. The linear approximation used in the ILP-I method is apparently unreasonable, i.e., its loss of accuracy is too great. For example, for cases T1=32=8, T1=20=2, and T1=20=4, the results from the ILP-I method are even worse than the normal ll results. Our experiments also show that the improvement in total delay impact depends on dissection size. As explained above, when the dissection becomes too ne-grain, it becomes harder to consider the total impact of a slack site column since we handle the overlapping tiles separately. Table 2 shows the results from the weighted performance-impact limited ll methods. Similar to the nonweighted PIL-Fill results, the ILP-II method gives the best solution quality (e.g., up to 93% reduction in weighted total delay increase for case T1=32=2, compared to the normal ll method) and retains its practicality.
k As presented, the Greedy algorithm will tend to insert ll close to the active line with minimum resistance. This may lead to worsening of critical path delay and hence cycle time in some pathological cases, compared to random ll insertion. This can be circumvented by placing an upper bound on the added net delay.
Our experimental testbed integrates GDSII Stream and internally-developed geometric processing engines, coded in C++ under Solaris 2.8. We use CPLEX version 7.0 as the integer linear programming solver. All runtimes are CPU seconds on a 300 MHz Sun Ultra-10 with 1 GB of RAM. Find its intersection with each tile Tij 5.
Calculate entry resistances R l (p q) o f Ni in its intersected tiles 6.
Find signal directions of Ni in its intersected tiles 7. Run scan-line algorithm (Fig. 7) to get slack site columns in layout 8. For each tile Tij Do 9. For each slack site column k Do 10 .
Find overlapping area of column k in tile Tij 11.
Calculate cumulative resistancer k at position k on two n e i g h boring active l i n e s l and l Calculate induced coupling capacitancesĈ a p k of column k as in Equation (5) 
CONCLUSIONS AND FUTURE RESEARCH
In this paper, we have developed approximations for capacitance impact of area ll insertion, and given the rst formulation for the Performance Impact Limited Fill (PIL-Fill) problem. We h a ve presented two I n teger Linear Programming based approaches and a Greedy method experiments on industry layouts indicate that our PIL-Fill methods can reduce the total delay impact of ll by v ery signi cant percentages. The lookup-table based ILP method (ILP-II) has the best performance with respect to the total delay increase -both weighted and non-weighted -and has practical runtimes. Our ongoing research is focused on performance-impact limited ll synthesis with given capacitance budgets for each net. This corresponds to the availability of budgeted slacks (translated to budgeted capacitances), which are typically available within synthesis, place and route tools driven by incremental static timing engine. Budgeted slacks (capacitances) eliminate the obstacle of handling full timing paths, and will allow u s t o e v aluate our methods within an integrated layout-manufacturing timing closure ow. Other research addresses alternative PIL-Fill formulations, e.g., wherein an upper bound on timing impact constrains the minimization of layout density v ariation.
