In very deep-submicron VLSI, manufacturing steps involving chemical-mechanical polishing (CMP) have varying e ects on device and interconnect features, depending on local characteristics of the layout. To reduce manufacturing variation due to CMP and to improve performance predictability and yield, layout must be made uniform with respect to certain density criteria, by inserting \ ll" geometries into the layout. To date, only foundries and special mask data processing tools perform layout post-processing for density control. In the future, better convergence of performance veri cation ows will depend on such layout manipulations being embedded within the layout synthesis (place-and-route) ow. In this paper, we give the rst realistic formulation of the lling problem that arises in layout optimization for manufacturability. Our formulation seeks to add features to a given process layer, such that (i) feature area densities satisfy prescribed upper and lower bounds in all windows of given size, and (ii) the maximum variation of such densities over all possible window positions in the layout is minimized. We present e cient algorithms for density analysis, notably a multilevel approach that a ords user-tunable accuracy. We also develop exact solutions to the problem of ll synthesis, based on a linear programming approach. These include an LP formulation for the xed-dissection regime (where density bounds are imposed on a predetermined set of in the layout) and an LP formulation that is automatically generated by our multilevel density analysis. We brie y review criteria for ll pattern synthesis, and the paper then concludes with computational results and directions for future research.
Introduction
As CMOS technology advances according to the Semiconductor Industry Association National Technology Roadmap for Semiconductors 24] and moves into the 180nm generation and beyond, foundry amortization becomes a dominant business concern, and manufacturing cost increasingly drives design 17]. To maximize yield, process engineers must achieve predictability and uniformity of manufactured device and interconnect attributes, e.g., dopant concentrations, channel lengths, interconnect dimensions, contact shapes and parasitics, and interlayer dielectric thicknesses. A total variability budget for the design is distributed among such attributes. In very deep submicron technologies, large process windows and uniform manufacturing is di cult 5] 10] 22] 15] 17] 7], and the manufacturing process has an increasingly constraining e ect on physical layout design and veri cation. Many physical design methods have been proposed to address various manufacturing issues such as registration errors, photolithographic random e ects, etc.; see such works as 16] 7] for reviews. In this paper, we address the problem of controlling the manufacturing variation that is due to chemical-mechanical polishing (CMP) 15] 19] 29]. CMP is the procedure by which wafers are polished using a rotating pad and slurry to achieve the planarized surfaces on which succeeding processing steps can build. The key observations are:
The polishing environment involves large pad downforce 1 and a signi cant variability due to pad wear. Hence, control of polish depth (i.e., nal thickness of the layer being polished) is extremely di cult. The elasticity of the polishing pad compounds the variability problem. Notably, in oxide polishing of interlayer dielectrics (oxide CMP), the pad conforms to local topography and overpolishes empty oxide areas that have no underlying metal features (a phenomenon called dishing); on the other hand, areas with dense underlying metal features are underpolished. A large fraction of the die's variability budget is used up by the oxide thickness variation 8] 26]. Interlayer dielectric thickness variation of 4000 angstroms is common, and this can severely a ect estimates of electrical performance 13] 26] 27]. The problem of CMP variation is rapidly worsening today, as industry moves to shallow-trench isolation (STI) sub-0:25 processes, where CMP is used to planarize glass 6] 18] 25]. For such processes, as well as for new inlaid-metal (e.g., damascene copper) processes 4], CMP variation must be even more tightly controlled.
Recent work in the eld of statistical metrology shows that fundamentally, CMP variation is controlled if the local feature density is controlled 20] 28]. Figure 1 illustrates the local dependency of oxide thickness on feature density, which is roughly monotone. By reducing the variation of local feature density over the die, the variation of oxide thickness can be reduced. The de nition of \local" is determined by the length scale at which feature density impacts oxide thickness, and corresponds to the \window size" within which feature density must be controlled. For oxide CMP, this length scale has been estimated to be on the order of 1-3mm, depending on CMP pad material, slurry composition, etc. 9] 20] 28].
To minimize the impact of CMP variation on device yield, foundries have imposed density rules for features on active and metal layers, typically starting with mature 0.35 m process generations.
The purpose of the density rules is, of course, to make the layout more uniform. Many process layers, including di usion and thin-ox, can have associated density rules. As examples, in 0.35 m and below, one major foundry requires overall feature area density on di usion layers to be between 0.25 and 0.40, and overall density on any metal layer to be between 0.40 and 0.70; another major foundry requires overall metal layer area density to be at least 0.35. Density rules and layout post-processing approaches may di er in various contexts (e.g., ASIC vs. high-end microprocessor design) due to tradeo s between device performance and predictability 1] 30].
To satisfy density rules, a post-processing step adds ll geometries into the original layout. Traditionally, only foundries or specialized mask data processing tools have performed the post-processing of layout needed to achieve this uniformity. Today, with more customer-owned tool ows, and with the need for early and accurate performance veri cation, physical veri cation tools at the back end of the IC design ow are becoming aware of density-driven layout rules. 2 We observe that the state of the art in density control for CMP leaves much to be desired.
Many foundry density rules still constrain only the average overall feature density on a given layer; the issue of local variation in feature density is ignored. Current approaches to analysis of layout density do not actually nd the true extremal window densities in the layout. Rather, they nd the extremal window densities over a xed set of window positions using the \ xed r-dissection" approach that we discuss below. This can result in substantial error. Current methods for inserting ll geometries into the layout do not actually minimize the maximum variation in layout density between windows of the layout. Rather, simple Boolean layer processing techniques are applied to insert ll patterns into any empty region that is su ciently large.
These weaknesses can perhaps be attributed to the genealogy of today's software tools for density control. Such tools are typically evolved from physical veri cation tools and mask processing tools, where the mindset is chie y concerned with veri cation rather than data modi cation, with local rules rather than global rules, and with Boolean rules rather than context-dependent rules. By contrast, our work addresses all of the above weaknesses: (i) our formulation of the lling problem seeks to minimize density variation over all possible windows, (ii) we develop a multilevel density analysis approach that is more accurate and faster than the \ xed-dissection" approach, and (iii) we develop a linear programming approach that considers and optimizes globally the amounts of ll to be added into each region of the layout.
Notation and Problem Formulation
The following notation and de nitions are used.
The input is a layout consisting of rectangular geometries, with all sides having length a multiple of c (the minimum feature width or spacing). 3 The value of c is typically 25 to 50 times the manufacturing unit. n side of the layout region. If the layout region is the entire die, n might typically be about 50; 000 c. Note that c does not imply that n c is \the size of the grid": the only grid that is guaranteed is the manufacturing grid, which is typically 25 to 50 times smaller than c. w xed window size. The window is the moving square area over which the layout density rules apply. A typical window size would be w = 10; 000 c. k the complexity of the original layout, i.e., the total number of rectangles in the input. U area (or perimeter) density upper bound 4 , expressed as a real number 0 < U < 1. Each w w region of the layout must contain total area of features U w 2 . L area (or perimeter) density lower bound, expressed as a real number 0 < L < U < 1. Each w w region of the layout must contain total area of features L w 2 . B bu er distance. Fill geometries cannot be introduced within distance B of any layout feature. slack(W) slack of a given w w window W, i.e. the maximum amount of ll area that can be introduced into W. 5 An extremal-density window is a window with either maximum density or minimum density over all windows in the layout. If an algorithm applies to either maximum-density or minimum-density analysis, we generically refer to extremal-density analysis.
Given the parameters above, we de ne the Filling Problem as follows: 6 The Filling Problem. Given a design rule-correct layout geometry of k disjoint rectilinear rectangles in an n n layout region, minimum feature size c, window size w < n, bu er distance B, and area (or perimeter) density lower bound L and upper bound U, add ll geometries to create a lled layout that satis es the following conditions:
(1) circuit function and design rule-correctness are preserved; (2) no ll geometry is within distance B of any layout feature; (3) no ll is added into any window that has density U in the original layout; (4) for any window that has density < U in the original layout, the lled layout density is L and U; and (5) the minimum window density in the lled layout is maximized.
Condition (5) corresponds to what we call the Min-Variation Formulation, since it minimizes the di erence between minimum and maximum window density in the lled layout. Condition (3) implies that, without loss of generality, no window in the original layout has density > U (otherwise, such a window would have its contents xed, so that it could not be changed by the lling process). 4 It turns out that most of the results and algorithms of this paper easily apply to either the area density or perimeter density regimes. Thus, we will generically indicate both the area and perimeter density upper bounds with U, and lower bounds with L. In practice, the maximum density is attained in memory cores, and the bound U is set with respect to this maximum density. To our understanding, no foundry yet imposes both area density and perimeter density bounds simultaneously on a given layer 30]. However, such simultaneous constraints may be required in the future (e.g., for reverse-active area masks), and we analyze ll pattern synthesis for such a situation in Section 4. 5 The value of slack(W) will depend on the maximum possible ll pattern density. That is, total empty area outside the bu er distance B from any feature should be scaled by the maximum possible ll density to yield the slack of the window. 6 Note that this is not merely a satis cing formulation where we seek only a feasible solution, but rather an optimization formulation where we seek a best solution, as dictated by the particular underlying VLSI technology.
Organization
Our paper is organized to re ect three major functions in density control for CMP: (1) window density analysis, (2) determining the optimal ll amount to be inserted in various regions of the layout, and (3) actual insertion of the appropriate ll pattern.
Section 2 describes several ways of nding extremal window density within the layout. We rst describe the xed-dissection approach, where a dissection partitions the layout into disjoint windows, and windows of only a nite number of (overlapping) xed dissections are taken into account. To our understanding, this is the type of analysis a orded by today's physical veri cation tools. We give a tight analysis of the error inherent in xed-dissection extremal-density analysis, i.e., it yields only an approximation of the global extremal window density. We then describe and analyze an exact method for nding the extremal global window density. Finally, we develop a multilevel density analysis approach with user-tunable accuracy; this method is more accurate, and faster, than the xed-dissection approach.
Section 3 gives new linear programming based methods for determining the optimal ll area to be inserted in the layout. We rst address the xed-dissection regime. We then incorporate results of the multilevel density analysis into a variant LP formulation. Finally, an LP formulation is proposed which minimizes an estimate of global window density variation.
Section 4 describes several approaches to lling pattern synthesis. For example, we note the possibilities of rectangle-based and basket-weave patterns on given pitches, explore the regime where both area and perimeter density bounds must be satis ed, and suggest appropriate lling patterns for such situations. Section 5 describes implementation and computational experience, and Section 6 concludes with directions for future research.
Extremal Density Analysis
We rst develop algorithms for density analysis (with respect to either area or perimeter). Given a xed layout and window size, we seek to determine a maximum-density and a minimum-density window (i.e., the algorithms will return extremal-density window(s)). 7 The density analysis problem is stated as follows:
Extremal-Density Window Analysis: Given a xed window size w and a set of k disjoint rectangles in an n n layout region, nd an extremal-density w w window in the layout.
This section presents a series of algorithms for the Extremal-Density Window Analysis problem. We rst consider a xed-dissection approach when windows from several xed dissections of the layout are taken into account. To our understanding, this approximate method is the type of analysis provided by commercial veri cation tools. Several algorithms are then proposed for optimal solution (i.e., over all possible windows) of the Extremal-Density Window Analysis problem.
Fixed-Dissection Density Analyses
In practice, feature density bounds are enforced only within a xed set of w w windows corresponding to a dissection of the layout region into ( n w )
determines the \phase shift" w=r by which the dissections are o set from each other. In other words, density bounds are enforced only for windows from the xed r-dissection de ned below.
De nition: A xed r-dissection of the layout is the set of w w windows having bottom-left corners at points (i w r ; j w r ), for i; j = 0; 1; : : :; r( n w ? 1), where r is an integer divisor of w. Figure 2) . In other words, each w w window in a xed r-dissection consists of r 2 non-overlapping tiles. For instance, the bottom-left w w-window corresponds to an r r grid of tiles whose origins are at grid node coordinates (i w r ; j w r ), i; j = 0; : : : ; r. Only, nodes (i w r ; j w r ), i; j = 0; : : : ; r ? 1 determine di erent dissections e.g., the node ( w r ; r w r ) determines the same dissection as the node (0; r w r ) (see Figure 2) . In practice, a density upper bound for arbitrarily located windows is sought by enforcing density upper bounds on all windows in a xed r-dissection. 8 Unfortunately, it turns out that a xed-dissection scheme for small r cannot guarantee any nontrivial density bounds over all w w windows (as opposed to only the xed tiles in the dissection). For r = 1, even if the area density of each tile in the xed r-dissection is guaranteed to be at least 75%, a 8 To the best of our knowledge, commercial tools (Avant! Hercules, Mentor Calibre, Cadence Dracula/Vampire) provide only layout density checking with respect to xed r-dissections. E.g., the Cadence Dracula COVERAGE command 3] allows checking of feature area density upper and lower bounds in w w windows that occur at a xed o set from each other (e.g., an o set of 100 m with w = 500 m corresponds to r = 5).
completely empty w w tile can exist. Conversely, if the area density of each window in the xed r-dissection is guaranteed to be at most 25%, a completely full w w window can exist.
On the other hand, we believe that the analysis of xed r-dissections can be done much faster than the analysis of all eligible w w windows. First we initialize an array of n w n w counters associated with all of the xed r-dissection windows, and then for each rectangle R, we increment the counters of the windows intersecting R by the area of the intersection. In case of r > 1, the above procedure is repeated r 2 times in order to check all (r n w ) 2 windows. The rest of this subsection seeks ways in which density bounds for arbitrarily located windows can be enforced by density bounds on xed r-dissection windows. We compare two ways of applying simple local rules to windows having bottom-left corners at points (i w r ; j w r ), i; j = 0; 1; : : :; n w for some r > 1 such that w r is an integer. First, we consider what happens when the upper and lower density bounds are enforced in each individual w r w r tile of the xed r-dissection (Theorem 1), and then we derive upper/lower bounds in the case when we enforce density bounds for standard w w windows (Theorem 2). For example, if the area density is enforced to be at least 25% (i.e. L = 0:25), then (for r = 5) the rst rule guarantees 16% area density while the standard method can guarantee only 6%. The bounds from Theorems 1 and 2 can help to choose appropriate combinations of xed r-dissections and design rules corresponding to speci ed area density lower/upper bounds. We now compute the upper bound on lled area in the window W. Without loss of generality, we assume that each xed r-dissection tile is U-lled. Clearly, the lled area in W cannot be more than the total lled area in all (r + 1) 2 tiles; this is at most To prove that the upper and lower bounds are tight we need to present an instance for which these bounds hold. It is easy to check that this happens when the bottom-left corner of W is at the center of a xed-dissection tile. 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 
Optimal Extremal-Density Window Analysis
This subsection is devoted to optimal extremal density analysis, i.e., covering all possible w w windows in the layout region. We rst present a density analysis algorithm with time complexity O(n 2 ) that is strictly a function of the layout size. We then develop a di erent algorithm with time complexity O(k 2 ) that is strictly a function of the number of rectangles. Finally, we propose an algorithm with even faster expected runtime. Note that the O(n This formula uses the principle of inclusion-exclusion: the fourth term is added in the formula above since it is implicitly subtracted twice by the middle two terms. The technique is analogous to e cient range tally queries in computational geometry 21].
The density of all O(n 2 ) windows of xed size w w can be determined in O(1) time per window, i.e., a total of O(n 2 ) time.
Properties of Extremal-Density Windows
To obtain an algorithm with time complexity that is strictly a function of k (as opposed to a function of n), we rst prove a result that is analogous to Hanan's Theorem for the rectilinear Steiner minimal tree problem 12]. The Hanan grid over a given layout is formed by creating vertical and horizontal lines that pass through all the sides of all the rectangles (Figure 4 ). Proof: If neither the left nor right edge of a maximum-density window W touches the boundary of any of the k rectangles, then we can continuously slide W horizontally either to the left or to the right without decreasing its density, until it touches one of the rectangles on either its left or right side (see Figure 5 ). Similarly, if neither the top nor the bottom edge of W touches the boundary of any of the rectangles, then W can be slid vertically either up or down until it touches one of the rectangles with either its top or bottom edge, without decreasing W's density.
Note that the same arguments hold even if some of the rectangles intersect W's boundary before the sliding operations commence. Since we assumed that W was a maximum-density window, W must remain a maximum-density window after these two sliding operations have been performed. It follows that there exists a maximum-density window (i.e., W in its new position after the two sliding operations) that abuts one or more rectangles of the layout on two of its adjacent sides. Thus, there exists a maximum-density window with one of its corners coinciding with a Hanan grid point.
(a) (b) (c) Figure 5 : A maximum-density window may be slid horizontally (a) until it touches one of the rectangles; the window may then be slid vertically (b) until it touches one of the rectangles. After the sliding operations have been performed, the window will abut one or more rectangles of the layout on two adjacent sides (c).
Theorem 3 actually establishes a stronger result than coinciding a vertex of the maximum window with a Hanan grid point: it shows that there always exists a maximum-density window that touches rectangles of the layout with at least two of its sides (these sides might touch the same layout rectangle).
This observation helps us to design an e cient algorithm for density analysis, since it limits the feasible locations of a maximum-density window (i.e., as abutting either one or two of the layout rectangles). The argument used to prove Theorem 3 can also be used to establish an analogous result for minimumdensity windows.
Corollary 4 Given a layout of k rectilinearly-oriented rectangles in the n n grid and a xed window size w, there exists a w w window with extremal area density that abuts layout rectangles with at least two of its sides.
Notice that a type of geometric symmetry/duality exists here, in that layout rectangles abut the interior of maximum-density windows, and abut the exterior of minimum-density windows. Finally, a similar argument establishes analogous results for windows having maximum or minimum perimeter density.
Corollary 5 Given a layout of k rectilinearly-oriented rectangles in the n n grid and a xed window size w, there exists a w w window with extremal perimeter density that abuts layout rectangles with at least two of its sides.
ALG2: O(k 2 ) Density Analysis
Recall that Theorem 3 establishes that an extremal-density window must touch rectangles of the layout with at least two of its sides. Since there are only O(k) sides of rectangles, the extremal density analysis can be achieved by (i) de ning a window for each of these O(k) rectangle sides, and (ii) computing in O(k) time the window's intersections with all rectangles as it slides along the rectangle side. A careful implementation of this scheme yields an algorithm with overall worst-case time complexity of O(k 2 ) as follows (see Figure 7) .
We preprocess the rectangles by sorting all left and right edges of the k rectangles by their x coordinates into a single sorted list L (having up to 2k elements), within O(k log k) time. In the main loop (line (2) in Figure 7 ), for each \pivot" rectangle R, we create a w w window W that abuts R on the top and right (i.e., so that their top-right corners coincide -see Figure 6 (a)). We then compute the density of W in O(k) time by intersecting W with all k rectangles of the layout (line (3) in Figure  7 ).
In the inner loop (4), we slide the window W horizontally to the right ( Figure 6 (a-f)) until it leaves R, updating the density of W each time its left or right edge intersects an edge in the list L. Note that the perimeter and area density of the window W increase or decrease monotonically between such intersection events. 10 We update the value of area density, or the two values of perimeter density, for W in constant time per intersection event by keeping track of the total \cross section" length of the current intersections between the rectangles and the left and right edges of W. We add new intersections that enter the window W as it advances horizontally, and we subtract from the total the areas of rectangles that exit the window W on the left during the sliding process. Finally, we repeat lines 3 through 5 of algorithm ALG3 (Figure 7 ) for all other O(1) starting orientations of W with respect to the pivot rectangle R (Figure 6(g-i) ). The overall time complexity of this algorithm is dominated by the O(k) scans which require O(k) time each, to a total of O(k 2 ) time.
ALG3: Fast Expected Time Density Analysis
Charging O(k) time for each scan in the ALG2 analysis is pessimistic, since each sliding window is expected to intersect only a small fraction of the total number of rectangles (the window size is typically Figure 6 : ALG2 starts a window abutting a pivot rectangle (a) and slides the window to the right, stopping at each edge that intersects its perimeter (b), until the pivot abuts the opposite side of the window, on the outside (f). Other combinations of the pivot-window orientations are then explored (g-i). This process is repeated for every rectangle, using each as a pivot in turn.
small compared with the overall layout area). For each pivot rectangle, it would be advantageous to scan through only the few rectangles that actually intersect its associated sliding window (as opposed to scanning all k rectangles).
We implement this speedup via a new xed-dissection preprocessing step, modifying the algorithm from given w w query window W by examining four lists of rectangles (corresponding to the four squares that together cover W).
Theorem 6 Given k non-overlapping rectangles with positions uniformly distributed in the n n grid, the algorithm in Figure 7 nds the maximum-density w w window in time O(k E), after applying a xed-dissection preprocessing phase with runtime O(k E + ( n w )
, where E is the expected number of rectangles that intersect an arbitrary w w window.
Proof: Let E be the expected number of rectangles that intersect an arbitrary w w window, under a uniform random distribution model. Although we will use E as an indeterminate variable here, the actual value of E (as a function of k, w, and n) will be determined later in Theorem 7 below.
To prove the present theorem, we follow the same overall strategy as in the O(k 2 ) algorithm described in Section 2.2.3: for each of the k rectangles, we slide a w w window W over the pivot rectangle and compute the intersections of the various rectangles with that sliding window. These sliding phases can be performed in time linear in the number of intersecting rectangles, assuming that we can compute this set e ciently. The time for each one of these O(k) scanning phases is therefore dominated by the time to obtain and scan a sorted list of the left and right coordinates of the E rectangles that are expected to intersect each sliding window; as we will see below, this can be accomplished within time O(E) per window, given appropriate preprocessing.
The remaining issue here is how to e ciently nd all rectangles that intersect a given xed-size window as it slides over a pivot rectangle. This is accomplished as follows.
1. Partition the layout area into n w n w squares of size w w each, and create and initialize an n w n w array corresponding to this tiling. 2. Iterate over all rectangles and mark all the tiles that intersect with each rectangle, thus creating for each tile a list of rectangles that intersect it. Then, sort these lists and put into each array position a pointer to the sorted list containing all rectangles that intersect with the corresponding tile. The sum S of the lengths of all these lists is equal to the number of tiles ( n w ) 2 times the expected number E of rectangles that intersect each tile, so S = ( n w ) 2 E. The time to create the preprocessed data structure is therefore the sum of the array creation time plus the total time to sort all the lists, which brings the total to O(( n w ) 2 + S log S). The total space required by this data structure is O(( n w ) 2 + S).
3. Given the preprocessing above, we can nd all rectangles that intersect a given w (2w) query window as follows. First, nd all the tiles that intersect the query window: there can be at most 6 of these. Then, merge the corresponding 6 (presorted) rectangle lists into a single sorted list of rectangles that intersects the query window. The size of each of the sublists is O(E), and there are O(1) of them , so the overall work involved in this step is O(E).
The overall time complexity of the algorithm is therefore the preprocessing time of O(( n w ) 2 + ( n w ) 2 E log(( n w ) 2 E)) plus the time to process each of the O(k) pivots and its associated list of intersected rectangles, i.e., O(k E), where E is the expected number of rectangles that intersect an arbitrary w w window. We call this improved-preprocessing algorithm ALG3, and now show that the expected number of rectangles that intersect a given xed-size window is indeed quite small. We de ne a \random rectangle" as a rectangle uniquly determined by a pair of opposite corners chosen independently at random from a uniform distribution.
Theorem 7 Given k random pairwise-disjoint rectangles distributed uniformly in the n n layout region, the expected number E of rectangles that intersect a given w w window is bounded by E 3k w 2 n 2 + 3.
Proof: Consider the following two types of rectangles that can intersect a given window:
1. Rectangles having at least one of their corners contained inside the window; and 2. Rectangles having none of their corners contained inside the window (yet who still intersect it).
In order to simplify the probabilistic analysis which follows, we allow overlaps to occur among the rectangles. Note that this can only increase the expected number of rectangles that intersect a window, because if the non-overlap constraint is enforced, rectangles which intersect a window preclude some other rectangles from intersecting it due to the non-overlap requirement, thereby reducing the expected number of rectangles that may intersect a given window. We analyze separately the expected number of rectangles of each type.
Type 1: rectangles having at least one of their corners contained inside the window. The probability that a type-1 random rectangle 11 will have at least one of its corners inside a xed w w window is equal to 1 minus the probablity that neither of the rectangle's two opposite corners are inside the window, i.e., 1 ? ( n 2 ?w 2 n 2 ) 2 = 2 w 2 n 2 ? w 4 n 4 2 w 2 n 2 . Thus the expected number of type-1 rectangles that intersect a window is E 1 2k w 2 n 2 . Next, we account for type-2 rectangles, i.e., those having none of their corners contained inside the window, yet whose area still intersects the window's area. There are three subcases here:
Type 2a: rectangles of type 2 where one of the rectangle's edges intersect the perimeter of the window (with the other edge being entirely outside the window's area). Rectangles of this can occur at most twice per window, on opposite edges (by applying the non-overlapping constraint to such rectangles). Thus the expected number of type-2a rectangles that intersect a window is E 2a 2.
Type 2b: rectangles of type 2 where two of the rectangle's edges intersect the perimeter of the window. Rectangles of this type have an occurrence probability less than ( w n ) 2 , since both opposite corners of the rectangle in question must independently fall inside the strip of size w n containing the window. Note that this is actually an over-estimate, since this probability includes rectangles inside the w n strip but strictly outside the window itself; however, this is not a problem since this over-estimate still upper-bounds the actual expectation for case 2b. Thus the expected number of type-2b rectangles that intersect a window is E 2b k ( w n ) 2 .
Type 2c: rectangles of type 2 that completely contain the window. Rectangles of this type can occur at most once per window (since by applying the non-overlapping constraint, such a rectangle will preclude any other rectangles from intersecting the window). Thus the expected number of type-2c rectangles that intersect a window is E 2c 1. Thus, the expected number E of rectangles intersecting a window of size w w is upper-bounded by the sum of the expectations for case 1, 2a, 2b, and 2c:
E E 1 + E 2a + E 2b + E 2c 2k w ). The same algorithm and expected time bounds will hold for nding minimum-density windows, as well as for the extremal-perimeter density criteria.
Multilevel Density Analysis
The algorithms described in the previous two subsections have two drawbacks: (i) the fast analysis in the xed-dissection regime may signi cantly underestimate the maximum density among all w wwindows in the worst case (Theorem 1), while (ii) the optimal density analysis is too slow when the number of rectangles is large (Corollary 8). We now develop a new multilevel approach that attempts to overcome both drawbacks simultaneously. It is based on the following simple fact (see Fig. 8 ).
Lemma 9 Given a xed r-dissection, any arbitrary w w window will contain some shrunk w(1?1=r) w(1?1=r) window of the xed r-dissection, and will be contained in some bloated w(1+1=r) w(1+1=r) window of the xed r-dissection.
We suggest the following ideas. Lemma 9 suggests that the possible error of the xed-dissection approximation can be estimated more accurately than in Theorem 1. Our rst idea is that if we nd the area of not only standard windows (i.e., xed r-dissection windows consisting of r r tiles) but also bloated windows (i.e., xed r-dissection windows consisting of (r+1) (r+1) tiles), then the maximum area of a oating window (i.e., arbitrary w w-window) can be bounded by the maximum area of a bloated window (see Figure 8 ). Our second idea is to use zooming to make xed-dissection density analysis for any given r = r 0 even faster. The main points of this approach are: (i) starting with one xed r-dissection (r = 1), omit all tiles which do not belong to any bloated window that can possibly contain high-density oating windows, and (ii) recursively subdivide the remaining tiles into 4 subtiles (i.e., multiply r by 2) until the necessary r = r 0 is reached. In the gure, a standard r r xed-dissection window is shown with thick border. A oating window is shown in light gray. The white window is the bloated xed-dissection window, and the dark gray window is the shrunk xed-dissection window.
Our third idea is that the recursive subdivision may be continued until the number of rectangles left in tiles is su ciently small to run the optimal density analysis algorithm ALG3. Alternatively, the subdivision can be terminated at the moment when some user-de ned accuracy, say 2%, is reached.
The following algorithm is a formal implementation of the above ideas. We use > 0 to denote the user-de ned accuracy that is required in nding the maximum window density. The lists TILES and WINDOWS are byproducts of the analysis, which will be used in Section 3.2 below to nd the optimal amounts of ll geometries to add into the corresponding tiles.
Since any oating w w-window W is contained in some bloated window, the lled area in W ranges between Max (maximum w w-window lled area found so far) and BloatMax (maximum bloated window lled area found so far). The algorithm terminates when the relative gap between Max and BloatMax is at most 2 , and then outputs the middle of the range (Max; BloatMax).
The runtime of multilevel density analysis depends on . At each iteration of the main loop (3) the di erence in area between the bloated and standard window is reduced by half. The loop (3) terminates when the original area di erence 3w performance of multilevel density analysis for actual VLSI layouts (see Subsection 5.2). 12 
Computing the Optimal Fill Amount
To solve the Filling Problem, it is necessary to compute the proper ll amount that should be added in each particular tile. In the next subsection, we develop an optimal linear program solution for the xed-dissection regime. Then, two modi cations of the LP formulation are described in the following subsections. The rst modi cation is applied to the output of multilevel density analysis; the second modi cation uses window area bounds from Lemma 9 to minimize an estimate of the maximum deviation among arbitrary ( oating) windows.
Minimizing Density Variation in the Fixed-Dissection Regime
This subsection develops exact solutions to the Filling Problem in the xed-dissection regime. Recall that Theorem 2 indicates that if r = 10 and all windows of a xed r-dissection have feature area density at most 75% (i.e., U = 0:75), then the density of any w w window in the layout is at most 85%. Theorem 2 thus allows us to consider the Filling Problem for only a xed r-dissection of the layout, i.e., we will analyze density with respect to each w w-window W that covers exactly r 2 tiles. Desired accuracy of the result is achieved by increasing r.
For any given tile T = T ij ; i; j = 1; : : : ; nr w , denote the total feature area inside T as area(T). We de ne the slack of T, slack(T), as the maximum ll amount that can be introduced using a given ll pattern into T without violating the density upper bound U in any window containing T. In other words, the total layout feature area inside T can be increased up to any value between area(T) and area(T) + slack(T), using ll geometries. The slack of T is determined by the total area of metal features inside T and its neighbor tiles. The slack of a window W is the sum of the slacks of the tiles that form W (e cient algorithms for slack computation are discussed in Section 3.1.2 below). Using the concept of slack, the Filling Problem for the xed-dissection regime can be formulated as follows.
The Filling Problem for a xed r-dissection. Suppose we are given a xed r-dissection of the layout into tiles of size w r w r , as well as an area(T) and slack(T) for each tile in the dissection. Then, for each tile T ij , the total ll pattern area p ij = p(T ij ) to be added to T ij must satisfy is the area of the (i; j)-th window, and ij = 0 if area ij > U w 2 and 1 otherwise. Also, the patterndependent coe cient pattern denotes the maximum pattern area which can embedded in an empty unit square.
The constraints (2) imply that features can only be added, and cannot be deleted from any tile. The slack constraints (3) are computed for each tile. The pattern-dependent coe cient pattern denotes the maximum pattern area which can embedded in an empty unit square. If a tile T ij is originally over lled, then we set slack(T ij ) = 0. From the linear programming solution, the values of p ij indicate the ll amount to be inserted in each tile T ij . The constraint (4) says that no window can have density more than U after lling unless it was over lled initially, i.e., such a window cannot increase its density. The number of variables and the number of constraints in the linear program are both O (( nr  w ) practice, even for a large die and a user requirement of high accuracy, we might have n = 15000, w = 3000, r = 10, which yields a linear program of tractable size. Equation (5) implies that the auxiliary variable M is the lower bound on all window densities. The linear programming seeks to maximize M, thus achieving the min-variation objective. Solving the above LP formulation will give the optimal ll amounts to be added to each tile in the xed r-dissection, as dictated by the Min Variation objective. However, as shown in Figure 3 , the LP solution may distribute the ll unevenly among the tiles of a given window. If this is unsatisfactory, various simple xes can be applied (e.g., partial pre-lling of all tiles, binary search on an upper bound of ll added into each individual tile, etc.) so that the result is more balanced while still being optimal. (Our current implementation sets an upper bound U t on the tile density in order to achieve a balanced ll pattern.)
Slack Computation
This subsection discusses how to e ciently compute slack values for the linear programming formulation described in the previous subsection. To compute slack, i.e. to determine the total area of k possibly overlapping rectangles, we adopt the \measure of union of rectangles" sweep-line -based technique described in 21]. We begin by sorting all the left and right edges of the k rectilinear rectangles according to their x coordinates. Next, we sweep horizontally across these 2k edges from left to right, while using a segment tree 2] to keep track of the total length of the sweep line intersected by any of the k rectangles (see Figure 10 ). The time complexity of the sorting step is O(k log k). Insertions and deletions from the segment tree require O(log k) time each, and the total time to process all 2k segments is therefore O(k log k). The total time complexity to determine the area of the union of k possibly overlapping rectangles is therefore O(k log k).
A simple implementation which avoids the usage of segment trees altogether can still have reasonably fast expected time as follows. We still use the sweep line technique as before, but rather than using a segment tree to store the intersected rectangle, we instead use a simple linked list to store those segments, and then apply the one-dimensional \measure of union of intervals" technique of 14]. The time complexity of this practical implementation is O(k 2 ) in the worst-case, and the expected time is O(k l) where l is the average length of this list (i.e., the expected number of rectangles intersected by the sweep line). For random uniform distributions, we would expect l = O( p k), thus on average this method will run in time O(k p k) in practice.
Multi-Level Computation of the Fill Amount
If the multilevel density analysis approach has been used, we can use data obtained during that computation to compute ll amounts. Recall that during the multilevel density analysis, we keep track of active tiles (i.e. tiles which can possibly belong to a maximum density window) and check the area of some windows in order to update the maximum window density if necessary. The multilevel computation of ll amounts attempts to decrease the number of tiles and windows, i.e., variables and constraints participating in the LP formulation. Let r max = 2 lmax be the highest r reached in the multilevel density analysis algorithm; this corresponds to the user-de ned accuracy parameter . Instead of considering all w rmax w rmax -tiles and all w w-windows consisting of such tiles, we propose to consider only tiles w 2 l w 2 l -tiles, l l max , and windows consisting of such tiles which were tried during the multilevel density analysis.
The multilevel ll amount computation is implemented as follows. During multilevel density analysis, we save a tile in TILES at the moment when the tile is deactivated (cannot belong to a window of maximum density) or the size of the tile becomes w rmax w rmax . We also record the area and slack of each such tile. On the other hand, each time when we nd the area of a w w-window W, we put W in the list of windows WINDOWS. In the LP formulation for multilevel ll amount computation, for each window W from WINDOWS there are two constraints: (i) the rst constraint upper-bounds the lled area (i.e., the area after ll geometries are added to the original layout) of W, and (ii) the second constraint forces an auxiliary variable M to be less than or equal to the lled area in W. Each lled window area is expressed as a sum of lled tile areas. In addition, tile ll amount constraints ensure that each tile ll amount is nonnegative, and at most the corresponding tile slack pattern.
Minimizing Density Variation of Arbitrary (Floating) w w-Windows
Finally, we suggest a third LP formulation that may better re ect the quality of the ll amount computation. Again, this is because the linear program for the xed-dissection regime will be susceptible to density deviations in oating windows. Consider two di erent LP solutions in xed-dissection regime with di erent number of xed dissections: the rst has r 2 dissections and the second has (2r) 2 dissections. It is obvious that the more dissections we take in account the better result we should have. On the other hand, more dissections imply more constraints in the LP and, therefore, worse (bigger) deviation achieved (i.e., smaller value of target variable M). A fair comparison of results with di erent number of xed dissections entails nding the oating deviation, i.e., the di erence between the minimum and maximum oating window density. However, since the number of oating windows is too large, we suggest comparing worst-case estimates of the oating deviation, which can be derived from Lemma 9.
Moreover, instead of comparing LP solutions according to the above estimate of oating deviation, we suggest using such an estimate as an objective in a new LP formulation. Speci cally, we constrain the area of each bloated w(1 + 1=r) w(1 + 1=r)-window by the user-de ned density upper bound U, and we maximize the auxiliary variable M which is the lower bound for the area of any w(1 ? 1=r) w(1 ? 1=r)-window. We refer to this LP formulation as the oating deviation LP. The oating deviation LP formulation optimally decreases the estimate of the density range between the maximumand minimum-density oating windows.
Synthesis of Filling Patterns
Given the layout geometry along with the parameters of the Filling Problem, we apply the methods of previous sections to analyze density violations, and determine the necessary amounts of ll to be added in each region of the layout. We now discuss criteria for, and actual synthesis of, the ll geometries added into the layout.
Uniform Coupling to Long Conductors
Fill patterns should be devised such that all long conductors on adjacent layers have identical coupling capacitance to the inserted ll. 13 There are several practical ways of achieving this, of which one is to \basket-weave" the ll 30]. In other words, the ll pattern should not consist of a regular grid geometries, but instead have some internal o sets that \skew" the pattern. Figure 11 illustrates this concept. 
Grounded vs. Floating Fill
Grounded ll can be required for predictable extracted parasitic values. Structured-custom (microprocessor) designs have strong requirements for predictability, due to aggressive timing tolerances. For such designs, it is better to have larger, but exactly known, coupling capacitances to grounded ll geometries, rather than indeterminate capacitances to oating ll. On the other hand, for ASIC designs where timing is not being pushed too hard, designers seek the simplest ll construction that meets feature density requirements. A secondary reason for studying grounded-ll constructions is that modern parasitic extraction tools do not handle oating capacitors well. If ll synthesis should be performed earlier so as to achieve an accurate performance veri cation ow during the layout phase, it may be necessary to use grounded ll.
We seek a grounded ll pattern that requires relatively few edges to specify. For example, a metal ll pattern consisting mostly of long parallel stripes is preferable to a checker-board pattern, since the number of lines required to fully specify the latter is considerably smaller than the former. Thus, we propose a grounded metal ll pattern that spans the area to be lled as follows. We start by striping the empty areas in the layout using horizontal lines (see Figure 12b) . Then, we span the horizontal stripes using vertical lines (see Figure 12c) . The width and pitch of the horizontal stripes, and the number of vertical segments, can be easily determined in terms of the required pattern density. Connections to an existing ground distribution network can be made using standard special-net routers. Figure 12 : Given a layout (a), we create a grounded ll pattern by rst (b) creating horizontal stripes, and then (c) spanning these stripes using a small number of vertical lines.
Simultaneous Area and Perimeter Constraints
In this subsection we characterize combinations of area and perimeter densities (D a ; D p ) that can be simultaneously satis ed by the same lling pattern. As discussed in Section 1, all geometries must satisfy minimum length and minimum separation rules. In particular, no ll feature dimensions, nor any distance between features, can be less than c. In practice, the distance between lling geometries and nearest layout feature is constrained to be greater than c 0 > c. However we can still view regions eligible for lling as c-polyominoes, i.e., polyominoes 11] with sides a multiple of c that are in distance c 0 from the layout features. The ll pattern should also consist of polyominoes in the c-grid, i.e., the minimum separation rule implies that a pair of lled cells which share exactly one corner should have one common lled neighboring cell.
First, we will describe lling patterns for a rectangular region R which have maximum perimeter, and either the minimum or maximum allowable area density. The pattern P min with the minimum area density lls all cells which have top-left corner coordinates (a + 2ci; b + 2cj), where (a; b) is one of the corners of R (see Figure 13(a) ). This pattern has area slightly more than 1 4 area(R), because it lls approximately every fourth cell of R. The pattern P max with maximum area density lls R completely, leaving empty only cells with coordinates (a + c + 2ci; b + c + 2cj) (see Figure 13(b) ). The area of this pattern is slightly larger than 3 4 area(R) because it leaves empty approximately every fourth cell of R.
Two more patterns are necessary for completing the description of all possible patterns. These are simply the empty pattern P 0 with zero perimeter and area, and the completely-lled pattern P 1 having both perimeter and area equal to those of R. In Figure 14 , the x-axis represents area and the y-axis represents perimeter. The highlighted region with vertices P 0 , P min , P max , and P 1 represents the combinations of area and perimeter densities for which there exist lling patterns. Notice that a square has the minimum perimeter with a given area. Let S be the area of a maximum square which can be embedded in R. Before the pattern area reaches S, the minimum perimeter grows quadratically; past S, the minimum perimeter grows linearly.
The algorithm for nding a pattern with a given area and perimeter is straightforward: it starts with the minimum area pattern that has the given perimeter, and sequentially adds square cells with side c until the necessary area is achieved.
? Figure 14 : The x-axis represents the area and the y-axis represents the perimeter of the lling pattern. The highlighted region with vertices P 0 , P min , P max , and P 1 represents the combinations of area and perimeter for which there exist lling patterns. The pattern with the area S minimum perimeter is the largest square which can be embedded in R. Before the pattern area reaches S, the minimum perimeter grows quadratically; when the area exceeds S, the minimum perimeter grows linearly.
Implementation and Computational Results

Implementation
Our current experimental testbed integrates GDSII Stream input, conversion to CIF format, and internally-developed geometric processing engines. For density analysis, the user speci es the parameters w, r, B, U, L, etc., and receives output indicating extremal window densities, average window density, and lists of violating windows. For density improvement, the user speci es additional parameters such as the maximum possible ll pattern density (used to compute available slacks in each tile). The program outputs whether the density lower bound L can be achieved (and if so, the maximum achievable density lower bound L 0 ), and the amounts of ll p ij that should be introduced into each tile. Finally, for pattern insertion the user speci es further parameters, including the type of ll pattern desired (rectangular grid or basket-weave oating ll), and a parametric speci cation of the pattern. The program outputs the nal layout, including the added ll geometries, in CIF format. Our experiments have been run using three metal layers extracted from industry standard-cell layouts. Benchmark L1 is the M2 layer from an 8131-cell design; Benchmark L2 is the M3 layer from a 20577-cell design; and Benchmark L3 is the M2 layer from the same 20577-cell design. The layout dimension, number of rectangles, and window size (w always chosen to equal 1.5mm) for each test case are shown in Table 1 . Table 2 reports the maximum window density found by the multilevel density analysis with accuracy parameter set to either 2% or 3%. We report the maximum density of a standard window, rather than the midpoint between the maximum density standard and bloated windows, in order to enable comparison with the xed-dissection analysis results below. CPU time corresponds to seconds on a 140MHz Sun Ultra-1 with 256MB RAM. In practice, we nd that the multilevel analysis is preferable to the exact method of ALG3, since the latter has runtimes on the order of tens of CPU minutes for the same test cases (see 13] for ALG3 runtimes on slightly variant layouts of the same standard-cell designs). Table 3 shows the analogous results for the xed-dissection approach, which we understand to be used in industry. The two tables show that large values of r, and correspondingly large runtimes, will be required for the xed-dissection approach to nd the window densities that the multilevel analysis can nd in only a few seconds.
Computational Experience
Last, Table 4 depicts the performance of our software for LP generation and solution to obtain optimal ll amounts, along with ll insertion into the output CIF le. The xed-dissection LP achieves lower bounds M on density that are reasonably close to the best possible values. 15 Larger r values may be used to reduce the potential variation from uniform density. For the multilevel density analysis, LP and ll generation, we observe somewhat lower values of M, and one large runtime for the L2 benchmark with r = 8. The oating deviation LP, which allows the user to bound the density variation between minimum-and maximum-density oating windows, shows steady improvement in solution quality with increasing r, just as we expect. In practice, we believe that the oating deviation LP is attractive for its control over arbitrary windows; the multilevel LP is also attractive for its data ow directly from multilevel density analysis.
Conclusions and Future Directions
In conclusion, we have addressed an increasingly critical problem in the interface between process, physical layout design and performance veri cation. We have given the rst statement of the lling problem (with min-variation objective), which arises in layout post-processing for CMP uniformity. We have developed e ective algorithms for density analysis as well as for lling synthesis. Our current experimental testbed integrates GDSII Stream input, conversion to CIF format, and internally-developed geometric processing engines. Runtimes show that the proposed techniques are practically useful. We are currently seeking more test cases and density rules from industry to further re ne the proposed approaches and implementations. 16 Ongoing work addresses such issues as: developing even more e cient, general and provably-good lling algorithms (e.g., for simultaneous 16 Interesting test cases for lling will not simply be place-and-route test cases: the vast majority of P&R instances are for cell-based implementation of random (control or glue) logic. The majority of the chip { embedded memory cores, high-performance datapaths, global clock and power distribution, analog or mixed-signal blocks, etc. { is what makes the lling problem challenging, but at the same time such layouts are not typically seen by a place-and-route tool. As a result, test cases for the problem that we address are currently quite di cult to obtain. Table 4 : Experimental results showing CPU times for three variant LP approaches to computing optimal ll amounts; CPU times for ll generation are also shown.
perimeter-and area-density based criteria); nding improved heuristics or exact algorithms for the min-variation formulation; maintaining knowledge of min/max density/perimeter windows under dynamic feature insertion/deletion in time o(n) or o(k); calibrating our proposed methods against data and density control requirements from industry partners; and extending the present infrastructure to address, in a uni ed way, requirements for both slotting (metal stress relief) and lling at other length scales (micro-loading, iso-dense) in combination with the current requirements.
Acknowledgments
We thank the anonymous reviewers for their valuable comments on the previous draft of this work. We thank Tom Laidig and Kurt Wampler of MicroUnity Systems Engineering, Inc. for many illuminating discussions and patient readings of early drafts during 1997. Juan Rey of Cadence has also been generous with his time. We thank Larry Camilletti and Duane Boning for enlightening discussions. We appreciate the help of Huijuan Wang with the Raphael simulations, we thank Nayantara Gupta for help with implementing the user interface, and we thank Tongtong Zhang and Chris Helvig for their help with proofreading. We gratefully acknowledge a software donation from Artwork Conversions, Inc.
Appendix: Fill Impact on Extraction Table 5 shows capacitance extraction results obtained with the Raphael 3-D eld solver from TMA / Avant!, for an isolated conductor (i) with or without ll insertion in empty regions of adjacent layers, and (ii) with or without same-layer neighbor conductors. 17 The simulation shows that ignoring the possibility of metal ll can result in underestimation of total line capacitance by more than 50%. This can in turn lead to inaccurate RCX, delay calculation, and timing analysis results. We conclude that the presence or absence of ll geometries must be modeled during performance-driven layout optimization. Such modeling must be e cient and \transparent"; since there are many iterations through the layout optimization loop, we must be careful with the time complexity of ll insertion and the increases in data volume. Tables 6 and 7 give TMA/Avant! Raphael capacitance extraction results for multi-layer interconnect structures involving ll geometries, as follows. Three 20 1 victim conductors A, B, and C (with B in the middle), with spacing 1 between them, are placed on a victim layer i. All conductor thicknesses = 1.5; dielectric height between layers = 1.5. Dielectric permittivity was set at either 3.9 or 2.7.
A 40 40 bottom ground plane is placed at layer i ? 2 Table 5 .
Two types of ll geometry patterns were considered for layer i ? 1 (see Figure 15 ): (a) 1 1 squares with (x; y) origins of form (2i; 2j), i and j integers, resulting in an overall pattern area density (for an in nite layout region) of 0.25 (see Figure 15 (a)), and (b) 10 1 (tall and thin) rectangles with (x; y) origins of form (4i; 14j) or (4i ? 2; 14j ? 7) , i and j integers, resulting in an overall pattern area density (for an in nite layout region) of 0.357 (see Figure 15(b) ). An o set is optionally introduced. When the ll geometries are o set, they lie directly under the spaces between the victim conductors. When there is no o set, the ll geometries lie directly under the victim conductors. Table 6 shows that the total capacitance values for the middle conductor (B) uctuate by less than 1 percent over all four combinations of ll pattern and o set. The critical factor is that the ll is present in the rst place. Similarly, Table 7 shows that the total capacitance values for each of the outside conductors (A and C) also uctuate by less than one percent. We conclude that the lling can, subject to constraints involving feature dependencies between layers, be viewed as a \single-layer problem". 18 18 There are several notable conditions under which the single-layer assumption is not quite correct. For example, poly ll geometry in regions with underlying active di usion can create spurious transistors, and such regions must be marked as inviolate on the poly layer. This is a simple preprocessing step before slack calculation and ll insertion.
