Self-aligned multiple patterning (SAMP), due to its low overlay error, has emerged as the leading option for 1D gridded back-end-of-line (BEOL) in sub-14nm nodes. To form actual routing patterns from a uniform "sea of wires", a cut mask is needed for line-end cutting or realization of space between routing segments. Constraints on cut shapes and minimum cut spacing result in end-of-line (EOL) extensions and non-functional (i.e. dummy fill) patterns; the resulting capacitance and timing changes must be consistent with signoff performance analyses and their impacts should be minimized.
INTRODUCTION
Self-aligned multiple patterning (SAMP), due to its low overlay error, has emerged as the leading option for the 1D gridded BEOL on "1×" or "Mx" layers in sub-14nm nodes. Figure 1 (a) shows a part of a target layout, which is finalized as of post-routing design. In the first step of fabrication, we first generate uniform "sea of wires" as shown in Figure 1 (b). In the next step, to form actual routing patterns from the "sea of wires", we make cuts on the wire segments by using cut masks as shown in Figure 1 (c). Figure 1(d) shows the final layout. In addition to the target layout, the final layout includes end-ofline (EOL) extensions, which are attached to the routing segments of the target layout, and dummy fills, which are floating.
There are several ways to print cuts, such as 193i immersion lithography and electron-beam (e-beam) technology. Ebeam is costly due to the intrinsically low throughput of "writing" as opposed to "printing". Conventional 193i patterning remains a viable alternative. However, multiple 193i cut masks typically must be used to increase granularity. In other words, for a set of cut shapes to be printed by the same cut mask, the spacing between any two cuts must be at least the minimum cut spacing * . To print cut shapes with closer spacing requires more cut masks (colors). Further, complex cut shapes (e.g., non-rectangular shapes) may cause pattern fidelity loss and risk of yield loss. Thus, cuts must be assigned to different cut masks (color assignment) and nicely distributed with simple cut shapes, i.e., rectangular shapes. These constraints on cut mask shapes and colorability result in EOL extensions beyond what is originally seen in the layout tool; the resulting capacitance and timing changes must be consistent with signoff performance analyses. Furthermore, cut mask shapes determine the amount of non-functional (i.e., dummy fill) patterns that remain from the original "sea of wires"; this must be consistent with area density bounds and timing constraints.
In this work, we address the co-optimization of cut mask layout, dummy fill, and design timing for sub-14nm BEOL design. Our central contribution is an optimizer based on integer linear programming (ILP) that minimizes the timing impact due to EOL extensions, with consideration of (i) minimum cut spacing arising in sub-14nm nodes; (ii) cut assignment to different cut masks (color assignment); and (iii) the eligibility to merge multiple unit-size cuts into a larger cut.
We minimize timing impacts by assigning a timing slack-dependent weight to each wire segment. To enable our optimization to apply at full-chip scale, a partitioning-based method is used to achieve linear scaling of runtime with layout area. Finally, beyond finding optimal locations of line-end cuts, we develop a heuristic to remove dummy fills in an effort to improve timing performance of critical paths, subject to minimum metal density and cut mask density constraints. Our contributions are summarized as follows.
• To our knowledge, ours is the first work that considers timing, metal density, and cut mask density simultaneously.
• We formulate an ILP-based optimization of unit-size cut locations together with cut mask assignment (color assignment). Our ILP-based optimizer minimizes the timing impacts (i.e., slack degradation on critical timing paths) due to EOL extensions of wire segments.
• We develop a post-ILP optimization flow that further optimizes timing by enlarging and/or inserting cuts to remove dummy fills around timing-critical segments, while satisfying prescribed minimum metal density constraints and considering cut mask density uniformity.
• Our experiments across different numbers of cut masks, minimum metal densities and minimum cut spacings give insight into a significant performance-cost tradeoff that can be afforded by cut mask patterning technology options.
The remainder of this paper is organized as follows. Section 2 gives a review of related works. Section 3 describes our cut mask optimization approach, which consists of (i) ILP-based optimization of unit-size cut locations, (ii) a scalable partitioning-based method to handle larger designs, and (iii) a heuristic to remove dummy fills according to various cut mask layout rules. Section 4 presents our experimental results, and Section 5 concludes the paper.
RELATED WORKS
In this section, we first introduce previous works to support the use of SAMP and 193i line-end cuts towards sub-14nm nodes. We then list a couple of related works for cut mask optimization.
Using 193i and line-end cuts towards sub-14nm nodes. Owa et al. 8 investigate the possibility of extending 193i patterning to sub-10nm nodes. They provide experimental data for SAMP and Litho-Etch (LE) cuts down to the 5nm node. Notably, they apply self-aligned quadruple patterning (SAQP), a type of SAMP process, with 11.8nm half-pitch to support the use of unidirectional patterning with multiple LE cuts at the 7nm node. A cost model of SAMP is evaluated, assuming that the cost (and, number of repetitions) of litho-etch processes is simply proportional to transistor density. This assumption ensures printability but is pessimistic in that it increases the node-to-node per-transistor cost scaling factor from 0.7× to 0.86×, making it a less cost-effective option. Gillijns et al. 4 study 193i patterning for N10 and N7 BEOL † , contrasting the use of cut masks against the removal of all excess metal fill shapes. They show that when moving to the N7 node, a line-end cut option affords better process window with fewer cut masks, at the cost of increased wire length, capacitance and power. These two works provide motivating context for our present study. In our present work, we focus on achievable tradeoffs between IC performance and cut mask cost on 1× layers. In particular, we demonstrate an effective timing optimization that simultaneously keeps mask cost down by using fewer cut masks.
Shortest path-based approach. Zhang et al. 9 use a shortest path-based method to improve the printability of cuts. The authors categorize cuts into two groups based on their printability. One type is regular, which is a cut adjacent to a routing segment. The other type is critical, which is a cut adjacent to the line-end of a routing segment. The authors investigate tradeoff between performance and printability. However, their model is not timing-aware, and it does not consider the usage of multiple cut masks. Also, since regular cuts may be printed without printability issues, there may be a guardband to be optimized.
Integer linear programming-based approaches. Du et al. 3 propose a hybrid optimization of cut masks with e-beam by using integer linear programming (ILP). In their work, they generate minimum spacing rules within the same and across tracks according to a lithography simulation. They propose an ILP model to handle these constraints. The objective is to minimize the usage of throughput-constrained e-beam technology. Compared to Zhang et al., 9 Du et al. 3 use more realistic design rules derived from lithography simulation. However, their solver takes up to a day to obtain an optimal solution for larger designs. Ding et al. 2 improve Du et al.'s ILP formulation to reduce solver runtime; their updated ILP formulation has fewer binary variables and introduces an extension limit for each wire segment which can reduce ILP solver runtime. However, rather than performing a design-specific timing optimization, Ding et al. simply minimize the sum of EOL extensions without consideration of possible tradeoffs involving timing-critical wire segments. Furthermore, their ILP formulation does not support cut assignments to multiple cut masks.
OUR APPROACH
In this section, Subsection 3.1 describes our ILP-based optimization of cut locations, and cut assignments to different cut masks, to minimize the impact of end of line extensions on timing. Subsection 3.2 then proposes a timing-and densityaware post-ILP optimization to minimize the impact of dummy fills considering metal and mask densities. Subsection 3.3 explains our overall flow.
ILP-based Cut Mask Optimization
For a given 1D routed layout, we seek cut locations and assignments of cuts to different cut masks so as to minimize the impact of line end extensions on critical-path timing. We assume a horizontal routing layer in the following discussion. For any horizontal wire segment w, there are exactly two unit-size (i.e., minimum horizontal half-pitch × minimum vertical half-pitch size) cuts at each right and left end-of-line. Given minimum cut spacing min s , any two cuts that are located within min s of each other cannot be printed with a single cut mask. To address this printability problem, we can relocate one or both of the cuts so that their separation becomes larger than min s , or so that they are merged and form a larger cut. Another solution is to assign the cuts to different cut masks. If we use a unique color to represent each cut mask, then assigning each cut to a cut mask is equivalent to assigning a color to each cut (we refer to this as color assignment).
In our formulation, cuts are located on grid points, with each grid point corresponding to the intersection of perpendicular tracks of adjacent metal layers. We define the relocation range for each cut as the set of grid points to which the cut can be relocated. Relocation ranges may be subject to maximum EOL extension limits for each wire segment, and cannot overlap with any existing routing segments. There is an obvious tradeoff between timing and cost: relocation of cuts leads to EOL extensions which affect the timing of paths going through the extended wire segment. On the other hand, use of additional cut masks, while helping to control line-end extensions, adds to process cost and, potentially, process variability as well.
We now describe our ILP formulation for the cut mask optimization problem. The variables used in our formulation are summarized in Table 1 .
The objective is to minimize the weighted sum of EOL extensions. The weight a w for each wire segment w is assigned based on the timing slack of the net. Constraints:
(i) Constraints for cut mask assignment.
Constraint (1) forces each cut to be assigned to exactly one of |K| cut masks.
(ii) Constraints for cut pairs on the same track.
Two cuts c i and c j form a cut pair in set S 1 if (i) they are on the same track and (ii) the possible cut locations of c i and c j are within the minimum cut spacing of each other, considering their relocation ranges. The relocation range for c i is the maximal contiguous set of grid points where c i can be located. To achieve a legal solution for a cut pair, the two cuts should be (i) kept at least the minimum cut spacing apart from each other, as shown in Figure 2 (a); (ii) merged into one cut, as shown in Figure 2 (b); or (iii) assigned to different cut masks, as shown in Figure 2 (c). For a cut pair c i , c j , a valid merging requires two cuts to be overlapped or abutted. Without loss of generality, we assume x i > x j . Given two neighboring wire segments on the same track, if cut c j is the right-end cut of the left wire segment and cut c i is the left-end cut of the right wire segment, Constraint (2) keeps their relative cut locations in order, as shown in Figure 2 (a). The variable G is a large positive constant, and m i, j is a 0-1 variable indicating whether the two cuts are merged into a larger cut. When m i, j = 0, Constraint (3) ensures that two cuts are either separated by at least the minimum cut spacing (see Figure 2 (a)) or assigned to different cut masks (see Figure 2 (c)). If the two cuts are merged, Constraints (4) and (5) ensure that they are assigned to the same cut mask, as shown in Figure 2 (b).
(iii) Constraints for cut pairs on different tracks. Two cuts c i and c j form a cut pair in set S 2 if they are on different tracks and their possible cut locations (i.e., relocation ranges) are within the minimum cut spacing. To legalize a given cut pair, the two cuts should be (i) kept at least the minimum cut spacing apart as shown in Figure 3 
Indicator d i, j is a 0-1 variable indicating whether cut c i is on the left side of cut c j . Specifically, d i, j = 1 indicates cut c i is to the left of cut c j . Indicator m i, j is a 0-1 variable indicating whether the two cuts c i and c j are vertically aligned and merged into a larger cut. Since we do not know the cut location in advance, for two vertically overlapped relocation ranges, either cut may be on the left side of the other. Similar to Constraint (3), when m i, j = 0, Constraints (6) and (7) force the two cuts to be separated by at least the minimum cut spacing or assigned to different cut masks. Again, G is a large positive constant. If m i, j = 1, Constraints (8) and (9) align the cuts c i and c j when they are assigned to the same cut mask. The vertical alignment requires that all aligned cuts share the same x-coordinate on contiguous tracks. Special consideration must be taken for the vertical alignment of cuts on multiple (i.e., ≥ 3) tracks. For two cuts c i and c j on two non-adjacent tracks, and a cut c l on the track between the tracks of c i and c j , Constraints (10) -(13) ensure the vertical alignment between c i and c l if they are on the same cut mask. We enforce similar constraints between c j and c l . Figure 3(b) shows the result when three cuts are vertically aligned. Note that we do not allow vertical alignment if there is no available intersection with relocation ranges on intervening tracks.
Determining weights for EOL extensions of routing segments. For our optimization to be timing-aware, we must capture the timing impact of EOL extensions. A small amount of EOL extensions on the most timing-critical path may degrade the timing and result in an increase of the design's clock period (thus, reducing the maximum clock frequency of the design). On the other hand, even a large amount of EOL extensions on a non-critical path may not cause any degradation of the clock period. We model the timing criticality of possible EOL extensions by assigning weights that are derived from timing slacks computed by static timing analysis, e.g., using the Synopsys PrimeTime tool. Timing slack of a given net is used to determine the criticality of every wire segment of that net. For each net, we first obtain the timing slack of the most critical path passing through the net. Since the timing slack is defined per timing path, we distribute the timing slack among nets of the path based on stage delays (a stage consists of a logic gate or primary input of the design, along with its driven net). For example, given a timing path of two stages with a path timing slack of +50ps, if the first and second stage delays are 200ps and 300ps, respectively, we assign +20ps (e.g., 50 × 200/(200 + 300)) and +30ps of timing slack to the nets of the first and second stages, respectively.
We then classify all wire segments into two groups, based on the calculated net slack values. The first group includes all wire segments of clock nets and of nets that have negative net slacks. All other wire segments are included in the second group. We assign a higher weight to segments in the first group. ‡ By minimizing the weighted sum of EOL extensions, our optimization will avoid EOL extensions on wire segments with higher weights (i.e., on timing-critical nets).
Analysis of the number of variables and constraints. Given a set of cuts C for wire segments W , and |K| cut masks, we obtain sets of cut pairs S 1 , S 2 . The number of variables and constraints are as follows.
• The number of variables m is |S 1 | + |S 2 |.
• The number of variables d is |S 2 |.
• The number of variables n is |C| · |K|.
• The number of variables x is |C|.
• The number of Constraints (1) is |C|.
• The number of Constraints (2) - (5) is |S 1 |.
• The number of Constraints (6) - (9) is |S 2 |.
• The number of Constraints (10) - (13) is F · |S 2 |, where F is a constant.
Timing-and Density-Aware Post-ILP Optimization
We now explain our timing-and density-aware post-ILP optimization flow, which starts from the ILP solution achieved as described in the preceding subsection. Given layer t with all cuts assigned to cut masks, we iteratively consider regions above and below given routing segments -so as to remove dummy fills by enlarging or inserting cuts using all available cut masks -until the total metal density of the layer t reaches the target minimum metal density constraint. To maintain awareness of timing, we process all routing segments in the ascending order of their net slack, as discussed above. Our flow also attempts to maintain mask density uniformity across all cut masks. Algorithm 1 describes the detail of our post-ILP optimization flow. Our flow optimizes layer by layer from the output solution of ILP-based optimization. The inputs are the output layer t from ILP-based optimization, target minimum metal density ρ min , set of cut masks (colors) K and minimum cut spacing min s . The output is the optimized layer t opt with dummy fills. Lines 1-2 calculate the current metal density d m of layer t and mask density P k for cut mask k of layer t. Lines 3-4 collect all routing segments W in layer t and sort all routing segments w ∈ W in the ascending order of their net slack. We then set the ∆, which is used to determine the target region to apply cuts, as one (Line 5). Lines 6-16 iteratively add cuts on all available cut masks until ρ m ≤ ρ min . For each routing segment w (Line 7), we check the upper (left) and lower (right) horizontal (vertical) tracks track cur (Line 8), which are exactly ∆ tracks apart from the horizontal (vertical) track of the routing segment w. ‡ We set a weight of w = 2 for segments in the first group, and weight w = 1 for segments in the second group. for all w ∈ W do 8:
for all track cur ∈ {track r + ∆, ..., track r − ∆} do 9:
v ← de f ineTargetRegion(w,track cur );
10:
Q ← enumCandidateCuts(v,t, min s );
11:
t ← selectCuts(t, Q, P k );
12:
updateDensity(ρ m , P k ,t); The function de f ineTargetRegion(w,track) defines a target region v to be cut as shown in Figure 4 (a). For this target region v, the function enumCandidateCuts() then enumerates all possible sets of candidate cuts on each cut mask k. Minimum cut spacing min s (Line 10) is considered in this step. We only allow rectangular shapes on cut masks for better fidelity of metals. To avoid forming non-rectangular shapes, we consider existing cuts on the neighboring tracks of the target region v. Figure 4(b) shows an example of all candidate cuts on each cut mask for the target region. Neighboring regions of the target region v are checked so that rectangular shapes are always preserved when enlarging or inserting cuts.
After obtaining the set of candidate cuts Q on each cut mask k, to account for the mask density uniformity, we first select the set on a cut mask with the least mask density. We then pick the set that has the minimum mask density among the remaining cut masks, and cover the target region which is not covered by previous cuts. Figure 4(c) shows an example solution with the assumption of ρ 3 ≤ ρ 2 ≤ ρ 1 . Finally, based on the optimized cut mask solution, we obtain the actual layout pattern with EOL extensions and dummy fills for layer t (Line 17).
Overall Flow
Our overall flow is shown in Figure 5 . In the flow, we perform two steps: (i) ILP-based cut mask optimization and (ii) timing/density-aware post-ILP optimization. To achieve a scalable optimization for full-chip layouts, we use a partitioning-based, distributable optimization strategy. Namely, to overcome the poor scalability of ILP, we split the layout into many clips and run the ILP-based optimization for each clip in parallel. (A clip is simply a rectangular piece of the chip layout.) The typical clip size is 3µm by 3µm. § Our second step of post-ILP optimization to improve critical path performance removes dummy fills with consideration of metal density and mask density constraints. This step is achieved using an efficient heuristic, with no need for distributed implementation.
Handling cuts at boundaries with multiple iterations. One drawback of partitioning-based optimization is that clips can interfere with each other so that their solutions may not be compatible with each other when stitched together within the entire chip. To avoid such situations, we perform several iterations of the optimization so that all cuts are processed § We use a foundry N28 BEOL stack with 2.5× scaled N7 library cells. Therefore, the clip size in N7 will be 1.2µm by 1.2µm. without conflict. Figure 6 shows how we partition the layout in each of three iterations that comprise our partitioningbased optimization.
In the first iteration, we optimize cuts within each clip, without considering boundaries between clips, as shown in Figure 6 (a). In the second iteration, we optimize and solve conflicts near the horizontal boundaries between vertically adjacent clips of Figure 6 (a), as shown in Figure 6 (b). In this second iteration, we adjust the height of a clip to be four times the minimum cut spacing. In each clip, we only optimize for regions within minimum cut spacing of horizontal boundaries and keep the solutions obtained in the first step for the other regions in the clip. In this way, we can solve all conflicts on horizontal boundaries. Similarly, in the third iteration, we optimize clips that cover vertical boundaries between pairs of horizontally adjacent clips in Figure 6 (a). The width of clips in this iteration is determined similarly to the height of clips in the second iteration; see Figure 6 (c). After completing the three iterations, we will have covered all cuts without inducing conflict between the clip solutions.
EXPERIMENTAL SETUP AND RESULTS
In this section, we present our experimental setup and results. We experiment on the impact of number of cut masks, minimum metal density, minimum cut spacing and EOL extensions. These experiments show the effectiveness of our optimization and tradeoffs between performance and cost.
Experimental Setup
The program is written in C++ with OpenAccess 2.6 13 API to support DEF/LEF 12 and handle routing segments. We use IBM CPLEX 11 as the ILP solver. Parallel optimization is enabled by OpenMP 15 API. We perform experiments with 40 threads on a 2.6GHz Intel Xeon E5-2690 dual-CPU server. Reported runtimes are "wall clock" time between start and termination of each given experiment. We evaluate our approach using an encryption core (AES) and a media processing core (JPEG) from OpenCores, 14 as well as an ARM Cortex M0 design without memories. We synthesize the designs with Synopsys Design Compiler H-2013.03-SP3 16 from RTL netlists. We then perform placement and routing with Cadence Encounter Digital Implementation System v14.1, 10 using an abstracted N7 library from a leading IP provider. Since our N7 technology is missing detailed BEOL stack information which is necessary for design enablement, we scale up the N7 library cells' dimensions to use an N28 BEOL stack, following the methodology described in Han et al. 5 ¶ . The methodology described by Chan et al. 1 is used to derive the missing resistance (R) and capacitance (C) information for N7 BEOL from original N28 wire RC values. Here, R (C) is defined as per unit-length resistance (capacitance) in a specific foundry node. We scale the N28 wire R by 13× to derive the N7 wire R, accounting for the rapid increase of resistivity in advanced nodes . N28 wire C is scaled by 0.4× to derive the N7 wire C considering geometric scaling. We also project to N5 foundry technology by scaling wire R and C further (e.g., 22× and 0.28× for R and C from N28 BEOL, respectively) and derating standard cells' delay based on 2015 ITRS models. 6 The delay and transition time of standard cells are scaled by 0.75× according to the ratio of I/CV parameters of N7 and N5. The gate capacitances of standard cells are scaled by 0.86×. Table 2 summarizes key parameters of our testcases.
In all our experiments, we derive minimum cut spacings for N7 and N5 foundry nodes according to the 2015 ITRS Lithography Chapter. 6 Based on the ITRS discussion, the 2D lithography pitch cliff is approximately 110nm, which corresponds to cut pitch. M2 pitches of foundry N7 and N5 nodes are 36nm and 24nm, respectively. Since our enablement uses a foundry N28 BEOL stack, we use multiples of M2 pitch as minimum cut spacing, which are derived from metal and cut pitch numbers. We use four M2 pitches and five M2 pitches as the minimum cut spacing values in the N7 and N5 nodes, respectively. Minimum cut spacing is checked based on center-to-center Euclidean distance between cuts. Figure 7 illustrates forbidden locations caused by an existing cut on the cut mask. Our BEOL stack consists of six layers (i.e., M1 -M6) of 1× minimum M2 pitch and two layers (i.e., M7 -M8) of 2× minimum M2 pitch. We assume that SAMP will only be applied to the six layers with 1× minimum M2 pitch. Therefore, we vary the number of cut masks, minimum cut spacing and target minimum metal density only on the layers M2 -M6 to study the impact of each parameter. We report worst negative slack (WNS) and total wire capacitance using Cadence Encounter Digital Implementation System v14.1 for each of the N7 and N5 implementations (Note that smaller (e.g., more negative) values of WNS are worse because "slack" corresponds to "timing safety".). As a calibration, the typical FO4 buffer delays in N7 and N5 are 23ps and 18ps, respectively. Our analysis is performed with coupling capacitance and signal integrity (i.e., crosstalk-induced delay impact analysis) options in the timing analysis.
Experimental Results
We now report our experimental results, including the impact of number of cut masks, minimum metal density, minimum cut spacing and EOL extensions. Our experiments demonstrate the effectiveness of our optimization as well as substantial available tradeoffs between performance and cost. Figure 8 visualizes a fragment of layout of the M2 layer with optimized EOL extensions and dummy fills, for the N7 Cortex M0 testcase, using four cut masks. Impact of number of cut masks. Table 3 shows results with various number of cut masks for each layer (i.e., options C1 -C12). We use minimum cut spacing as four M2 pitches with N7 technology and target minimum metal density as 40%. We assume that in the SAMP process, the width and spacing of metal are both equal to the half-pitch, so that maximum track occupancy gives 50% metal density. For each option, we report the WNS, total wire capacitance, sum of EOL extensions, the number (percentage) of infeasible clips and runtime. An infeasible clip means that the ILP solver cannot find a feasible solution for the ILP instance corresponding to the clip for the given numbers of cut masks and minimum cut spacing. Regardless of the testcase, infeasible clips exist for all options C1 to C5, implying that option C6 is the set of minimum numbers of cut masks that ensures solution feasibility in all three of our testcases. Further, by comparing options C6, C11 and C12, we observe that using two more than the minimum number of cut masks in each of the layers has little effect on timing. We also note that the Cortex M0 testcase has a larger WNS variation than the AES testcase among different options, even though AES always has larger EOL extensions than Cortex M0. This is because Cortex M0 has more stages on its critical timing path than AES, and so the cumulative timing impact (over the entire critical path) seen in Cortex M0 is larger. Runtimes for our optimization are larger for options with numbers of cut masks similar to those in option C6 (the set of minimum numbers of cut masks). Also, among the three testcases, runtime increases roughly linearly for each option according to the number of segments (see Table 2 ).
Impact of minimum metal density. Table 4 shows the results with various minimum metal density constraints (i.e., 40%, 42.5%, 45%). We set minimum cut spacing as four M2 pitches and use option C6 with the minimum number of cut masks for a feasible solution. The WNS improvement is up to 14ps by decreasing the target metal density from 45% to 40% among three testcases. We also observe that runtime does not change among different target metal densities, which means that runtime of our post-ILP optimization is negligible compared to that of the ILP-based cut mask optimization step.
Impact of minimum cut spacing. Table 5 shows results with different minimum cut spacings for N7 and N5. Minimum metal density of 40% is enforced. For each design in each node, we first find the option that has the minimum number of cut masks per each layer. We then investigate the impacts on timing, total wire capacitance and total EOL extensions when we add one or two more masks for each layer. When we compare the results for N7 and N5, we observe that N5 is more sensitive to the number of cut masks. For example, AES for N7 shows 1ps difference in WNS between the options (3,2,2,2,2) and (5, 4, 4, 4, 4) , but AES for N5 shows 11ps difference. This is because wire delay is more dominant than gate delay in N5 compared to N7. Also, going from N7 to N5, the increase of per unit-length wire resistance is greater than the decrease in per unit-length wire capacitance. Impact of EOL extensions. Our next experiment performs ILP-based cut mask optimization by itself, without any post-ILP optimization, to highlight the impact of EOL extensions. We compare the best unit-size cut solution against the (near-) worst unit-size cut solution. To find the (near-) worst solution, we maximize the weighted sum of extensions, instead of minimizing this weighted sum. The maximum length of each relocation range is restricted to 30 M2 pitches according to our clip size. Therefore, the (near-) worst solution from our ILP-based cut mask optimization may not be the worst solution over the solution space since a wire segment cannot be extended beyond its clip boundaries. However, this (near-) worst solution is bad enough to demonstrate a strong impact of EOL extensions. We conduct experiments in both N7 and N5 nodes. For N7, we use four M2 pitches for minimum cut spacing, option (3,2,2,2,2) and 40% for target metal density.
To isolate the timing impact of EOL extensions from dummy fills, the best (BEST) and the worst (WORST) solutions have only EOL extensions without dummy fills. We compare BEST and WORST with the original target layout (ORIG) to see the pure impact of EOL extensions. We also add a comparison to our final layout (BEST + POST-ILP), which accounts for the impact of both EOL extensions and dummy fills. Table 6 shows the results for WORST, BEST, ORIG, and BEST + POST-ILP. For N5, we use five M2 pitches for minimum cut spacing, and the option with minimum number of cut masks for each design as determined in previous experiments (see Table 5 ). The target minimum metal density is set to be 40%. Table 7 shows the results for N5. By comparing BEST and WORST timing to ORIG timing, we observe that among three testcases, WORST EOL extensions can degrade WNS by up to 228ps. For all three designs, the average gap between BEST and WORST timing in N5 is 90ps larger than in N7. Compared to ORIG, our BEST + POST-ILP optimization achieves an average timing degradation of only 50ps and 67ps for N7 and N5, respectively, including the impact of dummy fills. When we compare ORIG, BEST and BEST + POST-ILP solutions, it is apparent that most of the WNS degradation is caused by dummy fills.
CONCLUSION
In this work, we have studied the co-optimization of cut mask layout, dummy fill, and design timing for sub-14nm BEOL design. We propose an ILP-based cut mask optimizer and a heuristic for post-ILP optimization. Our cut mask optimization flow for varying contexts (e.g., number of cut masks, target minimum metal density, minimum cut spacing, EOL extensions) indicate that there can be significant potential tradeoffs of performance and cost. Our ongoing work addresses such topics as: (i) improved timing-aware weight assignment in ILP; (ii) implementation of an ECO routing flow for infeasible routing clips to reduce the mask cost; (iii) comprehension of the difference between coupling to floating dummy fills and coupling to EOL extensions; and (iv) co-optimization of detailed routing and cut mask solutions. 
