Optimization for advanced lithography by Du, Yuelin
c© 2014 Yuelin Du
OPTIMIZATION FOR ADVANCED LITHOGRAPHY
BY
YUELIN DU
DISSERTATION
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2014
Urbana, Illinois
Doctoral Committee:
Professor Martin D. F. Wong, Chair
Professor Rob A. Rutenbar
Associate Professor Deming Chen
Associate Professor Xiuling Li
Dr. Rasit O. Topaloglu, IBM
ABSTRACT
Lithography has always been the most critical process in integrated circuit
(IC) fabrication. Below the 28 nm technology node, conventional 193 nm im-
mersion lithography (193i) with single exposure has reached its printability
limit. In order to keep up with Moore’s law, a lot of advanced lithography
and process techniques have been coming up in the recent decade, such as fin
based multiple-gate field-effect (FinFET) transistors, electron beam lithog-
raphy (EBL), self-aligned double patterning (SADP) lithography, directed
self-assembly (DSA), extreme ultraviolet lithography (EUVL), etc. Each of
the advanced lithography techniques has its own advantages over others, but
also faces great challenges due to different process limitations. In order to
adopt the advanced lithography technologies in IC fabrication, their bottle-
necks must be overcome first. Due to the physical limitations, it is extremely
difficult to break the bottlenecks by merely improving the fabrication pro-
cesses. Instead, design-technology co-optimization (DTCO) via electronic
design automation (EDA) software is a more effective way. Targeting the
most promising lithography and process techniques for advanced technology
nodes below 20 nm, in this thesis we study their major challenges and pro-
pose potential DTCO solutions to break the bottlenecks and improve the
manufacturability.
First, in the sub-20 nm technology nodes, fin based multiple-gate field-
effect (FinFET) transistors show great advantages over traditional planar
MOSFET transistors in high performance and low power applications. Edge
device degradation is among the major challenges for the FinFET process.
To avoid such degradation, dummy gates are needed on device edges, and the
dummy gates have to be tied to power rails in order to avoid unconnected
parasitic transistors. This requires that each dummy gate must abut at least
one source node after standard cell placement. If the drain nodes at two
adjacent cell boundaries abut each other, additional source nodes must be
ii
inserted in between for dummy gate power tying, which takes more placement
area. We propose a detailed placement optimization strategy for the standard
cell based designs. By flipping a subset of cells in a standard cell row and
switching pairs of adjacent cells, the number of drain to drain abutments
between adjacent cell boundaries can be optimally minimized, which saves
additional source node insertion and reduces the length of the standard cell
row.
Second, to make logic devices manufacturable for the 16 nm technology
node and beyond, designers are moving towards 1D gridded design style,
where a target 1D layer can be printed by the combination of a dense line
layer and a cut layer. A cut layer consists of a number of identical cut
patterns, each located at the line-end of a target wire. The randomness of
logic circuits will mainly affect the cut pattern distribution and introduce
a major challenge fabricating 1D gridded designs. With the help of hybrid
lithography, people can apply different types of processes for one single layer
manufacturing such that the advantages from different technologies can be
combined together to further benefit manufacturing. Targeting cut printing
difficulties and hybrid lithography with EBL and 193i processes, we propose
a novel algorithm to optimally assign cuts to 193i or EBL processes with
proper modifications on cut distribution, in order to maximize the overall
throughput.
Third, SADP lithography is a leading candidate for 10 nm node lower-
metal layer fabrication. Spacer-is-dielectric (SID) is the most popular flavor
of SADP with higher flexibility in design. In the SID process, due to uni-
form spacer deposition, the spacer shape gets rounded at convex mandrel
corners, and disregarding the corner rounding issue during SID decompo-
sition may result in severe residue artifacts on device patterns. Targeting
residue artifact removal, we propose an enhanced SID decomposition flow
with model-based verification. However, sometimes it is impossible to re-
move all artifacts through SID decomposition only. Besides that, full chip
SID decomposition has been proved to be an NP-complete problem. There-
fore, addressing artifact issues in the design phase (e.g., detailed routing) and
producing SADP-friendly layouts become much more effective. We make a
careful study on the challenges for SID-compliant detailed routing and pro-
pose a graph model to capture the decomposition violations and SID intrinsic
residue issues. Then a negotiated congestion based scheme is adopted to solve
iii
the overall routing problem.
Fourth, at the 7 nm technology node, DSA technology is the most promis-
ing candidate for the contact/via layer fabrication. To pattern contact holes
with DSA process, guiding templates are usually printed first with conven-
tional lithography (e.g., 193i) that has a coarser pitch resolution. Then the
guiding templates will determine the DSA patterns inside and these patterns
have a finer resolution than the templates. The overlay accuracy of the con-
tact holes as well as the printability of templates may vary among different
templates, and in consequence, the cost of each guiding template shape is
very different from others. We first discuss the DSA-aware contact layer op-
timization problem in the standard cell level. Given a standard cell library,
we simultaneously optimize the layouts of every cell, such that the contact
layer of any cell in the library can be fully patterned by a set of guiding
templates, and the total cost of the templates is minimal. Then in the full
chip level, we propose a DSA-aware detailed routing algorithm that takes
consideration of the constraints on feasible templates for the DSA process.
We guarantee that the via layers produced by our router can be successfully
patterned using feasible templates only.
Finally, EUV lithography is a leading candidate beyond the 7 nm tech-
nology node. One of the challenges in EUV lithography is how to utilize
defective blanks to produce valid EUV masks. One effective defect mitiga-
tion approach is to cover the defects with device patterns such that mask
defect will not impact the printing on wafer. We first present an efficient lay-
out shifting algorithm that finds an optimal location to place a single layout
onto a blank such that all defects are simultaneously covered. However, in
many cases, it is impossible to completely mitigate all defect impact if multi-
ple dies are tied and moved together; hence we further explore the flexility of
individual die shifting. Even with that, 100% success rate in complete defect
mitigation can never be guaranteed since this also depends on the designs
and defect maps. Targeting imperfect defect mitigation between one pair
of design and blank, we finally develop an optimal design-blank matching
strategy to match multiple designs and defective blanks simultaneously.
iv
To my parents, my wife and my son.
v
ACKNOWLEDGMENTS
I would like to express my deepest gratitude to my adviser Prof. Martin D.
F. Wong, who has been a tremendous mentor for me in both research and
life. His knowledge and wisdom always inspired me and led me forward. This
dissertation would not have been possible without his patient guidance.
Besides my adviser, I want to thank the rest of my Doctoral Committee:
Prof. Deming Chen, Prof. Xiuling Li, Prof. Rob A. Rutenbar and Dr. Rasit
Topaloglu, for their insightful comments and constructive suggestions.
I am also grateful to Prof. H.-S. Philip Wong and Ms. He Yi for their
great help in the research of DSA related topics. I would also like to thank
Dr. Hua Song for the valuable instructions during my internship in Synopsys
Inc. in 2012. My special thanks also go to Mr. James Hutchinson and Ms.
Janice L. Progen for offering the useful instructions to improve my academic
writing skills.
Many thanks to all the members in Prof. Wong’s research team who make
my life in UIUC very enjoyable. Thanks to Dr. Hongbo Zhang for helping
me establish my research topics. Thanks to Dr. Hui Kong, Dr. Tan Yan, Dr.
Lijuan Luo, Dr. Qiang Ma, Dr. Ting Yu, Ms. Pei-Ci Wu, Ms. Leslie Hwang,
Mr. Zigang Xiao, Mr. Haitong Tian, Mr. Tsung-Wei Huang, Mr. Choden
Konigsmark and Mr. Daifeng Guo for all the stimulating discussions and the
seamless collaborations we had, as well as their help in my study and life.
Last but not least, I want to thank my parents for raising me up, and
always provide me with endless love and support. I also owe my deepest
thanks to my wife, Yang Luo. I sincerely appreciate her coming to life, and
I appreciate every moment we have together, not only in the past few years,
but also in the future. Finally and specially, I would like to thank my baby
coming at the end of my Ph.D. period, with whom my life becomes more
meaningful.
vi
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . xiv
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of this Dissertation . . . . . . . . . . . . . . . . . . 2
CHAPTER 2 PLACEMENT OPTIMIZATION FOR FINFET PRO-
CESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Problem Expanding and Discussions . . . . . . . . . . . . . . 17
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
CHAPTER 3 HYBRID LITHOGRAPHY OPTIMIZATION FOR
CUT LAYER PRINTING . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Process Overview . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Lithography Simulation . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
CHAPTER 4 SADP DECOMPOSITION AND SADP-AWARE
DETAILED ROUTING . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Spacer Corner Rounding by Uniform Deposition . . . . . . . . 38
4.3 Residue Artifacts by Spacer Corner Rounding . . . . . . . . . 39
4.4 Enhancements in SID Decomposition for Residue Artifact
Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
vii
4.5 Enhanced SID Decomposition with Model-Based Verification . 43
4.6 Challenges in SID-Compliant Detailed Routing . . . . . . . . . 47
4.7 SID-Compliant Detailed Routing Problem . . . . . . . . . . . 50
4.8 Solution to SID-Compliant Detailed Routing . . . . . . . . . . 51
4.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 58
4.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
CHAPTER 5 CONTACT/VIA LAYER OPTIMIZATION FOR
DSA LITHOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Background: Contact Patterning with DSA . . . . . . . . . . 64
5.3 DSA-Aware Standard Cell Library Optimization . . . . . . . . 66
5.4 Cell Letter Determination . . . . . . . . . . . . . . . . . . . . 68
5.5 Alphabet Optimization . . . . . . . . . . . . . . . . . . . . . . 70
5.6 Feasible Letters for Via Patterning . . . . . . . . . . . . . . . 77
5.7 DSA-Aware Detailed Routing Problem . . . . . . . . . . . . . 78
5.8 Detailed Routing Scheme . . . . . . . . . . . . . . . . . . . . . 79
5.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 83
5.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
CHAPTER 6 BLANK DEFECT MITIGATION FOR EUV LITHOG-
RAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Layout Shifting Considering Inspection Inaccuracy . . . . . . . 89
6.3 Defect Mitigation through Multi-Die Placement . . . . . . . . 102
6.4 Design-Blank Matching . . . . . . . . . . . . . . . . . . . . . . 119
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
CHAPTER 7 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . 128
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
viii
LIST OF TABLES
2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Safe Distances for Two Cuts with Different Vertical Distances 26
3.2 Algorithm Comparison . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Experimental Results on Large Layout . . . . . . . . . . . . . 33
4.1 Comparison of Automatic/Random Pin Location Determi-
nation in SID-compliant Detailed Routing . . . . . . . . . . . 60
5.1 Experimental Results for AOPAPX . . . . . . . . . . . . . . . 83
5.2 Comparison between Conventional Detailed Routing and
DSA-aware Detailed Routing . . . . . . . . . . . . . . . . . . . 86
6.1 Impact of Defect Size on Defect Mitigation Results . . . . . . 100
6.2 Impact of Defect Number on Defect Mitigation Results . . . . 100
6.3 Impact of Shift Margin on Defect Mitigation Results . . . . . 101
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 110
6.5 Saturation Capacity and Freedom for Different Die Sizes . . . 118
ix
LIST OF FIGURES
1.1 The size gap between feature size and lithography wavelength. [1] 1
2.1 The edge device degradation induced by dummy gate removal. 7
2.2 A parasitic transistor is introduced by two FinFETs abut-
ting each other. . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 A dummy gate must abut a source node in order to be tied
to power rails. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 D2D abutments are removed via placement optimization.
Note that only the diffusion layers and the cell boundaries
are displayed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Solving the CFP problem by constructing the CFP graph
model and applying the shortest path algorithm on it. . . . . . 12
2.6 Solving the CSP problem by constructing the CSP graph
model and applying the shortest path algorithm on it. . . . . . 13
2.7 An overall graph model constructed based on the place-
ment of five standard cells. Note that only a subset of
edges in the last four rows are shown in (a), and the inter-
active edges between the first two rows and the last four
rows are illustrated in (b). Cost values are not displayed. . . . 15
2.8 The design rules for minimum jog widths. . . . . . . . . . . . 19
2.9 The updated CFP graph model built on two adjacent cells. . . 20
3.1 The 1D design are fabricated by a combination of dense
lines and cuts. (a) shows the original design; (b) shows the
default cut positions where 4 cuts need EBL to print; (c)
shows the cut positions after redistribution, where only 1
cut needs EBL to print. . . . . . . . . . . . . . . . . . . . . . 22
3.2 The flow of proposed manufacturing process. The original
design is shown in blue, dummy wires in yellow. The black
lines denote the dense lines; the red rectangles denote 193i
cuts; the green rectangles denote E-Beam cuts. . . . . . . . . . 24
x
3.3 Lithography simulation to find forbidden patterns. The
green rectangles show target cuts and the red contours
show printed shapes. (a) shows the simulation result at
the best focus, and (b) shows the simulation result with 50
nm defocus. The forbidden patterns at different focus are
grouped inside the shadow regions. . . . . . . . . . . . . . . . 25
3.4 The lithography simulation results before (a) and after
(b) cut redistribution. The real wires are shown in blue,
dummy wires in yellow. The green rectangles show the
ideal cut shape and the red contours show the printed images. 34
4.1 An example of SID decomposition. . . . . . . . . . . . . . . . 36
4.2 The spacer shape varies differently at the two types of corners. 38
4.3 Type 1 artifact. . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Type 2 artifact. . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Type 3 artifact. . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6 Enhanced SID decomposition to minimize type 1 artifacts. . . 42
4.7 Enhanced SID decomposition to remove type 2 artifacts.
The overlay-tolerance capability is sacrificed at the circled
edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.8 Enhanced SID decomposition to remove type 3 artifacts.
The overlay-tolerance capability is sacrificed at the circled
edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.9 Enhanced SID decomposition flow. . . . . . . . . . . . . . . . 44
4.10 Type 2 and type 3 artifact inspection with simplified de-
composition verification model. The artifacts are circled in
red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.11 Enhanced SID decomposition and contour simulation result. . 46
4.12 The comparison of the simulation results performed on pri-
mary and final masks. The black contours show the inner
and outer edges of the spacer deposition. . . . . . . . . . . . . 47
4.13 Allowed spacing values in SID process [2]. . . . . . . . . . . . 48
4.14 Splitting a single wire introduces a gap in the final pattern. [2] 49
4.15 Design rule for anti-parallel line-ends. . . . . . . . . . . . . . . 49
4.16 The definitions of vertices and edges in the expanded rout-
ing graph model. . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.17 The graph model for a pin with two candidate locations. . . . 54
4.18 The wire crossing and spacing conflict cost assigned to an
expanded routing graph with two pre-routed wires. . . . . . . 54
4.19 Three scenarios of anti-parallel line-ends conflicts on a rout-
ing grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.20 Graph model modification to avoid prohibited anti-parallel
line-ends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.21 The graph model that disallows inside-box detour. . . . . . . . 56
xi
4.22 The strategy of automatic line-end extension to avoid anti-
parallel line-ends conflicts. . . . . . . . . . . . . . . . . . . . . 57
4.23 Comparison of the simulation results with and without sm-
jog penalty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Comparison of DSA contact hole patterning between the
22 nm technology node and the 7 nm technology node with
a half adder. In (c) and (d), the dark gray areas denote
guiding templates and the black areas denote DSA contact
holes. Scale bar: 200 nm. . . . . . . . . . . . . . . . . . . . . . 62
5.2 The layout and guiding templates for a standard cell before
and after wire permutation. . . . . . . . . . . . . . . . . . . . 63
5.3 The overlay accuracy of a diagonal pair of DSA contacts
guided by the “peanut-shaped” template is worse than a
rectangular template. Scale bar: 200 nm. . . . . . . . . . . . . 66
5.4 A notional table to illustrate the dependence between cell
layouts and letters. . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Each layout is considered as a binary matrix. . . . . . . . . . . 69
5.6 A letter can be saved by 8 or 4 strings of 0s and 1s. . . . . . . 69
5.7 Feasible letters for via patterning. . . . . . . . . . . . . . . . . 77
5.8 All the feasible via patterns that can be printed using DSA
guiding templates. . . . . . . . . . . . . . . . . . . . . . . . . 78
5.9 Three vias in L-shape can be patterned with a four-hole
letter by inserting a dummy via. . . . . . . . . . . . . . . . . . 78
5.10 The via cost of a subset of vertices is updated once vias
are inserted by a net routed. . . . . . . . . . . . . . . . . . . . 80
5.11 The shortest path algorithm cannot handle via pattern con-
flicts introduced by the current path. . . . . . . . . . . . . . . 82
5.12 The experimental results co-optimizing three standard cells. . 84
5.13 The comparison between conventional detailed routing and
DSA-aware detailed routing performed on a toy netlist. . . . . 85
6.1 Cover blank defects with device patterns to mitigate their
impact. The impacts of defects A and B are not mitigated.
The impacts of defects C and D can both be mitigated but
with different tolerance to inspection inaccuracy. . . . . . . . 88
6.2 EUV mask: the blank and the layout. . . . . . . . . . . . . . . 89
6.3 The comparison of GET between two different solutions.
The first solution in (b) has 10 nm GET which is better
than the second solution in (c) with 0 GET. . . . . . . . . . . 91
6.4 Illustration of the impact region. . . . . . . . . . . . . . . . . 92
6.5 The shrinking of defect, IR and device patterns. . . . . . . . . 93
6.6 The layout is partitioned into SIRs by IR and device pat-
tern shrinking. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.7 Find all feasible regions by SIR intersection. . . . . . . . . . . 95
xii
6.8 Invalid tiles are discarded for the intersection operation. . . . . 97
6.9 The toy test of our algorithm. Before layout shifting, all
the 4 defects impact printing in (a), while after shifting
the layout by (115, 130), all defect impact is completely
mitigated by device pattern coverage in (b). . . . . . . . . . . 98
6.10 The relationship between the defect size and the percentage
of valid device patterns in our design. . . . . . . . . . . . . . . 99
6.11 Defect A lies within the die area and must be covered by
features. Defect B is outside any die area and does not
need to be considered. . . . . . . . . . . . . . . . . . . . . . . 103
6.12 A toy example to illustrate the feasible regions to place the
bottom left corner of the die for defect mitigation. Note
that the die area can never move out of the exposure field. . . 103
6.13 The valid region in the exposure field to place the bottom
left corner of the die is partitioned into 8 blank regions
based on the impact range of each defect. . . . . . . . . . . . . 105
6.14 The strategy to explore all feasible regions for a blank re-
gion with single effective defect. . . . . . . . . . . . . . . . . . 107
6.15 The overall flow of the algorithm. . . . . . . . . . . . . . . . . 111
6.16 Feasible region bloating. . . . . . . . . . . . . . . . . . . . . . 112
6.17 Approximation analysis of the placement algorithm. The
die in solid green line is the placement result of our algo-
rithm; the dies in dashed green line show the result of one
optimal placement solution; the dies in dashed red lines are
demos of invalid placement. . . . . . . . . . . . . . . . . . . . 114
6.18 Definition of the placement freedom. . . . . . . . . . . . . . . 115
6.19 The testing result of the algorithm. . . . . . . . . . . . . . . . 117
6.20 The number of successfully placed dies with respect to the
die size and the feasible region density. . . . . . . . . . . . . . 118
6.21 The design-blank matching problem for EUV mask vendors. . 119
6.22 Comparison between sequential matching and simultane-
ous matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.23 The max-flow formulation of the design-blank matching problem.121
6.24 The design-blank matching result provided by the max-
flow algorithm. The flow values on each edge are shown in
blue, and the valid pairings are shown in red. . . . . . . . . . . 122
6.25 The min-cost flow formulation of the compensation cost
minimization problem. . . . . . . . . . . . . . . . . . . . . . . 124
6.26 The precalculation results provided by the layout reloca-
tion algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.27 Blank utilization comparison between simultaneous match-
ing and sequential matching. . . . . . . . . . . . . . . . . . . . 126
6.28 Compensation cost comparison between simultaneous match-
ing and sequential matching. . . . . . . . . . . . . . . . . . . . 127
xiii
LIST OF ABBREVIATIONS
AOP Alphabet Optimization Problem
CD Critical Dimension
CEBL Complementary Electron Beam Lithography
CFP Cell Flipping Problem
CNF Conjunctive Normal Form
CPU Central Processing Unit
CSP Cell Switching Problem
DPL Double Patterning Lithography
DSA Directed Self-Assembly
DTCO Design-Technology Co-Optimization
D2D Drain-to-Drain
EBL Electron Beam Lithography
EDA Electronic Design Automation
ET Error Tolerance
EUVL Extreme Ultraviolet Lithography
FET Field-Effect Transistor
GB Gigabytes
GET Global Error Tolerance
GHz Gigahertz
HVP High Volume Production
xiv
IC Integrated Circuit
ILP Integer Linear Programming
IR Impact Region
LELE Litho-Etch-Litho-Etch
LI Local Interconnect
LWR Line Width Roughness
ML Multi-Layer
MPL Multiple Patterning Lithography
NGL Next Generation Lithography
NP Non-deterministic Polynomial
OPC Optical Proximity Correction
RAM Random-Access Memory
RET Resolution Enhancement Techniques
SADP Self-Aligned Double Patterning
SAT Satisfiability
SCP Set Covering Problem
SID Spacer-Is-Dielectric
SIR Shrunk Impact Region
SMO Source-Mask Optimization
SRAF Sub-Resolution Assist Features
TPL Triple Patterning Lithography
1D One Dimensional
2D Two Dimensional
193i 193 nm ArF Immersion Lithography
xv
CHAPTER 1
INTRODUCTION
1.1 Background and Motivation
As the size of device patterns keeps shrinking following Moore’s law, the
lithography process continues to be the backbone of integrated circuit (IC)
fabrication. The 193 nm ArF light has been used for many years, and the gap
between lithography wavelength and minimum feature size becomes bigger
and bigger, as illustrated in Fig. 1.1.
Figure 1.1: The size gap between feature size and lithography
wavelength. [1]
Since the 90 nm technology node, resolution enhancement techniques (RET)
such as optical proximity correction (OPC) and sub-resolution assist features
(SRAF) have been widely used in order to enable finer feature printing with
larger wavelength. However, 193 nm immersion lithography (193i) with sin-
gle exposure has finally reached its printability limit at the 28 nm technology
node. Below the 20 nm technology node, merely relying on RET is far from
1
enough to print IC designs with reasonable yield. In consequence, a lot of
advanced lithography and process techniques have been coming up to keep
up with Moore’s law, such as fin based multiple-gate field-effect (FinFET)
transistors, electron beam lithography (EBL), self-aligned double pattern-
ing (SADP) lithography, directed self-assembly (DSA), extreme ultraviolet
lithography (EUVL), etc. Each of the advanced lithography techniques has
its own advantages over others, but also faces great challenges due to dif-
ferent process limitations. FinFET transistors show great advantages over
traditional planar MOSFET transistors in high performance and low power
applications, but they also suffer from various challenges, such as high para-
sitic capacitance, high parasitic resistance and edge device degradation. EBL
is able to print extremely complicated and small features, but throughput
is a challenging problem. SADP has a great advantage in overlay-tolerance
and much lower line-width roughness (LWR). However, uniform space de-
position introduces artifacts at feature corners and edges, and compared to
litho-etch-litho-etch (LELE) double patterning lithography (DPL), the lay-
out decomposition for SADP is much more complicated. DSA lithography is
capable of patterning very small features with low cost and high throughput,
but the pitch size and line edge roughness (LER) are still very difficult to
control, and the design rules may be more restricted. EUVL is a very promis-
ing candidate for the 7 nm technology node and beyond with wavelength as
short as 13.5 nm; however, it still faces several critical challenges such as
powerful light source and controllable blank defects. In order to adopt the
advanced lithography technologies in IC fabrication, their bottlenecks must
be overcome first. Due to the physical limitations, it is extremely difficult to
break the bottlenecks by merely improving the fabrication processes. Instead,
design-technology co-optimization (DTCO) via electronic design automation
(EDA) software is a more effective way.
1.2 Overview of this Dissertation
In this dissertation, we study the most promising advanced lithography tech-
niques and their major challenges in the sub-20 nm technology nodes. Then
targeting solving the process bottlenecks, we propose various DTCO strate-
gies and develop efficient EDA algorithms to verify their effectivity.
2
In Chapter 2, we look into the edge device degradation problem for the
FinFET process in the 16 nm technology node. Dummy gates on device
edges help to release the fin stress, but introduce parasitic edge devices that
may potentially increase leakage power or even cause logic failures if not dealt
with carefully. Thanks to the local interconnect (LI) layer, a dummy gate
can be easily tied to power rails as long as it abuts a source node. Otherwise
if two adjacent cells have drain-to-drain (D2D) abutment, additional source
nodes must be inserted which takes more placement area. In Chapter 2, we
propose a placement optimization algorithm that properly flips a subset of
cells in a standard cell row and switches pairs of adjacent cells, such that
the number of D2D abutments is minimal. The proposed strategy saves
unnecessary source nodes for dummy gate power tying and minimizes the
placement area for the 16 nm FinFET technology. As far as we know, this
is the first work on detailed placement optimization for the FinFET process.
Our algorithm is able to handle cell flipping and cell switching simultaneously,
and optimal solutions can always be obtained in O(nlogn) time, where n
denotes the number of cells in a standard cell row. In addition, the proposed
graph model can be easily updated to minimize cell flippings/switchings and
consider more complicated design rules.
In order to achieve reasonable process window and yield for the advanced
technology nodes, the design style has gone through a complete change from
the complex 2D style to the extremely regular geometry – 1D style [3,4]. To
fabricate 1D gridded designs, usually unidirectional parallel lines are printed
first. Then a cut/trim step will be further applied to cut off the metal track to
achieve the circuit pattern and functionality. Since the dense line printing can
be fully optimized due to its regularity, the randomness of logic circuits will
mainly affect the cut pattern distribution and introduce additional difficulties
in the 1D gridded design. In the 16 nm node, the cut density is too high
to be printed with 193i single patterning or even double patterning. EBL
is capable of printing the very random and complex cut patterns, but the
throughput is too low. Targeting the challenges from cut pattern printing in
the 16 nm technology node, Chapter 3 proposes a hybrid approach that uses
both 193i and EBL for single cut layer printing. By properly redistributing
the cuts, we let 193i print the majority of the cuts, and leave the remaining
to be printed by EBL. By this means, the throughput of the entire process
can be improved. Furthermore, for some sparse layers, the EBL process can
3
be totally saved which dramatically reduces the fabrication cost.
SADP lithography is a leading candidate for the 10 nm node lower-metal
layer fabrication, and Spacer-Is-Dielectric (SID) is one popular flavor of
SADP with higher flexibility in design. During the process of spacer de-
position, the spacer shape always gets rounded at convex mandrel corners,
which may lead to residue artifacts if not dealt with properly. In Chapter 4,
we study the problem of residue artifacts in conventional SID decomposition,
and propose an enhanced SID decomposition flow with model-based verifica-
tion. Decomposition results are verified with a practical lithography model on
an industry design. On the other hand, compared to LELE DPL, the layout
decomposition for SID is much more complicated, and full-chip SID decom-
position has been proven as an NP-complete problem [5], suggesting that
SID-compliant design is a pre-requisite for successful decomposition. Chap-
ter 4 also studies the SID-compliant detailed routing problem, and proposes
a graph model that correctly captures the decomposition violations and SID
intrinsic residue issues. Then a negotiated congestion based routing scheme
is developed to resolve all conflicts. All conflict-free routing layers produced
by our detailed router have been verified as 100% SID decomposable.
In the 7 nm technology node, the contact/via pitch can be as small as 40
nm. How to print the highly dense contacts/vias is one of the major chal-
lenges for the 7 nm technology node IC fabrication. Recent research progress
on DSA has shown this technique’s strong potential for the contact/via layer
manufacturing [6–11]. To pattern contact holes with DSA process, guid-
ing templates are usually printed first with conventional lithography (e.g.,
193i) that has a coarser pitch resolution. Then the guiding templates will
determine the DSA patterns inside and these patterns have a finer resolu-
tion than the templates. The overlay accuracy of the contact holes as well
as the printability of templates may vary among different templates, and in
consequence, the cost of each guiding template shape is very different from
others. In Chapter 5, we study the DSA-aware contact/via layer optimiza-
tion problems in both standard cell level and full chip level. In the standard
cell level, given an arbitrary standard cell library, we simultaneously opti-
mize the layouts of every cell, such that the contact layer of any cell in the
library can be fully patterned by a set of guiding templates, and the total
cost of the templates is minimal. This optimization problem is first proved
to be NP-hard and formulated as a Weighted Partial Maximum Satisfiability
4
problem, which can be optimally solved with a public SAT solver. Then we
propose a bounded approximation algorithm that solves the problem much
more efficiently. In the full chip level, we propose a DSA-aware detail rout-
ing algorithm to optimize the via layers such that only feasible templates
are needed for via layer patterning. In addition, among all the feasible tem-
plates, the one with better overlay accuracy has higher priority to be picked
up by the router for via patterning, which further improves the yield. By en-
abling DSA process for via layer patterning in the 7 nm technology node, the
proposed detailed routing strategy tremendously reduces the manufacturing
cost and improves the throughput for IC fabrication.
EUVL is a leading candidate beyond the 7 nm technology node. However,
the technology is facing several challenges before the mass production. Be-
sides the difficulties in exotic light source setup and the tuning of the resist
for line edge roughness and sensitivity, the chip fabrication with defective
blanks remains a huge challenge. Although the defect density and size are
being reduced year by year, the current progress is far from sufficient [12,13].
To achieve the stringent defect-free blank requirement involves the collabora-
tion of many aspects such as high quality blank substrate material (low ther-
mal expansion material) fabrication, substrate polishing, substrate cleaning,
blank handling, multi-layer (ML) deposition, and high sensitivity substrate
and blank defect inspection, which may largely increase the EUVL cost of
ownership [14]. Instead, it is much more cost-efficient to allow a certain num-
ber of printable defects on the blank and mitigate their impact by covering
them with device patterns in later mask fabrication process. The device pat-
terns block the out-of-phase light from the defect such that the mask defect
will not impact the printing on wafer [15]. In Chapter 6, we first present
an efficient layout shifting algorithm that finds an optimal location to place
a single layout on a defective blank such that all defects are simultaneously
covered. However, in many cases, it is impossible to completely mitigate all
defect impact if multiple dies are tied and moved together; hence we further
explore the flexility of individual die shifting. Even with that, 100% suc-
cess rate in complete defect mitigation can never be guaranteed since this
also depends on the designs and defect maps. Targeting imperfect defect
mitigation between one pair of design and blank, we finally develop an opti-
mal design-blank matching strategy to match multiple designs and defective
blanks simultaneously.
5
CHAPTER 2
PLACEMENT OPTIMIZATION FOR
FINFET PROCESS
2.1 Introduction
In the sub-20 nm technology nodes, fin based multiple-gate field-effect tran-
sistors (FinFET) show great advantages over traditional planar MOSFET
transistors in high performance and low power applications [16, 17]. Unlike
a planar MOSFET, the FinFET employs a vertical fin-like structure pro-
truding from the substrate with the gate wrapping around the sides and top
of the fin, thereby producing transistors with low leakage currents and fast
switching performance. Major foundries are adopting FinFET technology
for advanced node fabrication [18].
Despite the excellent control of short channel effects [19,20], FinFETs also
suffer from various challenges, such as high parasitic capacitance, high par-
asitic resistance and edge device degradation [21]. Edge device degradation
was already observed with the planar process [22], and is even more severe
with the 3D fin structure. As illustrated in Fig. 2.1, the fin stress increased by
dummy gate removal leads to defect formation [23], and such defects may in-
duce high resistance or capacitance, which degrades the device performance.
In contrast, as long as the dummy gates are in place, the fin stress becomes
fairly uniform [24], indicating the necessity of keeping the dummy gates.
However, dummy gates introduce parasitic edge devices, which may po-
tentially increase leakage power or even cause logic failures if not dealt with
carefully. Figure 2.2 shows an example of two FinFET transistors abutting
each other. The parasitic transistor introduced by the shared dummy gate
and its schematic view are illustrated in Fig. 2.2(a) and Fig. 2.2(b) respec-
tively. If the dummy gate inside the red circle is left unconnected, there will
be large leakage between the drain node of the left transistor and the source
node of the right transistor. In the worst case, the left drain is directly con-
6
Gate
STI STI
Fin Fin
Diffusion Gate Dummy Gate M0 Fin
Fin Degradation
(a) A FinFET transistor with dummy 
gate on device edge.
(b) Dummy gate removal induced 
fin degradation .
Figure 2.1: The edge device degradation induced by dummy gate removal.
nected to the right source, resulting in logic failures. One straightforward
solution to this is tying such dummy gates to power rails; i.e., the dummy
gates of a PFET should be tied up to power supply and the dummy gates of
an NFET should be tied down to ground.
S D S D
(a) The layout view of the parasitic device.
(b) The schematic view of 
the parasitic device.
Figure 2.2: A parasitic transistor is introduced by two FinFETs abutting
each other.
In the 16 nm technology node circuit design, the local interconnect (LI)
layer (or metal 0 layer) is used to connect active nodes (i.e., source and drain),
and the direction of the LI patterns is perpendicular to the fins. Thanks to
the LI layer, a dummy gate can be easily tied to power rails as long as it
abuts a source node, as illustrated in Fig. 2.3(a). However, it is difficult to
route a dummy gate to a non-adjacent source node due to limited cell level
routing resources. As a result, during the standard cell placement, whenever
two drain nodes are placed abutting each other, the dummy gates of the two
individual cells cannot merge into one, and additional source nodes must be
inserted to tie the dummy gates to power rails, as illustrated in Fig. 2.3(b).
In the standard cell based design, all cells in a standard cell library are of
the same height and each cell is considered as a small block for higher level
placement and routing, where only the input/output pins are visible to the
7
S D DS
S D SDSS
(a) The dummy gate is tied to a power rail by 
connected  to an adjacent source node.
Metal 0
Power Rail
Power Rail
(b) Additional source nodes (shown in the red circle) must be inserted 
for drain-to-drain abutment.
Figure 2.3: A dummy gate must abut a source node in order to be tied to
power rails.
placer and router. By performing global placement and legalization, all cells
are packed into standard cell rows, each with thousands of cells. Based on the
pin locations and an input netlist, the global placer tries to optimize certain
performance objectives, such as timing, net congestion, etc. [25]. Since the
detailed layout information (e.g., the active node types at cell boundaries)
is usually hidden from the global placer, it is very challenging to consider
the source/drain abutment constraint during the global placement. Usu-
ally detailed placement is performed after global placement and legalization
are completed, where there is some flexibility flipping a cell horizontally or
switching the positions between two adjacent cells, which has little impact on
either timing status or net congestion. However, by properly flipping a subset
of cells on each standard cell row and switching pairs of adjacent cells, the
number of drain-to-drain (D2D) abutments can be minimized, which saves
the area of additional source nodes inserted for the purpose of dummy gate
power tying. Note that a D2D abutment exists between two adjacent cells
if either the P-diffusions or the N-diffusions have a D2D abutment situation.
Figure 2.4 shows a demo of the placement optimization. In Fig. 2.4(a), ad-
ditional columns of source nodes are needed between both pairs of adjacent
8
cells for dummy gate power tying. However, by horizontally flipping cell C
and switching cell B and cell C, no D2D abutment exists any more, and
consequently, no additional source nodes are needed for the optimized place-
ment shown in Fig. 2.4(b). By this means, the total length of the standard
cell row can be minimized.
D D
D D
D D
S D
S S
D S
A B C
D D
D D
D D
S D
SS
DS
A BC
(a) The original placement has D2D abutments 
between both pairs of adjacent cells.
(b) No D2D abutment exists in the optimized placement.
Cell Boundary P Diffusion N Diffusion
Figure 2.4: D2D abutments are removed via placement optimization. Note
that only the diffusion layers and the cell boundaries are displayed.
In this chapter, we propose a detailed placement optimization algorithm to
minimize the number of D2D abutments in a standard cell row, which saves
unnecessary source nodes for dummy gate power tying and minimizes the
placement area for the 16 nm FinFET technology. As far as we know, this
is the first work on detailed placement optimization for the FinFET process.
Our algorithm is able to handle cell flipping and cell switching simultane-
ously, and optimal solutions can always be obtained in O(nlogn) time, where
n denotes the number of cells in a standard cell row. The experimental
results show that every test case is completed within 0.1 second, verifying
9
the efficiency of our algorithm. In addition, the proposed graph model can
be easily updated to minimize cell flippings/ switchings and consider more
complicated design rules.
The rest of the chapter is organized as follows. The D2D abutment min-
imization problem is defined in Section 2.2. Section 2.3 solves the overall
optimization problem by solving its subproblems and combining the sub-
problem solutions. Then the experimental results are reported in Section 2.4.
Section 2.5 expands the proposed graph models and adapts it to other consid-
erations. Finally, Section 2.6 concludes the chapter. This work is published
in [26].
2.2 Problem Definition
In this section, we define the detailed placement optimization problem, where
we only consider cell flipping and adjacent cell switching as feasible operations
for detailed placement.
Definition 2.1. D2D Abutment Minimization Problem
Given a row of standard cells and the boundary node types (i.e., source
or drain) of the diffusion regions (i.e., N-diffusion and P-diffusion) in
each cell, horizontally flip a subset of cells and select pairs of adjacent
cells to switch their positions in the row, such that the total number
of D2D abutments between adjacent cells is minimized.
2.3 Problem Solution
In this section, we divide the D2D abutment minimization problem into two
subproblems. In the first subproblem, only cell flipping is allowed, and in the
second one, only adjacent cell switching is allowed. The graph models target-
ing each subproblem are introduced in Subsection 2.3.1 and Subsection 2.3.2
respectively. Then Subsection 2.3.3 integrates the two graph models into a
complete one to solve the overall problem.
10
2.3.1 Cell Flipping Problem
Definition 2.2. Cell Flipping Problem (CFP)
Given a row of standard cells and the boundary node types (i.e., source
or drain) of the diffusion regions (i.e., N-diffusion and P-diffusion) in
each cell, horizontally flip a subset of cells, such that the total number
of D2D abutments between adjacent cells is minimized.
In CFP, each cell has two candidate orientations in the horizontal direction.
If we exhaustively enumerate all possible combinations, the time complexity
will be O(2n), where n denotes the number of cells in the row. In a standard
cell design, there may be thousands of cells in each standard cell row, and
hence the exponentially increased runtime will be too slow to be acceptable
in practice. In fact, the orientation of each cell only impacts the abutment
conditions with adjacent cells. As a result, we only need to consider the
abutment combinations between each pair of adjacent cells. Based on the
above analysis, we construct a graph model to formulate the problem and
then solve the problem by performing the shortest path algorithm. An ex-
ample of five consecutive cells, the corresponding CFP graph model and the
optimization result are illustrated in Fig. 2.5.
The graph model is constructed as follows. For each cell ci, two nodes
are introduced in the graph, namely oi and fi, corresponding to the original
orientation and flipped orientation of ci respectively. For any pair of adjacent
cells ci and ci+1, four directed edges are introduced connecting from oi and fi
to oi+1 and fi+1, each assigned with a cost value. When the orientations of
two adjacent cells introduce a D2D abutment, the corresponding edge cost
is 1. Otherwise the edge cost is 0. For example, in Fig. 2.5(a), the original
c1 and the flipped c2 introduce a D2D abutment, so in Fig. 2.5(b) the cost of
the edge connecting from o1 to f2 is 1. Finally, an additional source node s
is introduced connected to the o1 and f1, and an additional target node t is
introduced connected from on and fn. After the graph model is constructed,
the shortest path from s to t automatically picks up the optimal orientations
for every cell in the row. In Fig. 2.5(b), the shortest path is marked in red, and
nodes f1, o2, o3, o4 and f5 are picked up by the path. Correspondingly, the
optimal orientations for c1, c2, c3, c4 and c5 are ‘flipped’, ‘original’, ‘original’,
‘original’ and ‘flipped’ respectively, as illustrated in Fig. 2.5(c). In addition,
11
S D
S D
D D
D D
S D
D D
1 2 3
o1
f1
o2
f2
o3
f3
s t
1 1
0 1
1
1
0
10
0
0
0
(a) The original placement of five cells.
(b) The CFP graph model and a shortest path from s to t.
S D
S S
D S
D S
4 5
o4
f4
o5
f5
1
0
0
0
0
0
0
1
D S
D S
D D
D D
S D
D D
1 2 3
(c) The optimized placement via cell flipping.
S D
S S
S D
S D
4 5
Figure 2.5: Solving the CFP problem by constructing the CFP graph model
and applying the shortest path algorithm on it.
the total cost of the shortest path shown in Fig. 2.5(b) is 1, and consequently,
there is only 1 D2D abutment in the optimal solution, as marked by the red
cross in Fig. 2.5(c).
2.3.2 Cell Switching Problem
Definition 2.3. Cell Switching Problem (CSP)
Given a row of standard cells and the boundary node types (i.e., source
or drain) of the diffusion regions (i.e., N-diffusion and P-diffusion) in
each cell, select pairs of adjacent cells to switch their positions in the
row, such that the total number of D2D abutments between adjacent
cells is minimized.
Fig. 2.6(a) demonstrates the CSP graph model for the same example shown
in Fig. 2.5(a), where both nodes and edges are assigned with cost values.
The graph model is constructed as follows. In the top row, a zero cost
12
0 0 01 1 0 010
0 1 0 0
c1 c2 c3 c4 c51
0
0 1
1 10
1
s t
0
0 0
0
c5,4c4,3c3,2c2,1
D D
D D
S D
S D
S D
S S
2 1 4
(b) The optimized placement via cell switching.
S D
D D
D S
D S
3 5
(a) The CSP graph model built on the same example shown in Fig. 5(a).
Figure 2.6: Solving the CSP problem by constructing the CSP graph model
and applying the shortest path algorithm on it.
node is introduced for each standard cell, denoted by c1 to c5 in Fig. 2.6(a).
Next, since each cell is allowed to switch position with its adjacent cells,
one additional node is introduced for each pair of switched cells, denoted by
c2,1, c3,2, c4,3 and c5,4 in Fig. 2.6(a). Node ci+1,i denotes that the positions
of cells ci and ci+1 are switched during the detailed placement. If such a
switching introduces a D2D abutment between ci and ci+1, node ci+1,i will
be assigned with cost 1. Otherwise it has 0 cost. For example, switching c2
and c3 introduces a D2D abutment between them, so the cost of node c3,2 is
1. The edges and their cost assignments are defined as follows.
• Each node ci(1 ≤ i ≤ n− 1) is connected to its adjacent node ci+1 by a
directed edge. If a D2D abutment exists between ci and ci+1, the edge
cost is 1. Otherwise the edge cost is 0.
• Each node ci(1 ≤ i ≤ n− 2) is connected to node ci+2,i+1 by a directed
edge. If a D2D abutment exists between ci and ci+2, the edge cost is 1.
Otherwise the edge cost is 0.
• Each node ci+1,i(1 ≤ i ≤ n− 2) is connected to node ci+2 by a directed
edge. If a D2D abutment exists between ci and ci+2, the edge cost 1.
Otherwise the edge cost is 0.
13
• Each node ci+1,i(1 ≤ i ≤ n − 3) is connected to node ci+3,i+2 by a
directed edge. If a D2D abutment exists between ci and ci+3, the edge
cost 1. Otherwise the edge cost is 0.
Finally, an additional source node s is introduced connected to c1 and
c2,1, and an additional target node t is introduced connected from cn and
cn,n−1. Both s and t and the edges connecting them have 0 cost. Similarly
as in CFP, along the shortest path from s to t, the subscripts of the selected
nodes provide the optimal sequence of the cells. For example, in Fig. 2.6(a),
the path in red is the shortest path between s and t. Correspondingly, the
optimal cell sequence is {c2, c1, c4, c3, c5}. The optimized placement result is
illustrated in Fig. 2.6(b). Again, the number of D2D abutments in Fig. 2.6(b)
is 1, which equals to the total cost of the shortest path.
2.3.3 Overall Problem Solution
In this subsection, the overall graph model for the D2D abutment minimiza-
tion problem is constructed by integrating the CFP graph model and the CSP
graph model. Fig. 2.7 demonstrates the overall graph model construction for
the example shown in Fig. 2.5(a).
In the first two rows of Fig. 2.7(a), node oi denotes the original orientation
of cell ci, and node fi denotes the flipped orientation of ci. Each node in the
first two rows has 0 cost. Then in the following four rows, each node denotes
a pair of switched cells with certain orientations. Node oi+1oi denotes that
both ci+1 and ci are in the original orientation; node oi+1fi denotes that
only ci is flipped; node fi+1oi denotes that only ci+1 is flipped; node fi+1fi
denotes that both cells are flipped. The node cost assignments for the last
four rows are similar as in CSP. Whenever a D2D abutment is introduced
between switched cells, the corresponding node cost is 1. Otherwise the node
cost is 0. The first two rows in Fig. 2.7 can be considered as split from the
first row in Fig. 2.6, the last four rows in Fig. 2.7 as split from the second
row in Fig. 2.6. Then for a node splitting, each edge connecting from/to it is
also split into multiple ones, which composes the edge set in Fig. 2.7. Similar
as in CFP and CSP, whenever two nodes connected by an edge introduce a
D2D abutment, the corresponding edge cost is 1. Otherwise the edge cost
is 0. Note that some D2D abutments in the original placement may have
14
D S
D S
D D
D D
S D
D D
1 2 3
(c) The optimal solution to place the five cells.
S D
S S
S D
S D
4 5
f1 f2 f3 f4 f5
o2o1
s t
o1 o2 o3 o4 o5
o2f1
f2o1
f2f1
o3o2
o3f2
f3o2
f3f2
o4o3
o4f3
f4o3
f4f3
o5o4
o5f4
f5o4
f5f4
 ...  ...
fi
oi
oi+2oi+1
oi+2fi+1
fi+2oi+1
fi+2fi+1
oi+3
fi+3
(a) The complete graph model.
(b) The interactive edges between the first two rows and the last four rows.
f1 o2 o4o3 f5s t
Figure 2.7: An overall graph model constructed based on the placement of
five standard cells. Note that only a subset of edges in the last four rows
are shown in (a), and the interactive edges between the first two rows and
the last four rows are illustrated in (b). Cost values are not displayed.
15
empty space between them, such that inserting dummy source nodes will not
introduce area penalty. In that situation, the corresponding edge cost is 0
instead of 1. Again, an additional source node s and an additional target
node t are introduced in the overall graph model, and the shortest path
between them provides the optimal sequence and orientations of all cells. In
this example, the shortest path and the corresponding optimal placement
solution is shown in Fig 2.7(c). By flipping c1 and c5 and switching c3 and
c4, no D2D abutment exists in the optimal solution.
2.3.4 Timing Analysis
Let n denote the number of cells in a standard cell row. Then the number of
nodes in the overall graph model is 6n−4, which is linear in n. Similarly, the
number of edges in the overall graph model is also linear in n. In the imple-
mentation, the graph model is constructed in linear time, and the shortest
path algorithm is implemented using Fibonacci heaps [27]. Therefore, the
entire time complexity of our algorithm is O(nlogn).
2.4 Experimental Results
We implement our algorithm in C++ on a Unix machine with 1.7GHz CPU
and 4GB RAM. Then we design a standard cell library with 42 cells for the
16 nm FinFET process. The benchmarks are generated by randomly placing
the standard cells in rows. We show the benefits of the proposed algorithm
by comparing lengths of the standard cell rows before and after placement
optimization. The experimental results are displayed in Table 2.1.
The first column of Table 2.1 shows the number of cells in a standard cell
row for each test case. The feasible placement operations are shown in the
second column. The following three columns illustrate the original cell row
length, the length after placement optimization and the saved length by the
optimization respectively. Finally, the last column shows the runtime of the
optimization algorithm.
As illustrated in the second column, for each test case, we compare two
types of optimizations with different feasible placement operations. In the
first one, only cell flipping is allowed, and in the second one, adjacent cell
16
Table 2.1: Experimental Results
]
Operations
Org. Len. Opt. Len. Saved Len. Runtime
Cells (mm) (mm) (mm) (ms)
10k
flip only 9.70 9.53 0.17 2
flip&switch 9.70 9.24 0.46 7
20k
flip only 19.43 19.07 0.36 8
flip&switch 19.43 18.50 0.93 20
40k
flip only 38.67 37.96 0.71 12
flip&switch 38.67 36.81 1.86 28
60k
flip only 58.28 57.21 1.07 14
flip&switch 58.28 55.51 2.77 44
80k
flip only 78.02 76.58 1.44 20
flip&switch 78.02 74.27 3.75 59
100k
flip only 96.88 95.12 1.76 23
flip&switch 96.88 92.24 4.64 75
switching is allowed as well. The comparison of the two sets of experimen-
tal results shows that adjacent cell switching makes a great contribution to
area saving. At least twice the area can be saved by allowing cell switching
than allowing cell flipping only. Totally around 5% of the chip area can be
saved by the proposed detailed placement optimization strategy. The last
column shows that every test case is completed within 0.1 second, verifying
the efficiency of our algorithm.
2.5 Problem Expanding and Discussions
Sometimes the proposed detailed placement optimization strategy may in-
troduce other problems such as net congestions and timing variations due to
too many cell flipping and switching operations. Designers may be willing
to pay certain area cost in order to resolve those net congestions and tim-
ing issues. In other words, to reduce the impact on the global placement
result, the number of cell flipping and switching operations should be min-
17
imized during the detailed placement optimization. On the other hand, for
the 16 nm FinFET process, more complicated design rules may need to be
taken into consideration in practice. In this section, we demonstrate that the
proposed graph model can be easily modified and adapted to an expanded
placement optimization problem.
2.5.1 Minimal Cell Flippings and Switchings
In the overall graph model shown in Fig. 2.7, there may be multiple shortest
paths between s and t with the same cost. However, one path may have fewer
cell flippings and switchings than another. As we have mentioned previously,
cell flipping and switching may impact circuit performance and introduce
net congestions. Thus, the shortest path with the minimal cell flippings and
switchings is preferred to others. On the other hand, sometimes too many
cells have to be flipped or switched in order to save very little area. Designers
may not want to make such sacrifice and prefer to pay the little area cost
instead. In order to balance the number of cell flippings/switchings and the
area saving, we update our graph model by introducing more cost terms: cd,
cf and cs, which denote the cost of a D2D abutment, a cell flipping and a cell
switching respectively. The three cost values capture the relative importance
among the cell operations and area saving. In the original graph model
shown in Fig. 2.7, whenever a node or edge introduces a D2D abutment, the
corresponding cost is 1. To update the graph model, we first replace each 1
value with cd. Next, if a node has one flipped cell (e.g., node fi, fi+1oi and
oi+1fi), the node cost is increased by cf . If a node has two flipped cells (e.g.,
node fi+1fi), the node cost is increased by 2× cf . Finally, each node in the
last four rows has its cost increased by cs since it denotes a cell switching. On
the updated graph model, the shortest path from s to t provides a balanced
solution with customized cd, cf and cs values.
2.5.2 Other Design Rule Considerations
In practice, the design rules of standard cell abutments for the 16 nm FinFET
process can be much more complicated than merely active node type (e.g.,
D2D) considerations. For example, usually there are certain minimum width
18
requirements for ‘U-shape’ and ‘stair-shape’ jogs on the diffusion layers, as
illustrated in Fig. 2.8(a) and Fig. 2.8(b) respectively. When designing a stan-
dard cell, such rules may not apply if its diffusion region does not have those
jogs. However, when abutting two standard cells during detailed placement,
the ‘U-shape’ and ‘stair-shape’ jogs are very likely to show up if adjacent
diffusion regions have different widths. Whenever the minimum width rules
are violated due to such cell abutment, a dummy diffusion region has to be
inserted in between, as illustrated in Fig. 2.8(c), where wj is less than wu. In
this situation, the dummy gates on device edges can be tied to power rails
through the dummy diffusion.
D
S
S D
Dummy
wu ws
wj
(a) Minimum width of a U-shape jog. (b) Minimum width of a stair-shape jog.
Diffusion Diffusion
(c) A dummy diffusion must be inserted if wj < wu.
Figure 2.8: The design rules for minimum jog widths.
Such dummy diffusions also result in area cost. To capture this in our graph
model, we take the length of the dummy diffusion as a cost term, namely ld.
Depending on the shape of the diffusion regions and the minimum jog width
requirements, the value of ld may vary among different cell abutments. At
the same time, the length of the inserted source nodes for a D2D abutment is
denoted by ls. Then an updated CFP graph model is illustrated in Fig. 2.9.
As illustrated in Figure 2.9(a), even though the original orientations of
cell 1 and cell 2 have source to drain abutments in both the P-diffusion and
the N-diffusion, the ‘U-shaped’ jog introduced by the cell abutment violates
the minimum jog rule. To resolve the violation, dummy diffusion has to
be inserted in between, and consequently, cost ld is assigned to the edge
connecting from o1 to o2. Similarly, cost ld is assigned to the edge connecting
19
D S
D S
D S
D S
Cell 1
(a) The placement of two adjacent cells. (b) The updated CFP graph model.
Cell 2
s
o1
f1
o2
f2
t
0
0
0
0
ld
0
ld
ls
Cell Boundary P Diffusion N Diffusion
Figure 2.9: The updated CFP graph model built on two adjacent cells.
o1 and f2 as well. On the other hand, abutting f1 and o2 introduces a
D2D abutment, and hence cost ls is assigned to the edge connecting them.
On the updated graph model, the shortest path from s to t provides the
optimal placement solution considering both the D2D abutment penalty and
the penalty of minimum jog width rule violations. Similarly, other design
rules involving area penalty may also be formulated in the proposed graph
model as additional cost terms.
2.6 Conclusion
This chapter proposes a standard cell based detailed placement optimization
strategy for the 16 nm FinFET process. By flipping a subset of cells in a
standard cell row and switching pairs of adjacent cells, the number of D2D
abutments between adjacent cell boundaries is optimally minimized, which
saves additional source node insertion and minimizes the placement area.
The benefits and the efficiency of the proposed algorithm are verified by the
experimental results. In the end, we also discussed the flexibility of updating
the proposed graph model to minimize the cell flipping/switching operations
and expanding it to consider practically more complicated design rules for
the 16 nm FinFET process.
20
CHAPTER 3
HYBRID LITHOGRAPHY OPTIMIZATION
FOR CUT LAYER PRINTING
3.1 Introduction
As the integrated circuit (IC) industry continues to shrink the technology
node into sub-20 nm, the conventional 193 nm ArF immersion optical lithog-
raphy (193i) with single exposure has reached its printability limit. Beyond
the evolution of the lithography process, the design style has gone through
a complete change from the complex 2D style to the extremely regular ge-
ometry – 1D style [3, 4]. Different technologies have shown the capability
of fabricating unidirectional parallel lines such as dense line printing with
source-mask optimization (SMO), interference [28], double patterning lithog-
raphy (DPL), and direct self-assembly (DSA) [29]. After the dense lines are
completed, a cut/trim step will need to be further applied to cut off the
metal track to achieve the circuit pattern and functionality. Therefore, since
the dense line printing can be fully optimized due to its regularity, the ran-
domness of logic circuits will mainly affect the cut pattern distribution and
introduce additional difficulties in the 1D gridded design.
Hybrid lithography is a novel concept introduced most recently, which is
to implement more than one types of lithography processes for a single layer
manufacturing. The candidate for hybrid lithography can be from a large
pool of processes (193i, EUV, EBL, DPL, DSA, etc.). Targeting the key
factors in the lithography, such as throughput, cost, yield and CD unifor-
mity, the success of hybrid lithography mainly relies on a proper pattern
assignment for different processes, to fully utilize the advantages and avoid
the disadvantages of those processes. For instance, while 193i with SMO
is suitable for the highly regular geometry patterns within resolution limit,
EBL has lower throughput but good for random complex patterns in a small
amount.
21
Facing the challenges from the cut pattern manufacturing, through hybrid
lithography, we can redistribute the cuts, let 193i single exposure process
print most of the cuts, and leave the remaining to be manufactured by EBL.
Since the cut number is equivalent to the number of E-beam shots, how to
maximize the cuts that can be printed by 193i and minimize the cuts for
EBL to increase the throughput becomes a challenging problem. Fig. 3.1
demonstrates how the hybrid process for cut printing can be improved after
cut redistribution. As shown in Fig. 3.1(b), since the cut patterns are too
close to each other, only a few of them can be printed by 193i; but after
well defined cut redistribution, most of them can be printed by 193i, and the
number of cuts left for EBL is minimized.
(a)
Real Wires
Dummy  Wires
193i Cuts
2 3
4
(b)
(c)
1
1
EBeam Cuts
Figure 3.1: The 1D design are fabricated by a combination of dense lines
and cuts. (a) shows the original design; (b) shows the default cut positions
where 4 cuts need EBL to print; (c) shows the cut positions after
redistribution, where only 1 cut needs EBL to print.
In this chapter, targeting 193i and EBL hybrid lithography for cut pattern-
ing, we optimally redistribute the cuts for 1D gridded designs to maximize the
cuts for 193i. To the best of our knowledge, this work is the first study on the
hybrid lithography process optimization for cut patterning from the design
perspective. By studying the limitation of the 193i process, we first define the
forbidden cut patterns that cannot be printed by 193i single exposure. Then,
we develop a cut redistribution algorithm to redistribute the cuts, such that
the number of cuts incompatible with 193i is minimal, and such cuts have to
be printed by EBL. By minimizing the number of cuts printed by EBL, we
22
can thus maximize the throughput of the overall process. Our experimental
result shows that the throughput can be greatly improved after cut redistri-
bution. For some sparse layers, the EBL process can be totally saved which
dramatically reduces the fabrication cost. Our cut redistribution algorithm
is able to handle large layout efficiently and guarantees optimal solution for
sparse layers.
The rest of the chapter is organized as follows. Section 3.2 provides an
overview of the proposed fabrication process. In Section 3.3, lithography
simulation is performed to find the safe isolation distances for any two cuts.
Based on these distances, Section 3.4 formulates the problem as an ILP prob-
lem and introduces an efficient iterative algorithm to solve large scale prob-
lems. Then Section 3.5 shows the experimental results. Finally, Section 3.6
concludes this chapter. This work is published in [30].
3.2 Process Overview
The proposed processing flow in this paper is shown in Fig. 3.2. The first
step is cut redistribution. As shown in Fig. 3.1, any given 1D gridded design
can be accomplished by dense lines, the cuts printed by 193i (defined as 193i
cuts), and a number of cuts printed by EBL (defined as E-Beam cuts). In
this step, the cuts are redistributed by our algorithm such that no forbidden
patterns exist in the 193i cuts and the number of E-Beam cuts is minimal.
The second step, dense line fabrication can be accomplished by a variety
of lithography techniques, such as LELE, SADP or DSA. Since the dense
line pattern is extremely regular, it is able to be created in high quality
without much difficulty. In the third step, 193i cut printing, the 193i cuts
are printed to cut the dense lines. In this step, the maximum number of
cuts are printed at once with a single 193i exposure, which greatly improves
the throughput. In the last step, E-Beam cut printing, the small portion of
remaining cuts are printed by E-Beam shots onto the dense lines. Some multi
E-Beam techniques [31–34] can be applied here to improve the throughput
even further. For some sparse layers, if all forbidden patterns are successfully
removed by cut redistribution, the EBL step can be saved.
In summary, instead of using EBL only to print the cuts in conventional
Complementary E-Beam Lithogrpahy (CEBL) [31,32], we introduce one ad-
23
Cut 
Redistribution
Dense Line 
Creation
193i Cut 
Printing
E-Beam Cut 
Printing
Original 1D 
Gridded Design
Real Wires
Dummy Wires
Dense Lines
193i Cuts
Ebeam Cuts
Figure 3.2: The flow of proposed manufacturing process. The original
design is shown in blue, dummy wires in yellow. The black lines denote the
dense lines; the red rectangles denote 193i cuts; the green rectangles denote
E-Beam cuts.
ditional 193i lithography process before the EBL process. By optimal cut
redistribution, the majority of the cuts are printed by 193i instead of EBL.
Even though the the proposed processing flow has higher cost because of
the additional litho step, considering the great improvement in throughput,
overall we still believe the additional 193i process is beneficial.
3.3 Lithography Simulation
The lithography simulation is performed on the cuts for 16 nm 1D gridded
designs. The width of the cuts is 32 nm and the height of the cuts is set to be
the pitch size, which is 48 nm for the 16 nm technology node [35]. Accord-
ing to [36], an annular optical source is selected to perform the lithography
simulation, with parameters tuned up for best printing.
In order to find the forbidden patterns, we simulate pairs of cuts with
different vertical and horizontal distances. In our simulation, the layout is
gridded in both vertical and horizontal directions. The vertical grid size is
24
equal to the cut height and the horizontal grid size is equal to the cut width.
The distance between two cuts is defined as the grid distance between their
bottom left corners. The test matrix of cut pairs is shown in Fig. 3.3.
(a) Defocus = 0 nm (b) Defocus = 50 nm
Figure 3.3: Lithography simulation to find forbidden patterns. The green
rectangles show target cuts and the red contours show printed shapes. (a)
shows the simulation result at the best focus, and (b) shows the simulation
result with 50 nm defocus. The forbidden patterns at different focus are
grouped inside the shadow regions.
Note that in the real layout, the distance between any two pairs of cuts is
far enough such that each forbidden pattern only involves one pair of cuts.
Here in Fig. 3.3, we group them together after the lithography simulation
is performed for better illustration. The pairs of cuts on each column have
the same vertical distance, while those on the same row have the same hor-
izontal distance. In Fig. 3.3(a), the layout is printed at the best focus and
in Fig. 3.3(b) it is printed with 50 nm defocus. According to the regions of
forbidden patterns in each matrix, two cuts with different vertical distances
require different minimum horizontal distances to avoid forbidden patterns.
We define the minimum required horizontal distance as safe distance. Un-
less two cuts connect each other either vertically or horizontally to create a
bigger cut, they must be at least a safe distance away in order for qualified
printing. Based on the simulation result, the safe distances for any pair of
cuts are shown in Table 3.1. If the vertical distance between two cuts is
bigger than or equal to 3, they will not affect each other when printed. Note
25
that for the two cuts whose vertical distance is 2, they are allowed to be lined
up vertically only if there exists another cut between them with the same
horizontal coordinate, such that the three cuts are connected vertically to
make a bigger one. Otherwise they have to be at least a safe distance away.
Table 3.1: Safe Distances for Two Cuts with Different Vertical Distances
Vertical Distance (vertical grid) Safe Distance (horizontal grid)
0 4
1 3
2 2
≥ 3 0
3.4 Problem Formulation
Based on the lithography simulation, for any two cuts whose vertical distance
is less than 3, there exits a safe distance to avoid forbidden patterns. With-
out any further restrictions on 1D gridded design rules, forbidden patterns
might appear anywhere on the layout. If the cuts’ positions are not allowed
to be changed, many of them will need EBL to print. However, for most 1D
designs, proper wire end extension is allowed since it does not change cir-
cuit’s logic connections, except for some performance impact which is almost
negligible [37]. By proper wire end extension, the cuts can be redistributed
to remove most forbidden patterns.
In our problem formulation, wire end extension is allowed as long as the
logic connections are not changed. If no forbidden patterns exist after wire
end extension, all the cuts can be fabricated by 193i only and the EBL
process is completely saved. Otherwise, a minimum number of cuts should
be removed from the 193i cut mask in order to avoid forbidden patterns.
And those removed cuts will require EBL to print after the 193i process is
performed. If multiple solutions with the same number of E-Beam cuts exist,
the one with less wire extension is preferred because wire extension might
still affect the circuit performance slightly.
26
Definition 3.1. E-Beam Cut Minimization Problem
Given a 1D layout with n gaps, where each gap is defined by two cuts
on its two ends, cl and cr, and each cut can move within the gap
it defines. The objective is to remove the minimum number of cuts
from the layout (print by EBL), and find the target locations for the
remaining ones (print by 193i) with the least total wire extension,
such that no forbidden pattern exists.
In this chapter, we first formulate the problem as an integer linear pro-
gramming (ILP) problem. Then we develop an iterative algorithm which is
able to solve large layouts efficiently without losing much optimality.
3.4.1 ILP Formulation
From the problem definition we can see that the primary objective is to
minimize the number of cuts that should be printed by EBL. The secondary
objective is to minimize the total amount of wire extension which is equivalent
to the total cuts’ moving distance from the original wire end positions. The
basic constraint is that all cuts should be located within their moving regions,
which are the gaps they defined. For two cuts whose vertical distance is less
than 3, neighborhood constraints should be enforced in order to to avoid
forbidden patterns. However, it is not necessary to consider every pair of
cuts within 3 adjacent tracks. If two gaps are already far apart from each
other, the cuts on them will never form forbidden patters no matter how they
are moved. Hence when applying neighborhood constraints, we only need to
consider the overlapping gaps that might produce forbidden patterns. The
overlapping gaps are defined as follows.
Definition 3.2. Overlapping gaps:
For two gaps p ([pl, pr]) and q ([ql, qr]) within 3 adjacent vertical grids,
suppose the safe distance for the cuts on p and q is ds, then p and q
are overlapping gaps if and only if interval [pl, pr] vertically overlaps
with interval [ql − (ds − 1), qr + (ds − 1)].
According to Definition 3.2, any two cuts belonging to two overlapping
gaps should be assigned with neighborhood constraints. For two cuts on the
27
same track, if they belong to the same gap, they are allowed to be adjacent or
merged into one single cut. Otherwise, they must be at least a safe distance
away. For two cuts with vertical distance equal to 1, they should be either
aligned up to make a bigger cut vertically, or at least a safe distance apart
from each other. For two cuts with vertical distance equal to 2, they are
allowed to be aligned up only if there exist another cut on the track between
them with the same horizontal coordinate, which connects the two cuts to
make a bigger cut. Otherwise they have to be at least a safe distance away
in horizontal direction.
The objective and the basic constraints can be easily converted to ILP for-
mat with binary slack variables. But those neighborhood constraints involve
logic computations which can only be converted to non-linear constraints.
However, since the safe distances are all small integers, the logic constraints
can be converted to several dis-equality constraints first, and then each dis-
equality constraint can be converted to ILP constraints with the help of bi-
nary numbers [38]. The problem with dis-equality constraints is formulated
as follows.
Objective:
Minimize M ∗
4n∑
i=1
Si +
n∑
i=1
X2i −
n∑
i=1
X2i−1
where n is the total number of gaps, M is a big number, Si are binary slack
variables, and X2i and X2i−1 denote the x coordinates for the left and right
cuts of gap i respectively.
Basic constraints:
X2i − S4i ∗M < ri, (1 ≤ i ≤ n)
X2i + S4i−1 ∗M ≥ li, (1 ≤ i ≤ n)
X2i−1 − S4i−2 ∗M < ri, (1 ≤ i ≤ n)
X2i−1 + S4i−3 ∗M ≥ li, (1 ≤ i ≤ n)
where li and ri denote the x coordinates for the left and right ends of gap i
respectively.
Neighborhood constraints on overlapping gaps can be written in the
format of dis-equality constraints:
28
N1. For two cuts Ci and Cj on the same track, where 1 ≤ i, j ≤ 2n:Xi 6= Xj ± 2Xi 6= Xj ± 3
N2. For two cuts Ci and Cj whose vertical distance is 1, where 1 ≤ i, j ≤ 2n:Xi 6= Xj ± 1Xi 6= Xj ± 2
N3. For two cuts Ci and Cj whose vertical distance is 2 and another cut Ck
that is vertically between Ci and Cj, where 1 ≤ i, j, k ≤ 2n:Xi 6= Xj ± 1Xi 6= Xk → Xi 6= Xj
According to the objective and basic constraints, we introduce two binary
slack variables for any cut c. If c moves out of its gap region, either of the
slack variables is set to be 1 and it requires EBL to print, otherwise both
of them are 0. To minimize the number of E-Beam cuts is equivalent to
minimizing the summation of all slack variables. The secondary objective
is to minimize the total moving distance. In the objective function, it is
converted as minimizing the summation of the x coordinates for all left cuts
while maximizing the summation of the x coordinates for all right cuts. A big
coefficient M is assigned to the primary objective to ensure that the number
of E-Beam cuts is minimized primarily. Those neighborhood constraints in
dis-equality format can be converted to ILP format in the following way.
The dis-equality constraint X 6= Y can be converted to the following three
ILP constraints:
X +B1 ∗M − Y ≤M − 1
Y +B2 ∗M −X ≤M − 1
B1 +B2 = 1
where B1 and B2 are additional binary variables and M is a big number.
The conditional dis-equality constraint X 6= Z → X 6= Y can be converted
29
to the following five ILP constraints:
X +B1 ∗M − Y ≤M − 1
Y +B2 ∗M −X ≤M − 1
X − Z +B3 ∗M ≤M
Z −X +B4 ∗M ≤M
B1 +B2 +B3 +B4 = 2
where B1, B2, B3 and B4 are additional binary variables and M is a big
number.
Up to now, all the neighborhood constraints are converted as ILP con-
straints and thus the problem has been formulated as an ILP problem. After
the problem formulation, we call a commercial solver [39] to solve this prob-
lem. The solver is able to return an optimal solution with tolerance less than
10−4. However, the runtime increases exponentially with the layout scale
and hence it is not able to handle very large layout. In next subsection, we
propose an efficient iterative algorithm which is able to solve layouts in any
scale efficiently without losing much optimality.
3.4.2 Iterative Algorithm
The main idea of the iterative algorithm is to decompose a large layout into
smaller pieces and find the optimal solution for each piece in one iteration.
Then traverse through the entire layout iteratively until the solutions con-
verge. In our algorithm, a few tracks are selected as target tracks in each
iteration and the remaining tracks are kept unchanged, then the ILP solver
is called to optimally find the cuts’ locations on the target tracks. After that,
another few tracks are selected as target tracks and solved optimally by the
solver. We keep doing this iteratively until the entire layout is processed.
Suppose the layout has n tracks in total, at each iteration, m tracks are
selected where the cuts are allowed to be moved, and the maximum allowed
iteration rounds is r. Then at each round, the first track of each iteration
starts from track 0 to track n−m. When dealing with target tracks in each
iteration, two more tracks that are either above or below target tracks should
also be taken into consideration because they might also produce forbidden
30
patterns together with the target tracks. Therefore in each iteration, the
ILP constraints are built on at most (m+ 4) tracks, and only the forbidden
patterns involving the m target tracks will be taken into consideration. Based
on that analysis, the maximum runtime can be computed by the following
equation:
ttotal = titer ∗ (n−m+ 1) ∗ r
where titer is the time required for the solver to find the optimal solution for
one iteration, which can be considered as a constant since only a few tracks
are selected at each iteration. From this equation we can see that the total
runtime has a linear relationship with the number of tracks. The detailed
algorithm is shown in Algorithm 1.
Algorithm 1: Iterative ILP Algorithm
1: Initialization: lastEBeamShots←∞
2: for curRound← 0 to maxRound do
3: for curTrack ← 0 to n−m do
4: startTrack ← max(0, curTrack − 2)
5: endTrack ← min(n− 1, curTrack +m+ 1)
6: cutSet ← all cuts between startTrack and endTrack
7: for each cut ci ∈ cutSet do
8: if ci is on target tracks then
9: Put ci into objective function
10: Build basic constraints for ci
11: end if
12: end for
13: for each two cuts ci, cj ∈ cutSet do
14: if ci and cj can form a forbidden pattern then
15: if ci is on target tracks OR cj is on target tracks then
16: Build neighborhood constraints for ci and cj
17: end if
18: end if
19: end for
20: Call ILP solver
21: end for
22: curEBeamShots← Count E-Beam shots in current round
23: if curEBeamShots = lastEBeamShots then
24: break
25: else
26: lastEBeamShots← curEBeamShots
27: end if
28: end for
29: Done
31
3.5 Experimental Results
In 1D standard cell design, the 1D cells in the cell library are with same
height and placed along cell tracks in the layout. Since the cell tracks are
isolated from each other by the power rails in between, in the experiment,
we only focus on finding solutions for each cell track. In our 1D cell library,
the height of each cell is 14 grids, meaning that there are 14 locations in the
vertical direction to place cuts. The cells in the library are placed adjacent
to each other in horizontal direction to make cell tracks. In the cell layout,
wire tracks on Poly and Metal 1 layer are perpendicular to the cell track
direction, and Metal 2 wires are in parallel with cell tracks. Since on each
cell track, the maximum wire track number for Metal 2 is limited to 7 and
the gaps on Metal 2 are distributed very sparsely, the Metal 2 layer can
be solved efficiently by the optimal ILP algorithm. However, for Poly and
Metal 1 layer, there can be thousands of wire tracks on one cell track, and
in each wire track, there are 14 locations to place cuts. Thus, the iterative
algorithm is needed to find a solution for Poly and Metal 1 layers of long cell
tracks. In this section, we first compare the two algorithms by simulating
the Metal 1 layer of a small layout; then apply the optimal ILP algorithm
and the iterative algorithm to find solutions for Metal 2 and Metal 1 layers
respectively on a large layout.
The comparison results for the two algorithms are shown in Table 3.2.
Table 3.2: Algorithm Comparison
ILP Itr.
Track # Org. Shot # Shot # Runtime (s) Shot # Runtime (s)
50 54 3 28 5 60
100 98 10 538 11 71
150 271 10 2233 14 268
200 211 16 1939 19 228
250 299 18 19460 18 281
300 305 17 32260 18 257
In the first row where there are only 50 tracks on the layout, the ILP
algorithm takes less time than the iterative algorithm. This is because the
layout scale is so small that the ILP algorithm can terminate within a very
32
short time even though it is exponential to the number of tracks, but the
iterative algorithm might still takes several rounds to converge. However,
for all other cases, the runtime of the iterative algorithm is much less than
the optimal ILP algorithm. And at the same time, the iterative algorithm
does not lose much in optimality. After this comparison, the two algorithms
are applied to a large layout with the size of 100 µm by 100 µm. For the
iterative algorithm, two parameters (maximum allowed iteration rounds r
and the number of tracks in each iteration t) can be tuned up to balance the
optimality and the runtime. Table 3.3 shows the experimental results.
Table 3.3: Experimental Results on Large Layout
Layer Method
Max Tracks Org. Final
Runtime(s)
Rounds per Itr. Shot # Shot #
M1 Itr. 1 3 236967 69547 17983
M1 Itr. 3 3 236967 43068 45690
M1 Itr. 1 7 236967 24717 23096
M1 Itr. 3 7 236967 15839 51034
M1 Itr. 1 10 236967 17841 31253
M1 Itr. 3 10 236967 14985 69790
M2 ILP NA NA 14157 0 8437
For dense layers such as Metal 1, the iterative algorithm is required to
solve large layout. With more wire tracks at each iteration and more itera-
tion rounds, the optimality is improved, but the runtime is increased at the
same time. In order to achieve a proper tradeoff between optimality and ef-
ficiency, the parameters should be set up carefully. With proper tuneup, the
iterative algorithm is able to achieve 15X reduction in the number of E-Beam
shots with reasonable runtime, which dramatically improves the throughput.
For sparser layers such as Metal 2, the ILP algorithm is able to find the
optimal solution efficiently with zero E-Beam shots. Thus, the EBL process
is completely saved for Metal 2 fabrication, which tremendously saves the
fabrication cost.
In addition, both algorithms can be parallelized by dummy track insertion.
As observed from lithography simulation, if the cuts are more than two wire
tracks away from each other, they will not produce a forbidden pattern.
Thus, by piecewise inserting two dummy wire tracks, the large cell track can
33
be divided into isolated pieces, and all pieces can be optimized in parallel
which could further reduce the runtime.
After the cuts are redistributed by our algorithm, we perform lithography
simulation on the layout to verify the imaging quality improvement by cut
redistribution. The simulation results on the cuts before and after redistri-
bution are show in Fig. 3.4. By comparing the lithography simulation results
(a) (b)
Figure 3.4: The lithography simulation results before (a) and after (b) cut
redistribution. The real wires are shown in blue, dummy wires in yellow.
The green rectangles show the ideal cut shape and the red contours show
the printed images.
we can see that the printing quality has been highly improved by our cut
redistribution algorithm.
3.6 Conclusion
This chapter proposes a hybrid lithography process for advanced 1D grid-
ded design, which involves 193i and EBL processes. By applying our cut
redistribution algorithm to the original layout, the number of E-Beam shots
is minimized, which greatly improves the throughput. The ILP formulation
finds optimal solutions for sparse layers, and the iterative algorithm is able
to solve any size of dense layers efficiently. The experimental results show
that for dense layers such as Metal 1, our methodology is able to achieve
15X improvement for the throughput of EBL, and for sparser layers such as
Metal 2, the EBL process can be totally saved, which tremendously reduce
the manufacturing cost.
34
CHAPTER 4
SADP DECOMPOSITION AND
SADP-AWARE DETAILED ROUTING
4.1 Introduction
Self-aligned double patterning (SADP) lithography is a leading candidate
for the 10 nm node lower-metal layer fabrication. Compared to Litho-Etch-
Litho-Etch (LELE) double patterning lithography (DPL), SADP has a great
advantage in overlay tolerance and much lower line-width roughness (LWR).
Spacer-Is-Dielectric (SID) is one popular flavor of SADP with higher flex-
ibility in design. Many works have been done to study the decomposition
for SID process [40–43]. Similar to other DPL, in SID the target layout is
also decomposed into two masks – mandrel mask and trim mask. In the first
lithography step of the SID process, one group of target patterns together
with some assistant patterns are printed by the mandrel mask. Then spacer
is uniformly deposited surrounding each mandrel pattern. Next, mandrel
patterns are removed and only spacer is left on wafer. Finally, the trim mask
is applied in the second lithography step to define the final patterns. The
Boolean equation (4.1) characterizes the SID decomposition from a geometric
perspective [41].
Target = Trim ∧ (¬Spacer) (4.1)
Equation (4.1) shows that the final patterns on wafer are generated in the
area which is covered by the trim mask but not covered by spacer. Fig. 4.1
shows a valid SID decomposition of a toy layout.
From Fig. 4.1 we can observe that the patterns in the mandrel mask or trim
mask are not always directly from the target layout. The mandrels directly
from the target patterns are defined as main mandrels. The remaining
patterns on the mandrel mask are defined as additional mandrels. The
35
(a) Target layout. (b) Decomposition result.
Target Pattern
Main Mandrel
Sub-Metal
Additional Mandrel
Spacer
Trim Mask
Figure 4.1: An example of SID decomposition.
target patterns which do not exist on the mandrel mask are defined as sub-
metals.
Lacking a proper verification model, all the previous works have a common
assumption that the printed contour of the mandrel/trim mask and the spacer
shape are rectilinear, and the decomposition results are merely verified by
Boolean operations on rectilinear polygons. In fact, even with perfect optical
proximity correction (OPC) for the mandrel mask, the spacer shape still gets
rounded at convex mandrel corners due to uniform deposition. The intrinsic
corner rounding property of spacer may result in severe residue artifacts on
device patterns, which is not identifiable without a proper verification model.
In this chapter, we first discuss the potential residue artifacts generated by
the conventional SID decomposition algorithms and classify them as three
types. Those artifacts severely impact the critical dimension (CD) uniformity
of the device patterns and may lead to problems in circuit performance. Then
three enhancement strategies in SID decomposition are introduced, in order
to remove the three types of artifacts respectively. Based on the enhancement
strategies, we propose the complete SID decomposition flow with model-
based verification, and then verify the final decomposition results with a real
lithography model on a industry design. Simulation results show that the
residue artifacts are removed effectively by the enhanced SID decomposition
flow.
However, it is extremely difficult to merely rely on decomposition for com-
plete residue artifact avoidance, and full chip SID decomposition has been
proved to be an NP-complete problem [5]. In addition, Zhang [41] reports
that only a limited number of cells from a large cell library can be directly
36
decomposed for SID, suggesting that SID-compliant design is a pre-requisite
for successful decomposition. [2] examines the challenges for SID-compliant
design, such as forbidden spacing, anti-parallel line-ends and residue issues
arising from contour simulation. [44] is the first work adopting SADP-based
guidelines in detailed routing, but the decomposability of the routing layers
cannot be guaranteed. [45] claims that the routing and decomposition are
solved simultaneously by the proposed detailed routing algorithm. However,
it fails to consider some major challenges faced by the SID process, such
as anti-parallel line-ends conflicts and residue problems, and decomposition
violations also exist on conflict-free layouts produced by the router.
This chapter also studies the SID-compliant detailed routing problem, and
proposes a graph model that correctly captures the decomposition violations
and SID intrinsic residue issues. Then a negotiated congestion based routing
scheme is developed to resolve all conflicts. Since the SID process is more
adoptable to the largely unidirectional routing layers, we assume that SID-
compliant detailed routing is applied to Metal 2 or higher layers where each
layer has a preferred direction. Depending on the design of Metal 1, one
input/output pin may have multiple available locations, providing more flex-
ibility for detailed routing on higher layers. Such flexibility is also considered
in the proposed routing scheme.
The rest of this chapter is organized as follows. The issue of spacer corner
rounding is illustrated in Section 4.2. Then Section 4.3 shows the residue
artifacts caused by spacer corner rounding, and the enhancements in SID
decomposition for residue artifact removal are demonstrated in Section 4.4.
Then the enhanced SID decomposition flow is proposed in Section 4.5. Next,
Section 4.6 examines the challenges for SID-compliant detailed routing. The
SID-compliant detailed routing problem is formulated in Section 4.7. Sec-
tion 4.8 presents the graph model and our negotiated congestion based rout-
ing scheme to solve the problem. Section 4.9 shows the experimental results,
and finally Section 4.10 concludes this chapter. The related work is published
in [46] and [47].
37
4.2 Spacer Corner Rounding by Uniform Deposition
As mentioned in Section 4.1, in the SID process, after the mandrel mask
is applied in the first lithography step, spacer is uniformly deposited sur-
rounding each mandrel pattern. Mathematically, the uniform deposition is
equivalent to biasing the mandrel contour outside by a constant value. Sup-
pose OPC and the process conditions for the mandrel mask are perfect such
that the printed contour on wafer is exactly identical to the design on mask.
Then through constant biasing, the horizontal or vertical edges of the spacer
contour can be simply determined by shifting the mandrel edges outside.
However, the corners may have different behaviors resulting from constant
biasing. The corners on the mandrel mask can be classified as two types
– convex corners and concave corners. The spacer shape varies differently
through the constant biasing operation at the two types of corners, as illus-
trated in Fig. 4.2.
(a) Spacer gets rounded at 
convex mandrel corners.
(b) Spacer keeps the rectilinear 
shape at concave mandrel corners.
(c) Spacer gets sharpened at 
concave mandrel corners.
Figure 4.2: The spacer shape varies differently at the two types of corners.
Figure 4.2(a) shows that the spacer shape is rounded at the convex man-
drel corners even with perfect OPC and process for the mandrel mask. In
reality, the mandrel contour on wafer can never be perfectly rectilinear at
the corners. Instead, the corners will be rounded due to the limited band-
width of the lithography process, which makes the spacer corner rounding
even more adverse. On the contrary, the spacer contour is able to keep the
rectilinear shape at concave mandrel corners, as illustrated in Fig. 4.2(b).
This is because the horizontal and vertical edges at concave corners cross
each other during constant biasing, generating a perfect right angle at the
spacer contour. In fact, even though the mandrel corners cannot be printed
perfectly due to the limitation of the lithography process, the uniform spacer
deposition automatically sharpens the concave mandrel corners and generates
38
rectilinear corners on the inner spacer contour, as illustrated in Fig. 4.2(c).
Spacer’s intrinsic corner rounding property at convex mandrel corners may
lead to residue artifacts on device patterns, which should be considered dur-
ing SID decomposition.
4.3 Residue Artifacts by Spacer Corner Rounding
In this section, we will introduce the main residue artifacts caused by the
intrinsic spacer corner rounding property, and classify them as three types.
4.3.1 Type 1 reside artifacts at concave sub-metal corners
In the SID process, a subset of the device patterns are defined by mandrel,
the remaining ones by sub-metal. If a 2D pattern is defined by sub-metal,
its concave corners are usually adjacent to convex mandrel corners. Ideally,
the space between the corners should be filled with spacer in order to com-
pletely separate the mandrel pattern from the sub-metal pattern. However,
as mentioned in Section 4.2, the spacer shape gets rounded at convex man-
drel corners, leaving some empty space unfilled with spacer. According to
equation (4.1), the empty space will be printed on wafer if covered by a trim
mask, leaving residue artifacts on the sub-metal patterns.
(a) Target patterns. (b) Decomposition result. (c) Contour simulation. 
B
A
Mandrel
Trim
Figure 4.3: Type 1 artifact.
Figure 4.3(a) shows a pair of target patterns with spacing equal to the
spacer width, and Fig. 4.3(b) gives one valid decomposition solution where
pattern B is defined by mandrel and pattern A is defined by sub-metal. From
the contour simulation in Fig. 4.3(c) we can see that undesired residue is left
at the concave corner of pattern A. Such residue at the concave sub-metal
corners is classified as type 1 artifacts.
39
4.3.2 Type 2 reside artifacts at long sub-metal edges
Figure 4.4 demonstrates the type 2 artifacts. Figure 4.4(a) shows three 1D
target patterns with different wire length. The spacing between two adjacent
patterns is equal to the spacer width. Figure 4.4(b) gives a valid decomposi-
tion solution where patterns A and C are defined by mandrel and pattern B is
defined by sub-metal. As illustrated by the contour simulation in Fig. 4.4(c),
some “spur” shaped residue is left at the long edges of pattern B, due to the
spacer corner rounding of patterns A and C. Such residue at long sub-metal
edges is classified as type 2 artifacts.
(a) Target patterns. (b) Decomposition result. (c) Contour simulation.
A
B
C
Mandrel
Trim
Figure 4.4: Type 2 artifact.
4.3.3 Type 3 reside artifacts due to spacer merging
When the spacing between two adjacent mandrel patterns is less than or
equal to twice the spacer width, the spacer surrounding the two mandrel
patterns will merge together. However, due to the corner rounding property,
the spacer at convex mandrel corners cannot merge seamlessly, leaving some
empty space which becomes residue artifacts after the trim mask is applied,
as illustrated in Fig. 4.5.
(a) Target patterns. (b) Decomposition result. (c) Contour simulation.
A
B
C
Mandrel
Trim
Figure 4.5: Type 3 artifact.
40
Figure 4.5(a) shows the target pattern, and Fig. 4.5(b) gives a valid de-
composition solution where patterns A and B are defined by mandrel and
pattern C is defined by sub-metal. The spur-shaped residue is illustrated by
the contour simulation in Fig. 4.5(c). In this example, the spacing between
pattern A and pattern B is exactly twice the spacer width. The spacer at
the lower-right corner of pattern A and the upper-right corner of pattern B
does not merge seamlessly, leaving some empty space that is not filled with
spacer. Since the trim mask covers the entire patterns as well as the space
between them, the empty space becomes a residue artifact printed on pattern
C. Such residue generated by spacer merging is classified as type 3 artifacts.
4.4 Enhancements in SID Decomposition for Residue
Artifact Removal
In this section, we will introduce three enhancement strategies in SID decom-
position, in order to remove the three types of residue artifacts mentioned in
the previous section respectively. The type 1 artifacts should be considered
during the step of decomposition, while the type 2 and type 3 artifacts can
be removed by post-processing after decomposition is done.
4.4.1 Minimize type 1 artifacts by mandrel reassignment
Similar to LELE decomposition, the SID decomposition can be formulated
as a two-coloring problem [48], and there are usually multiple valid color-
ing solutions for an SID decomposable layout. From Fig. 4.3 we can see
that whenever a 2D pattern is defined by sub-metal, there will be undesired
residue left at its concave corners. Therefore, in order to minimize the type 1
artifacts, the target patterns with more concave corners should have higher
priority to be defined by mandrel during the step of coloring.
Figure 4.6 shows an alternative decomposition solution for the layout in
Fig. 4.3(a), which is obviously valid as well. As illustrated in Fig. 4.6(b),
pattern A is defined by mandrel and pattern B is defined by sub-metal in
the enhanced decomposition solution. Comparing Fig. 4.6(c) with Fig. 4.3(c)
we can easily observe that the previous type 1 artifact at the concave corner of
pattern A has been successfully removed by the enhanced SID decomposition.
41
(a) Target patterns.
B
A Mandrel
Trim
(b) Enhanced decomposition result. (c) Contour simulation.
Figure 4.6: Enhanced SID decomposition to minimize type 1 artifacts.
In the meantime, the adjacent convex corner at pattern B is automatically
sharpened with the help of the uniform spacer deposition, as explained in
Fig. 4.2(c).
4.4.2 Remove type 2 artifacts by mandrel extension
The type 2 artifacts are produced because the mandrel pattern is shorter
than the adjacent sub-metal pattern. So intuitively, this type of artifacts
can be removed by simply extending the mandrel patterns, as illustrated in
Fig. 4.7.
(a) Target patterns. (b) Enhanced decomposition result. (c) Contour simulation.
A
B
C
Mandrel
Trim
Figure 4.7: Enhanced SID decomposition to remove type 2 artifacts. The
overlay-tolerance capability is sacrificed at the circled edges.
Compared to Fig. 4.4(c), Fig. 4.7(c) shows a much clearer contour of the
target patterns with the enhanced SID decomposition. However, there is a
tradeoff between type 2 artifact removal and overlay-tolerance capability. In
Fig. 4.4(b), the line ends of the mandrel patterns are protected by spacer,
but in Fig. 4.7(b), they are directly defined by the trim mask. Therefore, the
overlay-tolerance at those edges has been sacrificed in order to remove the
type 2 artifacts.
42
4.4.3 Remove type 3 artifacts by mandrel merging
The type 3 artifacts are generated because the spacer at adjacent mandrel
corners cannot merge seamlessly due to the corner rounding property. One
effective way to remove such artifacts is to merge the adjacent mandrel pat-
terns directly into a single pattern, such that there is no spacer at adjacent
mandrel corners merging with each other, as illustrated in Fig. 4.8. Note that
the additional mandrel introduced for the merging purpose can never be cov-
ered by the trim mask. Otherwise an undesired pattern will be generated
according to equation (4.1), which connects two adjacent wires and causes
logic failure. Therefore, the trim mask has to be modified accordingly, as il-
lustrated in Fig. 4.8(b). Similar to the enhancement strategy targeting type
2 artifact removal, the overlay-tolerance is sacrificed at the circled edges in
Fig. 4.8(b), in order to remove the type 3 artifacts. Compared to Fig. 4.5(c),
Fig. 4.8(c) shows that the residue artifacts have been successfully removed
by the enhanced SID decomposition.
(a) Target patterns. (b) Enhanced decomposition result. (c) Contour simulation.
A
B
C
Mandrel
Trim
Figure 4.8: Enhanced SID decomposition to remove type 3 artifacts. The
overlay-tolerance capability is sacrificed at the circled edges.
4.5 Enhanced SID Decomposition with Model-Based
Verification
In this section, we propose an enhanced SID decomposition flow and verify its
effectiveness in residue artifact removal. The decomposition flow is proposed
in subsection 4.5.1. Then in subsection 4.5.2, the final output of the entire
flow is verified with a real model on a industry design.
43
4.5.1 Enhanced SID decomposition flow
The three enhancement strategies for SID decomposition are performed dur-
ing different stages. The type 1 artifacts should be considered during the
decomposition step, while the type 2 and type 3 artifacts can be removed by
post-processing after decomposition is done. In order to removed the type 2
and type 3 artifacts, the overlay-tolerance capability at certain edges has to
be sacrificed. However, this is a tradeoff and sometimes the overlay-tolerance
at certain edges is more critical, and hence it is not worthwhile to make such
a sacrifice. In the proposed SID decomposition flow, both residue artifact
removal and overlay-tolerance capability are simultaneously considered.
Input Layout
SID Decomposition
Weighed Mandrel Assignment to 
Minimize Type 1 Arfifacts
SNPS PLRC Verification
Type 2 Artifacts More 
Critical Than Overlay 
Tolerance
Type 3 Artifacts More 
Critical Than Overlay 
Tolerance
Overlay Tolerance More 
Critical
Type 2 & Type 3 
Artifacts
Mandrel Extension & 
Trim Modification
Primary Mandrel Mask 
& Trim Mask
Mandrel Merging & 
Trim Modification
Final Mandrel Mask & Trim Mask
Constant Biasing Model 
for Spacer Deposition
Figure 4.9: Enhanced SID decomposition flow.
Figure 4.9 shows the enhanced SID decomposition flow with model-based
44
verification. Given a target layout to decompose, we first modify an exist-
ing SID decomposition algorithm [41] by introducing an additional weighting
term on mandrel assignment. Note that other valid SID decomposition algo-
rithms can also be adopted in this step with minor modifications. Then the
modified algorithm is applied to decompose the design into a primary mandrel
mask and a primary trim mask, where the type 1 artifacts have been mini-
mized. In the following step, the Synopsys PLRC verification tool is adopted
to inspect the type 2 and type 3 artifacts. The PLRC tool has the capabil-
ity of real-model full-chip verification, where the sizes of the residue can be
precisely measured. However, it may also report other artifacts caused by
incorrect OPC or wafer printing failures, and it is extremely time-consuming
if post-OPC verification is performed in this step. In fact, no other artifacts
except for those caused by SID decomposition should be reported, and to
decide how severe each artifact can be, we only need to estimate their sizes
instead of knowing the detailed shape information. Therefore, we assume
ideal lithography and processing conditions for the decomposition verifica-
tion, and modify the recipe to directly bias the target patterns with a simple
constant biasing model, in order to simulate the spacer deposition. Then
the full-chip contour simulation can be done very efficiently. By comparing
the simulated contour with the target patterns, the hot spots with residue
artifacts can be effectively reported, as illustrated in Fig. 4.10.
(c) Contour simulation.(a) Target patterns. (b) Decomposition result.
  
Target Mandrel Mask Trim Mask Target Contour
Figure 4.10: Type 2 and type 3 artifact inspection with simplified
decomposition verification model. The artifacts are circled in red.
Figure 4.10(a) shows a target layout, and Fig. 4.10(b) gives a valid decom-
position result using Zhang’s algorithm [41]. Merely verified by Boolean op-
45
erations on the decomposed layers, the printed patterns are exactly identical
to the targets, and hence no artifacts can be identified. Instead, Fig. 4.10(c)
illustrates the verification result with the simplified model. Through effi-
cient contour simulation, the residue artifacts are successfully reported as
hot spots. Then in the next step of the decomposition flow, if the residue
artifacts are more critical than overlay-tolerance, appropriate enhancement
strategies (mandrel extension or mandrel merging) are adopted to remove
the critical residue. Otherwise the residue remains untreated and will be
printed on the final patterns. Then, the final mandrel mask and trim mask
are obtained. Figure 4.11 illustrates an alternative valid SID decomposition
for the same layout shown in Fig. 4.10(a), where the enhancement strategies
are fully adopted. From the contour simulation in Fig. 4.11(c) we can see
that with certain sacrifice in overlay-tolerance, the printed contour becomes
much clearer than that shown in Fig. 4.10(c), and most residue artifacts have
been successfully removed.
(c) Contour simulation.(b) Enhanced decomposition.(a) Target patterns.
  
Target Mandrel Mask Trim Mask Target Contour
Figure 4.11: Enhanced SID decomposition and contour simulation result.
4.5.2 Decomposition verification with a real model
In order to verify the effectiveness of the proposed decomposition flow in
residue artifact removal, we perform lithography simulation with a real lithog-
raphy model on both primary masks and final masks. The comparison is
illustrated in Fig. 4.12.
In Fig. 4.12(a) [2], spur-shaped residue is generated because the spacer sur-
rounding the mandrel patterns cannot merge seamlessly. This spur-shaped
46
(a) Contour simulation of the primary masks, where 
 spur  shaped residue is inspected (circled in blue).
(b) Contour simulation of the final masks. The  spur  
shaped residue is removed through mandrel merging.
Mandrel
Sub Metal
Figure 4.12: The comparison of the simulation results performed on
primary and final masks. The black contours show the inner and outer
edges of the spacer deposition.
residue can be easily inspected as a type 3 artifact with the simplified decom-
position verification in the proposed SID decomposition flow. Accordingly,
mandrel merging and trim modification are performed on the primary man-
drel and trim masks, which generates the final masks shown in Fig. 4.12(b).
Then OPC and contour simulation are performed on the final masks with a
real lithography model. From the simulated contour in Fig. 4.12(b) we can
clearly see that the spur-shaped residue has been completely removed.
4.6 Challenges in SID-Compliant Detailed Routing
In SID decomposition, the patterns are not independent of each other with
regard to color assignment. Any two patterns with a spacing conflict have to
be assigned different colors. Therefore, even with weighted mandrel assign-
ment, it is still impossible to guarantee that all 2D patterns are defined by
main mandrel during SID decomposition. Hence it is extremely difficult to
completely remove the type 1 artifacts through decomposition enhancement
only. As illustrated in Fig. 4.11(c), there is still type 1 residue left at the
concave pattern corners. In order to remove such artifacts more effectively,
we have to take them into consideration in the design phase. However, due
to the special step of spacer deposition, SID process faces several intrinsic
challenges in the design aspect. In this section, we will discuss the main
challenges in SID-compliant detailed routing.
47
4.6.1 Avoid Forbidden Spacing
In SID process, the allowed spacing values between two adjacent wires should
be either equal to the spacer width or large enough to satisfy the minimum
spacing requirement of the trim mask [2]. Any values in between are strictly
forbidden. As illustrated in Fig. 4.13, when the wire space is exactly the
spacer width, the two adjacent wires can be defined within the same pattern
on the trim mask and separated from each other by spacer. Otherwise, if
two wires are far away enough from each other, they can be defined by two
separate trim patterns.
S1 S2 S1 S2
(a) Target wires. (b) Decomposition result.
M
ain
 M
an
d
re
l
Su
b
 M
e
tal
M
ain
 M
an
d
re
l
Figure 4.13: Allowed spacing values in SID process [2].
4.6.2 Avoid Odd Cycles
Similarly to LELE DPL, the SID decomposition can be formulated as a
two-coloring problem [48]; hence odd cycles will introduce problems in de-
composition. In LELE, odd cycles can be resolved by introducing stitches.
However, in SID, since all mandrels are surrounded by spacer, splitting a
single wire into two will introduce a gap in the final pattern, which breaks
the wire and causes logic failures, as illustrated by Fig. 4.14. Therefore, no
stitches are allowed in SID decomposition, and in consequence, odd cycles
must be avoided in SID decomposable layouts.
48
Target Wire Main Mandrel Sub Metal
Main Mandrel Sub Metal
(a) A single target wire. (b) Split as main mandrel and sub metal.
(c) Spacer is formed surrounding main mandrel. (d) A gap is introduced at final pattern.
Figure 4.14: Splitting a single wire introduces a gap in the final pattern. [2]
4.6.3 Prohibited Anti-Parallel Line-Ends
When the space between two wires is equal to the spacer width, one of them
has to be defined by main mandrel, the other by sub-metal. The trim mask
will cover both wires as well as the spacer between them. If the main mandrel
and the sub-metal on adjacent tracks form a pair of line-ends in opposite
directions, as illustrated in Fig. 4.15(a) [2], the end-by-end overlapping must
be larger than a certain length in order to fulfill the minimum trim width
requirement. Therefore, for two wires on adjacent tracks with anti-parallel
line-ends, they should either have enough end-by-end overlapping (i.e. ≈
minimum trim width) or enough end-to-end distance (i.e. ≈ minimum trim
space) such that they can be defined by two separate patterns on the trim
mask, as shown by Fig. 4.15(b). Either line-end falling into the prohibited
region will result in a failure in SID decomposition.
Min Trim Width
TargetProhibited Region
Prohibited 
Region
L1 L1L2
L1 = Minimum end-by-end overlapping
L2 = Minimum end-to-end distance 
(a) Enough end-by-end overlapping to 
fulfill the minimum trim width requirement.
(b) Prohibited regions for 
anti-parallel line ends.
Sub Metal
Main Mandrel
Figure 4.15: Design rule for anti-parallel line-ends.
49
4.6.4 Sub-Metal Residue Artifacts
We have mentioned that in order to completely avoid the type 1 residue arti-
facts, all 2D patterns in the design must be defined by main mandrels during
SID decomposition, which is extremely difficult to be achieved. Therefore, it
has to be considered in the design phase. We define the jogs on sub-metals as
sm-jogs. Then the number of sm-jogs should be minimized during detailed
routing.
4.7 SID-Compliant Detailed Routing Problem
Since the SID process is more adoptable to the largely unidirectional routing
layers, we assume the SID-compliant detailed routing algorithm is applied to
Metal 2 or higher layers where each layer has a preferred direction, and the
other direction perpendicular to the preferred direction is defined as non-
preferred direction of the layer. Depending on the design of Metal 1,
an input/output pin may have more than one candidate locations, providing
more flexibility for detailed routing on higher layers. With such assumptions,
we formulate the SID-compliant detailed routing problem in this section con-
sidering the challenges mentioned in the previous section.
In order to efficiently avoid odd cycles, in the non-preferred direction of
each layer, the routing tracks are assigned as main mandrel tracks and sub-
metal tracks alternatively with track space equal to the spacer width, and
odd-track jogs on the same layer are strictly forbidden. In this way, all
wires on main mandrel tracks become main mandrels during the step of de-
composition, the rest being sub-metals. The scheme of simultaneous color
assignment provides valuable information to guide the subsequent procedure
of SID decomposition. Furthermore, the alternative track assignment auto-
matically avoids spacing conflicts in the non-preferred direction. We only
need to guarantee that the minimum spacing rules are not violated along
each routing track and no prohibited anti-parallel line-ends occur, in order
to produce decomposable routing layers.
In each layer, there are two options for main mandrel/sub-metal track
assignment (either odd/even track is assigned as main mandrel/sub-metal
track or vice versa). Since the total number of routing layers fabricated by the
SID process is very limited, we can always enumerate the possible assignment
50
combinations for all layers in order to achieve the optimal routing result.
Therefore in our problem formulation, we assume the track assignments for
each layer have been fixed and define the SID-compliant detailed routing
problem as follows.
Definition 4.1. SID-Compliant Detailed Routing
Given a netlist with candidate source/target pin locations for every
net, a routing grid, a main mandrel/sub-metal track assignment strat-
egy and the minimum spacing requirement, detailed routing with si-
multaneous pin location determination is performed such that no odd-
track jog, spacing violation or prohibited anti-parallel line-ends occur,
and the number of sm-jogs is minimized.
4.8 Solution to SID-Compliant Detailed Routing
A negotiated congestion based scheme [49] is adopted in our SID-compliant
detailed routing algorithm. In order to reduce the adverse effect of improper
net ordering, wire crossing and wire spacing conflicts are initially allowed, and
then resolved over iterations of rip-up and reroute. The key subproblem of the
negotiated congestion based routing scheme is how to perform maze routing
for a single net on the routing graph in the presence of a set of previously
routed nets. In this section, we first define the subproblem of the SID-
compliant detailed routing problem and propose a graph model where the
subproblem can be optimally solved; then we present the overall negotiated
congestion based routing scheme.
4.8.1 Subproblem Definition
When routing a net, it is desired to compute a path p which produces the
minimum number of crossing conflicts, spacing conflicts and sm-jogs. Of
course, the wire length, as a conventional metric, also needs to be minimized.
Note that the unit length wire segment in the non-preferred direction of a
layer should cost more than that in the preferred direction in order to preserve
the unidirectional property. Therefore, the weighted sum lpw+α×vpc +β×vps+
γ × jps is a good cost metric to minimize when routing a single net, where
51
lpw, v
p
c , v
p
s and j
p
s denote the weighted wire length, the number of vertices
with crossing conflicts, the number of vertices with spacing conflicts and the
number of sm-jogs produced by the path p computed, respectively, and α,
β and γ are user defined parameters that specify the relative importance
between them. We define the SID-compliant maze routing problem below.
Definition 4.2. SID-Compliant Maze Routing
Given a set of previously routed nets as well as the candidate
source/target pin locations of a net, the objective is to determine the
optimal source/target pin locations and compute a path between the
two pins such that the weighted sum lpw + α× vpc + β × vps + γ × jps is
minimized.
4.8.2 Subproblem Solution
In this subsection, we propose a graph model that correctly captures the cost
of crossing, spacing conflicts and sm-jogs, and show that the SID-compliant
maze routing problem can be optimally solved by performing the shortest
path algorithm on the proposed graph model. We then demonstrate that our
algorithm is able to automatically extend line-ends and remove anti-parallel
line-ends conflicts through simple post-processing.
Expanded Routing Graph Model
Suppose we are given a routing grid G with preferred direction and main
mandrel/sub-metal track assignment for each layer, which can be viewed
as a routing graph if we regard every segment intersection as a vertex and
segments between vertices as edges. In order to capture the cost of sm-jogs,
we split each vertex v of G into 4 vertices and construct an expanded routing
graph model on them, as illustrated in Fig. 4.16(a). Fig. 4.16(b) shows four
types of edges to capture the cost of sm-jogs and wrong-way wires. The
detailed construction is described as follows.
• Each split vertex works as a switch box with four vertices connecting
each other.
52
Sub-Metal
Sub-Metal
Main Mandrel
es
ewe0 e1
Preferred Direction
(a) Vertex split. (b) Edge definition.
Figure 4.16: The definitions of vertices and edges in the expanded routing
graph model.
• All edges are categorized into four types, namely, es, e0, e1 and ew.
Inside a switch box located at a sub-metal track, two vertices in the
diagonal direction are connected by a type es edge with cost γ (the
cost of a sm-jog). The rest of the edges inside switch boxes, as shown
in yellow in Fig. 4.16(b), are classified as type e0 edges with 0 cost.
This means that inside a switch box, a wire can travel freely on a main
mandrel track; however, on a sub-metal track, it has to pay the sm-jog
cost in order to travel diagonally, because such travel introduces an sm-
jog which is undesired. Outside the switch boxes, along the preferred
direction, a pair of vertices at neighboring switch boxes are connected
by a type e1 edge with cost 1, as shown in blue in Fig. 4.16(b). In
order to forbid odd-track jogs, in the non-preferred direction, only the
vertices located at every other track are connected by type ew edges
with twice the wrong-way wire length cost 2 × cw, as shown in red in
Fig. 4.16(b).
• On top of the switch box model, an extra vertex is added to represent a
pin. Fig. 4.17 illustrates the graph model for a pin with two candidate
locations. For each candidate pin location, four extra edges are added
connecting the pin to the vertices inside the switch box.
• The cost of crossing and spacing conflicts is captured by assigning con-
gestion cost to the verities and edges. Vertices from the same switch
box always share the same congestion cost. Before the routing starts,
53
Figure 4.17: The graph model for a pin with two candidate locations.
all vertices are initialized with 0 cost. Then after a vertex v is occupied
by a routed net, the cost of all vertices in the same switch box as well as
the wrong-way edge passing that switch box is increased by α, which is
the cost of a crossing conflict. And along the same track of v, the ver-
tices within its spacing conflict region but not occupied by the current
net have cost increased by β – the cost of a spacing conflict. Fig. 4.18
shows the congestion cost assignment after two wires of different colors
have been routed. Note that the wire length cost is not displayed.
Preferred Direction
1 2 3 4
5 6 7 8
9 10 11 12
α α 
α α 
β α+β 
β 
β 
α 
α 
1 2 3 4
5 6 7 8
9 10 11 12
α α 
α 
Figure 4.18: The wire crossing and spacing conflict cost assigned to an
expanded routing graph with two pre-routed wires.
54
So far we have constructed an expanded routing graph G′ from the original
routing grid G. When routing one net, we simply apply Dijkstra’s shortest
path algorithm on G′ to find the optimal path p′ between two pins on G′,
which corresponds to a path p on G. In addition, it is obvious that the short-
est path on G′ passes exactly one candidate switch box for the source pin,
and one for the target pin. So the optimal pin locations can be determined at
the same time. Therefore, we conclude that the SID-compliant maze routing
problem can be solved optimally by performing the shortest path algorithm.
Forbidding Prohibited Anti-Parallel Line-Ends
As mentioned previously, prohibited anti-parallel line-ends violate SADP de-
sign rules and hence are forbidden in a decomposable layout. Fig. 4.19 il-
lustrates three scenarios of anti-parallel line-ends conflicts on a routing grid
G. In order to avoid these conflicts during single net routing, we modify our
expanded routing graph to disallow the routing scenarios shown in Fig. 4.19.
Fig. 4.20 gives an example of the detailed modification.
(a) Scenario 1. (b) Scenario 2. (c) Scenario 3.
Figure 4.19: Three scenarios of anti-parallel line-ends conflicts on a routing
grid.
In Fig. 4.20(a), suppose the blue wire is a pre-routed wire in G and the
rectangular regions show the prohibited regions for anti-parallel line-ends.
Then for any switch box located within the prohibited region, two types of
edges are blocked, as illustrated in Fig. 4.20(b). The first type of edge (in
blue) connects a vertex inside the switch box to a pin or a via, generating
the first or the second routing scenario in Fig. 4.19. The second type of edge
(in green) connects two vertices inside the switch box, generating the third
scenario in Fig. 4.19. However, only blocking these problematic edges may
not work correctly. For example, the right vertex in the second switch box
55
(a) A routing grid with a pre-routed 
wire and the prohibited regions for 
anti-parallel line ends.
(b) Two types of edges are 
blocked in the switch boxes.
x
x
x
x
xx
Figure 4.20: Graph model modification to avoid prohibited anti-parallel
line-ends.
of Fig. 4.20(b) may still reach the bottom vertex through a detour inside
the box (shown in black) even if the green edge is blocked. To avoid such
inside-box detours, we further split a vertex v in the graph model into two
vertices vin and vout, and make the edges directed, as illustrated in Fig. 4.21.
The input vertex on one boundary is connected to the three output vertices
located at different boundaries. Note that only the connections from/to the
left boundary are displayed in Fig. 4.21.
Vin
Vout
Figure 4.21: The graph model that disallows inside-box detour.
By blocking certain edges in the new graph model, prohibited anti-parallel
line-ends will not show up while routing a net; instead, a looped walk in
G may be obtained by the shortest path algorithm as a routing path, as
56
illustrated in Fig. 4.22(a), where net 1, 2 and 3 are pre-routed on G. When
routing net 4, since the candidate location of its target pin lies within the
prohibited region of wire 3, in the corresponding switch box, the edge con-
necting the input vertex on the right boundary to the target pin has been
blocked. Therefore, the shortest path will go through a loop and connect
to the target pin from the input vertex on the left boundary. When tracing
the path on G, we simply remove all wire segments inside the loop and ob-
tain a routed wire with extended line-end, as shown in Fig. 4.22(b), which
automatically avoids the anti-parallel line-ends conflict.
44
3 3
1 1
2 2
44
3 3
1 1
2 2
(a) A looped walk on G is 
obtained as a routing path.
(b) Remove all wire segments in 
the loop when tracing the path.
Figure 4.22: The strategy of automatic line-end extension to avoid
anti-parallel line-ends conflicts.
4.8.3 Overall Routing Scheme
In this subsection, we present the overall negotiated congestion based routing
scheme for the SID-compliant detailed routing problem, where crossing and
spacing conflicts are resolved over iterations of rip-up and reroute. We let the
nets negotiate for routing resources by adding history costs to vertices and
wrong-way edges in G′. The cost of a vertex v is computed by the following
formula:
cost(v) = α× hc × nc + β × hs × ns, (4.2)
where nc/ns denotes the number of pre-routed nets having crossing/spacing
conflict with the current vertex, and hc/hs denotes the history cost for cross-
ing/spacing conflicts. All history cost is initialized as 1. Similarly, the cost
of a wrong-way edge e is computed by the following formula:
cost(e) = 2× cw + α× hc × nc, (4.3)
57
where the first term describes its weighted wire length cost, and the second
term describes the crossing conflict cost of the vertex it passes.
The scheme works as follows. We first route all nets sequentially in a ran-
dom order on G′. Then as long as conflicts exist, iterations of rip-up and
reroute will be performed. When a net i is ripped-up and rerouted, we first
remove its current route, unblock the edges causing anti-parallel line-ends
conflict with the current route, and update the cost values on the vertices
and wrong-way edges it has impact on. Then the shortest path algorithm
is performed to compute a new path for net i, and newly impacted vertices
and edges are updated accordingly. In addition, if the new path causes any
conflicts with previously routed nets, the history cost (hc or hs) on the corre-
sponding vertices and edges is incremented by 1. In this way, the conflicting
vertices and edges grow more expensive over iterations, and those nets with
more options will tend to choose alternative routes in subsequent iterations,
so that the conflicts can potentially be resolved. In our implementation, this
procedure will terminate when either no conflict exists or enough iterations
have been performed.
4.9 Experimental Results
We implement our algorithm in C++ on a Linux machine with 3.0GHz
CPU and 16GB RAM. Experiments are performed with the10 nm node
benchmarks where both wire width and spacer width are 24 nm. All the
conflict-free routing layers produced by our detailed router are verified by
Synopsys Proteus as 100% decomposable.
4.9.1 Advantages in Residue Removal
Fig. 4.23 compares the simulation results on the routing layers with and
without sm-jog penalty. From Fig. 4.23(a) we can easily see that without
sm-jog minimization, a lot of residue is left at concave sub-metal corners.
However, the sm-jog penalty introduced in our SID-compliant routing scheme
helps to clean up such residue effectively, as shown in Fig. 4.23(b). The
remaining ‘spur’ shaped residue can be simply removed by post-processing
such as mandrel extension and mandrel merging [46].
58
Simulation Contour
Main Mandrel Sub-Metal
Additional Mandrel
Sub-Metal 
Residue
(a) Routing layer without sm-jog penalty. (b) Routing layer with proper sm-jog penalty.
Figure 4.23: Comparison of the simulation results with and without sm-jog
penalty.
4.9.2 Benefits of Simultaneous Pin Location Determination
We then perform the experiments on a set of benchmarks with different scales
and show the advantage of our automatic pin location determination strategy
in routability, sm-jog number, via number, wire length and runtime. We first
randomly choose one of the candidate locations for each pin and run the
router with the fixed pin locations. Then we set free all candidate locations
and let the router decide where to place the pins. In the experiments, each
pin has 3 candidate locations on average, and the maximum iterations is 50.
Table 4.1 shows the comparison results, from which we can conclude that
simultaneously determining pin locations during detailed routing has great
advantages over random selection in all aspects.
4.10 Conclusions
Spacer corner rounding is inevitable in the the SID process, which potentially
leads to residue artifacts on sub-metal patterns. Targeting artifact removal,
this chapter first proposes an enhanced SID decomposition flow with model-
based verification. The simplified lithography model introduced in the step
59
Table 4.1: Comparison of Automatic/Random Pin Location Determination
in SID-compliant Detailed Routing
] Net
Size ] Conflict ] SM-Jogs ] Via Wire Leng.(µm) Runtime(s)
(µm2) Auto. Rand. Auto. Rand. Auto. Rand. Auto. Rand. Auto. Rand.
1k 66.6 0 68 199 227 854 1682 381.6 513.4 78 298
2k 132.7 0 189 374 482 2104 3388 735.6 1021.6 465 866
4k 368.6 0 17 683 894 3254 5894 1392.5 1894.8 1984 3037
8k 829.4 0 33 1329 1728 6706 11686 2766.4 3777.3 7081 12506
16k 1866.2 0 60 2593 3467 13594 23008 5535.2 7475.9 38958 54136
of decomposition verification tremendously improves the efficiency of the en-
tire flow, and the simulation results with a real lithography model verifies
that the enhanced SID decomposition flow is capable of removing residue ar-
tifacts effectively. Next, this chapter also analyzes the necessity to consider
residue artifacts in the design phase and proposes an expanded graph model
to solve the SID-compliant detailed routing problem. The challenges faced by
the SID process such as forbidden spacing, odd cycles, anti-parallel line-ends
conflicts and sub-metal residue issues have been considered in the proposed
graph model. In addition, color assignment and pin locations can be si-
multaneously determined during the detailed routing. An overall negotiated
congestion based routing scheme is developed to resolve wire crossing and
design rule conflicts over iterations of rip-up and reroute, and all conflict-free
routing layers produced by our detailed router have been verified as 100%
SID decomposable.
60
CHAPTER 5
CONTACT/VIA LAYER OPTIMIZATION
FOR DSA LITHOGRAPHY
5.1 Introduction
Due to the limitations of 2D layout printing for the advanced technology
nodes, integrated circuit (IC) designers are moving towards a highly regular
1D gridded design style [50]. Taking the 1D detailed routing as an exam-
ple, only one routing direction is allowed in each routing layer. Whenever
direction switching (vertical to horizontal or vice versa) is needed, a via has
to be inserted in order to connect between different routing layers. While
1D routing is less flexible than 2D routing, the routing layers with strict 1D
wires offer the advantages of larger process window and higher yield [3,4,51].
Besides that, many advanced technologies have demonstrated their capabil-
ities in printing 1D wires with very narrow pitch size, such as self-aligned
double pattering (SADP) lithography [52], self-aligned quadruple pattern-
ing (SAQP) lithography [53] and directed self-assembly (DSA) [54]. On the
other hand, 1D detailed routing introduces a large amount of contacts/vias
that are randomly distributed. In the 7 nm technology node, the contact/via
pitch can be as small as 40 nm. How to print the highly dense contacts/vias
becomes a major challenge for the 7 nm technology node IC fabrication.
The conventional 193 nm immersion (193i) lithography with single expo-
sure has already reached its printability limit at the 28 nm technology node.
In consequence, next generation lithography techniques have to be adopted
for contact/via layer printing in the 7 nm technology node and beyond, such
as extreme ultraviolet lithography (EUV), electron beam lithography (EBL),
multiple patterning lithography (MPL) and block copolymer directed self-
assembly (DSA). Despite years of study, EUV is still far from practical im-
plementation, and EBL can only be adopted for small volume production
with low throughput. Double patterning lithography (DPL) is reaching its
61
limit at the 22 nm node, and triple patterning lithography (TPL) is a natural
extension of DPL targeting sub-14 nm node fabrication. However, with more
masks involved, the manufacturing cost may become too high to be accepted.
Recent research progress on DSA has shown this technique’s strong potential
for the contact/via layer manufacturing [6–11].
(a) Layout of a half adder. (b) The contact Layer.
(c) DSA contact patterning for 
the 22 nm technology node.
(d) DSA contact patterning for 
the 7 nm technology node.
Figure 5.1: Comparison of DSA contact hole patterning between the 22 nm
technology node and the 7 nm technology node with a half adder. In (c)
and (d), the dark gray areas denote guiding templates and the black areas
denote DSA contact holes. Scale bar: 200 nm.
As shown in Fig. 5.1, to use DSA for patterning irregularly positioned
contacts, topographical guiding templates are needed for block copolymer
to form irregular patterns. At the 22 nm node (Fig. 5.1(c)), for each con-
tact we can build a single-hole guiding template so that a smaller DSA hole
will form inside. With a smaller contact pitch at the 7 nm technology node
(Fig. 5.1(d)), a larger template could be used to guide the formation of
multiple DSA holes inside the template, corresponding to closely positioned
contacts in the layout. Different DSA patterns could be controlled by ad-
justing the shape, size and density of guiding templates [55]. We conjecture
62
that there exists a set of guiding templates analogous to the letters in an
alphabet which could cover and compose the desired full chip contact layer.
However, the overlay accuracy of the contact holes as well as the printability
of templates may vary among different templates, and in consequence, the
cost of each guiding template shape is very different from others. Therefore,
it becomes extremely important to optimize contact distribution for overall
DSA process cost minimization. In this chapter, we will denote a set of guid-
ing templates as an alphabet, and for each guiding template in the set we will
denote it as a letter.
Inside a 1D standard cell, vertical connections are mostly realized on local
interconnect (LI) layer [56], so that we can permute the Metal 1 wire to adjust
the guiding template in use without any logic interruption on the original
design, as shown in Fig. 5.2. Based on the freedom of intra-cell Metal 1 wire
permutation, we propose an optimization scheme to ensure that the contact
layer of every cell in a 1D standard cell library can be fully patterned by an
alphabet, and the total cost of the alphabet is minimal. First, since the size
of a single cell is very limited, the letters to pattern the candidate layouts of
one cell can be enumerated efficiently. Then, we prove that the problem of
determining the optimal layout for each cell and the optimal alphabet with
the minimum total cost is NP-hard. To solve this problem, we first formulate
it as a Weighted Partial Maximum Satisfiability (MAX-SAT) problem, and
obtain the optimal solution using a public SAT solver [57]. Then we propose
a bounded approximation algorithm that solves it more efficiently.
Active Poly Contact Metal 1 Guiding Template
(a) Before permutation. (b) After permutation.
LI
Figure 5.2: The layout and guiding templates for a standard cell before and
after wire permutation.
63
At the full chip level, conventional detailed routing usually randomly in-
serts vias, which are very likely to compose infeasible letters, and any in-
feasible letter may cause the entire via layer to be incompatible with the
DSA process. Due to the very limited freedom in 1D detailed routing, it is
extremely difficult to remove all infeasible letters through post-routing pro-
cessing. A more effective mode of via layer optimization is to consider the
DSA template constraints at the beginning of the detailed routing. Targeting
full-chip level via layer optimization, we propose a DSA-aware detailed rout-
ing algorithm that takes into consideration the constraints on feasible letters
for the DSA process. We guarantee that the via layers produced by our router
can be successfully patterned using feasible letters only. In addition, among
all the feasible letters, the proposed routing algorithm preferentially picks up
lower-cost ones, which further improves the yield. For single net routing, we
perform the Dijkstra’s shortest path algorithm on the routing graph to obtain
the optimal path. Overall, a negotiated congestion based routing scheme is
developed, where the nets are routed sequentially, and wire crossing conflicts
and infeasible letters are resolved over iterations of rip-up and reroute.
The rest of the chapter is organized as follows. Section 5.2 introduces the
background of DSA contact hole patterning and defines the cost function for
DSA templates. In Section 5.3, the intra-cell contact layer optimization prob-
lem is formulated and divided into two subproblems. The first subproblem is
solved in Section 5.4. Then in Section 5.5, the second subproblem is proved to
be NP-hard and formulated as a Weighted Partial Max-Sat problem; we then
present an approximation algorithm to solve it efficiently. Next, Section 5.6
summarizes the feasible via patterns for DSA-aware detailed routing, and
Section 5.7 defines the full-chip DSA-aware detailed routing problem. The
problem solution is proposed in Section 5.8. Then experimental results are
reported in Section 5.9, and finally, Section 5.10 concludes the chapter. The
related work is published in [10] and [58].
5.2 Background: Contact Patterning with DSA
To pattern contact holes with DSA process, guiding templates are usually
printed first with conventional lithography, e.g. 193i. Then the guiding
templates will determine the DSA patterns inside. At the 22 nm node
64
(Fig. 5.1(c)), the contact pitch is big enough such that each contact can
be surrounded by a single hole guiding template. However, in the 7 nm node
standard cell design, the contacts can be very dense, and if each contact is
guided by an individual template, the template pitch will be too small to
be printed by 193i single exposure. Instead, we could use a larger template
to guide a group of contacts together, as illustrated in Fig. 5.1(d). It is
important to note that for a cylinder-forming block copolymer, the pitch be-
tween two DSA holes could be varied within a certain range by adjusting the
template size, thereby providing certain flexibility and constraints for guid-
ing template design. If the distance between two contacts falls within the
range of the DSA hole pitch, then it is applicable to use a larger template to
group these two contacts together. Otherwise we could only use single-hole
templates [59].
Realizing that the overlay accuracy of DSA contact holes may vary among
different letters, and some letters might be more challenging to print using
conventional lithography than others, here we try to capture and define the
cost of each letter. Usually it is more difficult to control the overlay accuracy
for a larger letter; e.g., the triple hole template has larger overlay variations
than the double hole template [55]. Another major contributor to overlay
inaccuracy is the shape of the letter. When a pair of contacts lies on the
same row or column, the contact pitch can be less than the maximum pitch
between DSA holes, so that they could be confined very well with simple
rectangular template. In contrast, the pitch of a diagonal pair of contacts
can be larger than the maximum pitch of DSA holes, so we cannot put them
in the same rectangular template. However, it is also impossible to print two
separate but very close templates using conventional lithography; therefore,
we end up with a special ‘peanut-shaped’ template for these two contacts, as
illustrated in Fig. 5.3(a).
Such ‘peanut-shaped’ templates are difficult to print precisely by 193i,
which directly influences the DSA results. In consequence, a diagonal pair
of contacts usually has larger overlay variations (see Fig. 5.3(b)). Therefore,
according to the overlay variation of the contact holes, the letters in the
alphabet incur different ‘costs’ that are defined by the following equation:
ci = k1 × si + k2 × pi (5.1)
65
(a) A diagonal pair of 
contacts can be guided by a 
‘peanut-shaped’ template.
(b) Peanut-shaped templates may lead 
to overlay accuracy variations.
Figure 5.3: The overlay accuracy of a diagonal pair of DSA contacts guided
by the “peanut-shaped” template is worse than a rectangular template.
Scale bar: 200 nm.
where ci denotes the cost of the i
th letter, si denotes the letter size and pi
denotes the number of ‘peanut-shaped’ pairs in the letter. k1 and k2 are
constant values specifying the relative importance between si and pi. Note
that the cost of a letter may depend on other factors as well, such as the
frequency with which the letter occurs in a full chip design, the local pattern
density where the letter is located, etc. A full chip placement optimization
in the design flow should take these factors into consideration. Since this
work targets standard cell level optimization, we only consider si and pi in
the cost function.
5.3 DSA-Aware Standard Cell Library Optimization
We define the DSA-aware standard cell library optimization problem below.
Definition 5.1. DSA-Aware Standard Cell Library Optimization
Given a 1D standard cell library where each cell has certain flexibility
of Metal 1 wire permutation, find the optimal layout for every cell and
the corresponding alphabet that is capable of patterning the contact
layer of any cell in the library, such that the total cost of the alphabet
is minimal.
This problem can be divided into two subproblems, and the objective will
be achieved by solving the subproblems sequentially. The first subproblem
is defined below.
66
Definition 5.2. Cell Letter Determination Problem
Given a 1D standard cell library where each cell has certain flexibility
of Metal 1 wire permutation, the objective is to find all candidate
layouts for every cell, and for each candidate layout, find all letters
that are needed to pattern its contact layer.
The solution of the first subproblem can be expressed in a dependency
table as illustrated in Fig. 5.4, where X means the letter in the corresponding
column is needed to pattern the layout in the corresponding row. In other
words, the layout depends on that letter.
Cells Layouts Letters / Cost
t1 / c(t1) t2 / c(t2) t3 / c(t3) t4 / c(t4) …
A A1 X X …
A2 X X …
A3 X X …
B B1 X …
B2 X X …
… … … … … … …
Figure 5.4: A notional table to illustrate the dependence between cell
layouts and letters.
Based on the dependency table, we define the second subproblem below.
Definition 5.3. Alphabet Optimization Problem (AOP)
Given a dependency table between cell layouts and different letters, as
well as the cost of each letter, pick up one candidate layout for each
cell and a subset of letters to form an alphabet, such that the contact
layer of any cell in the table can be patterned by the alphabet, and the
total cost of the alphabet is minimal.
In the following two sections, we will propose our solution to each sub-
problem respectively. The solution to AOP is exactly the objective of the
problem defined in Definition 5.1.
67
5.4 Cell Letter Determination
In this section, we propose the solution targeting the cell letter determination
problem that is defined in Definition 5.2. First, we try to find all candidate
layouts for every cell. For a standard cell library, the cell height is usually
fixed, and the number of functional Metal 1 wires in a single cell is also
very limited. Therefore, the valid Metal 1 wire permutations within a single
cell are upper bounded. Hence, to find all candidate layouts for one cell,
we enumerate all valid Metal 1 wire permutations, subject to the following
constraints. First, any two wires never overlap on the same track. Second,
every contact is located at the same LI or Poly pattern before and after wire
permutation. Third, no contacts are placed in the gate region.
For each cell in the standard cell library, all the candidate layouts can
be obtained through the enumeration procedure. Then, in order to find
all the letters that are needed to pattern one candidate layout, we build a
binary matrix to represent the layout, where the locations of the contacts
are filled with 1s, and all other locations are filled with 0s, as illustrated in
Fig. 5.5. Since the locations of the contacts are known information based
on the permutation result of Metal 1 wires, the binary matrix can be built
up instantly. As mentioned in Section 5.1, at the 7 nm technology node,
only contacts located at adjacent horizontal and vertical tracks have to be
confined in the same letter. So in the next step, we pick a value 1 in an
arbitrary sequence in the matrix and look at its 8 neighbors on adjacent
horizontal and vertical tracks. If there are other 1s in the neighborhood, we
cluster them together in a single letter. We repeat the procedure on all the
neighboring 1s sequentially until no more 1s can be clustered in the same
letter. Then a complete letter shape is obtained.
We save the obtained letter into a universal alphabet. If the alphabet is
empty, we directly save the newly obtained letter in it. Otherwise, we try
to match the shape of the new letter with the letters in the alphabet first
to check if it is saved already. Through row by row scanning, each letter
can be saved by a set of strings of 0s and 1s, as illustrated in Fig. 5.6, where
each string represents one valid orientation (flipping or rotation) of the letter.
To save computational effort, when saving a letter, we always let the letter
height be less than or equal to the letter width. So depending on whether
the letter height and width are equal or not, one letter may have 8 or 4 valid
68
0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 0 1 0 0 0
0 1 0 1 0 0 0 0 0 0 0
1 0 0 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 1
Figure 5.5: Each layout is considered as a binary matrix.
orientations, as illustrated in Fig. 5.6(a) and Fig. 5.6(b) respectively.
1 0
1 1
0
1
‘1 0 0 1 1 1’
‘1 1 1 0 0 1’
‘0 0 1 1 1 1’
‘1 1 1 1 0 0’
1 1
1 1
0
1
0 1 0
‘0 1 0 1 1 0 1 1 1’
‘1 1 0 1 1 1 1 0 0’
‘1 1 1 0 1 1 0 1 0’
‘0 0 1 1 1 1 0 1 1’
‘0 1 0 0 1 1 1 1 1’
‘1 0 0 1 1 1 1 1 0’
‘1 1 1 1 1 0 0 1 0’
‘0 1 1 1 1 1 0 0 1’
(b) width < height(a) width = height
Figure 5.6: A letter can be saved by 8 or 4 strings of 0s and 1s.
To match two letters, we first compare their height and width. If both
match, we then compare the string representation of the new letter with
each string representation of the letter in the alphabet. If the new letter has
already been saved, we only memorize that the current layout depends on
the corresponding letter in the alphabet. Otherwise we save it first and then
memorize the dependency relationship.
In our algorithm, we process all candidate layouts for every cell sequen-
tially, and maintain a universal alphabet storing all the letters that have ever
been needed. The universal alphabet, together with the letter cost, forms
the letter row in Fig. 5.4, and the cells and their candidate layouts form the
cell column and layout column, respectively. As mentioned previously, the
dependency relationships between the layouts and the letters are memorized
while each candidate layout is being processed. In our implementation, the
universal alphabet is saved in a hash table with the hash function defined
by the letter size and its string representations, such that it takes constant
time to perform letter matching. Then in order to build up the dependency
69
table, each contact in each cell layout is visited only once. Therefore, the
time complexity to solve the cell letter determination problem is linear with
respect to the total number of contacts in all candidate layouts.
5.5 Alphabet Optimization
In this section, we first prove that AOP is NP-hard. We then formulate the
problem as a Weighted Partial MAX-SAT problem and obtain the optimal
solution using a public SAT solver [57]. Following that, we propose a bounded
approximation algorithm that solves the problem more efficiently.
5.5.1 Proof of NP-Hardness
Theorem 5.1. AOP is NP-hard.
Proof: We prove that AOP is NP-hard by using reduction from the Set
Covering Problem (SCP), which is a classical NP-hard problem [60].
SCP
Given a universal set U and a number of subsets whose union equals
U , the objective is to identify the smallest number of subsets whose
union still contains all elements in U .
Let< U, S > be an instance of the SCP problem. Assume U = {e1, e2, ..., em}
and S = {s1, s2, ..., sn}, where each element in S is a subset of U . Then the
original SCP problem is to identify the minimum number of elements in S,
denoted by {si1 , si2 , ..., sik}, such that si1
⋃
si2
⋃
...
⋃
sik = U .
Given an SCP instance, we construct the AOP instance as follows. We
construct a standard cell library L with m cells (L = {c1, c2, ..., cm}), one
cell for each element in U . Then we construct an alphabet A with n letters
(A = {t1, t2, ..., tn}) of identical cost, one letter for each element in S. Next,
iff ei ∈ sj, we create a candidate layout for cell ci, which can be patterned
by letter tj only.
Lemma 5.1. There are k elements in S, denoted by C = {si1 , si2 , ..., sik},
whose union equals U , iff there are k letters in A, denoted by T = {ti1 , ti2 , ..., tik},
capable of patterning all cells in L.
70
Proof:
• Set Cover ⇒ Alphabet: Since the union of the k elements in C
equals U , any element in U must be covered by at least one element
in C. W.l.o.g., suppose ej ∈ sim(1 ≤ m ≤ k). Then according the
construction, cell cj has a candidate layout that can be patterned by
letter tim only. Hence, tim is able to pattern cell cj. Similarly, every cell
can be patterned by at least one letter in T . Therefore, T is capable of
patterning all cells in L.
• Alphabet⇒ Set Cover: We know that the k letters in T can pattern
all cells in L, and according to the construction, each cell only needs
one letter to be patterned. W.l.o.g., suppose cj can be patterned by
tim(1 ≤ m ≤ k). Then cj must have a candidate layout that can be
patterned by tim . Since the candidate layout is created iff ei ∈ sim , the
corresponding element in U must be covered by at least one element in
C. Therefore, the union of all elements in C equals U .
Lemma 5.1 implies that the minimum number of elements in S comprising
U can be obtained by identifying the minimum number of letters capable
of pattering L. Since all letters have the same cost, the alphabet with the
minimum number of letters has the lowest total cost. Thus, the SCP problem
can be solved by solving the AOP problem. Since the reduction from SCP to
the AOP takes polynomial time (O(mn)), we conclude that AOP is NP-hard.
5.5.2 Weighted Partial Max-SAT Formulation
In this subsection, we formulate AOP defined in Definition 5.3 as a Weighted
Partial Max-Sat problem.
Weighted Partial Max-Sat
Given a satisfiability formula in conjunctive normal form (CNF),
where the clauses are divided into those that must be satisfied (hard)
and those that may or may not be satisfied (soft), weights are assigned
to each soft clause as the penalty to falsify the clause. The objective
is to find the assignment that satisfies all hard clauses, and the sum
of the weights of the falsified clauses is minimal.
71
Suppose there are n letters in the universal alphabet and m cells in the
standard cell library. We introduce n Boolean variables, denoted by {b1, b2, ..., bn},
one for each letter. The variable bi set to be true indicates that the i
th letter
is included in the final alphabet. As mentioned previously, each cell layout
depends on a subset of letters, and in order to pattern a cell, the Boolean
conditions for at least one of its candidate layouts must be satisfied. Such
dependency relationship can be expressed by a satisfiability formula. For
example, the notional table in Fig 5.4 can be expressed by equation (5.2),
which can be easily converted to CNF by applying DeMorgan’s law.
f = ((b2 ∧ b3)︸ ︷︷ ︸
Layout A1
∨ (b3 ∧ b4)︸ ︷︷ ︸
Layout A2
∨ (b1 ∧ b2)︸ ︷︷ ︸
Layout A3
)
︸ ︷︷ ︸
Cell A
∧ ( b4︸︷︷︸
Layout B1
∨ (b2 ∧ b3)︸ ︷︷ ︸
Layout B2
)
︸ ︷︷ ︸
Cell B
(5.2)
There is always a feasible assignment for equation (5.2) because f is ob-
viously true when all the Boolean variables are set to be true, which corre-
sponds to the trivial situation where the universal alphabet is equal to the
final alphabet such that all candidate layouts can be patterned. In order
to minimize to total cost of the final alphabet, we introduce n additional
clauses and assign them with the letter cost. The updated Boolean expres-
sion is shown in equation (5.3), where f denotes the right hand expression
in equation (5.2).
g = f ∧ b1 ∧ b2 ∧ b3 ∧ b4 (5.3)
The clauses in f must be satisfied because the final alphabet must be
capable of patterning every cell in the library. So we consider every clause in
f as a hard clause. The n additional clauses are considered as soft clauses,
and the weight of each soft clause is exactly the same as the cost of the
corresponding letter.
Now we have formulated AOP as a Weighted Partial Max-Sat problem.
According to its definition, in the optimal solution, the sum of the weights
of the falsified clauses is minimal. Based on equation (5.3), a soft clause
is falsified iff the Boolean variable in that clause is set to be true, meaning
that the corresponding letter is included in the final alphabet. Since the
weight values are identical to the cost values, the optimal solution obtained
by solving the Weighted Partial Max-Sat problem is exactly the optimal
solution of AOP. Note that there is always a feasible solution where every
72
soft clause is falsified, corresponding to every letter being included in the
final alphabet.
5.5.3 Approximation Algorithm
Solving the Weighted Partial Max-Sat problem is also NP-hard [60], and the
runtime may increase exponentially with the problem scale. In this subsec-
tion, we propose an ln (n)-approximation algorithm [61] (called AOPAPX) to
solve AOP efficiently, where n denotes the number of cells in the standard cell
library. We first introduce some notations in order to facilitate introducing
the algorithm.
• Let U denote the universal alphabet with m letters in it, namely
{t1, t2, ..., tm}. Then U = {t1, t2, ..., tm}.
• Let c(ti) denotes the cost of ti (1 ≤ i ≤ m).
• Let Ak denote the obtained alphabet in the kth step, and Ak ⊆ U .
• Let L denote the standard cell library with n cells, namely {e1, e2, ..., en}.
Then L = {e1, e2, ..., en}.
• Let Ck denote the subset of cells that can be patterned by Ak, and
Ck ⊆ L.
• Let OPT denote the cost of the optimal alphabet.
• Let ALG denote the cost of the alphabet obtained by AOPAPX.
Now AOPAPX can be described as follows.
First, if a letter is required by all candidate layouts of a cell, the letter
must be included in the final alphabet. So in the first step, all such letters are
saved in A0. Note that whenever a letter is included in A0, the dependency
is updated such that no cell layout depends on that letter any longer. A
candidate layout of a cell depending on no letters implies that it can be
patterned by A0. So the cell is saved in C0, and the corresponding layout is
utilized for that cell.
Second, for each layout l of the remaining cells in L \ Ck, we compute its
price p(l) by equation (5.4), where Nk denotes the number of cells that can be
73
simultaneously pattered if l is utilized next. Example 5.1 gives an example
of computing the prices for each layout.
p(l) =
∑
j
c(tj) (tj ∈ U \ Ak−1 and l depends on tj)
Nk
(5.4)
Example 5.1. Suppose in the kth step, three cells have not been patterned yet,
each with a single candidate layout, namely l1, l2 and l3. Layout l1 depends on
t1, t2 and t3; l2 depends on t2 and t3; l3 depends on t3 ({t1, t2, t3} ⊆ U \Ak−1).
Then in order to pattern l1, all the three letters (t1, t2 and t3) have to be saved
in Ak. Therefore, the price for l1 in the k
th step is c(t1)+c(t2)+c(t3)
3
. Similarly,
the prices for l2 and l3 are
c(t2)+c(t3)
2
and c(t3)
1
respectively.
Third, the layout with the lowest price (called lmin) is utilized, and con-
sequently, the letters that lmin depends on are all saved in Ak. Then, the
dependency is updated according to the newly added letters. If a cell in
L \Ck−1 has a layout that no longer depends on any other letters, the layout
is considered with the same price as lmin and the cell is saved to Ck immedi-
ately. In Example 5.1, suppose c(t1) < c(t2) < c(t3), implying that l1 has the
lowest price and utilized next. Consequently, t1, t2 and t3 are all saved in Ak.
After that, l2 and l3 can be pattered by Ak as well, so they are considered
with the same price as l1, and all the three cells are saved to Ck immediately.
Fourth, repeat from the second step until Ck = L.
The pseudocode is shown in Algorithm 2.
Theorem 5.2. AOPAPX is an ln (n)-approximation algorithm.
Proof: Because the first step of AOPAPX (line 3 to line 12 in Algorithm 2)
is an optimal approach, we only need to prove that the following steps are
ln (n)-approximation.
In the final solution, one layout is utilized for each cell. Let us order
the layouts in the order that they became patternable, and denote the ith
patternable layout by li.
Lemma 5.2. ALG =
n∑
i=1
p(li)
Proof: Suppose in the kth step, Nk cells are simultaneously patternable
by Lk letters, denoted by {t1, t2, ..., tLk}. Let ALGk denote the incremental
74
Algorithm 2: AOPAPX Algorithm
Data: U = {t1, t2, ..., tm}, L = {e1, e2, ..., en}
Result: Optimal alphabet A
1A⇐ ∅;
2C ⇐ ∅;
3foreach t ∈ U do
4foreach e ∈ L do
5if t is a must for e then
6A⇐ A ∪ {t};
// Update dependency
7update L ;
// Save patternable cells in C
8update C;
9break;
10end
11end
12end
13while C 6= L do
14forall the e ∈ L \ C do
15compute layout prices;
16end
17l⇐ layout with lowest price;
18T ⇐ letters required by l;
19A⇐ A ∪ T ;
// Update dependency
20update L;
// Save patternable cells in C
21update C;
22end
75
cost in the kth step, and we have
ALGk =
Lk∑
i=1
c(ti) = Nk ×
Lk∑
i=1
c(ti)
Nk
(5.5)
Since in each step, the simultaneously patternable layouts, one for each
cell, have the same lowest price, according to equation (5.4) we have
Nk ×
Lk∑
i=1
c(ti)
Nk
=
∑
j
p(lj)(lj patternable in the k
th step) (5.6)
Together we have
ALG =
∑
k
ALGk =
∑
k
∑
j
p(lj) =
n∑
i=1
p(li) (5.7)
Lemma 5.3.
n∑
i=1
p(li) ≤ (1 + lnn)OPT
Proof: At the time that the ith layout li became patternable, there are
at least i − 1 patternable cells in Ck. Since the optimal alphabet is able to
pattern all the remaining n− i+1 cells, the “per-cell” price is at most OPT
n−i+1 .
Because the layout with the lowest price was picked up at each step, we have
p(li) ≤ OPT
n− i+ 1 (5.8)
Thus, the total cost of all utilized layouts is at most
n∑
i=1
p(li) ≤
n∑
i=1
OPT
n− i+ 1
= (1 +
1
2
+
1
3
+ ...+
1
n
)OPT (5.9)
≤ (1 + lnn)OPT
Combing Lemma 5.2 and Lemma 5.3 we have
ALG ≤ (1 + lnn)OPT (5.10)
76
Thus, AOPAPX is an ln (n)-approximation algorithm.
5.6 Feasible Letters for Via Patterning
As mentioned in Section 5.2, some letters might be infeasible to print us-
ing conventional lithography. So far, we only consider the five letters shown
in Fig. 5.7 as feasible letters for via patterning based on the reports in [8]
and [62]. Note that with the development of the DSA process, the overlay
accuracy can be more accurately controlled and some other letters may be-
come feasible as well. Our proposed detailed routing algorithm can be easily
expanded to consider more feasible letters.
b ec dA
B
 
d
200nm
A
B
C
e
(a) single-hole letter (b) regular two-hole letter
(c) diagonal two-hole letter (d) three-hole letter
(e) four-hole letter
Figure 5.7: Feasible letters for via patterning.
In addition, even among the feasible letters shown in Fig. 5.7, the overlay
control of the inside vias is still different, e.g., the single-hole letter has
better overlay accuracy than other letters. We assign each letter a cost
value to describe its ability in overlay control. The letter with better overlay
accuracy has lower cost. Based on the feasible letters, we summarize all
the feasible via patterns for DSA process in Fig. 5.8, where the cost of each
pattern is equivalent to the cost of the corresponding letter used to print
77
it. Note that three vias in L-shape can be patterned by a four-hole letter
by inserting a dummy via, as shown in Fig. 5.9. Therefore, the cost of an
L-shaped three-via pattern should be equal to the cost of a four-hole letter.
Figure 5.8: All the feasible via patterns that can be printed using DSA
guiding templates.
Functional Via Dummy Via Guiding template
Figure 5.9: Three vias in L-shape can be patterned with a four-hole letter
by inserting a dummy via.
5.7 DSA-Aware Detailed Routing Problem
In the previous section, we have demonstrated all feasible via patterns in
Fig. 5.8. If any other pattern exists after detailed routing is completed, in-
feasible letters will be needed and the entire via layer is no longer compatible
with the DSA process. Therefore, our primary objective is to only compose
feasible patterns while inserting vias. On the other hand, it is mentioned
78
previously that each feasible pattern is assigned with a cost value that de-
scribes the overlay accuracy. Since lower-cost patterns have better overlay
accuracy that leads to higher yield, it is preferred to compose lower-cost pat-
terns whenever possible. For example, suppose a total of three vias need
to be inserted. It is preferred to pattern each via individually using three
single-hole letters instead of patterning all vias together with a three-hole
letter. So our secondary objective is to preferentially compose lower-cost
patterns. Based on the above analysis, we define the DSA-aware detailed
routing problem as follows.
Definition 5.4. DSA-Aware Detailed Routing Problem
Given a set of feasible via patterns S, where each pattern has a cost
value, detailed routing with via insertion is performed such that each
produced via layer can be partitioned into a subset of patterns in S,
subject to the constraint that vias on adjacent tracks (horizontally,
vertically or diagonally) belong to the same pattern, and lower-cost
patterns have higher priorities than higher-cost patterns to be com-
posed by the inserted vias.
5.8 Detailed Routing Scheme
A negotiated congestion based scheme [49] is adopted in our DSA-aware
detailed routing algorithm. In order to reduce the adverse effect of improper
net ordering, wire crossing conflicts and infeasible via patterns are initially
allowed, and then resolved over iterations of rip-up and reroute. The key
subproblem of the negotiated congestion based routing scheme is how to
perform maze routing for a single net on the routing graph in the presence
of a set of previously routed nets. In this section, we first solve the key
subproblem by adopting the shortest path algorithm on the routing graph;
then we present the overall negotiated congestion based routing scheme.
5.8.1 Subproblem: Single Net Routing
First, we convert the original routing grid into a routing graph G = (V,E) by
regarding every segment intersection as a vertex in V and segments between
79
vertices as edges in E, and each vertex is assigned with a cost value. We
update the cost for a subset of vertices in V whenever a net is successfully
routed. Based on Fig. 5.7, we denote the cost of a single-hole template
(Fig. 5.7 (a)), a regular two-hole template (Fig. 5.7 (b)), a diagonal two-hole
template (Fig. 5.7 (c)), a three-hole template (Fig. 5.7 (d)) and a four-hole
template (Fig. 5.7 (e)) by ca, cb, cc, cd and ce respectively, and let ci denote
the cost of an infeasible template. Initially, since there are no vias existing
in the layout, every vertex in the via layers has via cost ca, as shown in
Fig. 5.10 (a). When a via is inserted by the previous net routed, its adjacent
via cost is updated accordingly. As illustrated in Fig. 5.10 (b), the grid filled
in red denotes an inserted via by the previous net routed. Then if routing the
current net results in a new via inserted at any one of the green grids, a regular
two-hole via pattern will be composed. Otherwise if a via is inserted at any
blue grid, a diagonal two-hole pattern will be composed. In consequence,
the via cost of the corresponding vertices is updated accordingly. Similarly,
in Fig. 5.10 (c) where two vias have been inserted, a new via inserted at
the yellow grid introduces a three-hole pattern, and inserting another via
at either of the black grids introduces an infeasible pattern. Note that in
Fig. 5.10 (d) where three vias already exist, another via inserted at either of
the purple grids introduces an L-shaped three-hole via pattern that must be
printed using a four-hole template. In consequence, the via cost of these two
vertices is equivalent to a four-hole template cost.
Inserted Via Via Cost = ca Via Cost = cb Via Cost = cc Via Cost = cd Via Cost = ce Via Cost = ci
(a) A blank via layer. (b) Via cost update after 
one via is inserted. 
(c) Via cost update after 
two vias are inserted. 
(d) Via cost update after 
three vias are inserted.
Figure 5.10: The via cost of a subset of vertices is updated once vias are
inserted by a net routed.
From Figure 5.10 we can see that whenever a new via is inserted, it impacts
the via cost of its neighboring vertices. Therefore, we update the cost values
of all such vertices once a net is routed and vias are introduced. Then on
the updated graph, the weighed sum cp = lp + α × vp is a good cost metric
80
to minimize when routing a new net, where lp and vp denote the wire length
cost and total via cost produced by the path p computed respectively, and
α is a user-defined parameter that specifies the relative importance between
them. To minimize cp, we perform the Dijkstra’s shortest path algorithm to
find the optimal path p that connects the source and the target pins.
5.8.2 Overall Routing Scheme
In this subsection, we present the overall negotiated congestion based rout-
ing scheme for the DSA-aware detailed routing problem. We let the nets
negotiate for routing resources by adding history cost to all vertices in V ,
and resolve wire crossing conflicts and infeasible via patterns over iterations
of rip-up and reroute. The cost of a vertex v is computed by the following
formula:
cost(v) = α× hc × nc + β × hi (5.11)
where hc denotes the history cost for a wire crossing conflict, nc denotes the
number of pre-routed nets having crossing conflict with the current vertex,
and hi denotes the history cost for infeasible via patterns. All history cost
is initialized as 1. The scheme works as follows. We first route all nets
sequentially in a random order on G. Then as long as wire crossing con-
flicts or infeasible via patterns exist, iterations of rip-up and reroute will
be performed. When a net i is ripped-up and rerouted, we first remove its
current route, and update the cost values on the vertices it impacts. Then
the shortest path algorithm is performed to compute a new path for net i,
and newly impacted vertices are updated accordingly. In addition, if the new
path causes any conflicts with previously routed nets, the history cost (i.e.,
hc or hi) on the corresponding vertices is incremented by 1. In this way, the
conflicting vertices grow more expensive over iterations, and those nets with
more options will tend to choose alternative routes in subsequent iterations,
so that the conflicts can potentially be resolved. In our implementation, this
procedure will terminate when either no wire crossing conflict or infeasible
via pattern exists, or enough iterations have been performed.
81
5.8.3 Special Challenge for DSA-Aware Detailed Routing
While routing a single net, it is very likely that more than one via is inserted
by the shortest path p computed. The Dijkstra’s shortest path algorithm
is performed on a fixed graph where the cost of vertices is determined by
the previously routed nets, and hence it is unable to consider the conflicts
between the newly inserted vias introduced by p. For example, suppose net
1 is pre-routed, occupying two vias a and b as shown in Fig. 5.11 (a). When
routing net 2, individually occupying via c or via d is allowed, since it either
generates a three-hole template in Fig 5.11 (b) or a single-hole and a two-
hole templates in Fig. 5.11 (c). However, if both via c and via d are occupied
simultaneously, it results in an infeasible template in Fig. 5.11 (d), which is
not allowed.
a b c a b
a b c a bdd
(a) Net 1 occupies via 
a and via b.
(b) Net 2 occupies via c, 
introducing a three-hole template.
(c) Net 2 occupies via d, introducing a 
single-hole and a two-hole templates.
(d) Net 2 occupies both via  c and via d, 
introducing an infeasible template.
Figure 5.11: The shortest path algorithm cannot handle via pattern
conflicts introduced by the current path.
This problem can be automatically resolved by the proposed negotiated
congestion based routing scheme. As mentioned in the previous subsection,
infeasible patterns are allowed first while computing the shortest path p for
single net routing. However, while tracing back p, the infeasible patterns
are captured and the history cost hi for all vias involved in those patterns is
increased. For example, in Fig 5.11 (d), when tracing back net 2, an infeasible
four-hole pattern consisting of via a, b, c and d is captured, and hence the
82
history cost hi of the corresponding vertices is increased incrementally. By
this means, the locations where infeasible patterns have existed become more
expensive for via insertion in subsequent iterations, so that the infeasible
patterns can potentially be resolved over iterations.
5.9 Experimental Results
We implement our approximation algorithm AOPAPX in C++ and em-
ploy SAT4J [57] as the SAT solver. Then we design a 10 nm 1D standard
cell library with 60 individual cells. To better illustrate the performance of
AOPPAX as well as the cost trend, we test the algorithm on 6 subsets of the
entire library. The experiments are performed on a Linux workstation with
2.8GHz CPU and 4GB memory.
Table 5.1: Experimental Results for AOPAPX
] ] Univ. Alph. Rand. Alph. APX. Alph. OPT. Alph. Runtime
Cells Layouts Size Cost Size Cost Size Cost Size Cost (s)
10 1186 113 72.69 9 2.67 5 1.0 5 1.0 0.01
20 2445 176 127.0 14 4.81 7 1.67 7 1.67 0.03
30 4033 313 229.89 16 6.25 8 2.17 8 2.17 0.09
40 7066 425 373.94 19 7.89 10 2.64 10 2.64 0.25
50 8626 443 394.78 26 11.83 10 2.64 10 2.64 0.33
60 9583 486 435.61 31 12.69 10 2.64 10 2.64 0.40
The experimental results are displayed in Table 5.1. The first column
shows the number of cells in each test case, and the second column shows
the total number of candidate layouts obtained through enumeration. Note
that if several candidate layouts of one cell depend on the same alphabet,
we consider them as identical and keep only one of them in the experiments.
Then the third and fourth columns show the size (i.e. the number of letters)
and the normalized cost of the ‘Universal Alphabet’, which is obtained by
enumeration and comprises all the letters that the candidate layouts depend
on. In order to validate the effectiveness of AOPAPX, we first randomly
pick a candidate layout for each cell and compute the ‘Random Alphabet’.
Then we globally optimize the entire library with AOPAPX to obtain the
‘APX. Alphabet’. Comparing the results in the corresponding columns we
83
can see that both the alphabet size and the alphabet cost are reduced ef-
fectively by the proposed optimization approach. In addition, comparing
the ‘APX. Alphabet’ with the optimal alphabet (‘OPT. Alphabet’) we also
observe that the optimal solutions are successfully obtained for every test
case by AOPAPX. Finally, the last column shows the runtime of AOPAPX.
Although AOPAPX is an ln (n)-approximation algorithm, the experimental
results show that the performance of AOPAPX is remarkably promising in
practice, in the sense that each test case can be optimally solved within one
second. Figure 5.12 illustrates the experimental results co-optimizing three
cells together. Before optimization, the alphabet contains seven letters and
the cost is high. After optimization, only three simple letters are needed in
the final alphabet, and the alphabet cost is reduced tremendously.
0
3 5 8 11 13
12
6 14
10 15
1 9
2 7
4
3 5
8
11
13
1
2
6 14
10 15
1 9
2 7
4
1
5
8
113
13
6
7 15
10 12
9 14
2
1 4
5
8
113
13
6
7
1
5
10 12
9 14
1
53 10 147
62
94 13
12
215
8 11 16
1
53
10
14
7
62
94 13
12
215
8 11 16
2
4
Before
Optimization
After
Optimization
1D Standard Cell Library Alphabet
Figure 5.12: The experimental results co-optimizing three standard cells.
Comparing the last three rows in Table 5.1 we observe that the optimal
alphabet is identical for the test cases 4, 5 and 6. This is because the alphabet
obtained in test case 4 is already capable of patterning all the remaining cells
in test case 5 and test case 6, and hence it is not necessary to expand the
alphabet size even further. In standard cell library development for a certain
application, it is highly possible that the alphabet may reach a ‘saturation’
size, where further expanding the library scale will not have much impact
on the alphabet cost. Our approach is extremely useful in obtaining the
‘saturated’ alphabet with the lowest DSA manufacturing cost in practical
applications.
The DSA-aware detailed routing algorithm is implemented in C++ on
a Linux machine with 1.7GHz CPU and 4GB RAM. Figure 5.13 shows a
toy example that compares the via layer produced by a conventional router
and our proposed DSA-aware router. The initial input netlist is provided
84
in Fig. 5.13 (a). Then Fig. 5.13 (b) illustrates the results of conventional
detailed routing. From Fig. 5.13 (b) we can see that three infeasible via
patterns are introduced inside the dashed red circles, meaning that the eleven
conflicting vias are unprintable using feasible letters. Hence, the via layer is
not compatible with the DSA process. The routing results produced by the
DSA-aware detailed router are shown in Fig. 5.13 (c), where no infeasible
via pattern exists, and in consequence, the entire via layer can be patterned
properly using feasible letters only.
1
2
43 1
2
5
6 9 3
87
7 4
6
958
(a) Input netlist. (b) Routing results w/o considering 
DSA template constraints.
(c) Routing results produced by 
the DSA-aware detailed router.
1 Pin Via Horizontal wire on layer 1 Vertical wire on layer 2
Figure 5.13: The comparison between conventional detailed routing and
DSA-aware detailed routing performed on a toy netlist.
We then perform the experiments on a set of benchmarks with different
netlist sizes, and the experimental results are shown in Table 5.2. The first
column shows the number of nets in each test case. Then each pair of the
remaining columns compares the conventional routing results and the DSA-
aware routing results in terms of total number of vias, the number of conflict-
ing vias, total wire length, required routing area and runtime, respectively.
Compared to the conventional router, the DSA router inserts fewer vias for
each test case, and all those vias can be patterned by feasible letters. In other
words, the via layers produced by our proposed router are completely com-
patible with the DSA process. Even though the wire length cost, the required
routing resources and the runtime for the DSA router are slightly worse than
the conventional router, enabling DSA process for via layer patterning in the
7 nm technology node will tremendously reduce the manufacturing cost and
improve the throughput for IC fabrication.
85
Table 5.2: Comparison between Conventional Detailed Routing and
DSA-aware Detailed Routing
] Net
] Vias ] Conf. Vias Wire Length(µm) Area(µm2) Runtime(s)
Conv. DSA. Conv. DSA. Conv. DSA. Conv. DSA. Conv. DSA.
1k 2328 2168 1036 0 429.64 447.08 27.04 40.96 7 11
2k 4552 4272 1746 0 882.52 897.28 64 84.64 19 29
3k 6654 6360 2435 0 1310.76 1349.92 100 144 45 57
4k 8818 8582 2587 0 1758.68 1801.64 144 174.24 60 102
5k 11122 10904 3639 0 2202.64 2259.28 174.24 196 99 174
6k 13640 13316 5298 0 2646.8 2737.88 184.96 219.04 232 354
5.10 Conclusions
For the 7 nm technology node and beyond, DSA technology has presented a
unique opportunity of DSA patterning and layout design co-optimization for
improving the manufacturability of DSA. This chapter first studies the DSA-
aware 1D standard cell library designing and optimization problem. For the
first time we define the cost function of DSA guiding templates based on the
overlay inaccuracy of DSA patterns. Then we divide the entire optimization
problem into two subproblems and conquer them sequentially. The first sub-
problem is trivial due to the limited cell size and can be solved through simple
enumeration. We then prove that the second subproblem (AOP) is NP-hard
and formulate it as a Weighted Partial Max-Sat problem to obtain the op-
timal solution. In order to improve the efficiency, an ln (n)-approximation
algorithm (AOPAPX) is proposed and tested on a set of test cases. Exper-
imental results show that an optimal solution can be obtained within one
second, confirming the efficiency and effectiveness of our algorithm. Follow-
ing that, a DSA-aware detailed routing algorithm for via layer optimization
is proposed next. The DSA template constraints are considered while in-
serting vias during the detailed routing process, such that the produced via
layers can be fully patterned by feasible letters. The proposed works enable
DSA process for contact and via layer patterning in the 7 nm technology
node, which tremendously reduces the manufacturing cost and improves the
throughput for IC fabrication.
86
CHAPTER 6
BLANK DEFECT MITIGATION FOR EUV
LITHOGRAPHY
6.1 Introduction
Extreme ultraviolet lithography (EUVL) is a leading candidate for the next
generation lithography (NGL) with finer resolution. However, the technology
is facing several challenges before the mass production. Besides the difficul-
ties in exotic light source setup and the tuning of the resist for line edge
roughness and sensitivity, the chip fabrication with defective blanks remains
a huge challenge. Although the defect density and size are being reduced year
by year, the current progress is far from sufficient [12,13]. The size of the min-
imum printable mask defect for the 11 nm technology node can be as small
as 20 to 25 nm [14]; however, according to the recent study [63], currently
blank suppliers are only able to achieve 1-digit number of defects at 60 nm
in size, and the defect level at 23 nm is up to 70 within 132×132 mm2 blank
area. According to the latest ITRS roadmap [64], the required defect size
for the 11 nm technology node is 13 nm. To achieve the stringent defect-free
requirement involves the collaboration of many aspects such as high quality
blank substrate material (low thermal expansion material) fabrication, sub-
strate polishing, substrate cleaning, blank handling, ML deposition, and high
sensitivity substrate and blank defect inspection, which may largely increase
the EUVL cost of ownership [14]. Instead, it is much more cost-efficient to
allow a certain number of printable defects on the blank and mitigate their
impact by covering them with device patterns in later mask fabrication pro-
cess. The device patterns block the out-of-phase light from the defect such
that the mask defect will not impact the printing on wafer [15].
Fig. 6.1 illustrates four defect coverage situations. Defect A is not cov-
ered by any pattern, and might be printed on wafer and cause potential
problems [65]. Defect B is in an even worse situation as it is only partially
87
Figure 6.1: Cover blank defects with device patterns to mitigate their
impact. The impacts of defects A and B are not mitigated. The impacts of
defects C and D can both be mitigated but with different tolerance to
inspection inaccuracy.
covered by a pattern, which damages the boundary of the pattern and causes
severe wire breaking or shortening problems [66]. Defects C and D are both
completely covered by patterns, and hence their impact can be successfully
mitigated [67]. However, there are usually some errors existing in blank de-
fect inspection and fiducial mark position measurement [68]. The tolerance
of such errors is decided by the minimum distance from the boundary of the
defect to the boundary of the device pattern which covers the defect. As
shown in Fig. 6.1, compared to defect C, defect D is located under the cen-
ter of a larger pattern; therefore, it has greater tolerance to inspection and
measurement errors.
In EUVL mask fabrication, there is usually some freedom for the layout to
shift on the blank within a certain margin [69]. According to SEMI standard
P37-1102 [70], the available area on an EUV blank is fixed to 142 mm by
142 mm. The sizes of the layouts are usually smaller than the available area
of the blanks, and the reticle holder allows masks to shift within a certain
margin, which leaves a certain freedom to shift the layout on the blank, as
shown in Fig. 6.2. Hence, if the size and the number of defects on the blank
are kept below a certain level, it is possible to find a location to place a layout
onto a blank, such that all the defect impacts are avoided.
The rest of this chapter is organized as follows. Section 6.2 presents an
efficient algorithm to shift a single layout on a defective blank, such that
all defects are completely covered by the device patterns with maximum
covering margin. In many cases, it is impossible to completely mitigate all
defect impact if multiple dies are tied and moved together; hence we further
explore the flexility of individual die shifting in Section 6.3. Even with that,
100% success rate in complete defect mitigation can never be guaranteed
since this also depends on the designs and defect maps. Targeting imperfect
88
Blank
Layout
Feature absorber
Figure 6.2: EUV mask: the blank and the layout.
defect mitigation between one pair of design and blank, we finally develop
an optimal design-blank matching strategy in Section 6.4 to match multiple
designs and defective blanks simultaneously. The related work is published
in [71–74].
6.2 Layout Shifting Considering Inspection Inaccuracy
In a full-chip EUV mask, there can be billions of patterns. When shifted
together, it is a challenging problem to find the optimal layout location effi-
ciently, where all the defects are completely covered by the device patterns
with the maximum error tolerance. Some previous work [69, 75, 76] on the
layout relocation problem considered both empty regions and the regions
under device patterns as feasible regions to place the defects, among which
Zhang [76] reported much better performance than the other two. However,
the defects in empty regions may also be printed on wafer and potentially im-
pact circuit performance and the fabrication of other layers. In addition, it is
very costly for the previous work to find all feasible locations and determine
the best one with the maximum tolerance for defect position inaccuracy. Yan
et al. mentioned a simulation tool in their paper [14] with which the user
can decide the size of the absorber covering each defect. However, since there
are billions of absorbers of varying size in one layer, it is extremely difficult
to manually optimize the absorber size for each single defect, such that all
defects are successfully covered and the global error tolerance is maximized.
89
In this section, we propose an efficient algorithm to find all feasible loca-
tions to place a layout on a defective blank, such that all defects are com-
pletely covered by the device patterns in the layout. In the mean time, we
are able to report the optimal location with the maximum tolerance to in-
spection inaccuracy. In the first step of the algorithm, the impact region
for each defect is extracted whose size is approximately equal to the shift
margin of the layout. Then we formulate the layout relocation problem as a
rectilinear polygon shrinking and intersection problem. To improve the time
efficiency of our algorithm, an improved strategy is developed in the step of
intersection, which successfully reduces the time complexity from O(knlogn)
to O(kn) where k denotes the number of defects and n denotes the maximum
number of patterns in the impact regions.
6.2.1 Problem Definition
In order to cover the defects on the blank, three types of operations may
be applied to the layout – flip, rotation and shift. Flip and rotation offers
at most 8 different orientations to place the layout, depending on the re-
quirement of the process [76]. The shift operation can be applied to each
orientation independently to obtain all feasible locations for all orientations.
Then the globally optimal orientation and location to place the layout can be
finally determined by comparing the optimal solutions in each orientation.
Therefore, the key problem is how to efficiently shift the layout to find the
optimal layout location for a given orientation. Hence, we assume the layout
orientation is fixed and only consider layout shifting while looking for the
optimal solution for defect coverage.
As we have mentioned previously, the position of each defect relative to the
fiducial marks may not be measured accurately, and the offset between the
real defect position and the inspected position may have different directions
for different defects. The optimal solution should be able to tolerate such
inaccuracy in any direction. To simplify the problem, each defect can be
characterized by the rectangular bounding box of its impact range. Then the
error tolerance (ET) for a single defect is defined as the minimum distance
from the boundary of the defect to the boundary of the pattern covering it,
and the global error tolerance (GET) is defined as the minimum error
90
tolerance among all defects on the blank.
Figure 6.3: The comparison of GET between two different solutions. The
first solution in (b) has 10 nm GET which is better than the second
solution in (c) with 0 GET.
Fig. 6.3 illustrates the definition of ET and compares the GET between
two different relocation solutions for the same layout and blank. In solution
A shown in Fig. 6.3 (b), the ET of defect 1 is 20 nm, and 10 nm for defect
2. According to the definition, the GET for solution A is 10 nm. In solution
B shown in Fig. 6.3 (c), both defects touch the boundary of the covering
pattern, and hence the ET is 0 for either of them. Therefore, even though
defect 2 is covered by a larger pattern in solution B comparing to solution
A, its GET is still 0, which is worse than solution A with 10 nm GET.
Based on the above analysis and definitions, the defect mitigation problem
can be defined as follows.
Definition 6.1. Single Layout Shifting Problem
Given a layout with a set of rectilinear polygon patterns, a defect
map with the size and location of each rectangular defect and the
shift margin (4x,4y) for the layout to shift on the blank, find the
optimal location to place the layout on the blank, where all defects are
completely covered by device patterns and the GET is maximal.
91
6.2.2 Solution to Single Layout Shifting Problem
Step 1: Impact Region Extraction
Due to the limitation of the shift margin, the candidate device patterns
to cover one particular defect are also limited within a partial region of the
whole layout. We define the impact region (IR) of a defect as the minimum
rectangular region on the layout within which the defect can be located by
layout shifting. Only the device patterns within or partially within the IR
can potentially cover the defect. As shown by the dashed green regions in
Fig. 6.4, the width/height of the IR is equal to the width/height of the defect
plus the shift margin in the horizontal/vertical direction. In Fig. 6.4, pattern
A is partially located within the IR of defect 1, and pattern B is completely
within the IR of defect 2. However, pattern C and pattern D do not touch
either IR. Therefore, when the layout shifts within the shift margin, pattern
A may potentially cover defect 1, and defect 2 may be covered by pattern
B. But pattern C and pattern D can never cover either defect. Therefore,
the device patterns outside any IR are not worth considering at all for the
purpose of defect mitigation.
Blank
Layout
Shift Margin
 x
 y
1
2
 x
 y
 x
 y
C
A
B
D
Figure 6.4: Illustration of the impact region.
92
In the first step of the problem formulation, the IRs for every defect are
extracted according to the shift margin as well as the sizes and locations of
the defects. Then in the following steps, only the device patterns located
within or partially within an IR have to be considered for defect coverage.
Step 2: Impact Region and Device Pattern Shrinking
As mentioned in subsection 6.2.1, each defect is considered as a rectangle
in our problem formulation. As long as the rectangle is completely covered
by any device pattern, its impact will be successfully mitigated. With such
simplification, the defects only vary from each other in size, but the device
patterns can still have different shapes and sizes. In order to reduce the
number of variables and further simplify the problem, we consider each defect
as a single point at the center of the rectangle; then shrink its IR and the
device patterns located within the IR by the size of the defect. After that, as
long as the defect point is under one shrunk pattern by layout shifting, the
original rectangular defect can be completely covered by the device pattern.
The operation of device pattern shrinking is illustrated in Fig. 6.5 (a). Fig. 6.5
(b) and Fig. 6.5 (c) illustrate the defect and its IR with device patterns before
and after shrinking respectively. From Fig. 6.5 (c) we also notice that the
size of the shrunk impact region (SIR) is the same as the size of the shift
margin, which does not depend on the defect size.
Wd
Hd
Wd
1
2
_
Wd
1
2
_ Wd
1
2
_
1
2
_ Hd
1
2
_ Hd
1
2
_ Hd
Defect
Original Pattern
Shrunk Pattern
 x
 y
 x
 y
(a) The shrinking operation of a 
device pattern.
(b) The original defect, IR and 
device patterns.
(c) The defect, SIR and device 
patterns after shrinking.
Defect
IR
1
2
Defect
SIR
1
2
Figure 6.5: The shrinking of defect, IR and device patterns.
93
Step 3: Shrunk Impact Region Intersection and Coordinate Transformation
By IR and device pattern shrinking, the layout is partitioned into a number
of equally sized SIRs, and the number of SIRs is the same as the number of
defects on the blank, as illustrated in Fig. 6.6.
Blank
Layout
Shift Margin
 x
 y
1
2
 x
1
2
(a) The IR of each defect.
(b) IR and device pattern 
before and after shrinking. (c) SIRs.
 y
 y
 x
1
2
 x
 x
 y
 y
Figure 6.6: The layout is partitioned into SIRs by IR and device pattern
shrinking.
Since the original layout has to be shifted together, the shifting direction
and distance for all SIRs are identical as well. As illustrated in Fig. 6.6 (c),
before layout shifting, every defect point is located at the top right corner of
its SIR, so the locations of different defects relative to their SIRs are always
identical no matter how the layout is shifted. Therefore, the intersection of
all SIRs provides the feasible regions to locate all defects.
In order to find the optimal solution with the maximum GET, we first
intersect the SIRs to obtain all feasible regions. As shown in Fig. 6.7 (c), there
are two feasible regions obtained by SIR intersection. The error tolerance
ability of a feasible region is defined as the maximum distance from one
inner point of the region to the closest boundary. In the demo illustrated by
Fig. 6.7, the error tolerance ability of region A is obviously larger than that
of region B. In fact, the maximum GET should be equal to the maximum
error tolerance ability among different feasible regions. Hence it can be easily
determined that the red point in region A is one optimal location for both
defects.
Then in the next step, the optimal location to place the defects in the SIRs
94
 =
(a) SIR 1 (b) SIR 2
(c) Feasible regions and 
the  error tolerance ability.
A B
Figure 6.7: Find all feasible regions by SIR intersection.
is converted to the optimal vector to shift the layout on the blank by simple
coordinate transformation. Let (xd, yd) denote the coordinate of a defect,
and (xo, yo) denote the optimal location to place the defect in its SIR; then
the optimal layout shifting vector is simply (xd − xo, yd − yo).
Time Complexity Analysis
So far the defect mitigation problem has been formulated into pure compu-
tational geometry problems, which can be solved by calling a popular public
solver [77]. As we have mentioned previously, the size of each SIR is equal to
the size of the shift margin, which is usually fixed for a certain process. Let
n denote the maximum number of device patterns in one IR which is limited
by the minimum width/space design rules, and let k denote the number of
defects on the blank. Then it takes O(kn) time for IR and device pattern
shrinking, and O(knlogn) time for SIR intersection, assuming each device
pattern has limited number of vertices. Finally, it takes O(n) time to find
the optimal solution among all feasible regions. Therefore, the time com-
plexity for the whole algorithm is O(kn) +O(knlogn) +O(n) = O(knlogn).
6.2.3 Improved Strategy for SIR Intersection
In this subsection, we propose an improved SIR intersection strategy to re-
duce the time complexity of the algorithm. As shown in Fig. 6.6(c), each SIR
is the same size as the shift margin. Instead of applying the solver directly
to obtain the intersection result between two SIRs, we first partition each
SIR into a number of equally sized tiles, and the maximum number of device
patterns in one tile can be considered as a constant as long as the tile size is
95
small enough, such that the number of tiles is comparable with the number
of device patterns n. By this means, to distribute the n patterns into the
tiles takes O(n) time, and it takes constant time to get the intersection result
for each pair of tiles. Since the number of tiles in each SIR is comparable
to n, to get the complete intersection result between two SIRs takes O(n)
time as well. Therefore, the time complexity for the step of SIR intersec-
tion is reduced from O(knlogn) to O(kn). According to section 6.2.2, since
SIR intersection is the dominant term in runtime, the time complexity for
the whole algorithm is also reduced to O(kn) by the improved intersection
strategy.
In fact, not all device patterns are qualified for defect coverage since some
patterns are even smaller than the defects, which are directly eliminated by
device pattern shrinking. For different defect size, the number of valid device
patterns in the SIR is also different. Therefore, by partitioning each SIR into
very small tiles, there are usually many empty tiles without any patterns
inside, especially for very large defects. In each SIR, a tile is defined as a
valid tile if and only if it is not empty. As illustrated in Fig. 6.8(a), three
SIRs are extracted for three different defects, and each SIR is partitioned into
four equally sized tiles. Since all SIRS are with the same size and partition,
the tile maps in each SIR are identical as well. The tiles with the same
relative location in the SIRs are labeled with the same name. In Fig. 6.8(a),
tiles b, c, d are valid tiles for defect 1, tiles a, c, d are valid tiles for defect 2,
and tiles a, b, c are valid tiles for defect 3. Since we are looking for the feasible
regions which are able to mitigate the impact of all defects, we only need to
consider the tiles which are valid in all SIRs. In our algorithm, a truth table is
built during device pattern shrinking to remember the validation of each tile,
which is shown in Fig. 6.8(b). According to the global truth table obtained
by Boolean AND operation, only tile d is valid for all defects. Therefore
in Fig. 6.8(c), we only consider tile d during the step of SIR intersection.
The global truth table is updated whenever an original valid tile becomes
invalid by intersection, such that the tile will no longer be considered in the
following operations. In order to quickly identify invalid tiles to save the
computational effort, we sort all SIRs first according to the defect sizes, and
then always intersect the SIRs with larger defects first, because fewer device
patterns are valid for larger defects and the intersection of two smaller sets
of patterns is more likely to create an empty set, which identifies an invalid
96
∩ ∩
0 1
1 1
1 0
1 1
1 1
10
0 0
10
& & =
SIR 1 SIR 2 SIR 3
∩ ∩
a b
c d
a b
c d
a b
c d
1 2 3
(a) Intersection among three SIRs.
(b) Truth table to remember valid tiles.
(c) Only tile d has to be considered for intersection.
a b
c d
a b
c d
a b
c d
1 2 3
Figure 6.8: Invalid tiles are discarded for the intersection operation.
tile by the definition.
By discarding invalid tiles, the efficiency of our algorithm can be improved
even further. If there exist several large defects on the blank, the number of
valid tiles will be very limited, which can be determined quickly. If there is
no feasible region to mitigate all defect impact, our algorithm may terminate
without running through all SIRs.
97
6.2.4 Experimental Results
We implement our algorithm using C++ on a workstation with an Intel
Xeon E5620 2.40GHz CPU and 36GB memory. Then we carry out our
experiments with the 11 nm designs which are generated from the Metal
1 layer of scaled Nangate Standard Cell Library [78]. All device patterns
are random 2D polygons and the average pattern density is 2.5× 106/mm2.
According to the latest ITRS report [64], the minimum half pitch for the 11
nm technology node is around 15 nm. So with 4X scaling, on the blank the
minimum absorber width is around 60 nm, and the maximum absorber width
is set up to be 170 nm. The defect maps are randomly generated where each
defect is represented by a square.
(115, 130)
(a) Pre-shift layout on the blank (b) Post-shift layout on the blank
Figure 6.9: The toy test of our algorithm. Before layout shifting, all the 4
defects impact printing in (a), while after shifting the layout by (115, 130),
all defect impact is completely mitigated by device pattern coverage in (b).
Fig. 6.9 shows the result of a toy test with our implementation, in which
the layout is a simple AOI22 cell [78] and the blank has 4 defects on it. Our
algorithm determines (115 nm, 130 nm) as the optimal shifting of the layout
on the blank, where all defects are successfully covered by device patterns and
the GET is maximized. The whole process takes less than 1µs to complete.
With the result of our algorithm validated by the toy test, we perform the
remaining experiments on layouts and blanks with practical sizes.
98
With the increase of defect size, the number of valid patterns which are able
to completely cover the defect is reduced. Fig. 6.10 shows the relationship
between the defect size and the percentage of valid device patterns in our
design.
Figure 6.10: The relationship between the defect size and the percentage of
valid device patterns in our design.
The defect mitigation ability for a certain design depends largely on the
sizes of the device patterns. In our cell library, since the pattern size is
between 60 nm and 170 nm on the blank, any defect smaller than 60 nm can
be completely covered by all device patterns, but no pattern is able to cover
the defects larger than 170 nm. From Fig. 6.10 we can also see that in our
design, most patterns are between 60 nm and 100 nm in size, and only 30%
of the patterns can cover defect larger than 100 nm. Table 6.1 shows the
impact of defect size on the defect mitigation results. In each experiment,
the blank is programmed with 10 randomly distributed defects with identical
size, and the shift margin for all groups of experiments is fixed to be 400 µm
by 400 µm.
The mitigation difficulty for each defect mainly depends on the defect size.
For a smaller defect, there is more room to shift the layout on the blank
while keeping the defect covered by device patterns. As shown in Table 6.1,
the total area of feasible regions for complete defect mitigation decreases
dramatically with the increase of defect size. In the mean time, the GET
of larger defects is also less that of smaller defects. Therefore, it is very
important to control the defect size in order to successfully mitigate defect
impact. With technology improvement, the defect size has been reduced year
by year. However, besides the size, the number of defects also has a large
99
Table 6.1: Impact of Defect Size on Defect Mitigation Results
Defect Size Feasible Region Area GET Runtime
(nm× nm) (nm2) (nm) (s)
10× 10 9.9× 105 44 496.01
20× 20 4.2× 105 37 474.04
30× 30 2.2× 105 18 480.12
40× 40 7.5× 103 10 452.35
50× 50 0 0 450.5
impact on defect mitigation. In Table 6.2, the size of each defect is as small
as 20 nm, but the number of defects on each blank varies from 2 to 16 with
fixed shift margin of 400 µm by 400 µm.
Table 6.2: Impact of Defect Number on Defect Mitigation Results
Defect Number
Feasible Region Area GET Runtime
(nm2) (nm) (s)
2 1.1× 1010 150 108.39
4 1.9× 109 112 210.53
6 1.0× 108 103 302.73
8 2.8× 106 45 392.86
10 4.2× 105 37 474.04
12 1.6× 104 28 554.98
14 3.3× 102 14 627.36
16 0 0 706.55
From Table 6.2 we can see that both the area of feasible regions and the
GET are reduced dramatically with the increase of the number of defects
on the blank. This is because successful defect mitigation requires that all
defects are covered simultaneously by device patterns. Suppose there are n
defects randomly distributed on the blank and the probability of covering
one of them is p. Then the probability of covering all defects simultaneously
is approximately pn, which reduces exponentially with the increase of defect
100
number n. In addition, with less probability for successful defect coverage,
the GET drops as well. Thus, reducing the defect number is as important as
the defect size control. However, the number of blank defects is very difficult
to control, especially for the small ones under 20 nm. Hence other methods
to improve the defect mitigation rate have to be investigated, among which
enlarging the shift margin might be a potentially effective way. Table 6.3
shows the impact of shift margin on defect mitigation, where each blank is
programmed with 12 defects with equal size of 20 nm, and the shift margin
varies from 200 µm by 200 µm to 800 µm by 800 µm.
Table 6.3: Impact of Shift Margin on Defect Mitigation Results
Shift Margin Feasible Region Area GET Runtime
(µm× µm) (nm2) (nm) (s)
200× 200 1.8× 103 14 149.61
400× 400 1.6× 104 28 554.98
600× 600 3.3× 104 39 1361.86
800× 800 5.5× 104 45 2368.64
We can see from Table 6.3 that by enlarging the shift margin of the layout,
both the area of feasible regions and the GET are significantly improved. This
is because the shift margin enlargement offers more candidate device patterns
to cover each defect, which increases the successful rate of simultaneous defect
coverage. Meanwhile, with more valid patterns to choose from, the optimal
solution with the maximum GET also improves.
In addition, comparing Table 6.1, 6.2, and 6.3 we also observe that the
runtime is approximately linear with respect to the number of defects and
the size of the shift margin, but does not vary much with the defect size.
This is because the strategy introduced in Subsection 6.2.3 reduces the time
complexity of SIR intersection; hence the step of device pattern shrinking
dominates the total runtime of algorithm, which is linear in the number of
defects on the blank as well as the maximum number of device patterns in
the SIRs.
101
6.3 Defect Mitigation through Multi-Die Placement
In reality the die size is usually much smaller than the exposure field. To
improve the throughput, there will be multiple copies of a die on each blank.
In this case, it is not necessary to always shift all dies together as a whole
layout in order to mitigate defect impact. Instead, each die can be shifted
individually, which offers more freedom for defect impact mitigation [79].
6.3.1 Find All Feasible Relocation Positions
Problem Definition
In order to place multiple dies, we first develop an efficient algorithm to find
all relocation positions to place one valid die within the exposure field, such
that all defect impact is completely mitigated. As mentioned perviously,
since die size can be much smaller than the size of the exposure field, the
exposure field usually accommodates multiple dies. Any defect lying within
the die area must be covered completely with features of the particular die in
order to mitigate its impact, as defect A in Fig. 6.11. Otherwise if the defect
is outside any die area, it is ineffective and not necessary to be considered,
as defect B in Fig. 6.11.
In Fig. 6.11, if the features in die 3 fail to cover defect A completely,
the die becomes invalid since part of the die might not work correctly due
to the impact of the defect. In order to maximize the number of valid dies
within the exposure field, all relocation positions to place a single die must be
determined first, defined as feasible regions. Fig. 6.12 shows a toy example
where the die has only two features in the layout and there is only one defect
on the blank. As shown in Fig. 6.12(b), as long as the bottom left corner of
the die is located within the feasible regions (dashed regions in green), the
defect will be either outside the die area or covered completely by one feature
in the die, and hence the defect impact is mitigated.
102
Die 3
Exposure Field
Die 4
Die1 Die 2
A
B
Figure 6.11: Defect A lies within the die area and must be covered by
features. Defect B is outside any die area and does not need to be
considered.
Defect
Exposure Field
Die
Feature
(a) A toy example with one defect on 
the blank and two features in the die.
(b) Feasible regions to place the 
bottom left corner of the die.
D
ie
 H
e
ig
h
t
Exposure Field
D
ie
 H
e
ig
h
t
Die Width
Die Width
Figure 6.12: A toy example to illustrate the feasible regions to place the
bottom left corner of the die for defect mitigation. Note that the die area
can never move out of the exposure field.
103
Definition 6.2. Feasible Region Exploration Problem
Given a full die layout and an exposure field with a certain number
of rectangular defects, the objective is to find all feasible regions, such
that as long as the bottom left corner of the die lies within any feasible
region, all the defects are either outside the die area or completely
covered by the features in the die, and the entire die is located within
the exposure field.
Blank Region Partition
Since no part of the die can be shifted outside the exposure field, the bottom
left corner of the die can only be located within a rectangular region, as
illustrated by the dashed green region in Fig. 6.13. In addition, because die
size is much smaller than the size of the exposure field, for any defect, it is
effective if and only if the defect is within the die area. If the die is located far
away from the defect, the defect will never impact the die and is not worth
considering. Therefore, according to the position and size, each defect has a
unique impact range. As shown in Fig. 6.13(a), as long as the bottom left
corner of the die is located within the impact range of the defect, part of the
die may get impacted, and the defect is called an effective defect for the
die.
According to the impact range of each defect, the valid region in the expo-
sure field to locate the bottom left corner of the die can be partitioned into
several rectilinear polygon regions, which are defined as blank regions, and
if the bottom left corner of the die lies within the same blank region, the die
will be affected by the same group of defects. As illustrated in Fig. 6.13(b),
there are three defects (A, B and C) within the exposure field, and the ex-
posure field is partitioned into 8 blank regions based on the impact range
of each defect. The entire region 1 is a feasible region which is outside the
impact range of any defect. If the bottom left corner of the die is located
in this region, all defect impact is completely mitigated. In region 2, the
die is only impacted by defect A, and hence it is not necessary to consider
defect B and C for this region. Similarly, in regions 3, 5 and 8, only one
defect is effective. However, in regions 4 and 6, two defects become effective
and should both be considered. Region 7 is the only region where all defects
104
Impact Range
Die
Defect
Exposure Field Exposure Field
A
B
C
1
2
3 4
5
6
7
8
(a) The impact range of a defect. (b) The blank regions of the exposure field.
Die Width
D
ie
 H
e
ig
h
t
Die Width
D
ie
 H
e
ig
h
t
Figure 6.13: The valid region in the exposure field to place the bottom left
corner of the die is partitioned into 8 blank regions based on the impact
range of each defect.
become effective and must be considered simultaneously.
For computational convenience, we can further partition each blank region
into rectangles and consider each rectangle independently. Based on the
number of effective defects, we will develop three strategies in the following
subsections to find all feasible regions in each blank region. After that, the
union of all blank regions provides all layout relocation positions throughout
the entire blank.
Blank Region with No Effective Defect
Since die size is very small compared to the size of the exposure field, and
there are limited numbers of defects on each blank, usually there exist some
blank regions which are outside the impact range of any defect. As illustrated
by blank region 1 in Fig. 6.13(b), as long as the bottom left corner of the die
is located within this region, no defect will locate within the die area, and
hence all defect impact is completely mitigated. Thus, all the blank regions
where no effective defect exists are reported as feasible regions directly.
105
Blank Region with Single Effective Defect
The defects are randomly distributed on the blank, and usually most blank
regions are those with only one effective defect, as region 2, 3, 5 and 8 in
Fig. 6.13(b). Fig. 6.14 takes region 3 as an example to illustrate our three-
step feasible region exploration strategy for this case.
Step 1: Impacted Feature Extraction
As shown in Fig. 6.14(a), when the bottom left corner of the die moves in
region 3, part of the die will be impacted by defect B, which is defined as
impacted die area and marked as 3′ in Fig. 6.14. The width/height of 3′
is equal to the width/height of 3 plus the width/height of the defect. In this
example, there are three features in the die, F1, F2 and F3. However, when
the die moves within blank region 3, feature F1 will never touch defect B, and
hence only F2 and F3 can possibly cover the defect. The features located
within or partially within the impacted die area are defined as impacted
features. In this step, we extract all the impacted features and consider
them as possible candidates for defect coverage.
Step 2: Impacted Feature Shrinking
In the second step, the impacted features and the impacted die area 3′
are shrunk by the size of the effective defect, as shown in Fig. 6.14(b). The
shrunk die area is marked as 3′′, which is the same size as the blank region
3. After shrinking, as long as the center point of the defect is covered by any
shrunk feature located inside region 3′′, the whole defect will be completely
covered by the original features.
Step 3: Shrunk Die Area Rotation and Shift
In order to cover the center point of the defect with shrunk features in 3′′
in Fig. 6.14(b), the whole die has to be shifted to the upper right direction.
Originally the bottom left corner of the die overlaps with the bottom left
corner of the blank region 3, and the center of the defect is located at the top
right corner of region 3′′. By shifting the die, wherever the top right corner
of 3′′ can be covered by a shrunk feature in the die, the corresponding die
location is considered as a feasible location in blank region 3. Therefore, the
shrunk features in 3′′ and the feasible regions in 3 are diagonally symmet-
ric. Hence all feasible regions in blank region 3 can be simply obtained by
rotating 3′′ 180 degree and shifting it to the position of 3, as illustrated in
Fig. 6.14(c). And the final feasible regions are shown by the purple polygons
106
Die
B
3
3''
Die
F3
B
3
3'
Die
(a) Find the impacted features 
in the impacted die area.
(b) Shrink the impacted features and 
die area by the size of the defect.
B
3
3''
Die
(c) Rotate the shrunk die area 180 
degree and shift it to the blank region.
B
3
(d) All feasible regions in the 
blank region are finally obtained.
F1
F2
Figure 6.14: The strategy to explore all feasible regions for a blank region
with single effective defect.
in Fig. 6.14(d). As illustrated in Fig. 6.14(d), as long as the bottom left
corner of the die is located within a feasible region, the whole defect will be
completely covered by a feature in the die.
107
Blank Region with Multiple Effective Defect
Sometimes several defects are clustered close to each other, such that when
the die is located within some blank regions, it can be impacted by multiple
defects simultaneously, such as region 4, 6 and 7 in Fig. 6.13(b). The objective
is to find the feasible regions to locate the die where all defects are covered
by features simultaneously. In this situation, we first consider each defect
separately and apply the same strategy as illustrated in Fig. 6.14 to find
the feasible regions for each individual defect. After that, the final feasible
regions are obtained by intersecting all different sets of feasible regions.
Algorithm
So far, we have been able to find all feasible regions in each blank region,
the union of which gives the complete set of feasible regions throughout the
entire mask. Since the blank regions are independent from each other, simple
parallelism can be applied to improve the efficiency of the algorithm. In our
algorithm, a public solver [77] is adopted to solve the detail computational
geometry problems such as polygon shrinking and intersection. The detailed
algorithm is described in Algorithm 3.
Experimental Results
We implement our algorithm using C++ on a workstation with an Intel
Xeon E5620 2.40GHz CPU and 36GB memory. Then we carry out our
experiments with a 11 nm design which is generated from the Metal 1 layer
of scaled Nangate Standard Cell Library [78] and with a 4X reduction factor.
There are around 10 million features per centimeter square in each layout.
The defects are randomly distributed on the blank with size varying from
50 nm to 200 nm. The size of the exposure field is 10.4 cm by 13.2 cm.
Table 6.41 shows the experimental results with different die sizes and defect
numbers, as well as the runtime comparison between our algorithm and so
far the best published algorithm [76] we can find.
1Due to the limitation of the memory size for our machine, the maximum number
of defects is set to be 8 in our experiments. However, with reasonable increase in the
runtime, the same algorithm can be directly applied to a blank with 20 defects, which
requires around 80GB memory for a small die.
108
Algorithm 3: Feasible Region Exploration Algorithm
1: for all Defect curDefect on blank do
2: for all Feature curFeature in the die do
3: shrunkFeature← Shrink curFeature by the size of curDefect
4: Insert shrunkFeature into layouts[curDefect]
5: end for
6: end for
7: Partition the blank into blank regions
{This can be done by simple Hanan grid}
8: for all Blank region curRegion do
9: if No effective defect impacts curRegion then
10: Report curRegion as a feasible region
11: else
12: feasibleRegions← curRegion
13: for all Effective defect d impacting curRegion do
14: shrunkDieArea← impacted die area by defect d
{As illustrated by region 3′′ in Fig. 6.14(b)}
15: impactFeatures← shrunkDieArea intersect layouts[d]
{As illustrated by the yellow features in Fig. 6.14(c)}
16: Rotate impactFeatures 180 degree
17: feasibleRegions← feasibleRegions intersect impactFeatures
18: end for
19: Report feasibleRegions as feasible regions
20: end if
21: end for
22: Done
Since the algorithm in [76] is designed targeting full field large layout, its
runtime is directly related with the size of the exploration region, which is
the allowable shifting freedom on the blank. Table 6.4 shows that if this
algorithm is directly implemented to deal with very small die size where the
exploration region is as big as the whole blank, the runtime will be too large
to be acceptable. However, comparing to the runtime of the algorithm in [76],
which takes more than one week before termination, our algorithm is always
able to find all relocation positions within hours. The significant speedup of
our proposed algorithm is due to the fact that the runtime of our algorithm
is only related to the number of features in the die, rather than the size of
the exploration region.
109
Table 6.4: Experimental Results
Die Size
Defect ]
Feasible Region Runtime Comparison
(cm× cm) Area (cm2) Ours (s) Other’s [76]
1× 1 1 114.32 871.81 > 1 week
1.5× 1.5 1 103.59 1980.91 > 1 week
1× 1 2 113.47 1635.54 > 1 week
1.5× 1.5 2 101.57 3747.33 > 1 week
1× 1 3 112.33 2337.9 > 1 week
1.5× 1.5 3 99.11 5360.52 > 1 week
1× 1 4 111.34 3375.12 > 1 week
1.5× 1.5 4 96.93 9161.14 > 1 week
1× 1 5 111.41 4955.91 > 1 week
1.5× 1.5 5 98.64 11490.1 > 1 week
1× 1 6 111.87 5634.87 > 1 week
1.5× 1.5 6 98.42 15058 > 1 week
1× 1 7 108.09 6740.86 > 1 week
1.5× 1.5 7 89.79 22792.6 > 1 week
1× 1 8 106.92 9086.35 > 1 week
1.5× 1.5 8 87.60 26037.1 > 1 week
6.3.2 Defect-Aware Reticle Placement
With all feasible regions to place a single die determined, the final objective
of the multi-die placement problem is to maximize the total number of valid
dies that can be packed within the exposure field, subject to the following
three constraints:
• First, different layers of the same die must be placed at the same rel-
ative locations within the exposure field for the exposure alignment
requirement.
• Second, in every layer, the bottom left points of all dies must be placed
within the feasible regions.
110
• Third, in each single layer, the multiple dies should not overlap with
each other.
In this subsection, we develop an approximation algorithm to maximize the
number of dies that can be placed within the feasible regions. The overall
flow of this algorithm is shown in Fig. 6.15.
Algorithm inputs. Step 1. Take union of the bloated feasible regions.
Step 2. Place a die at an optimal corner 
(a) or a sub-optimal corner (b).
Step 3. Remove infeasible regions.
Feasible region area ≠ 0
Exposure Field
a
b
Feasible Regions
Die
Size
Figure 6.15: The overall flow of the algorithm.
In step 1, we consider the feasible regions as a set of rectangles. Note that
every rectilinear polygon can be sliced into a number of rectangles. For each
rectangle, we increase its height by the height of the die; then increase the
width by the width of the die to bloat it into a larger rectangular region.
The bloating operation is shown in Fig. 6.16. After that, the union of the
bloated rectangles forms the bloated feasible region.
Step 2 is the key step for this algorithm. In this step, we develop an
approximation strategy to select the proper corner to place a die. Every
111
Die 
Height
Die Width
Die 
Height
Die Width
(a) Original feasible region. (b) Bloated feasible region.
Bloat
Figure 6.16: Feasible region bloating.
corner consists of two orthogonal intervals. If the two orthogonal intervals of
a corner are able to completely cover the two edges of the die (both the width
and the height of the die), the corner is defined as an optimal corner, such as
corner a in Fig. 6.15. If only one interval of a corner is able to cover one edge
of the die (either the width or the height of the die), the corner is defined as
a sub-optimal corner, such as corner b in Fig. 6.15. In our corner selection
strategy, we first try to find an optimal corner in the bloated feasible region to
place the die. If an optimal corner does not exist, we will find a sub-optimal
corner to place the die. We claim that the sub-optimal corner always exists.
The proof is straightforward. Since the bloating operation for each rectangle
is only to the rightward and upward direction, the bottommost left point of
the bloated feasible region must overlap with the bottom left point of one
original rectangle. Suppose the width of that rectangle is wr. By bloating to
the rightward direction, the width of that rectangle is extend by wd, which is
the width of the die. Since it is the bottommost edge of the bloated region,
the length of that edge must be at lease wr +wd, which is able to completely
cover the width of the die. Hence the bottommost left corner of the bloated
feasible region is one sub-optimal corner.
After the die placement of step 2, some original feasible regions are no
longer feasible, because the die placed in some feasible regions might overlap
with the already placed die, which turns out to be conflict. Hence in step 3,
the conflict feasible regions (dashed in Fig. 6.15) due to the die placement are
removed from the original feasible regions, and the feasible area is reduced.
If the total area of the feasible regions is not zero after the removal, go to
step 1 to start another loop.
112
Approximation Analysis
Theorem 6.1. If an optimal corner can be found in each loop of the algo-
rithm, the placement solution is optimal, meaning that the number of suc-
cessfully placed dies is maximal.
Lemma 6.1. Suppose corner a is the optimal corner found by our algorithm,
and there is an optimal solution with n dies placed within the exposure field,
but no die placed at corner a. We claim that there must be another optimal
solution also with n dies placed, where there is a die placed exactly at corner
a.
Proof: We know that the optimal solution has n dies successfully placed.
By placing another die at corner a, there will be n+ 1 dies placed on blank.
However the newly placed die might overlap with the other dies, which cause
a confliction. Since the optimal corner a is defined as a corner with both
edges completely covering the die edges and the dies are of uniform size, it
is obvious that the newly placed die at corner a overlaps with at most one
existing die within the feasible region. Otherwise some die must exceed the
boundary of the feasible region as shown in Fig. 6.17(a), which is invalid.
Hence by removing the conflicting die from the blank, we get another valid
solution with n dies successfully placed. Therefore, placing the die at corner
a does not affect the optimality of the final solution.
Theorem 6.2. The number of dies placed by this approximation algorithm
is no less than half of the optimal solution.
Suppose our algorithm places n dies within the feasible region in the final
solution, no of which are placed at optimal corner and ns of which are placed
at sub-optimal corner. Then we can easily see that n = no + ns. We have
already proved that the no dies placed on optimal corner are optimal place-
ment which does not affect optimality. So we fix the locations of dies placed
at the optimal corners.
Lemma 6.2. Suppose there is an optimal solution which replaces the dies
at sub-optimal corner with another placement strategy. We claim that the
optimal solution cannot place more than no + 2ns dies in total.
Proof: As shown in Fig. 6.17(b), the die placed at a sub-optimal corner
can block no more than two dies. In other words, by replacing the die at
113
a b
(a) Placement at optimal corner. (b) Placement at sub-optimal corner.
Figure 6.17: Approximation analysis of the placement algorithm. The die
in solid green line is the placement result of our algorithm; the dies in
dashed green line show the result of one optimal placement solution; the
dies in dashed red lines are demos of invalid placement.
sub-optimal corner with another placement strategy, at most one more die
can be placed. Therefore, at most 2ns dies can be placed by replacing the ns
dies at the sub-optimal corners. Thus, the optimal solution is able to place
no more than no + 2ns dies in total, which is less than or equal to twice the
number of dies placed by our algorithm.
Based on this analysis we can conclude that in the worst case, our algorithm
still guarantees to place no less than half of the die number placed by optimal
strategy.
Heuristic Adjustment
In each loop of our algorithm, there might be multiple optimal corners or
sub-optimal corners for selection. Even though the approximation analysis
provides the same worst case factor (one half of the optimal solution), the
actual corner selection order may result in complete different performance in
reality. In order to improve the performance of the algorithm, we introduce
some heuristics when multiple corners are available for selection.
Firstly, when there are multiple corners available, we prefer to place the
114
die closer to the bounding box corner of the bloated feasible region. The
bounding box of each bloated feasible region has four boundaries: top, bot-
tom, left and right. The weight of a corner is defined as the summation of
two distances, one from the corner to the closer horizontal boundary, the
other from the corner to the closer vertical boundary. If multiple corners are
of the same type, either optimal or sub-optimal, the corner with the least
weight is selected in priority.
Y Freedom
X Freedom
Exposure Field
Die
Figure 6.18: Definition of the placement freedom.
Secondly, for different die size, the placement freedom in x and y directions
is different. The placement freedom is decided by the size of the die and the
size of the exposure field. Because the size of the exposure field cannot be
exactly divided by the die size, even if there is no defect on the blank and
the exposure field is fully packed with dies, there is empty space left for the
exposure field. We define the placement freedom in x and y directions as
the total empty distance left after the exposure field is fully packed with the
saturation number of dies, as illustrated in Fig. 6.18. If the freedom for the
x direction is much less than for the y direction, we prefer to place the die
closer to the vertical boundary of the bounding box rather than horizontal
boundary, and vice versa. Therefore, the distance between the corner to and
115
the boundary is also weighted by the inverse of the freedom in that direction.
Time Complexity Analysis
For this algorithm, exactly one die is placed in each loop. If the given die
is of reasonable size in respect to the size of exposure field, we can claim
that the number of loops is a constant number. Within each loop, the union
operation in step 1 takes O(nlog(n)) runtime according to the computational
complexity analysis of the open source library [77], where n is the total
number of vertices within the exposure field. In step 2, the corner selection
takes O(n) runtime. And the feasible region removal in step 3 also takes O(n)
runtime. Therefore, the total runtime is for this algorithm is O(nlog(n)).
Experimental Results
In this subsection, we perform simulation experiments to analyze the number
of successfully placed dies in respect to the distribution of feasible regions
and die sizes. In the first experiment, we fix the die size to be 20 mm by 30
mm, and randomly generate the exposure fields with different feasible region
distributions. Then we run the program to place the maximum number of
dies on different blanks to test the performance of the algorithm. The testing
results are shown in Fig. 6.19.
For die size of 20 mm by 30 mm, at most 20 dies (5 by 4) can be placed
within the exposure field if no defect exists. From the simulation results we
can see, when the density of the feasible regions is sparse, only a few dies
(less than 20) can be successfully placed. And if the density is higher enough,
the exposure field is fully packed with the saturation number of dies. From
Fig. 6.19(a) we observe that the number of successfully placed dies by the al-
gorithm is very close to the optimal solution. Almost no improvement can be
applied even by manual modification. The simulation results of other blanks
also indicate that the solution obtained by the approximation algorithm is
very close the the optimal solution, which validate the performance of the
algorithm.
After the validation of the algorithm, we vary the size of the die to find the
effect of die size on the placement result. As shown in Fig. 6.20, the number
of successfully placed dies always increase with the increase of the feasible
116
(a) 15 dies successfully placed within exposure field for sparser feasible regions.
(b) 20 dies successfully placed within exposure field for denser feasible regions.
Width of the exposure field (mm)
Width of the exposure field (mm)
Placement result
Placement result
H
e
ig
h
t 
o
f 
th
e
 e
x
p
o
s
u
re
 f
ie
ld
 (
m
m
)
H
e
ig
h
t 
o
f 
th
e
 e
x
p
o
s
u
re
 f
ie
ld
 (
m
m
)
Figure 6.19: The testing result of the algorithm.
117
region density. However, for different die sizes, they reach the saturation
number at different feasible region density. The number of successfully placed
dies reaches the saturation number at lower feasible region density when the
placement freedom is larger. The comparison is shown in Table 6.5.
Table 6.5: Saturation Capacity and Freedom for Different Die Sizes
Die Size
Saturation Capacity
X Freedom Y Freedom
(mm by mm) (mm) (mm)
2 x 2 5 x 6 = 30 4 12
2 x 3 5 x 4 = 20 4 12
2 x 4 5 x 3 = 15 4 12
3 x 3 3 x 4 = 12 14 12
3 x 4 3 x 3 = 9 14 12
4 x 4 2 x 3 = 6 24 12
0
5
10
15
20
25
30
0 500 1000 1500 2000 2500 3000 3500 4000 4500
N
u
m
b
e
r 
o
f 
p
la
ce
d
 d
ie
s
Number of feasible regions
Placed Dies V.S. Feasible Region Density
2x2
2x3
2x4
3x3
3x4
4x4
Die Size
Figure 6.20: The number of successfully placed dies with respect to the die
size and the feasible region density.
118
6.4 Design-Blank Matching
EUV mask vendors always have multiple blanks available, and all the blanks
are fabricated and inspected before circuit designs from customers are pro-
vided. Depending on the production volume, some designs might require
more than one mask because the lifetime of EUV masks is much shorter
than that of traditional masks used in deep UV. For any pair of blank and
layout, the layout relocation algorithms described in the previous sections
can be used to determine the optimal position to place the layout on the
blank, such that the damage caused by the defects is minimized. To com-
pletely avoid the defect impact, each design can only be paired with a subset
of blanks to produce valid masks. Therefore, after receiving designs from
customers, the EUV mask vendor needs to select a subset of blanks to match
the designs and make valid EUV masks, as shown in Fig. 6.21. Based on
statistical analysis, Burns et al. [69] provide information on the number of
blanks to prepare for the next coming design in order to have a high chance
of completely avoiding the defect impact. However, the EUV mask vendors
can receive multiple designs from different designers. Instead of picking up
one blank each time to pair with one individual layout sequentially, a si-
multaneous consideration offers the advantage of fully utilizing all defective
blanks.
Figure 6.21: The design-blank matching problem for EUV mask vendors.
Fig. 6.22 shows an example where the EUV mask vendor has six defective
blanks to match with three different designs. Designs 1, 2 and 3 require 1, 2
and 3 masks respectively, so totally six valid masks should be fabricated. The
119
minimum defect impact of each blank on each design is shown in Fig. 6.22(a).
Fig. 6.22(b) shows the sequential design-blank matching result. In sequential
matching, to completely avoid defect impact, blank 2 is first selected for
design 1, then blanks 1 and 3 are selected for design 2. After that, only
blank 4 is available for design 3 among the three blanks left, and hence only
four valid blanks can be produced. Blank 5 and blank 6 are both wasted.
However, as shown in Fig. 6.22(c), by simultaneously matching the three
designs together, the blank set is able to be fully utilized to produce six valid
masks. Therefore, the blank utilization can be improved dramatically by
matching multiple designs simultaneously rather than sequentially.
1
2
1
2
0
2
0
0
3
0
0
4
0
0
5
2
0
6
0
1
3 0 0 3 0 2 2
D
e
s
ig
n
 #
Blank #
1
2
1 2 3 4 5 6
3
D
e
s
ig
n
 #
Blank #
1
2
1 2 3 4 5 6
3
D
e
s
ig
n
 #
Blank #
(a) Minimum defect impact for each 
design and each blank.
(b) Sequential matching result 
produces four valid masks.
(c) Simultaneous matching result 
produces six valid masks.
Seq
uent
ial
Simultaneous
Figure 6.22: Comparison between sequential matching and simultaneous
matching.
6.4.1 Complete Defect Impact Avoidance
In EUV mask fabrication, if defect compensation is not allowed, all defect
impact must be completely avoided by layout relocation in order to produce
valid masks. The EUV mask vendors usually receive multiple designs from
different designers. Instead of selecting one blank at a time from the blank
set to pair with each design sequentially, the vendors would prefer a global
120
matching solution which maximizes the utilization of the blank set to fabri-
cate as many valid masks as possible. To formulate this problem, we consider
the zero defect impact between any design and any blank as a valid pairing,
and the nonzero defect impact as an invalid pairing. The minimum defect
impacts for each pair of design and blank are pre-calculated by the layout
relocation algorithm. The detailed problem definition and formulation are
given as follows.
Definition 6.3. Design and Blank Matching Problem
Given a set of blanks and a set of designs with different requirements
for blank quantities, and the minimum defect impact of each blank
on each design, the objective is to find an optimal matching between
the designs and the blanks, such that the number of masks immune
to defect impact is maximized.
S
B1
B2
Bm
D1
D2
T
Dn
. . .
. . .
(b) The flow graph constructed according to (a).
1
2
1
3
0
2
0
m
1
0
D
e
s
ig
n
 #
Blank #
n 0 0
...
...
...
...
...
...
2
2
(a) Minimum defect impacts 
provided by layout relocation.
Figure 6.23: The max-flow formulation of the design-blank matching
problem.
We formulate this design-blank matching problem as a max-flow problem.
Suppose the vendor has m blanks to match with n designs. As shown in
Fig. 6.23(b), each blank is denoted by a grey vertex Bi(1 ≤ i ≤ m) , and
each design is denoted by a blue vertex Dj(1 ≤ j ≤ n). According to the
defect impacts provided by the layout relocation algorithm in Fig. 6.23(a),
for any valid pairing between Bi and Dj, an edge is added to connect Bi to
Dj. After that, we add to the graph a source vertex S and a target vertex
T , and edges Esi(1 ≤ i ≤ m) from the source vertex to every blank vertex as
well as edges Ejt(1 ≤ j ≤ n) from every design vertex to the target vertex.
121
In EUV mask fabrication, each blank can only be used once to fabricate
one single design because of the capping layer [80], but some designs might
require more than one blank for the high volume production(HVP) purpose
because the lifetime of the EUV mask is limited. Therefore, the capacity
upper bound for each edge Esi(1 ≤ i ≤ m) from S to blank vertices is set to
1, which guarantees that each blank is used at most once. And the capacity
upper bound for each edge Ejt(1 ≤ j ≤ n) from design vertices to T is set to
be dj, which is the number of masks demanded by the corresponding design.
In the max-flow result, the nonzero flow value on edge Eij denotes that
the blank Bi is supplied to make a mask for the design Dj. The value of the
whole flow is equivalent to the number of valid masks that can be produced.
Thus, the maximum flow provides the optimal matching solution. Fig. 6.24
illustrates a matching result calculated by the max-flow algorithm.
S T
B1
D1B2
B3
B4
B5
D2
1/1
1/1
1/1
1/1
0/1
1
1
2/2
2/3
1
1
1
2
1
2
0
2
0
0
3
0
0
4
0
5
1
0
D
e
s
ig
n
 #
Blank #
3
(a) Minimum defect impacts 
provided by layout relocation.
(b) The result of max-flow algorithm.
Figure 6.24: The design-blank matching result provided by the max-flow
algorithm. The flow values on each edge are shown in blue, and the valid
pairings are shown in red.
In Fig. 6.24, there are five blanks in the blank set to pair with two designs.
The first design requires two masks and the second design requires three.
Ideally the two designs require five masks in total. Fig. 6.24(a) shows the
pre-calculated defect impacts, and Fig. 6.24(b) shows the matching result
obtained by the max-flow algorithm. All the effective pairings provided by
the flow algorithm are marked in red. From the results we can see that the
maximum number of useful masks that can be fabricated is four. In order to
122
fulfill the production volume requirement of design 2, more blanks must be
included in the blank set.
6.4.2 Defect Compensation Cost Minimization
Sometimes the defect impact can be compensated by absorber modification
or blank repair. In this case, zero defect impact is no longer a must to make
a valid mask. However, the defect compensation is costly, so the total com-
pensation cost of all masks must be minimized. We define the compensation
cost minimization problem as follows.
Definition 6.4. Compensation Cost Minimization Problem
Given a set of blanks and a set of designs with different requirements
for blank quantities, and the minimum defect impact of each blank on
each design, the objective is to find an optimal matching between the
designs and the blanks, such that all required masks are produced and
the total defect compensation cost is minimized.
We formulate this problem as a min-cost flow problem, where the cost for
each design and blank is set to be the minimum defect compensation cost.
Here we assume the defect compensation cost is a function of the defect
impact calculated by the layout relocation algorithm. Fig. 6.25 illustrates
the min-cost flow formulation.
As shown in Fig. 6.25(a) and 6.25(b), we implement a function f to
convert the defect impact into defect compensation cost in the first step.
Since defect compensation is permitted, every blank is a valid candidate for
every design. Therefore, as shown in Fig. 6.25(c), the blanks and designs form
a complete bipartite graph. Any edge Eij(1 ≤ i ≤ m, 1 ≤ j ≤ n) connecting
from blank Bi to design Dj is assigned a cost Cij, which is the minimum
compensation cost between Bi and Dj. Similar to the previous max-flow
formulation, the capacity upper bound for each edge Esi(1 ≤ i ≤ m) from S
to blank vertices is set to 1, which guarantees that each blank is used at most
once. Conversely, to guarantee that the production volume requirement of
each design is satisfied, the capacity lower bound for each edge Ejt(1 ≤ j ≤ n)
from design vertices to T is set to be the number of masks demanded by the
corresponding design. After the problem formulation, the min-cost flow in
123
12
1
3
0
2
0
m
1
0D
e
s
ig
n
 #
Blank #
n 0 0
...
...
...
...
...
...
2
2
1
2
1
f(3)
0
2
0
m
f(1)
0D
e
s
ig
n
 #
Blank #
n 0 0
...
...
...
...
...
...
f(2)
f(2)
(a) Minimum defect impacts provided 
by layout relocation.
(b) Minimum compensation cost 
between each design and blank.
(c) The min-cost flow graph constructed based on (b).
C11 = f(3)
C22 = f(2)
Cmn = 0
B1
B2
Bm
D1
D2
T
Dn
. . .
. . .
S
Figure 6.25: The min-cost flow formulation of the compensation cost
minimization problem.
the graph provides the optimal matching result as well as the minimum total
cost for all the required masks.
6.4.3 Experimental Results
In the experiments we assume the mask vendor has three designs to match,
and design 1, 2 and 3 require 1, 2 and 3 masks respectively. So totally
six masks should be fabricated. Four groups of blanks with different defect
densities are tested in the experiment. In each group, the defect numbers on
each blank are identical but the defect locations are randomly distributed.
In group 1, each blank has 30 defects on it; in group 2, each blank has 35
defects; in group 3, each blank has 40 defects; and in group 4, each blank
124
has 45 defects. The sizes of the defects are between 50 nm and 200 nm.
Pre-calculation by Layout Relocation Algorithm
Before matching the designs with a subset of blanks, we first run the layout
relocation algorithm to find the minimum defect impact between each design
and each blank. The pre-calculation results are shown in Fig. 6.26.
Figure 6.26: The precalculation results provided by the layout relocation
algorithm.
From Fig. 6.26 we can observe that with the increase of the defect number
on each blank, the defect impact becomes more and more severe. At the
same time, it becomes more difficult to find a successful pairing for one
design to completely avoid the defect impact. In the fourth group where the
defect number per blank has increased to 45, it becomes impossible to avoid
the defect impact completely, and hence the successful pairing probability is
reduced to zero. Therefore, reducing the defect density is very important for
defect impact mitigation.
Design-Blank Matching for Complete Defect Impact Avoidance
In the first problem formulation, the defect impact must be completely
avoided by layout relocation. For this problem, we compare the blank utiliza-
125
tion between the proposed simultaneous matching and sequential matching.
To evaluate the utilization of the blank sets in producing valid masks, for
each group, we first include only six blanks in the blank set, then dynamically
increase the size of the blank set by adding more blanks until the required six
masks can be successfully fabricated without defect impact. The results for
simultaneous matching and sequential matching are compared in Fig. 6.27.
Figure 6.27: Blank utilization comparison between simultaneous matching
and sequential matching.
The results in group 2 and group 3 show that to produce the same number
of valid masks, simultaneous matching requires fewer blanks than sequential
matching. Thus, the simultaneous matching strategy offers an advantage in
blank utilization. In group 4, since it is impossible to completely avoid the
defect impact by layout relocation, no comparison result is shown.
Design-Blank Matching for Defect Compensation Cost Minimization
In the second problem formulation, the costly defect compensation is per-
mitted. To validate the benefit of simultaneous design-blank matching in
cost saving, we perform another experiment implementing the second prob-
lem formulation, which minimizes the total defect compensation cost of the
126
six masks. In sequential matching, the designs are matched individually in
a greedy manner. When pairing with one design at each time, the blank
with the minimum compensation cost in the rest of the blank set is selected
first. The comparison result is shown in Fig. 6.28. The comparison results in
groups 2, 3 and 4 show that the simultaneous matching strategy incurs lower
cost than sequential matching to fabricate the six required masks. Thus, the
simultaneous matching strategy also offers an advantage in compensation
cost minimization.
Figure 6.28: Compensation cost comparison between simultaneous
matching and sequential matching.
6.5 Conclusions
This chapter summarizes our work on EUV blank defect mitigation by layout
relocation. Efficient algorithms are proposed considering inspection inaccu-
racy and multi-die placement respectively. Finally, since layout relocation
cannot always achieve complete defect impact mitigation, a simultaneous
design-blank matching strategy is proposed for EUV mask preparation.
127
CHAPTER 7
CONCLUSIONS
In this dissertation, we have studied design-technology co-optimization prob-
lems for advanced lithography. Major advanced lithography and process
techniques below the 20 nm technology node are covered in our study, includ-
ing placement optimization for FinFET process, cut layer optimization and
printing with hybrid lithography, enhanced SADP decomposition and SADP-
aware detailed routing , contact/via layer optimization for DSA lithography,
and EUV blank defect mitigation.
In the 16 nm technology node, we first study the edge device degradation
problem for the FinFET process. To avoid such degradation, dummy gates
are needed on device edges, which may introduce extra area penalty. In
Chapter 2, we propose a standard cell based detailed placement optimization
strategy for the 16 nm FinFET process. By flipping a subset of cells in a
standard cell row and switching pairs of adjacent cells, the placement area
can be optimally minimized. Then targeting cut layer printing in 16 nm
1D gridded design, we propose a hybrid lithograph process in Chapter 3
that involves 193i and EBL processes. By applying our cut redistribution
algorithm to the original layout, the number of E-Beam shots is minimized,
which greatly improves the throughput. Sometimes for sparser layers such as
Metal 2, the EBL process can be totally saved, which tremendously reduce
the manufacturing cost.
In the 10 nm technology node, we focus on the SADP lithography. In
Chapter 4, we first study the residue artifacts in conventional SADP decom-
position and propose an enhanced SID decomposition flow with model-based
verification. The simplified lithography model introduced in the step of de-
composition verification tremendously improves the efficiency of the entire
flow, and the simulation results with a real lithography model verify that the
enhanced SID decomposition flow is capable of removing residue artifacts
effectively. In the meanwhile, the study of the enhanced SADP decomposi-
128
tion flow also reveals the necessity to consider residue artifacts in the design
phase (e.g., during the routing phase). Therefore, we further study the major
challenges faced by the SID process such as forbidden spacing, odd cycles,
anti-parallel line-ends conflicts and sub-metal residue issues; then we propose
an expanded graph model to solve the SID-compliant detailed routing prob-
lem. An overall negotiated congestion based routing scheme is developed to
resolve wire crossing and design rule conflicts over iterations of rip-up and
reroute, and all conflict-free routing layers produced by our detailed router
have been verified as 100% SID decomposable.
In the 7 nm technology node, DSA technology is the most promising candi-
date for contact/via layer patterning, and it has presented a unique opportu-
nity of DSA patterning and layout design co-optimization for improving the
manufacturability of DSA. In Chapter 5, we study both standard cell level
and full chip level contact/via layer optimization problems for DSA lithog-
raphy. In the standard cell level, given an arbitrary standard cell library,
we simultaneously optimize the layouts of every cell, such that the contact
layer of any cell in the library can be fully patterned by a set of guiding
templates, and the total cost of the templates is minimal. This optimiza-
tion problem is first proved to be NP-hard and formulated as a Weighted
Partial Maximum Satisfiability problem, which can be optimally solved with
a public SAT solver. Then we propose a bounded approximation algorithm
that solves the problem much more efficiently. In the full chip level, we pro-
pose a DSA-aware detail routing algorithm to optimize the via layers such
that only feasible templates are needed for via layer patterning. In addition,
among all the feasible templates, the one with better overlay accuracy has
higher priority to be picked up by the router for via patterning, which fur-
ther improves the yield. By enabling DSA process for via layer patterning
in the 7 nm technology node, the proposed detailed routing strategy tremen-
dously reduces the manufacturing cost and improves the throughput for IC
fabrication.
Below the 7 nm technology node, we study the EUV blank defect miti-
gation problem in Chapter 6. We first present an efficient layout shifting
algorithm to find the optimal location to place a layout on a blank, such
that all defects are simultaneously covered by device patterns and the global
covering margin is maximal. Then with more defects inspected, in many
cases, it is impossible to completely mitigate all defect impact if multiple
129
dies are tied and moved together; hence we further explore the flexility of
individual die shifting. Even with that, 100% success rate in complete defect
mitigation can never be guaranteed since this also depends on the designs
and defect maps. Targeting imperfect defect mitigation between one pair
of design and blank, we finally develop an optimal design-blank matching
strategy to match multiple designs and defective blanks simultaneously.
130
REFERENCES
[1] S. Borkar et al., “Microarchitecture and design challenges for gigascale
integration,” in MICRO, vol. 37, 2004, pp. 3–3.
[2] G. Luk-Pat, A. Miloslavsky, B. Painter, L. Lin, P. De Bisschop, and
K. Lucas, “Design compliance for spacer is dielectric (sid) patterning,”
in SPIE Advanced Lithography. International Society for Optics and
Photonics, 2012, pp. 83 260D–83 260D.
[3] L. Liebmann, L. Pileggi, J. Hibbeler, V. Rovner, T. Jhaveri, and
G. Northrop, “Simplify to survive: prescriptive layouts ensure profitable
scaling to 32nm and beyond,” in SPIE Advanced Lithography. Interna-
tional Society for Optics and Photonics, 2009, pp. 72 750A–72 750A.
[4] M. C. Smayling, C. Bencher, H. D. Chen, H. Dai, and M. P. Duane, “Apf
pitch-halving for 22nm logic cells using gridded design rules,” in SPIE
Advanced Lithography. International Society for Optics and Photonics,
2008, pp. 69 251E–69 251E.
[5] Q. Li, “Np-completeness result for positive line-by-fill sadp process,”
in SPIE Photomask Technology. International Society for Optics and
Photonics, 2010, pp. 78 233P–78 233P.
[6] M. P. Stoykovich, H. Kang, K. C. Daoulas, G. Liu, C.-C. Liu, J. J.
de Pablo, M. Mu¨ller, and P. F. Nealey, “Directed self-assembly of block
copolymers for nanolithography: fabrication of isolated features and es-
sential integrated circuit geometries,” Acs Nano, vol. 1, no. 3, pp. 168–
175, 2007.
[7] H. Yi, X.-Y. Bao, J. Zhang, R. Tiberio, J. Conway, L.-W. Chang, S. Mi-
tra, and H.-S. P. Wong, “Contact-hole patterning for random logic cir-
cuits using block copolymer directed self-assembly,” in SPIE Advanced
Lithography. International Society for Optics and Photonics, 2012, pp.
83 230W–83 230W.
[8] H.-S. P. Wong, C. Bencher, H. Yi, X.-Y. Bao, and L.-W. Chang, “Block
copolymer directed self-assembly enables sublithographic patterning for
device fabrication,” in SPIE Advanced Lithography. International So-
ciety for Optics and Photonics, 2012, pp. 832 303–832 303.
131
[9] K. Lai, C.-c. Liu, J. Pitera, D. J. Dechene, A. Schepis, J. Abdallah,
H. Tsai, M. Guillorn, J. Cheng, G. Doerk et al., “Computational aspects
of optical lithography extension by directed self-assembly,” in SPIE Ad-
vanced Lithography. International Society for Optics and Photonics,
2013, pp. 868 304–868 304.
[10] Y. Du, D. Guo, M. D. Wong, H. Yi, H.-S. P. Wong, H. Zhang, and
Q. Ma, “Block copolymer directed self-assembly (dsa) aware contact
layer optimization for 10 nm 1d standard cell library,” in Computer-
Aided Design (ICCAD), 2013 IEEE/ACM International Conference on.
IEEE, 2013, pp. 186–193.
[11] Z. Xiao, Y. Du, M. D. F. Wong, and H. Zhang, “DSA template mask
determination and cut redistribution for advanced 1D gridded design,”
in SPIE Photomask Technology. International Society for Optics and
Photonics, 2013, p. 888017.
[12] T. Shoki, M. Mitsui, M. Sakamoto, N. Sakaya, M. Ootsuka, T. Asakawa,
T. Yamada, and H. Mitsui, “Improvement of total quality on euv mask
blanks toward volume production,” in Proc. SPIE, vol. 7636, 2010, p.
76360U.
[13] Y. Hirabayashi, “Development status of euvl mask blank and substrate,”
in SPIE Photomask Technology. International Society for Optics and
Photonics, 2011, pp. 81 663T–81 663T.
[14] P.-Y. Yan, Y. Liu, M. Kamna, G. Zhang, R. Chen, and F. Martinez,
“Euvl multilayer mask blank defect mitigation for defect-free euvl mask
fabrication,” in Proc. SPIE, vol. 8322, 2012, p. 83220Z.
[15] C. H. Clifford, “Simulation and compensation methods for euv lithog-
raphy masks with buried defects.” PhD dissertation, University of
California at Berkeley, 2010.
[16] K. Ahmed and K. Schuegraf, “Transistor wars,” Spectrum, IEEE,
vol. 48, no. 11, pp. 50–66, 2011.
[17] C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost,
M. Buehler, V. Chikarmane, T. Ghani, T. Glassman et al., “A 22nm high
performance and low-power cmos technology featuring fully-depleted tri-
gate transistors, self-aligned contacts and high density mim capacitors,”
in VLSI Technology (VLSIT), 2012 Symposium on. IEEE, 2012, pp.
131–132.
[18] R. Merritt, “EETimes: TSMC starts FinFETs in 2013, tries EUV at
10 nm,” http://www.eetimes.com/electronics-news/4411693, 2013, [On-
line; accessed 4-July-2013].
132
[19] K. H. Yeo, S. D. Suk, M. Li, Y.-y. Yeoh, K. H. Cho, K.-H. Hong, S. Yun,
M. S. Lee, N. Cho, K. Lee et al., “Gate-all-around (gaa) twin silicon
nanowire mosfet (tsnwfet) with 15 nm length gate and 4 nm radius
nanowires,” in Electron Devices Meeting, 2006. IEDM’06. International.
IEEE, 2006, pp. 1–4.
[20] S. Bangsaruntip, G. Cohen, A. Majumdar, Y. Zhang, S. Engelmann,
N. Fuller, L. Gignac, S. Mittal, J. Newbury, M. Guillorn et al., “High
performance and highly uniform gate-all-around silicon nanowire mos-
fets with wire size dependent scaling,” in Electron Devices Meeting
(IEDM), 2009 IEEE International. IEEE, 2009, pp. 1–4.
[21] R. Merritt, “Collaborate to innovate finfet design ecosystem challenges
and solutions,” in Electronic Design Process Symposium (EDPS), 2013,
pp. 4–4.
[22] K. Dombrowski, A. Fischer, B. Dietrich, I. De Wolf, H. Bender,
S. Pochet, V. Simons, R. Rooyackers, G. Badenes, C. Stuer et al., “De-
termination of stress in shallow trench isolation for deep submicron mos
devices by uv raman spectroscopy,” in Electron Devices Meeting, 1999.
IEDM’99. Technical Digest. International. IEEE, 1999, pp. 357–360.
[23] M. Choi, V. Moroz, L. Smith, and O. Penzin, “14 nm finfet stress engi-
neering with epitaxial sige source/drain,” in Silicon-Germanium Tech-
nology and Device Meeting (ISTDM), 2012 International. IEEE, 2012,
pp. 1–2.
[24] V. Moroz, “FinFET structure design and variability analysis enabled by
TCAD,” http://www.edn.com/design/eda-design/4398011/2/FinFET-
structure-design-and-variability-analysis-enabled-by-TCAD-, 2012, [On-
line; accessed 4-July-2013].
[25] C. W. Kang and S. Hong, “Timing-driven, congestion minimization, and
low power placement for standard cell layouts.”
[26] Y. Du and M. D. Wong, “Optimization of standard cell based detailed
placement for 16 nm finfet process,” in Design, Automation and Test
in Europe Conference and Exhibition (DATE), 2014. IEEE, 2014, pp.
1–6.
[27] M. L. Fredman and R. E. Tarjan, “Fibonacci heaps and their uses in im-
proved network optimization algorithms,” Journal of the ACM (JACM),
vol. 34, no. 3, pp. 596–615, 1987.
[28] R. T. Greenway, R. Hendel, K. Jeong, A. B. Kahng, J. S. Petersen,
Z. Rao, and M. C. Smayling, “Interference assisted lithography for pat-
terning of 1d gridded design,” in SPIE Advanced Lithography. Interna-
tional Society for Optics and Photonics, 2009, pp. 72 712U–72 712U.
133
[29] C. Bencher, J. Smith, L. Miao, C. Cai, Y. Chen, J. Y. Cheng, D. P.
Sanders, M. Tjio, H. D. Truong, S. Holmes et al., “Self-assembly pat-
terning for sub-15nm half-pitch: a transition from lab to fab,” in SPIE
Advanced Lithography. International Society for Optics and Photonics,
2011, pp. 79 700F–79 700F.
[30] Y. Du, H. Zhang, M. D. Wong, and K.-Y. Chao, “Hybrid lithography
optimization with e-beam and immersion processes for 16nm 1d gridded
design,” in Design Automation Conference (ASP-DAC), 2012 17th Asia
and South Pacific. IEEE, 2012, pp. 707–712.
[31] D. Lam, D. Liu, and T. Prescop, “E-beam direct write (ebdw) as com-
plementary lithography,” in SPIE Photomask Technology. International
Society for Optics and Photonics, 2010, pp. 78 231C–78 231C.
[32] D. K. Lam, E. D. Liu, M. C. Smayling, and T. Prescop, “E-beam to com-
plement optical lithography for 1d layouts,” in SPIE Advanced Lithogra-
phy. International Society for Optics and Photonics, 2011, pp. 797 011–
797 011.
[33] E. D. Liu and T. Prescop, “Optimization of e-beam landing energy for
ebdw,” in SPIE Advanced Lithography. International Society for Optics
and Photonics, 2011, pp. 79 701S–79 701S.
[34] J. Gramss, A. Stoeckel, U. Weidenmueller, H.-J. Doering, M. Bloecker,
M. Sczyrba, M. Finken, T. Wandel, and D. Melzer, “Multi-shaped e-
beam technology for mask writing,” in SPIE Photomask Technology. In-
ternational Society for Optics and Photonics, 2010, pp. 782 309–782 309.
[35] “International technology roadmap for semiconductors,” 2010. [Online].
Available: http://www.itrs.net/Links/2010ITRS/Home2010.htm
[36] V. Axelrad and M. C. Smayling, “16nm with 193nm immersion lithog-
raphy and double exposure,” in SPIE Advanced Lithography. Interna-
tional Society for Optics and Photonics, 2010, pp. 764 109–764 109.
[37] H. Zhang, Y. Du, M. D. Wong, and K.-Y. Chao, “Lithography-aware
layout modification considering performance impact,” in Quality Elec-
tronic Design (ISQED), 2011 12th International Symposium on. IEEE,
2011, pp. 1–5.
[38] M. T. Hajian, “Dis-equality constraints in linear/integer programming,”
1996.
[39] “Gurobi optimizer 4.0,” 2011. [Online]. Available:
http://www.gurobi.com/
134
[40] Y. Ban, A. Miloslavsky, K. Lucas, S.-H. Choi, C.-H. Park, and D. Z. Pan,
“Layout decomposition of self-aligned double patterning for 2d random
logic patterning,” in SPIE Advanced Lithography. International Society
for Optics and Photonics, 2011, pp. 79 740L–79 740L.
[41] H. Zhang, Y. Du, M. D. Wong, and R. Topaloglu, “Self-aligned
double patterning decomposition for overlay minimization and hot
spot detection,” in Design Automation Conference (DAC), 2011 48th
ACM/EDAC/IEEE. IEEE, 2011, pp. 71–76.
[42] H. Zhang, Y. Du, M. D. Wong, R. Topaloglu, and W. Conley, “Effective
decomposition algorithm for self-aligned double patterning lithography,”
in SPIE Advanced Lithography. International Society for Optics and
Photonics, 2011, pp. 79 730J–79 730J.
[43] Z. Xiao, Y. Du, H. Zhang, and M. D. Wong, “A polynomial time exact
algorithm for self-aligned double patterning layout decomposition,” in
Proceedings of the 2012 ACM international symposium on International
Symposium on Physical Design. ACM, 2012, pp. 17–24.
[44] M. Mirsaeedi, J. A. Torres, and M. Anis, “Self-aligned double-patterning
(sadp) friendly detailed routing,” in SPIE Advanced Lithography. Inter-
national Society for Optics and Photonics, 2011, pp. 79 740O–79 740O.
[45] J.-R. Gao and D. Z. Pan, “Flexible self-aligned double patterning aware
detailed routing with prescribed layout planning,” in Proceedings of the
2012 ACM international symposium on International Symposium on
Physical Design. ACM, 2012, pp. 25–32.
[46] Y. Du, H. Song, J. Shiely, and M. D. Wong, “Enhanced spacer-is-
dielectric (sid) decomposition flow with model-based verification,” in
SPIE Advanced Lithography. International Society for Optics and Pho-
tonics, 2013, pp. 86 840D–86 840D.
[47] Y. Du, Q. Ma, H. Song, J. Shiely, G. Luk-Pat, A. Miloslavsky, and
M. D. Wong, “Spacer-is-dielectric-compliant detailed routing for self-
aligned double patterning lithography,” in Design Automation Confer-
ence (DAC), 2013 50th ACM/EDAC/IEEE. IEEE, 2013, pp. 1–6.
[48] A. B. Kahng, C.-H. Park, X. Xu, and H. Yao, “Layout decomposition for
double patterning lithography,” in Proceedings of the 2008 IEEE/ACM
International Conference on Computer-Aided Design. IEEE Press,
2008, pp. 465–472.
[49] L. McMurchie and C. Ebeling, “Pathfinder: a negotiation-based
performance-driven router for fpgas,” in Proceedings of the 1995
ACM third international symposium on Field-programmable gate arrays.
ACM, 1995, pp. 111–117.
135
[50] M. C. Smayling, H.-y. Liu, and L. Cai, “Low k1 logic design using grid-
ded design rules,” in Advanced Lithography. International Society for
Optics and Photonics, 2008, pp. 69 250B–69 250B.
[51] C. Bencher, H. Dai, and Y. Chen, “Gridded design rule scaling: taking
the cpu toward the 16nm node,” in SPIE Advanced Lithography. Inter-
national Society for Optics and Photonics, 2009, pp. 72 740G–72 740G.
[52] C. Bencher, Y. Chen, H. Dai, W. Montgomery, and L. Huli, “22nm
half-pitch patterning by cvd spacer self alignment double patterning
(sadp),” in Advanced Lithography. International Society for Optics and
Photonics, 2008, pp. 69 244E–69 244E.
[53] P. Xu, Y. Chen, Y. Chen, L. Miao, S. Sun, S.-W. Kim, A. Berger,
D. Mao, C. Bencher, R. Hung et al., “Sidewall spacer quadruple pattern-
ing for 15nm half-pitch,” in SPIE Advanced Lithography. International
Society for Optics and Photonics, 2011, pp. 79 731Q–79 731Q.
[54] C. T. Black, R. Ruiz, G. Breyta, J. Y. Cheng, M. E. Colburn, K. W.
Guarini, H.-C. Kim, and Y. Zhang, “Polymer self assembly in semicon-
ductor microelectronics,” IBM Journal of Research and Development,
vol. 51, no. 5, pp. 605–633, 2007.
[55] H. Yi, X.-Y. Bao, J. Zhang, C. Bencher, L.-W. Chang, X. Chen,
R. Tiberio, J. Conway, H. Dai, Y. Chen et al., “Flexible control of
block copolymer directed self-assembly using small, topographical tem-
plates: Potential lithography solution for integrated circuit contact hole
patterning,” Advanced Materials, vol. 24, no. 23, pp. 3107–3114, 2012.
[56] M. C. Smayling, R. J. Socha, and M. V. Dusa, “22nm logic lithography in
the presence of local interconnect,” in SPIE Advanced Lithography. In-
ternational Society for Optics and Photonics, 2010, pp. 764 019–764 019.
[57] “Sat4j,” 2013. [Online]. Available: http://www.sat4j.org/
[58] Y. Du, Z. Xiao, M. D. Wong, H. Yi, and H.-S. P. Wong, “Dsa-aware
detailed routing for via layer optimization,” in SPIE Advanced Lithogra-
phy. International Society for Optics and Photonics, 2014, pp. 90 492J–
90 492J–8.
[59] H. Yi, X.-Y. Bao, R. Tiberio, and H.-S. P. Wong, “Design strategy of
small topographical guiding templates for sub-15nm integrated circuits
contact hole patterns using block copolymer directed self assembly,”
in SPIE Advanced Lithography. International Society for Optics and
Photonics, 2013, pp. 868 010–868 010.
136
[60] R. G. Michael and D. S. Johnson, “Computers and intractability: A
guide to the theory of np-completeness,” WH Freeman & Co., San Fran-
cisco, 1979.
[61] U. Feige, “A threshold of ln n for approximating set cover,” Journal of
the ACM (JACM), vol. 45, no. 4, pp. 634–652, 1998.
[62] H. Yi and H.-S. P. Wong, “Block copolymer directed self-assembly two-
hole pattern inside peanut-shaped templates,” in EIPBN, 2013, pp. 10B–
05.
[63] S. Huh, I.-Y. Kang, C. Y. Jeong, J. Na, D. R. Lee, H.-s. Seo, S.-S. Kim,
C.-U. Jeon, J. Doh, G. Inderhees et al., “Printability and inspectability
of defects on euv blank for 2xnm hp hvm application,” in SPIE Advanced
Lithography. International Society for Optics and Photonics, 2012, pp.
83 220K–83 220K.
[64] “ITRS Roadmap,” 2012. [Online]. Available: http://www.itrs.net
[65] I.-Y. Kang, H.-S. Seo, B.-S. Ahn, D.-G. Lee, D. Kim, S. Huh, C.-W. Koh,
B. Cha, S.-S. Kim, H.-K. Cho et al., “Printability and inspectability
of programmed pit defects on the masks in euv lithography,” in SPIE
Advanced Lithography. International Society for Optics and Photonics,
2010, pp. 76 361B–76 361B.
[66] G. M. Kloster, T. Liang, T. R. Younkin, E. S. Putna, R. Caudillo, and
I.-S. Son, “Printability of extreme ultraviolet lithography mask pattern
defects for 22-40 nm half-pitch features,” in SPIE Advanced Lithogra-
phy. International Society for Optics and Photonics, 2010, pp. 76 360M–
76 360M.
[67] T. Kamo, T. Terasawa, T. Yamane, H. Shigemura, N. Takagi, T. Amano,
T. Tanaka, K. Tawarayama, O. Suga, and I. Mori, “Printability of euvl
mask defect detected by actinic blank inspection tool and 199-nm pat-
tern inspection tool,” in SPIE Photomask Technology. International
Society for Optics and Photonics, 2010, pp. 78 231U–78 231U.
[68] P.-y. Yan, “Euvl ml mask blank fiducial mark application for ml defect
mitigation,” in SPIE Photomask Technology. International Society for
Optics and Photonics, 2009, pp. 748 819–748 819.
[69] J. Burns and M. Abbas, “Euv mask defect mitigation through pattern
placement,” in SPIE Photomask Technology. International Society for
Optics and Photonics, 2010, pp. 782 340–782 340.
[70] S. Hector, “Standards for EUV Masks,”
http://www.sematech.org/meetings/archives/litho/euvl/20050228
/B Hector.pdf, 2005, [Online; accessed 16-March-2014].
137
[71] Y. Du, H. Zhang, and M. D. Wong, “Linear time euv blank defect
mitigation algorithm considering tolerance to inspection inaccuracy,”
in SPIE Photomask Technology. International Society for Optics and
Photonics, 2012, pp. 85 221R–85 221R.
[72] Y. Du, H. Zhang, Q. Ma, and M. D. Wong, “Linear time algorithm to
find all relocation positions for euv defect mitigation.” in ASP-DAC,
2013, pp. 261–266.
[73] Y. Du, H. Zhang, M. D. Wong, Y. Deng, and R. O. Topaloglu, “Efficient
multi-die placement for blank defect mitigation in euv lithography,” in
SPIE Advanced Lithography. International Society for Optics and Pho-
tonics, 2012, pp. 832 231–832 231.
[74] Y. Du, H. Zhang, M. D. Wong, and R. O. Topaloglu, “Euv mask prepara-
tion considering blank defects mitigation,” in SPIE Photomask Technol-
ogy. International Society for Optics and Photonics, 2011, pp. 816 611–
816 611.
[75] A. A. Kagalwalla, P. Gupta, D.-H. Hur, and C.-H. Park, “Defect-aware
reticle floorplanning for euv masks,” in SPIE Advanced Lithography. In-
ternational Society for Optics and Photonics, 2011, pp. 79 740Z–79 740Z.
[76] H. Zhang, Y. Du, M. D. Wong, and R. O. Topalaglu, “Efficient pattern
relocation for euv blank defect mitigation,” in Design Automation Con-
ference (ASP-DAC), 2012 17th Asia and South Pacific. IEEE, 2012,
pp. 719–724.
[77] “Boost Polygon Library,” 2012. [Online]. Available:
http://www.boost.org/doc/libs/1 50 0/libs/polygon/doc/index.htm
[78] “Nangate Open Cell Library,” 2012. [Online]. Available:
http://www.si2.org/openeda.si2.org/projects/nangatelib
[79] A. A. Kagalwalla and P. Gupta, “Design-aware defect-avoidance floor-
planning of euv masks,” Semiconductor Manufacturing, IEEE Transac-
tions on, vol. 26, no. 1, pp. 111–124, 2013.
[80] P.-Y. Yan, G. Zhang, S. Chegwidden, E. A. Spiller, and P. B. Mirkarimi,
“Euvl mask with ru ml capping,” in Photomask Technology. Interna-
tional Society for Optics and Photonics, 2003, pp. 1281–1286.
138
