Control of wafer-scale non-uniformity in chemical-mechanical planarization by face-up polishing by Mau, Catherine (Catherine K.)
Control of Wafer-scale Non-uniformity in
Chemical-Mechanical Planarization by Face-up Polishing
by
Catherine Mau
B.S., Mechanical Engineering
University of California, San Diego, 2006
Submitted to the Department of Mechanical Engineering
in Partial Fulfillment of the Requirements for the Degree of
MASTER OF SCIENCE IN MECHANICAL ENGINEERING
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2008
© 2008 Catherine Mau. All Rights Reserved.
The author hereby grants to MIT permission to reproduce
and to distribute publicly paper and electronic
copies of this thesis document in whole or in part
in any medium now known or hereafter created.
Signature of Author:
Department of Mechanical Engineering
May 9, 2008
Certified by:
17 of Jung-Hoon Chun
grofessor of Mechanical Engineering
Thesis Supervisor
Certified by:
//Naninaji Saka&'
Research Affiliate, Department of haQal Engineering
,9 Thesis Supervisor
Certified by: --
Lallit Anand
4ASACHI•ETTS WIws
OF TECHNOLOGY
JUL 2 9 2008
SI III CIP A IP
Professor of Mechanical Engineering
Chairman, Department Committee on Graduate Students
4Rrr~h
BIL 
RARIES
~- -- I --
Control of Wafer-scale Non-uniformity in
Chemical-Mechanical Planarization by Face-up Polishing
by
Catherine Mau
Submitted to the Department of Mechanical Engineering
on May 9, 2008 in partial fulfillment of the requirements for the Degree of
Master of Science in Mechanical Engineering
ABSTRACT
Chemical-mechanical planarization (CMP) is a key process in the manufacture of ultra-
large-scale-integrated (ULSI) semiconductor devices. A major concern in CMP is non-uniform
planarization, or polishing, at the wafer-scale - primarily as interconnect metal dishing and
dielectric erosion. In conventional face-down CMP, the pad is much larger than the wafer and
the wafer is always in contact with the pad. Thus, non-uniform polishing rate at the wafer-scale
is due to variations in relative velocity, normal pressure, and especially slurry distribution at the
wafer/pad interface. Wafer-scale polishing uniformity requirements are expected to be even
more stringent in the future as the ULSI technology advances toward larger wafers (450 mm)
and ever shrinking feature sizes (< 20 nm).
This thesis presents the theory and experimental validation of a novel, face-up CMP
architecture proposed for achieving a high degree, better than 95 percent of polishing uniformity
at the wafer-scale. The novel design utilizes a small, perforated pad that contacts only a portion
of the wafer during CMP. Polishing uniformity is achieved by progressively translating the pad
away from the polished to the unpolished regions of the wafer. The theory is based on Preston's
Law for material removal rate and an optimal algorithm for pad translation. CMP experiments
were conducted on both blanket and patterned wafers to validate the theory. Polishing of blanket
wafers by non-translating pads showed that the Preston constant is higher at the center of the pad
due to increased slurry flow. Thus, perforations at the pad center were blocked to minimize the
variation in Preston constant. Face-up polishing of patterned wafers with the blocked pad
showed improved wafer-scale uniformity in material removal rate. Dielectric erosion was below
30 nm, less than 5 percent of the interconnect depth, across a 100-mm circular polished region.
However, dishing of the wider interconnects was much greater. Nevertheless, the variation in
dishing across the 100-mm region was less than 35 nm for linewidths ranging from 2.5 gm to
100 gm , also less than 5 percent. Based on the theory and experimental results, several
suggestions for further improving face-up CMP to minimize Cu dishing and dielectric erosion
are offered.
Thesis supervisor: Dr. Jung-Hoon Chun
Title: Professor of Mechanical Engineering
Thesis supervisor: Dr. Nannaji Saka
Title: Research Affiliate, Department of Mechanical Engineering
Acknowledgments
I would like to thank all the people who supported and guided me through my past two
years at MIT. This thesis would not have been possible without their contributions.
My utmost gratitude goes to my advisor, Professor Jung-Hoon Chun, who gave me the
opportunity to be part of his group and taught me how to be a researcher. Despite his busy
schedule, I knew his door was always open whenever I had a problem. I have learned a great
deal from him in problem solving and in communication. His suggestions and professional
advice were invaluable to my development as an engineer and I will take his wisdom with me as
I enter the next stage of my professional career.
I wish to thank Dr. Nannaji Saka, my second advisor, for the countless hours he spent
with me to improve my research skills. I cannot begin to enumerate the knowledge I gained
from working with him: in research, in writing, and above all, in life. I have the deepest respect
for Dr. Saka. His passion towards his work and research philosophy is what I will try to live by.
I would also like to thank my good friend and lab mate, Thor Eusner, who was never too
busy to stop what he was doing to help me with my research problems. I appreciate all his
comments and advice both inside and outside the lab. My time at MIT would not have been half
the experience that it was without Thor and I wish him all the best in his future endeavors.
To all my colleagues in the LMP graduate student office, many thanks for their support
and friendship. I will never forget the wonderful time we spent together and I wish them all the
very best. I would also like to thank my friends: Eehern Wong, Sai Hei Yeung, Leah Acker, and
Jared Thomas, for our many meals together. Our chats about anything and everything kept me
sane during the rough times.
I want to express my appreciation for the staff members at MIT who have helped me with
my research: Gerald Wentworth, David Dow, and Patrick McAtamney of the LMP machine
shop; Tim McClure and Libby Shaw of the Center of Materials Science and Engineering; and
Kurt Broderick of the Microsystems Technologies Laboratory.
I wish to acknowledge the financial support of the Semiconductor Research Corporation
Education Alliance (SRCEA) and Intel Corporation through their Master's Scholarship Program
and the funding of my research project. I also thank my industrial contacts, Dr. Paul Fischer of
Intel and Peter McKeever of Thomas West, Inc., for their invaluable advice and for providing me
with materials for my research.
I thank my best friend Marian Lee for encouraging me to always strive for the best.
Finally I thank my family, especially my sister Angela, for their continual love and
support. It is they who made me the person I am today, and I will be forever grateful to them for
providing me with the opportunities and encouragement to pursue my dreams.
Table of Contents
Title Page .................................................................................... ...........................................
Abstract ...............................................................................................................................
Acknowledgments .......................................................................................................................
Table of Contents .................................................................................................................
List of Figures ................................................................................. .......................................
List of Tables ................................................................................ .........................................
TER 1 INTRODUCTION .............................................................................................. 12
1.1 B ackground ............................................... ...................................................... 12
1.2 Chemical-Mechanical Planarization ......................................... ........... 15
1.2.1 Current CMP Tools................................................... 15
1.2.2 An Integrated, Multi-scale Tribological Model ..................................... 18
1.2.3 The Face-up CMP Architecture ........................................ ......... 22
1.3 O rganization.......................................................................................................... 23
N om enclature .............................................. .................................................... 24
'TER 2 FACE-UP CHEMICAL-MECHANICAL POLISHING......................... 25
2.1 Introduction ........................................................................................................... 25
2.2 G eom etry............................................................................................................... 27
2.3 Kinematic Analysis ........................................................................................ 32
2.4 Material Removal Rate ......................................................... 34
2.4.1 Non-Translating Pad ..................................................... 35
2.4.2 Translating Pad ......................................................... 39
2.5 Numerical Model for the Pad Translational Velocity................................ 41
2.5.1 Discretization .......................................................... 41
2.5.2 Matrix Formulation ...................................................... 42
2.5.3 An Example ........................................................... 50
2.5.4 Discretization Error......................................................................... 52
2.6 Summary ......................................................................................................... 57
N om enclature .............................................. .................................................... 60
TER 3 POLISHING EXPERIMENTS WITH A NON-TRANSLATING PAD....... 61
3.1 Introduction ................................ ..................................................................... 61
CHAP
CHAP
CHAP
3.2 Equipment and Consumables............................................... 61
3.3 Kinematic Effects of Slurry Cup Rotation................................ ........... 63
3.4 Variation of kp with Angular Velocity Ratios................................ ........ 70
3.5 Pad Position Effects .............................................................. .......................... 74
3.6 Sum m ary ............................................................................................................... 77
N om enclature .............................................. .................................................... 78
CHAPTER 4 VALIDATION OF WAFER-SCALE POLISHING UNIFORMITY......... 79
4.1 Introduction ........................................................................................................... 79
4.2 Equipment and Consumables............................................... 79
4.3 Blanket W afer Polishing .................................................................................... 81
4.4 Patterned Wafer Polishing ....................................................... 87
4.4.1 Dielectric Erosion ....................................................... 93
4.4.2 Cu D ishing ................................................................ .......................... 97
4.5 Sum m ary ........................................ 104
Nomenclature ........................................ 108
CHAPTER 5 CONCLUSION ........................................................................................... 109
5.1 Sum m ary ........................................ 109
5.2 Suggestions for Future Work ..................................... 110
APPENDIX A CHEMICAL DISSOLUTION OF Cu AND Ta................... ................... 112
APPENDIX B CONTROL OF FEATURE-SCALE NON-UNIFORMITY BY SPIN
COATING ........................................ 116
B .1 Introduction ......................................................................................................... 116
B.2 Film Thickness................................ 116
B.2.1 Hydrodynamic Model ..................................... 116
B.2.2 Experiments with Different Viscosities ..................................... 118
B.3 Step Coverage ........................................ 120
B.4 Preston Constant ........................................ 128
B.5 Summary ........................................ 128
Nomenclature ........................................ 131
REFERENCES ................................................................................................................. 132
List of Figures
Figure 1.1: The increase in the number of components in a chip as predicted by Dr. Gordon
Moore. [Moore, 1965]...................................................... 13
Figure 1.2: The number of transistors in recent commercial microprocessors. (Intel Corp.) 13
Figure 1.3: Feature and gate size trends as forecast in the ITRS. [ITRS, 2007] ................ 14
Figure 1.4: Cross-section of a multilayer microprocessor chip built by IBM's 90-nm CMOS
technology. (IB M ).......................................... ................................................ 14
Figure 1.5: Interconnect fabrication steps: (a) dielectric deposition and line etching, (b) via
etching if the dual damascene process is used, (c) barrier layer and metal
deposition, and (d) planarization by CMP. ..................................... ....... 16
Figure 1.6: Schematics of CMP tool architectures: (a) rotary, (b) orbital, (c) web-format with
fixed-abrasives, and (d) face-up rotary with an annular, oscillating pad.
[N oh, 2005] ................................................ . . .... . .... . .... . .. . .... . .... . .... . .. 19
Figure 1.7: Cu dishing and dielectric erosion in a die. [Noh, 2005] ................................... 19
Figure 1.8: Definition of the feature-scale non-uniformity factor, a ................................. 21
Figure 1.9: Definition of the wafer-scale non-uniformity factor, fl. [Noh, 2005] ................ 21
Figure 2.1: Schematic of the face-up CMP architecture. [Noh, 2005] ................................ 26
Figure 2.2: Pad translation with respect to the polished region during face-up CMP............. 26
Figure 2.3: Schematic depicting the path of a point P on the wafer during one wafer
revolution and the definition of . ............................................ ........... 28
Figure 2.4: Pad locations where (a) the pad edge is covering the center region of the wafer,
(b) the pad edge is just touching the center of the wafer, and (c) the pad is away
from the center. ..................................................................................................... 30
Figure 2.5: Semi-contact angles for three different pad locations for r* = 0.7 ................... 31
p
Figure 2.6: Definition of the wafer and pad coordinate systems................................. 33
Figure 2.7: Schematic of the entrance and exit semi-contact angles .................................... 36
Figure 2.8: Ah* versus r* for various rotational velocity ratios and pad sizes ................... 38
Figure 2.9: Schematic of the wafer radius and pad translation discretization.............. .43
Figure 2.10:
Figure 2.11:
Figure 2.12:
Figure 2.13:
Figure 2.14:
Figure 2.15:
Figure 2.16:
Figure 2.17:
Figure 2.18:
Figure 2.19:
Figure 2.20:
Figure 3.1:
Figure 3.2:
Figure 3.3.
Figure 3.4:
Figure 3.5:
Figure 3.6:
Figure 3.7:
Figure 3.8:
Figure 3.9:
Figure 3.10:
Flow chart summarizing a numerical method for determining pad translation in
face-up CM P. ........................................................................................................ 45
A discretization scheme involving clusters of points. .................................... 45
Schematic of a translating pad and the subsequent triangular matrix formulation...
.............................................................................................................. . . . . ..... 4 7
Normalized pad location versus polishing time for various rotational velocity
ratios....................................... ........................................................................ 4 9
Schematic of the discretization of wafer radius and pad translation in the five-step
example and the corresponding 90 (r) ...................................... .......... 51
Dimensionless pad translational motion for the five-step example ................... 53
Pad location versus polishing time for the five-step example. .......................... 53
Plots of the Cu remaining on the wafer after each pad translation step ................ 54
Discretization error for different mesh sizes.................................. ........ 56
Error with 2% deliberate overpolishing, Sc,,oxh o = 0.02hc, ............................. 56
Comparison of errors from two different discretization schemes ...................... 58
Photograph of the face-up CMP tool. ........................................ ........... 62
Schematic of the cross section of a TWI-817 stacked pad. .............................. 62
Photograph of the slurry cup ...................................................................... 64
(a) Uniformly spaced and (b) blocked center pad perforation patterns .......... 64
Video screenshots from a polishing experiment with an unblocked pad.......... 66
Material removal by a pad with uniformly spaced perforations compared to
theoretical results; p = 17 kPa, co, = = 19 rad/s (180 rpm). ......................... 68
Video screenshots from the blocked pad polishing experiment. ....................... 69
Material removal by a blocked pad compared to theoretical results; p = 17 kPa,
w, = , = 19 rad/s (180 rpm) ........................................... ............... 69
Variation in k, across the wafer when polishing with different pads. .............. 71
Comparison of experimental and theoretical Ah, when co,, / op, = 0.5 ................ 72
Figure 3.11:
Figure 3.12:
Figure
Figure
Figure
Figure
3.13:
4.1:
4.2:
4.3:
Comparison of experimental and theoretical Ah, when w, / c = 1.5 ................. 73
Normalized material removed per wafer rotation for various rotational velocity
ratios..................................... .......................................................................... 75
k,(r) for various rc,'s calculated from MRR ........................................ ...... 76
Photograph of the face-up CMP tool with the 300-mm platen ................................ 80
A blocked pad with perforations at the intersection of the x-y grooves ............... 82
Video screenshots of the wafer and the pad during the course of face-up
polishing; p = 13 kPa, cw = c, = 16 rad/s, and slurry flow = 150 ml/min.......... 83
Figure 4.4: Comparison of experimental and calculated pad positions versus polishing time in
(a) dimensional and (b) dimensionless form................................. ........ 86
Figure 4.5: Die map of the wafers used for dishing and erosion experiments ........................ 88
Figure 4.6: Cross-section of an SKW6-2 test wafer. (SKW Associates) ............................ 88
Figure 4.7: Measured initial wafer surface geometry: (a) Cu deposition factor, a, (b) initial
step-height, hsi, and (c) combined feature-scale geometry, a h. ................... 90
Figure 4.8:
Figure 4.9:
Figure 4.10:
Figure 4.11:
Figure 4.12:
Figure 4.13:
Figure 4.14:
Figure 4.15:
Figure 4.16:
Theoretical and actual dimensionless pad translation path ............................... 92
Normalized dielectric erosion across the wafer after face-up CMP. ................. 96
Normalized dielectric erosion compared with the multi-scale erosion model from
Eq. (4.2), where a and hsi were approximated using the data shown in Figure 4.7
and D from Eq. (4.5) ............................................................ ........................... 98
Normalized Cu dishing across the wafer after face-up CMP. ........................... 99
Experimental dishing compared with the multi-scale dishing model presented in
Eq. (4.5), where a and hsi were approximated using the data shown in Figure 4.7.
............................................................................................................................. 10 1
Schematic of a wide trench with a polymer coating ..................................... 103
Normalized Cu dishing versus linewidth after the polishing of uncoated and spin-
coated w afers. ..................................................................................................... 103
Confocal micrograph of a new TWI-817 pad surface. .................................... 105
Scaled schematic of sinusoidal pad asperities contacting a wide line feature.... 105
Figure A. 1:
Figure B.1:
Figure B.2:
Figure B.3:
Figure B.4:
Figure B.5:
Figure B.6:
Figure B.7:
Figure B.8:
Figure B.9:
Wafer samples after etching with NaOH+H 20 2. ............... . . . . . . .. .. . . . . . . . . . . . .  114
Schematics of (a) an ideal initial feature topography where a = 0 and hi =0,
(b) an actual feature topography where the feature is replicated on the Cu surface,
and (c) a feature coated with polymer to obtain a = 0 and hi = 0 .................... 117
Profile of a scratch created on an SU-8 coating to measure initial film thickness...
................................................. 119
Comparison of experimental spin coating film thickness with theoretical film
thickness.............................................................................................................. 12 1
Experimental dimensionless film thickness versus dimensionless time compared
with the hydrodynamic model proposed by Emslie et al ................................ 122
Profile of a subdie (a) before and (b) after spin coating with a 4-jim layer of SU-8
photoresist, w = 5 jim and A = 10 im . ..................................... 123
Profile of a subdie (a) before and (b) after spin coating with a 4-jim layer of SU-8
photoresist, w= 10 jim and A = 20 im . ........................................ 124
Profile of a subdie (a) before and (b) after spin coating with a 4-jim layer of SU-8
photoresist, w = 20 jim and A = 40 gim ........................................ 125
Profile of a subdie (a) before and (b) after spin coating with a 4-jim layer of SU-8
photoresist, w = 50 gm and A = 100 l m . ..................................... 126
Profile of a subdie (a) before and (b) after spin coating with a 4-jmn layer of SU-8
photoresist, w= 100 jim and A = 200 im . ........................................ 127
List of Tables
Table 1.1: Input and output CMP process parameters. ..................... ....... 17
Table 2.1: Discretized r and re values from two different discretization schemes ............ 58
Table 3.1: Wafer, consumables, and process parameters for polishing experiments. ........... 65
Table 3.2: Mechanical properties of the materials involved in CMP. ................................ 65
Table 3.3: Uniform pad polishing results with calculated 0c and kp .................................. 66
Table 3.4: Blocked pad polishing results with calculated 0c and kp.................................. 68
Table 3.5: Wafer, consumables, and process parameters for velocity ratio experiments...... 71
Table 3.6: Results from the cow / cop = 0.5 polishing experiment.............................. 72
Table 3.7: Results from the o / op = 1.5 polishing experiment. ..................................... 73
Table 3.8: Wafer, consumables and process parameters for the kp versus r, experiments... 76
Table 4.1: Experimental parameters for the blanket wafer polishing experiments............ 82
Table 4.2: Pad location measurements from the blanket wafer polishing experiments ......... 84
Table 4.3: Measured a and hsi on the SKW6-2 test wafers for various linewidths ............. 90
Table 4.4: Experimental conditions for patterned wafer polishing ..................................... 92
Table 4.5: Measured Cu dishing and dielectric erosion after face-up CMP ...................... 94
Table 4.6: Statistical summary of dielectric erosion at various features after face-up CMP. 96
Table 4.7: Statistical summary of Cu dishing at various features after face-up CMP ........ 99
Table 4.8: Comparison of Cu dishing in uncoated wafer and spin-coated wafers........... 103
Table A.1: Measured SiO 2 thickness on the reference sample before and after etching...... 114
Table A.2: Measured SiO 2 thickness on the blanket sample. ..................................... 115
Table A.3: Measured SiO2 thickness on the patterned sample .......................................... 115
Table B.1: SU-8 spin coating and curing process steps............................... 119
Table B.2: Theoretical and experimental results for the viscosity versus thickness spin
coating experiments for h0 = 5 gm, co= 314 rad/s, and t = 30 s ................... 121
Table B.3: Comparison of feature step-heights for SU-8 coatings of various thicknesses. . 127
Table B.4: CMP conditions for determining the Preston constant of SU-8 photoresist....... 130
CHAPTER 1
INTRODUCTION
1.1 Background
For the past forty years, the semiconductor industry has been fulfilling Dr. Gordon
Moore's 1965 prediction that the number of transistors in a chip would double roughly every two
years, Figure 1.1, [Moore, 1965]. Now, there are 1.7 billion transistors in Intel Corporation's
Itanium server microprocessors. A more recent representation of the increase in the number of
transistors per microprocessor is shown in Figure 1.2. The realization of Moore's Law is made
possible by shrinking component sizes and innovations in materials. Decreasing feature size is
the logical progression in integrated circuit (IC) technology as it increases the capacity per unit
area and functionality, and decreases cost. Reduction in the size of IC components is also driven
by incessant consumer demand for smaller electronic devices. The International Technology
Roadmap for Semiconductor (ITRS) reports that feature size is expected to be reduced to 20 nm
in the year 2017 [ITRS, 2007]. Figure 1.3 shows the decreasing trend of feature and gate sizes as
forecast by ITRS. The semiconductor industry's past and future needs for shrinking devices
have thus triggered rapid advancements in manufacturing technology.
Chemical-mechanical planarization (CMP) is one such enabling technology. Developed
by International Business Machines in 1990 [Beyer et al., 1990], CMP is used at various stages
of IC fabrication for its global surface planarization capabilities. This process is crucial in the
manufacture of multilayer devices, like the microprocessor shown in Figure 1.4, since a smooth
topography is necessary to meet the depth of focus requirements for the lithography tools used in
patterning each additional layer. CMP is used to remove excess metal, or overburden, from the
wafer surface during the fabrication of multi-level interconnects. These interconnects provide
the electrical connection between two layers of the device and it is crucial that the metal
overburden is completely removed to prevent electrical shorts.
16
1514
'413
12
I1
10
98
7
6
5
4
3
2
00
Figure 1.1: The increase in the number of components in a chip as predicted by Dr. Gordon
Moore. [Moore, 1965]
transistors
10,000,000,000
1,000,000.000
100,000,000
10,000,000
1,000,000
100,000
10,000
I ri
Figure 1.2: The number of transistors in recent commercial microprocessors. (Intel Corp.)
zJ
ot LAJ
00
2-
YEAR
1970 1975 1980 1985 1990 1995 2000 2005 2010D
2007 ITRS Product Technology Trends -
Half-Pitch, Gate-Length
2000 2005 2010 2015 2020
Year of Production
2007 - 2022 ITRS Range
--- DRAM M1 1/2 Pitch
o MPU M1 1/2 Pitch
(2.5-year cycle)
-A- Flash Poly 1/2 Pitch
--x- MPU Gate Length -
Printed
-*-- MPUGate Length -
Physical
2025
Figure 1.3: Feature and gate size trends as forecast in the ITRS. [ITRS, 2007]
Figure 1.4: Cross-section of a multilayer microprocessor chip built by IBM's 90-nm CMOS
technology. (IBM)
t
a.IL
4-
of
100.0
10.0
1.0
1995
0r'mt
0IUU_
Due to its high electrical conductivity, Cu is the choice metal for interconnects. A
schematic of the damascene process used for the fabrication of Cu interconnects is shown in
Figure 1.5. The process begins with a layer of dielectric material such as SiO 2. Features are
formed on the SiO 2 surface by photolithography followed by an etching process, Figure 1.5(a).
Since Cu can be used to form both interconnects and vias, a dual damascene process is
commonly employed in industry. This differs from the single damascene method by including
an additional via etch process after interconnect etching, Figure 1.5(b). Next, a thin (20 - 50 nm)
layer of barrier metal such as Ti/TiN or Ta/TaN is deposited onto the surface to prevent Cu
diffusion into SiO 2, which is followed by Cu deposition, Figure 1.5(c). Finally, Cu CMP is used
to planarize the wafer surface and remove the Cu overburden, Figure 1.5(d).
While CMP is currently used in multiple processes on various metal and dielectric
materials, this thesis focuses only on Cu CMP. It is expected, however, that the ideas presented
in this thesis could be transferred to other CMP technologies such as inter-level dielectric (ILD)
and shallow trench isolation (STI) CMP.
1.2 Chemical-Mechanical Planarization
Material removal by Cu CMP comprises two major components. Chemicals in the slurry
modify the Cu surface to yield a softer, porous layer [Cook, 1990; Du et al., 2004] while a
polymeric pad acts on abrasive particles in the slurry to mechanically remove the coating
material [Liu et al., 1996; Ahmadi and Xia, 2001; Paul et al., 2007]. Therefore, the chemical
content of the slurry, abrasive size and shape, pad material and topography all contribute to
material removal rate. Additionally, research has also shown that the process is dependent on
pressure, velocity [Preston, 1927], temperature [Mudhivarthi et al., 2005], and feature geometry
[Steigerwald et al., 1994]. Due to its complexity, which involves a multitude of inputs at various
scales, CMP is a difficult process to control and optimize, and defects are often generated.
Table 1.1 lists the primary contributors to the CMP process along with the outputs from the
process.
1.2.1 Current CMP Tools
The most common type of CMP tool to date is a rotary setup where the wafer is polished
faced down on a large pad (Figure 1.6a). Rotation of both the wafer and the pad provides
Lower level
(a)
(b)
*-Barrier
layer
(d)
Interconnect fabrication steps: (a) dielectric deposition and line etching, (b) via
etching if the dual damascene process is used, (c) barrier layer and metal
deposition, and (d) planarization by CMP.
Figure 1.5:
Table 1.1: Input and output CMP process parameters.
Inputs
Wafer Parameters:
Curvature
Cu thickness
Feature size
Feature density
Slurry Parameters:
Abrasive size
Abrasive material properties
Concentration of abrasives
Selectivity (chemistry)
Flow rate
Pad Parameters:
Material properties
Topography
Conditioning
Mechanical Parameters:
Relative velocity
Pressure
Outputs
Material removal
Cu dishing
Dielectric erosion
Scratching
Contaminants
Waste
relative motion at the interface for material removal. Other types of CMP tools include orbital
(Figure 1.6b) and web formats (Figure 1.6c). The web format, or linear motion, technology is
mainly used with slurry-free fixed abrasives for STI applications [Simpson et al., 2001;
Kulawski et al., 2003]. Instead of using slurry, a roll of abrasive pad material is passed over the
rotating wafer in a linear fashion to remove material. Recently, there has also been some focus
on electrochemical-mechanical polishing (ECMP), which employs an electrical potential to
oxidize Cu to its ions and a polishing pad for removal of the passivation layer. Researchers
claim that pressures much lower than those of conventional CMP can be used with this technique,
which is a benefit for planarizing the mechanically weaker low-k materials [Economikos et al.,
2004]. Low-pressure material removal is also claimed in a face-up rotary tool that utilizes a
high-speed, oscillating pad for polishing a wafer held in the face-up orientation, Figure 1.6d,
[Hoshino et al., 2003].
1.2.2 An Integrated, Multi-scale Tribological Model
For the most part, CMP is a highly empirical process. Interest in applying physical
models to CMP has increased over the past decade in the hope of gaining a better understanding
of the material removal mechanisms and control over non-uniformities and yield.
Two primary non-uniformity issues in interconnect formation by CMP are Cu dishing
and dielectric erosion. A schematic of Cu dishing and dielectric erosion is shown in Figure 1.7.
Copper dishing is defined as the difference in height between the center of the Cu line and the
dielectric at the edge of the feature. Dishing is mainly a problem for features with wide
interconnect Cu lines because the pad deforms over the feature and applies non-uniform
pressures between the edge and the center of the feature. The reduction of metal due to dishing
can cause deterioration in electrical performance. Dielectric erosion is defined as the difference
in the dielectric thickness before and after CMP. Erosion is more widespread in features with a
high areal density of Cu compared with the dielectric because there is increased pressure on the
dielectric, and thus higher dielectric material removal rate.
Most early models of CMP focused on local material removal rates at the feature-scale.
Warnock developed a phenomenological model for the effects of feature geometry on the
process [Warnock, 1991] and Runnels approached the problem by assuming a hydrodynamic
slurry layer [Runnels, 1994] between the pad and the wafer. On the pad side, models were
F
(Or ;lurry
F
-•
Pad
(Annulus)
(d)
Figure 1.6: Schematics of CMP tool architectures: (a) rotary,
fixed-abrasives, and (d) face-up rotary with
[Noh, 2005]
(b) orbital, (c) web-format with
an annular, oscillating pad.
Wafer-scale Feature-scale Total erosion
erosion (e,) erosion (ef) (e = ew+er)4ikrr n
ýj \k /
Field General Subdie
Figure 1.7: Cu dishing and dielectric erosion in a die. [Noh, 2005]
------------------. 
--------------------
·-----
developed based on the deformation of a smooth, elastic pad [Chekina et al., 1998; Lai et al.,
2002] or by assuming that the pad deforms in discrete blocks [Fu and Chandra, 2003; Noh et al.,
2004].
Despite the analyses at the local scale, however, CMP remains a multi-scale process that
requires uniformity across the subdie (feature), die, and wafer scales. Noh developed a
comprehensive, multi-scale tribological model based on pad-wafer contact mechanics at the
feature-scale and non-uniform polishing at the wafer-scale [Noh, 2005]. Non-uniformities at the
feature- and wafer-scales are defined by the pattern geometry and the material removal rate,
respectively. These factors must be controlled in order to obtain uniform planarization.
Feature-scale non-uniformity is described by the Cu deposition factor, a, and the initial
step-height, hsi, as shown Figure 1.8. The deposition factor a is defined as:
a --  (0<a•1) (1.1)
w
where ws is the surface trench width and w is the underlying interconnect linewidth. If a = 0,
the initial Cu surface topography is planar regardless of the underlying pattern geometry and if
a = 1, the trench pattern is exactly reproduced on the Cu surface.
The wafer-scale non-uniformity factor, 8, is defined as the ratio of material removal
rates, MRR, of the slowest and fastest field regions, shown in Figure 1.9.
P - M R s/ow ' field (0 < ,p 1) (1.2)
MRasRes field
If pJ = 1, the MRR at the slowest field is equal to that at the fastest field, and the entire wafer is
polished uniformly.
Polishing behavior at the feature-scale is characterized by a step-height evolution model,
which accounts for the change in pressure distribution on the wafer surface by pad-wafer contact
mechanics and a. Cu dishing and dielectric erosion are then determined based on the evolution
of step-height in the overpolishing stage. In this stage, it is assumed that the fastest die continues
to polish until the slowest die reaches the endpoint. The ratio of the times required for these two
dies to reach their respective endpoints is described by ,. The maximum dishing and erosion
will therefore occur in the fastest die.
hsi
ws = aw
Figure 1.8: Definition of the feature-scale non-uniformity factor, a.
Ahs= #Ahtf Ahf
. i Slowet• - Field FastestField
Figure 1.9: Definition of the wafer-scale non-uniformity factor, 8. [Noh, 2005]
A
Cu dishing, D, is defined as the interconnect step-height after CMP. For a rough pad
with uniform, fully plastic asperities,
D S CII.-1r p )l-exp(-t, (1.3)
D (1-wIA)SC,,ox+wA Y 6 aR,•
where Scu/ox is the selectivity of the slurry, w the linewidth of the interconnect, A the pitch, p the
average pressure, Yp the yield strength of the pad, R, the radius of curvature of the pad asperities,
2a the spacing between pad asperities, and t* the dimensionless polishing time for the
overpolishing stage. And t* is expressed as
=[ (1- w/ A) Sco,, +W/ A •6 6 [( ) aw 1 x
sto,,o6 Pt a2L 1 hcJ +--h- +Scix h] (1.4)
where hc, is the Cu coating thickness, hsj the initial feature step-height, and h, the thickness of
oxide removed at the slowest field region, or the amount of deliberate overpolishing.
Dielectric erosion is defined as the change in height of the high feature (dielectric) during
overpolishing:
1 1 aw+ + S -( / A) D (1.5)
e (1- w/A)Sc,,/ox +w/ Ahc, -h +S,.
From his integrated tribological model, Noh concluded that in order to minimize dishing
and erosion, a must be close to 0, jf close to 1, and overpolishing should be minimal. This
means that the surface should be initially planar and the whole wafer should have uniform
material removal rates.
1.2.3 The Face-up CMP Architecture
The integrated tribological model suggests that wafer-scale polishing should be uniform,
f close to unity, to minimize dishing and erosion. Attemps to address wafer-scale non-
uniformity by the current CMP tools shown in Figure 1.6 include zonal pressure control and pad
oscillation, which relies on complicated control systems and empirical data. Often there is a lack
of physical understanding for these forms of compensation. Meanwhile, the ITRS anticipates an
increase of wafer size to 450 mm in 2012, which indicates that manufacturing technology must
be in place in the coming years. Directly applying current technologies to the larger wafer will
result in a reduction in fi, which will lead to an increase in wafer-scale non-uniformity.
Therefore, a novel face-up CMP tool architecture was developed to control wafer-scale polishing.
This architecture utilizes a smaller diameter pad to enable kinematic control of material removal
rate across the wafer. A steep polishing gradient is developed, and a pad translation scheme is
used to reduce overpolishing time at all points on the wafer.
1.3 Organization
The object of this thesis is to study wafer-scale polishing uniformity by face-up CMP. A
set of process parameters are selected to determine their effects on wafer-scale material removal
rate and a numerical model is developed to determine the pad translation scheme for minimizing
overpolishing time. In Chapter 1, background information on the CMP process is provided and
the integrated tribological Cu CMP model is introduced. Chapter 2 describes the face-up CMP
tool architecture by relating geometric and kinematic parameters to material removal rate. Cases
for both a non-translating pad and a translating pad are discussed. Based on the requirement that
the total material removed across the wafer must be uniform, a numerical method for pad
translation is developed. Chapter 3 details an experimental study on the face-up CMP tool with a
non-translating pad and blanket Cu wafers. The wafers were polished with pads containing
different perforation patterns to investigate the change in Preston constants across the wafer. In
Chapter 4, polishing experiments with a translating pad are presented. Experiments were
performed on blanket Cu wafers to validate the numerical model by comparing the actual pad
position during polishing with the calculated values. Patterned wafers were then polished to
measure Cu dishing and dielectric erosion after face-up CMP to demonstrate wafer-scale
uniformity. Finally, Chapter 5 summarizes this thesis and suggests future work in reducing Cu
dishing and dielectric erosion using the face-up CMP technology.
Nomenclature
D = Cu dishing (m)
e = dielectric erosion (m)
h = thickness of Cu coating removed (m)
hc, = initial Cu coating thickness (m)
h, = thickness of oxide removed at the slowest field during overpolishing (m)
h, = step-height (m)
hsi = initial step-height (m)
MRR = material removal rate (m/s)
p = pressure (N/m2)
Ra = radius of curvature of the pad asperity (m)
Scu/ox = Cu to oxide slurry selectivity
to = dimensionless overpolishing time
w, w, = interconnect linewidth, surface trench width (m)
Y, = yield strength of the pad asperity (N/m 2)
a = feature-scale non-uniformity factor, Cu deposition factor
/ = wafer-scale non-uniformity factor
2 = pitch of Cu interconnect lines (m)
Aa = spacing between pad asperities (m)
CHAPTER 2
FACE-UP CHEMICAL-MECHANICAL POLISHING
2.1 Introduction
It has been widely observed that the material removal rate across the wafer is non-
uniform. This non-uniformity is caused by the variation in applied pressure due to wafer
curvature [Fu and Chandra, 2001] and non-uniform slurry film thickness due to the single-point
slurry delivery system common in conventional tools [Thakurta et al., 2001]. However, little can
be done to control wafer-scale polishing with current CMP tools because the entire wafer is in
contact with the pad throughout the polishing process.
In conventional face-down CMP, the wafer is always in contact with the pad and it is
impossible to independently terminate polishing at different points on the wafer. Thus,
overpolishing will occur in areas of high material removal rates. These areas continue to polish
after the endpoint has been reached until the regions with low material removal rates also finish
polishing. Overpolishing is a primary cause of Cu dishing and dielectric erosion in CMP
[Stavreva et al., 1997; Noh, 2005]. The face-up CMP tool architecture, Figure 2.1, is proposed
to improve wafer-scale polishing uniformity by allowing the pad to translate away from the
region of the wafer that has completed polishing [Noh et al., 2006]. To allow for better control
of the pad translation, a polishing gradient is induced by geometry and kinematics in which the
highest material removal rate occurs at the center of the wafer. With such variation in material
removal rate, the pad is only required to travel uni-directionally away from the center of the
wafer, as shown in Figure 2.2.
The face-up CMP tool is also novel in the method by which slurry is delivered to the pad-
wafer interface. Non-uniform wafer-scale polishing in CMP is in part due to non-uniform slurry
distribution at the pad-wafer interface [Coppeta et al., 2000; Fu et al., 2005]. In face-down CMP,
slurry is fed at the periphery of the wafer. Therefore, overpolishing occurs at the edge of the
wafer where there is an ample supply of fresh slurry. This issue will continue to gain importance
as the industry increases the size of wafers to 450 mm [ITRS, 2007]. The face-up
C".
Pad
(Perforated)
Figure 2.1: Schematic of the face-up CMP architecture. [Noh, 2005]
N
%0
~p --
Figure 2.2: Pad translation with respect to the polished region during face-up CMP.
I
/
Wr.V W • J •FU
.01
co
I I
k
I
I
I
CMP tool addresses the issue of uniform slurry distribution by supplying slurry through
perforations in the pad, as shown in Figure 2.1.
2.2 Geometry
The primary difference between the face-up and the conventional face-down CMP
architecture is that the pad does not cover the entire wafer during the polishing process in face-up
CMP. As a result, material removal at a point, P(r,O), on the wafer occurs only during the period
when that point is in contact with the pad. Figure 2.3 shows the path of P as the wafer completes
one revolution. The duration of contact between P and the pad is dependent on the semi-contact
angle, 0c . From Figure 2.3, Oc can be defined by applying the Law of Cosines with points P, Ow,
and Op:
0 = cos-'r -rp 2(2.1)
2rr(2
where rc is the distance between the wafer center and the pad center and r, is the radius of the
pad. The limiting cases for Eq. (2.1) are: O, = ; for points on the wafer in continuous contact
with the pad, and 0, = 0 for points outside the contact region. The location of the pad in relation
to the wafer and pad radius, and its relation to 90, can be categorized into three cases:
c r 2 _r-p2
2rr
cos_, r
C{osi +r2 +r -2 rp
2
= 2rric
(2.2)
0 < (rýrc( +rp)
r>(rc +p )
(otherwise (r, + )
otherwise
r < (rP -re )
r, < r : c = -
r > (r" + )
+ y
I-
'ad
Wafer
Schematic depicting the path of a point P on the wafer during one wafer
revolution and the definition of 0 .
Figure 2.3:
These pad locations are shown schematically in Figure 2.4 for rp =0.7r, . In
Figure 2.4(a), rc < ,r and the pad overlaps the center region of the wafer, a practical initial
position for face-up CMP. Figure 2.4(b) shows the pad when its edge is just touching the center
of the wafer, rc = rp, and Figure 2.4(c) shows the pad after it has translated away from the center,
rcc > rp. It is often useful to express Eq. (2.2) in dimensionless form. To do so, the following
dimensionless variables are introduced:
r - r / rw
rp - r / rw
(2.3)
Equation (2.2) can then be expressed as:
1
C ( os r*2 + * 2 r*2
rC < rp O = -cos/ 2r re
0
2re
2r~
rcc = r C
r.C > r:, Oc
*'2 * 2 *2r+r c, rp
2r r*
r < r - r*
r* *+
ohrw
rh> (ri+s)
otherwise
The change in O (r*) with each pad location is shown in Figure 2.5 for r* = 0.7. When
p
the pad overlaps the center of the wafer, there is continuous contact with the pad in that region
and Oc = 1. The region of no contact, Oc = 0 is outside the wafer due to the pad size relative to
the wafer radius. For the case when the edge of the pad just touches the center of the wafer, 9O
begins at 0.5 which indicates that the center of the wafer is in contact with the pad for exactly
(2.4)
rc = •p
rCC> rp :
=0
-o
I
0\
I
/
/
(C)
Figure 2.4: Pad locations where (a) the pad edge is covering the center region of the wafer,(b) the pad edge is just touching the center of the wafer, and (c) the pad is away
from the center.
m0
I I
0.5
0
0 0.5
Figure 2.5: Semi-contact angles
1.0 1.5
r = r/r
w
for three different pad locations for r* = 0.7.
p
half the time of one wafer rotation. Finally, when the pad is moved away from the center of the
wafer, the termination of contact in that region is described by O, = 0 near r*= 0 . The
magnitude of 0c is also decreased in this pad position. Therefore, the period of contact at each
point on the wafer has a non-linear dependence on pad size and pad location.
2.3 Kinematic Analysis
The relative velocity between a point on the wafer and a point on the pad is determined
by kinematics. A set of wafer and pad coordinate systems are defined in Figure 2.6. A polar
coordinate system (r, 0) is assigned to the wafer with its origin at the center of the wafer, Ow. A
pad coordinate system, situated at the center of the pad, Op,, is defined as (r', '). Lastly, a
global Cartesian coordinated system, (x, y), is defined at the center of the wafer.
When the wafer rotates at angular velocity, ow, the tangential velocity of a point P(r, 8)
on the wafer is expressed as a vector with by x and y components:
v, = -cwr sin 9ex + ojr cos Oe, (2.5)
Similarly, the tangential velocity at a point on the pad P'(r',9') with pad angular
velocity, op, and pad translation velocity, vc, is expressed in the pad coordinate system as:
Vp, = (v, -opr'sin9')e, + opr'cos9'ey (2.6)
Points from the pad coordinate system can be converted to the wafer coordinate system
by equating their global Cartesian positions:
x = r cos9 = r'cos 0'+ rC (2.7)
y = rsin 0 = r'sin 0'
Therefore, Eq. (2.6) can be expressed in wafer coordinates as:
v,V =(v, -oprsin9)ex + op (rcos - r, )ey (2.8)
The velocity of the wafer relative to the pad is:
VR(r,O) = Vp - Vp,
vR(r,0)= [(w - o )rsin+ v,, + • [(w, )COS (2.9)
XWafer
Figure 2.6: Definition of the wafer and pad coordinate systems.
The magnitude of relative velocity is expressed as:
vR(r,O)= +[(w,-cOP)rsin9+vc]2  -w )rCos  wr,,] 2  (2.10)
A simplifying case can be drawn from Eq. (2.10). If the wafer and the pad rotates at the
same angular velocity, Cw, = cp = , the relative velocity is constant regardless of the point's
position on the wafer:
VR = v 2 +(ccE ) (2.11)
Furthermore, when the pad is not translating, voc = 0, or is translating at a negligible velocity,
Vec << irW, Eq. (2.11) can be approximated by
VR = wrct (2.12)
2.4 Material Removal Rate
The local material removal rate in CMP is described by the Preston equation [Preston,
1927],
-d = k, p -vR  (2.13)
where h is the thickness of coating removed, t the polishing time, kp the Preston constant, and p
the nominal pressure. As researchers have previously noted, kp is not a fundamental constant.
Instead, it depends on polishing conditions such as pad stiffness and surface topography, and
slurry concentration and selectivity [Liu et al., 1996; Saka et al., 2001; Luo and Dornfeld, 2003;
Noh et al., 2005]. While pressure and relative velocity are mechanical parameters that can be
measured and controlled, determining the Preston constant requires exhaustive information
gathering during the process. Thus maintaining a uniform Preston constant is an essential, yet
challenging, requirement for uniform material removal rate.
In face-down polishing, the pad is always in contact with the wafer. This means that
material is being removed from the wafer at all times during one wafer revolution. Consequently,
the thickness of coating removed during one revolution, Ahr, is expressed as
Ahr= fdhdt (2.14)dt
where At, is the time required for one wafer revolution. In terms of 0, with dt = dO/ cw,
Eq. (2.14) can be rewritten as:
Ir 1 dhjAh 2 1 - dhdO (2.15) cow ldtl
In face-up CMP, however, only a portion of the wafer is exposed during polishing. Thus,
a point on the wafer may not be in contact with the pad for the full rotation. The period of time
that the pad contacts the wafer is dependent on the semi-contact angle, 90. To account for the
non-contact period in the face-up configuration, Eq. (2.15) is expressed as
Ah, (r) = kppvR dO (2.16)
0 cl(r) C
w
where 0, and 9c2 are the entrance and exit semi-contact angles, respectively. Figure 2.7 shows
the change in Oc for a point P on the wafer as it travels one wafer rotation underneath a
translating pad. In this case, Od > 0•2 as a result of the change in rcc. If the pad is not translating,
Oci = Oc2
The average material removal rate for one wafer revolution, MRR, is thus
Ah (r, rc) I 2(r,r cc)MRR(r, r) = Ah(r,) pvRd (2.17)Atr 2; o•,tr.rc) P
where Atr = 2r /co .
2.4.1 Non-Translating Pad
While the face-up tool concept requires that the pad translate away from the center of the
wafer, it is useful to first examine the case when the pad does not translate. That is, v1c = 0 and
r, is constant. If kp, p, co and ap are also assumed constant, the integral in Eq. (2.16) is only
dependent on vR
Ah,(r) = ! )rsin +(r),+ - coS + [4- 2d( (2.18)
Furthermore, suppose that co = w,. From Eq. (2.12), when ow, = = w and v, = 0,
vR is constant over the entire pad-wafer interface: vR = cor. The material removed per wafer
VX
Figure 2.7: Schematic of the entrance and exit semi-contact angles.
rotation at r is then:
Ahr(r) = kprpC () dO (2.19)
When the pad is not translating, the entrance semi-contact angle is equal to the exit semi-
contact angle, Oc, (r)= O2 (r) = 0 (r). Therefore, Eq. (2.19) can be further simplified to
Ahr(r) = 2kppr•c• (r) (2.20)
and the average material removal rate per wafer revolution is given by
MRR(r) = k pegrc 0(r) (2.21)
It is often convenient to determine material removal by the time required to reach the
process endpoint, te. The relation between polishing time and material removal is expressed as
te(r) = (2.22)k, per~ce (r)
where hce is the initial Cu coating thickness.
A dimensionless variable, Ahr, is defined as the ratio of the material removed at radius r
to the material removed at the center of the wafer (r = 0) per wafer rotation:
AhA(r)= Ah(r) (2.23)
Ah, (0)
From Eq. (2.20), the only spatially varying parameter in Ah, is 98. Therefore, Ah* for
the non-translating pad and ,,, = cp case is:
Ahr (r) = (r) (2.24)
0C (0)
Since te(r) is inversely related to Oc (r), Eq. (2.24) can also be expressed as
Ahr- te(0) (2.25)
te(r)
Figure 2.8 shows Ah (r*) for varying wafer-pad rotational velocity ratios and size ratios.
The decreasing polishing gradient with the fastest material removal rate at the center of the wafer
is crucial for progressive polishing. This allows the pad to translate uni-directionally toward the
edge of the wafer to avoid overpolishing.
.......... co /co = 0.5
w p
owl/ 0 = 1.0
0 0.2 0.4 0.6 0.8 1.0
Normalized Radius, r
Figure 2.8: Ah* versus r* for various rotational velocity ratios and pad sizes.
< 1.0
0
o
0.8
o 0ra> 0.60E
a)
S0.4
N
0 0.2
z
0
I - ------
-- --
20I .
In all the cases shown, the pad slightly overlaps the center of the wafer, r,, < rp, which is
evident in Figure 2.8 by the simultaneous polishing of the central region. This sort of pad initial
position is necessary in practice because it guarantees that the center of the wafer is completely
polished. An additional benefit of the overlap is that it increases the material removal rate across
the wafer, thereby decreasing polishing time. Most importantly, the overlap produces a steeper
polishing gradient for better control over progressive polishing.
2.4.2 Translating Pad
The novel concept in face-up CMP is that the pad translates laterally across the wafer to
minimize overpolishing time. In the previous section, it was possible to obtain a closed-form
solution for MRR because the period of contact between a point on the wafer and the pad for
every wafer revolution is constant when the pad is not translating. For a translating pad the
semi-contact angle may change over one wafer rotation, Oc 90c2 , and MRR is a function of the
displacement of the pad with respect to time, rQ(t).
The total thickness of material removed at a point on the wafer, Ah(r), can be obtained
by integrating Eq. (2.17), from t = 0 to the completion of the polishing process, t = te (r):
Ah(r) r(r) fOc2(r=rc 2 2(
Ah(r)=-• , o(r.r kPP tR,-Op)rsin 0+vc L\+ [(wO-oP)rcosO+prc ddt(2.26)
For uniform material removal across the wafer, Ah(r) should be independent of r.
Furthermore, the total material removed should be equal to the initial Cu thickness with some
additional overpolishing thickness. In the previous chapter, h, was defined as the amount of
deliberate oxide overpolishing. Therefore, the equivalent Cu overpolishing is equal to Sc,,io ho
and
Ah(r) = hc,, + Scoxh°  (2.27)
where hca is the initial Cu coating thickness and Sc,0ox is the Cu to oxide slurry selectivity. The
average material removal rate at r is the total material removed divided by the total polishing
time
h + S h,,oMRR(r) = C C/ox o (2.28)
te (r)
When the pad translates continuously, re is a function of time, t, and v, is the time
derivative of r,,(t). Equation (2.26) should then be evaluated to obtain the displacement of the
pad, rc(t). It is important to note that while Ah(r) is equated to a constant h,, + Sc,/oxh o, the
right hand side of Eq. (2.26) still has r dependencies.
The case when kp and p are constants and co, = co is again considered. Equation (2.26)
can then be expressed as
k p te(r) 0c2(rrc) dr 2 t)J +2f-f 0 ,-9(rr) dt L r"C(t)] d9dt = hc,, +Scu /oxh (2.29)2) o -0,tr C) dt
If the pad translates slowly, the entrance and exit semi-contact angles will be nearly constant per
wafer rotation and the time derivative of rec(t) will be small. Assuming Oc M 90c2 and neglecting
v,, the inner integral can be evaluated:
F E ( , i(rr,(t))i rc(t)dt = h t, +S ac)oh e (2.30)
From Eq. (2.2), 9c is a piecewise function with a constant fr term, a cos' term, and a 0 term.
Therefore, Eq. (2.30) can be rewritten as a sum of three integrals with one term being zero:
lei(r) 1,2(r) r c(t2 _ 2 (hc + SCtIOXho) (
/e r (t)dt + cos-' rp2r (t)dt=(hc+s/o) (2.31)0o 1, (r) 2rrcc (t) kPP, p
where tel and te2 are c0 transition times at r:
rc, (t) < rp - r, O < t < tel
rp -r < rc(t) r +r, te, < t < te2  (2.32)
rc, (t) > rp + r, t > te2
Equation (2.31) assumes that the pad is large enough to initially contact the edge of the wafer,
r, > r, - r,, and translates unidirectionally away from the center of the wafer. In the case of a
smaller pad that does not initially reach the edge of the wafer, there will be additional regions of
Oc = 0 and the transition times should be redefined to reflect that.
Equation (2.31) cannot be simply evaluated because the nature of the function rec(t) is
unknown. The task is even more difficult due to tel(r) and te2(r) also being dependent on rec(t).
Thus, numerical methods are appropriate for the resolution of pad translational motion for
uniform material removal.
2.5 Numerical Model for the Pad Translational Velocity
Pad translation can be described by the change in wafer center to pad center distance over
time: rec(t). The numerical approach to obtaining rc,(t) discretizes the wafer radius into a set of
points where the total material removed will be computed. The pad displacement is also
discretized into steps. The model then computes the duration of time the pad needs to stay at
each translation step in order to completely polish the wafer at all the discretized radial locations.
By reducing the size of the translation steps, it is possible to obtain an estimate of the continuous
pad translation function, r,,(t) [Mau et al., 2008].
2.5.1 Discretization
The first step in the numerical procedure involves the discretization of radial location, r,
and rc. First, the initial pad position is defined as rc . To ensure that the center of the wafer is
completely polished, rc, should be so chosen that the pad overlaps the center of the wafer, as in
Figure 2.4(a). From Figure 2.5, when reo < rp, the overlap region on the wafer is always in
contact with that pad. Hence O, , or material removal rate, is constant. The radius of the central
region that will polish concurrently is given by:
ro,,erlap = rp - rYe (2.33)
To avoid redundancy, r should be discretized so that only one point, ro, is within the overlap
region. Therefore, let
ro roverap, (2.34)
Now, r r,..., , ., r, can be arbitrarily chosen as long as ri > ro and the points span the
entire wafer. Because that r and rec are coupled through 0, the discretization of rc should be
dependent on r. From Figure 2.5, when the pad is away from the center of the wafer, 0, is zero
at the edge of the pad, and hence, MRR is also zero. Therefore, to avoid underpolishing between
ri and ri+,, rc, should be so chosen that the edge of the pad is at ri for every translation step after
the pre-defined riC:
rY,÷ = r,+r,, i=O,1,...,n-1 (2.35)
Figure 2.9 shows a schematic of the wafer radius and the pad translation discretized into six
points each as defined by Eqs. (2.33)-(2.35). Note that ri,...,r,,...,r, need not be uniformly
spaced as long as the edge of the pad lies on each point on the discretized wafer radius.
2.5.2 Matrix Formulation
The objective of the matrix formulation is to obtain the duration, At,, the pad should
reside at each position, rý , to guarantee that the radial location, ri, is completely polished.
Moreoever, since translation occurs in steps, vc = 0 in each time interval Atj.
For a particular pad position, rej , the material removal rate at ri can be calculated by
Eq. (2.17). Thus, the total thickness of material removed at P(ri) by a discretely translating pad
is the sum of the material removed by the pad at each position
Ah, - MRR(r, rc,).At , for i=0,1,...,n (2.36)
j=0
where n is the number of pad translation steps. To completely remove the Cu layer, the total
thickness of material removed should be equal to the initial Cu thickness, hca. In practice, there
is some deliberate overpolishing to deal with the uncertainties of the process and ensure that the
Cu is completely removed from the wafer. This is incorporated into the model by Scu/oxho, the
equivalent thickness of Cu removed during overpolishing. Finally, a set of equations can be
written as:
-MRR(r, rc,) Atj = hc,, + Sc,,oxho (2.37)
j=0
for i = 0,1,...,n.
Equation (2.37) is expressed in matrix form as:
= (hcu + Sc,,,1 hk)
1
ii
(2.38)
h
; :;;~, ~.s~t:~·~/-,
r;~~s:g *~di;- ca
a·i:I:·:i·sis!i:
'"c-'~:~i~:"~·d: -· ~i
·i::~:~~;i·~*:.·· "'"
~ IbB;:
~~;·~·~
k·`"~;-L?;~::~·li
· i·~~·
ii--"b:-:-i
~~`-::··-d~l"
- -- *( (·g %%%:
•# \ \ \ \ \\
1 / \ \\ \ \\
-_, r I I I I I
r 1  r2  r3 r4 r5
\ 00\ / / / / /0
/ / / / /
i;·X /
Figure 2.9: Schematic of the wafer radius and pad translation discretization.
43
where M is an (n+l) x (n+ 1) square matrix of material removal rates, t a set of time intervals,
and h a set of initial Cu thickness.
It may be noted that the rows of M correspond to the ri parameter while the columns are
related to rcj . The unknown in Eq. (2.38) is At, where each entry Atj represents the time the
pad stays at a particular r,, . Figure 2.10 is a flow chart summarizing the steps in the pad
translation algorithm.
Each individual term of the matrix, My, is the average MRR at ri when the pad is at ri, a
which can be found by Eq. (2.17). Since there is only one solution to Eq. (2.38), it is possible to
have a matrix M such that negative At's are required to satisfy the equality. Physically, this
means that overpolishing will occur at some ri's and uni-directional pad translation is not
feasible for the chosen ri's and ri, 's. Negative solutions are more likely to occur when random
discretization schemes are used. For example, if rw is discretized in clusters of points and the
pad translation steps are large, as in Figure 2.11, one translation step will be required to satisfy
several Ah equations. The appropriate discretization of r and r,, is therefore crucial in obtaining
M such that the resulting At has only positive terms.
The matrix M can be further simplified if kp andp are constants and ,, = c,:
kppcok)r~
recO (ro, ro) ... (ro,
rccoO (r.,,r,,o) ... rC ,,Oc(rr,,,rC)
Ato
_At,.
1
1
(2.39)
where Oc represents Oc (r, r-). Rearranging the constants yields,][ ]- = z(hc +Scu, 0oh0) ] (2.40)
Equation (2.40) shows that MRR is dependent on two geometric parameters, O, and rce. It is
important to also note that Oc is a non-linear function of ri, rcj , and r, as defined by Eq. (2.2).
= (On + scuioho )
Build matrix M
Discretize rc, Calculate MRR for where entry Mi, is the
according to ri each set of r, and rq,, MRR at ri when pad
is at rcej
Build vector h such Solve vector At, represents the
that each entry hi equation M*At = h time interval the pad
equals the desired Cu for column vector At should stay at rc,,
removal at r,
Figure 2.10: Flow chart summarizing a numerical method for determining pad translation in
face-up CMP.
\ \\.. --,-N NN\\ \ \
\ \ \ \
I I I I
1 I I I
I I I I
/ / II
S/ /
,/ /// ,
Moo Moo 0 0
= MIo MIo 0 0
M 20 M2 0  0 0
M30 M30 M30 M30
Figure 2.11: A discretization scheme involving clusters of points.
( I I I I I
To express Eq. (2.39) in dimensionless form, the following variables are introduced:
r• - r, / r,
(2.41)
At = Atj / Ato
where Ato is the time interval the pad stays at the initial position rcco , or the time required for the
center of the wafer to be fully polished. When there is an initial overlap of the pad at the wafer
center, Oc(r = 0) = ;r . From Eq. (2.17),
Ato = hc, + Sc°xh (2.42)kP p -co- rcc
Thus the dimensionless form of Eq. (2.40) is:
9* ... *1
Coo con rCC0  At *
* L J. rc :C (2.43)
In (" I... ,, rc, At I
where M* = 8 -r. . In this form, the dependence of MRR on process parameters, such as kp, p,
and co, only appears within the normalizing constant, Ato. The pad radius, however, is
integrated into the matrix through the computation of 0, and the discretization of rc.
If Oc, and therefore 0, is triangular, the resulting M matrix will also be triangular. By
geometry, it is possible to obtain a lower triangular 0, matrix. As the pad moves away from the
wafer center, it ceases contact with that region, resulting in Oc = 0 for those points. Since the
rows of 0c represent increasing r and the columns represent increasing rce, if the pad leaves one
discretization point at every translation step, every column will have one additional zero at the
top row(s) than the column to its left - creating an upper-right corner of zeros. The pad
translation necessary for the formulation of a lower triangular matrix is shown in Figure 2.12.
The ability to obtain a triangular M matrix is already in place under the discretization
scheme given by Eqs. (2.33)-(2.35). Figure 2.9 shows how the pad terminates contact with one
radial point at every translation step.
/0- 0 0 6o 0 0
O & O & O 0Co ,
Figure 2.12: Schematic of a translating pad and the subsequent triangular matrix formulation.
Figure 2.12: Schematic of a translating pad and the subsequent triangular matrix formulation.
Equation (2.43) can then be rewritten as:
0* O rOi At0  1
S 9• "rCC : (2.44)
* L... 0* At I
A lower triangular 90 matrix simplifies the solution of Eq. (2.43) by facilitating forward
substitution. By definition, Ato = 1. Solving the equation formed by the first row of the matrix
will result in the same solution:
9*, .r~° -At = r
* i (2.45)
Ato I
The equation obtained from multiplying the second row of O (r) will then only contain the
unknown At,. Solving for At* will leave At 2 as the only unknown in the third equation, and so
on. Thus
Ato = 1
OC .At* +0* r* -At = r
1.0  CCO 0 C1.1  c, I cc0
9* r* -At* +* O r* At+* r* -At= .A *(2.46)
C2,0  CO 0 C2,1 C I C2 .1  CC 2 co
C. .r*  At* + . r* -•At* +... + * -r* -At*=
c.0 CCo 0 Cl cc 1 I CI C,, fl CCo
Figure 2.13 compares the computed dimensionless pad location as a function of polishing
time for various wafer-pad rotational velocity ratios. For all cases kp and p are constants and
r, = 0.7r,, rco = rp -0.1r,,, and n = 50. It was shown in the non-translating pad case that the
polishing gradient is steeper if co,, /p = 0.5 . Therefore, the pad translation motion is more
gradual. The steeper gradient offers easier control over pad velocity since the pad stays at each
rcc for a longer period of time. However, this control is gained at the expense of longer polishing
times. The non-translating pad case also shows that a large portion of the wafer will polish at
nearly the same time when the larger ratio is used, so for quicker pad motion and shorter
polishing times, w, /cop = 1.5 is appropriate. Because of the higher velocities involved, tighter
control of pad displacement is required for the desired results.
a,
6o0C
CU
C)Cnco
v_
0cn
C -
(DO •
v_*EII
CuCu
oCu
0
E
.5 3.0 3.5
Polishing Time, t
4.0
= t / At o
4.5
Scow / = 0.5
(w, wp p I
-co / Cp = 1.5
5.0 5.5 6.0
= I
Figure 2.13: Normalized pad location versus polishing time for various rotational velocity ratios.
2.0
1.0
1.0 1.5 2.0 2.5
Dimensionless
0
0 0.5
I I I I I _ I III
I
I I I I_ I I I I I I I
2.5.3 An Example
Let rw = 50 mm, rp = 35 mm and rco = 30 mm. By Eq. (2.34), ro = 5 mm . The rest of the
wafer from ro to rw is then evenly divided into five increments: r = {5, 14, 23, 32, 41, 50} mm.
Using Eq. (2.35), r',j = {30, 40, 49, 58, 67, 76} mm. The dimensionless values will be
r
0.10
0.28
0.46
64
0.82
1.00
and r = =
0.60
0.80
0.98
1.16
1.34
1.52
(2.47)
A schematic of the discretized wafer radius and pad translation is shown in Figure 2.14 along
with Oc(r*) for each rc . From Eq. (2.43),
M =
0.60
0.33 0.26
0.27 0.27 0.22
0.23 0.25 0.25 0.20
0.19 0.23 0.24 0.24 0.20
0.15 0.20 0.23 0.24 0.23 0.19
(2.48)
As noted previously, the zeros in the upper-right corner of M* represent the absence of contact
between the pad and the wafer.
Solving for At, by Eq. (2.46) will result in
At* =
1.00
1.03
0.23
0.26
0.30
0.37
(2.49)
which represents the dimensionless time that the pad remains at each translation step.
Equation (2.49) can be used to control the pad motion.
0 0.2 0.4 0.6 0.8 1.0
r =rlr
Figure 2.14: Schematic of the discretization of wafer radius and pad translation in the five-step
example and the corresponding O (r).
51
Figure 2.15 plots r versus At* for this five-step example. Points A and B represent the
time the pad remains at its initial position before the center of the wafer is clear. The pad then
translates quickly from B to C as it moves away from the center overlap region that is completely
polished. From C to D is another long timestep due to the steep decrease in Oc when the pad
moves away from the center of the wafer to rc , as shown in Figure 2.14. Lastly, the pad
translates more or less uniformly from D to E to finish polishing the rest of the wafer.
Assuming some common face-up polishing parameters: hc, =1 pm , h, =0 pm,
k, =5.0 x 10-" 1/Pa, p =13 kPa , and co= 16 rad/s , the time for the center to polish is
Ato = 320 s from Eq. (2.42). This value can be applied to the dimensionless result in Eq. (2.49)
to obtain the polishing time for each translation step. Figure 2.16 plots the computed pad
translational motion for this five-step example. From the obtained time durations, the thickness
of Cu remaining on the wafer after every pad translation step is computed and shown in
Figure 2.17.
2.5.4 Discretization Error
The set of At*'s obtained from Eq. (2.38) guarantees that exactly hc, + Sc,/oh,, coating
thickness is removed at each ri. The model, however, does not consider what happens between ri
and ri+j and errors may occur in that region. After solving for At, the thickness of material
removed at any point P(r) along the wafer radius can be obtained by summing the material
removed at each translation step, as in Eq. (2.36), up until rPc,.. From the Eq. (2.35), r,,,, is
positioned so that the edge of the pad is at ri. Beyond this step, the pad no longer contacts the
annulus between ri and rijl and no material removal takes place (Figure 2.9). In the case where
kp andp are constants and c,,, = co,,
Ah(r)= r C . At.~ O(r, r,) (2.50)
for r1 <r • .
Since Ah(r1) = Ah(r~i+) = hc + Scorho, if Ah is a non-constant, continuous function, a
maximum or minimum must exist between r1 and ri+,. The extremum represents the largest
II
0(3
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
t= t / Ato
Figure 2.15: Dimensionless pad translational motion for the five-step example.
- I--
200 400 600 800 1000 1200
Polishing Time, t, [s]
Figure 2.16: Pad location versus polishing time for the five-step example.
S80
.- 60
E(D
U-
C,
20
0
0
~80
60
40
20
0 0.2 0.4 0.6 0.8 1.0
r = r/r
W
- Step 2
0 0.2 0.4 0.6 0.8 1.0
Step 1
0 0.2 0.4 0.6 0.8 1.0
r= rl rw
Step 3
0.2 0.4 0.6
r =r/rr =r/r
0.8 1.0
- Step 4
0 0.2 0.4 0.6 0.8 1.0
r = r/ rW
0 0.2 0.4 0.6 0.8 1.0
r =r/r
W
Figure 2.17: Plots of the Cu remaining on the wafer after each pad translation step.
_• on
o- OU
C
(E
a)
LL
C,
0)
0
80
40
0
• .
0u - Step 5
of 80
60
40
20
"-I
Y
]1
" '
OUnr
L
l
U
deviation from the endpoints, and therefore the largest polishing error. Because the
discretization scheme places the minimum Oc at ri, Ah must increase from that point, indicating
that the extremum is a maximum and the error describes overpolishing. To find the point where
the maximum occurs, Eq. (2.50) is differentiated with respect to r and set equal to zero:
8Ah ,+, 88c(r,raAh' r -Atj (r, = 0 (2.51)jr T =0 -o r
Solving Eq. (2.51) for r will result in the point of maximum polishing, rmax, and subtracting
Ah(r,) from Ah(rmax) will yield the maximum overpolishing error.
By increasing the size of the matrix or the number of discretizations, n, the region
between ri and ri+i is decreased. As a result, Oc(r,+,) is close to Oc(r') and the variation in
material removal rate between the two points will be small. The error is therefore expected to
decrease as n increases. Figure 2.18 compares the discretization error for different mesh sizes:
n = 5, 10, and 50. The errors between ri and ri+1 are obtained by finding Ah by Eq. (2.50) for a
fine mesh of points at 0.1 mm increments along the radius and comparing the result with ho.
When the discretized intervals become smaller, the magnitude of error between the wafer mesh
points also decrease. Figure 2.18 shows, as expected, that there is no error in material removal at
the points along the discretized wafer radius.
From previous discussion, the algorithm guarantees exact material removal at ri.
However, the computed polishing error does not always have to be 0%, since there can be
deliberate overpolishing, h, > 0. In those cases, the whole error curve will be shifted up by the
percentage of specified overpolishing, as shown in Figure 2.19. Such computation is useful for
deciding how much overpolishing time is allowable for the entire wafer to still meet
specifications. The other parameters used in the simulations are: r, = 50 mm, ry = 35 mm, and
rc, = 30 mm. Additionally, kp and p are assumed constant, and cO,, = co .
Computing the discretization error is also valuable when comparing different
discretization schemes. The scheme described by Eqs. (2.33)-(2.35) is designed to avoid
underpolishing between ri's. However, the matrix M in Eq. (2.38) can be formulated with any
set of ri's and r,, 's. For example, consider the following two discretization schemes: uniform
intervals and no underpolishing as outlined by Eqs. (2.33)-(2.35). In the uniform intervals
scheme, rw was divided into 5 increments of 10 mm, and r, was defined as 10 mm steps starting
Po
lis
hi
ng
 E
rro
r, 
[%
]
Po
lis
hi
ng
 E
rro
r, 
[%
]
0
I I
II
CD ,C
D
0 ODI 0t 0 Ii 0
· 0,
from rc . In the no underpolishing scheme, r0 = 5 mm by Eq. (2.34). The rest of the wafer,
from ro to r, was evenly divided into 5 sections. The values for rc, to rc, was obtained using
Eq. (2.35). Table 2.1 lists the discretized values for both methods. Figure 2.20 compares the
errors from the two discretization schemes. The length parameters for the simulations are
rw = 50 mm, rp = 35 mm and rco = 30 mm.
Although the magnitude of error is similar for both cases, in practice it is required to
avoid underpolishing. The excess metal on an underpolished wafer can lead to electrical shorts,
resulting in unusable dies and reduced yield. While overpolishing may still cause defects, small
amounts of overpolishing is necessary to counteract the uncertainties of the process and ensure
that all the Cu overburden is completely removed. Moreover, Figure 2.18 has shown that
overpolishing can be controlled by using finer discretizations in the model. A discretization
scheme like the one outlined by Eqs. (2.33)-(2.35) is therefore the more practical choice in pad
translation computation.
2.6 Summary
This chapter introduces the face-up CMP tool architecture and presents a numerical
method for determining a pad translation scheme that would minimize overpolishing by face-up
polishing. First, the geometric relations between the pad and the wafer are described to define
the time of contact between a point on the wafer and the pad. Second, a kinematic analysis is
used to obtain the relative velocity of a point on the wafer with respect to the pad. Third, the
geometrical and kinematical parametes are incorporated into the Preston equation for local
material removal rate to express the material removed at a point on the wafer per wafer
revolution, which can then be used to obtain the average material removal rate at that point.
Two cases of pad motion are discussed: non-translating pad and translating pad. In the
former case, kp, p, c,, and co, are assumed to be constants. Since the pad is not translating, r,
is also constant and hence, v" = 0. If the pad rotates at the same speed as the wafer, the resulting
material removal rate is only dependent on the semi-contact angle, 90. The polishing gradient is
controlled by the change in contact angle across the wafer.
Table 2.1: Discretized r and re values from two different discretization schemes.
Uniform Intervals No Underpolishing
Index, i r1 (mm) r (mm) r, (mm) rcc, (mm)
0 0 30 5 30
1 10 40 14 40
2 20 50 23 49
3 30 60 32 58
4 40 70 41 67
5 50 80 50 76
r = r/r
W
Figure 2.20: Comparison of errors from two different discretization schemes.
For the pad translation case, the objective is to obtain the pad displacement with time,
r,,(t), for uniform material removal across the wafer. Even when kp, p, ow, and o, are constant,
the material removal rate cannot solved analytically due to the dependence of the time and
geometric parameters on the unknown, r,,(t). Therefore, a numerical approach is developed. In
this method, the wafer radius is discretized into a set of points. The pad displacement is treated
as a series of steps so that the non-translating pad material removal rates can be used to compute
the material removed at each step. Finally, the discretization error is evaluated and its effects on
overpolishing are obtained. The polishing error is used to show the effects of overpolishing and
to compare different discretization schemes.
Nomenclature
h = thickness of Cu coating removed (m)
h, = initial Cu coating thickness (m)
h, = thickness of oxide removed during deliberate overpolishing (m)
Ahr, Ah, = thickness of coating removed in one wafer rotation (m), normalized value
Ah = total thickness of coating removed (m)
kp = Preston constant (m2/N)
MRR, MRR = material removal rate, average per wafer rotation (m/s)
p = pressure (N/m2)
Ow, ,, = center of the wafer and the pad
P, P' = point in the wafer coordinate system, pad coordinate system
r, 0 = coordinates in the wafer polar coordinate system
r', 0' = coordinates in the pad polar coordinate system
- = discretized radial wafer coordinate (m)
rC, rc = wafer center to pad center distance (m) and normalized value
r, = discretized wafer center to pad center distance (m)
rax = point of maximum material removal between r, and r'+ (m)
rove,.l,,p = radius of central area on wafer that will polish concurrently due to the
pad/wafer center overlap (m)
rw, r,, r, = wafer radius, pad radius (m), and normalized pad radius
Sc,,ox = Cu to oxide slurry selectivity
t, t' = polishing time (s) and normalized value
te = time required to reach the process endpoint (s)
tel, te2 = semi-contact angle transition times (s)
Ato = time for center of the wafer to be completely polished (s)
Atj, At* = time duration of pad translation stepj (s) and normalized value
v, = translational velocity of the pad; rate of change of rc (m/s)
vR = relative velocity of the wafer with respect to the pad (m/s)
0c, 9 0 = semi-contact angle (rad) and normalized value
O~c, ,c2 = entrance and exit semi-contact angle (rad)
W,w, cop = angular velocities of the wafer and the pad (rad/s)
CHAPTER 3
POLISHING EXPERIMENTS WITH A NON-TRANSLATING PAD
3.1 Introduction
A steep polishing gradient is fundamental for progressive polishing [Saka and Chun,
2007], and therefore the material removal rate for a non-translating pad must first be validated.
Accordingly, polishing experiments were conducted on blanket Cu wafers to measure the
material removal rate and Preston constant across the wafer. Two pad perforation patterns were
used to optimize slurry distribution at the pad-wafer interface, and hence the Preston constant.
Experiments were conducted at various wafer-pad rotational velocity ratios to compare their
effect on the polishing gradient across the wafer. Finally, experiments were performed with the
pad positioned at different distances from the center of the wafer to compare the change in
Preston constants.
3.2 Equipment and Consumables
All polishing experiments were performed on a rotary face-up CMP tool, shown in
Figure 3.1. The 100-mm wafers were held with the coating facing upward in a vacuum chuck.
A perforated pad was attached to a slurry cup and the normal load was applied by compressed air.
Slurry was supplied to the cup by a peristaltic pump. Both the wafer and pad were rotated in the
same direction during polishing. To prevent interruption during the course of polishing, a video
camcorder captured the experiments and the video clips were used to analyze the areas of Cu
removal afterwards.
Commercial stacked pads manufactured by Thomas West, Inc. were used to polish the
wafers. The proprietary pad face material, TWI-817, was made of polymer-impregnated fibers,
and the SP-7 subpad was of felt-type material. X-Y grooves on the pad face aid in distributing
the slurry across the surface during polishing. A schematic of the pad cross section is shown in
Figure 3.2. The face-up CMP architecture requires the delivery of slurry through perforations in
the pad; however, this feature is not yet available commercially. Therefore, holes were punched
Figure 3.1: Photograph of the face-up C
6.35 mm
TWI-817 Face Material
0.635 mm
0.813 mm
Figure 3.2: Schematic of the cross section of a TWI-817 stacked pad.
MP tool.
1.40 mm
1.40 mm
ZZ
into the pad at the center of the squares formed by the X-Y grooves using a stainless steel punch
and a drill press. Due to the lack of in-situ conditioning which has been shown to extend pad life
[Muldowney and James, 2004], new pads were used for each experiment. This ensured that the
material removal rates were comparable between experiments. Prior to polishing, the pads were
conditioned with deionized water and a stiff nylon brush. This procedure revitalized the pad
surface and removed the debris left on the surface from hole-punching.
Cabot Microelectronics iCue 5001 Cu CMP slurry, was used for the polishing. The
slurry contained fumed alumina particles with an average particle diameter of 2.8 pim that make
up 3% of the total slurry volume. Hydrogen peroxide making up 3% the total volume was added
to the slurry and the pH of the mixture was measured prior to usage.
3.3 Kinematic Effects of Slurry Cup Rotation
The face-up CMP tool architecture addresses the issue of uniform slurry flow to the pad-
wafer interface by supplying slurry through perforations in the pad. In order to do so, the pad
must be attached to a slurry cup as shown in Figure 3.3. To investigate the kinematic effects of
the cup rotation on slurry distribution and material removal rate, polishing experiments were
performed on blanket Cu wafers using pads with different perforation patterns: uniformly spaced
and sized perforations as shown in Figure 3.4(a) and central region blocked, Figure 3.4(b) [Mau
et al., 2007]. The process parameters were kept constant for the two tests to isolate the effects of
slurry flow. These parameters are shown in Table 3.1 and the material properties of the coatings,
abrasive particles, and pad are listed in Table 3.2. Due to the surface topography of the pad, the
measured local material properties can span a wide range of values [Eusner, 2008].
Table 3.3 lists the polishing time, t, the radial location of the cleared region, r, and the
calculated values of semi-contact angle, 0c, and Preston constant, kp, from Eqs. (2.2) and (2.22),
respectively. When polishing with a uniformly perforated pad, the Cu was first removed in an
annular pattern at r = 27 mm and the polished area extended in both directions as time
progressed. One minute after the initial substrate opening, the Cu at the center of the wafer was
cleared, creating discontinuous exposed SiO 2 regions. Finally, the two regions merged to form
the final circular polished region. The polishing pattern can be seen in the video screenshots in
Figure 3.5.
Figure 3.3. Photograph of the slurry cup.
(b)
Figure 3.4: (a) Uniformly spaced and (b) blocked center pad perforation patterns.
Table 3.1: Wafer, consumables, and process parameters for polishing experiments.
Parameter
hcu (pm)
r,, (mm)
Pad type
rp (mm)
rcc (mm)
Slurry
Slurry additive
pH
ow, (rad/s) (rpm)
cop (rad/s) (rpm)
vcc (m/s)
p (kPa) (psi)
Slurry flow rate (ml/min)
Value
1.0
50
TWI-817
35
30
Cabot iCue 5001
H2 0 2 - 3% vol
8
19(180)
19 (180)
0
17(2.4)
100
Table 3.2: Mechanical properties of the materials involved in CMP.
Material E (GPa) H (GPa)
Cu 128.0 1.22
Ta 186.0 0.80
SiO 2  92.0 15.0
A120 3  350.0 20.0
TWI-817 Pad (wet)+  0.09 - 0.68 0.02 - 0.17
+[Eusner, 2008]
Table 3.3: Uniform pad polishing results with calculated 0, and kp.
t (s) Ato /t r (mm) r / r, 6 (rad) / zr k,(x 10- 13 Pa')
420 1.14 25 0.50 1.37 0.44 5.79
29 0.57 1.28 0.41 6.19
480 1.00 4 0.08 3.14 1.00 2.21
15 0.30 1.67 0.53 4.15
36 0.72 1.11 0.35 6.26
540 0.89 5 0.10 3.14 1.00 1.96
13 0.26 1.77 1.77 3.48
40 0.80 1.01 1.01 6.10
600 0.80 7 0.14 2.28 0.73 2.43
12 0.24 1.82 0.58 3.04
42 0.84 0.96 0.31 5.78
660 0.73 9 0.18 2.06 0.65 2.45
10 0.20 1.97 0.63 2.56
44 0.88 0.91 0.29 5.53
720 0.67 45 0.90 0.89 0.28 5.20
780 0.62 46 0.93 0.85 0.27 5.01
t = 8 min
t= 10 min t= 12 min
Figure 3.5: Video screenshots from a polishing experiment with an unblocked pad.
Figure 3.6 shows the normalized thickness of material removed per wafer rotation, Ahr
along the wafer radius and compares the experimental results with the calculated values from
Eq. (2.24). It should be noted that Eq. (2.24) assumes that kp and p are constant across the pad-
wafer interface. The deviation between the experimental and predicted polishing gradients is a
result of a variation in kp induced by non-uniform slurry distribution in the contact area. The
initial ring of Cu removal indicates that the material removal rate was greatest near the center of
the pad where the tangential velocity of the slurry in the cup was zero. Since the slurry at this
region was essentially motionless, its downward gravitational flow was not hindered by the
inertial forces from the rotating cup. As a result, more abrasive particles were delivered to this
area and the material removal rate was enhanced.
The blocked pad in Figure 3.4(b) has no perforations at the center of the pad. Thus,
slurry flow from the center of the pad, or the region of zero tangential velocity, is suppressed.
Polishing with the blocked pad produced a uni-directional polishing gradient, which is necessary
for face-up polishing. Table 3.4 summarizes the time and location of complete Cu removal for
the blocked pad polishing experiment. The Cu at the center of the wafer first cleared at t = 7 min
and extended outward as shown in the video screenshots, Figure 3.7. The initial radius of Cu
clearance was 2 mm, which is less than the 5 mm pad-wafer center overlap. This indicates that
the effective pad radius is smaller than the actual pad radius. Some causes for the pad edge
effect are reduced applied pressure at the edge of the pad or reduced slurry flow due to the lack
of perforations.
Figure 3.8 compares the experimental Ah: along the radius of the wafer with theoretical
values computed from Eq. (2.24). The plot shows that polishing progressed uni-directionally as
predicted by the theory. The disparity between the experimental and theoretical values can again
be attributed to non-constant kp. Table 3.4 also lists the calculated kp at various radii. While the
modified pad produced desirable polishing results, the calculated kp still varied within the wafer,
supporting the general notion that the Preston constant is difficult to manipulate. Nonetheless,
the modified pad mitigated kinematical effects to some extent and provided better control of kp
than the pad with uniformly spaced holes.
1.0
0.8
' 0.6
0.4
0.2
0
Material removal by
theoretical results; p =
0
r/r
a pad with uniformly spaced perforations compared to
17 kPa, co, = co, = 19 rad/s (180 rpm).
Table 3.4: Blocked pad polishing results with calculated Oc and kp.
Figure 3.6:
t= 8 min t= 10 min
t= 12 min t= 14 min
Figure 3.7: Video screenshots from the blocked pad polishing experiment.
<1
0 0.2 0.4 0.6 0.8 1.0
r/r
Figure 3.8: Material removal by a blocked
co, = wo, = 19 rad/s (180 rpm).
pad compared to theoretical results; p = 17 kPa,
Figure 3.9 compares the variation in kp along the radius of the wafer for the two pads.
Both cases showed an increase of kp with r, which also indicates that MRR is greater at the outer
edge of the wafer. Accordingly, the data points are above the predicted material removal curve
in Figures 3.6 and 3.8. The Preston constant ranged from 2.0 x 10- 13 to 6.3 x 10-13 1/Pa when
polishing with the unblocked pad with a mean of 4.3 x 10-13 1/Pa, compared with 2.2 x 10- 13 to
4.7 x 10-13 1/Pa and a mean of 3.7 x 10-1'3 1/Pa when the blocked pad was used. Therefore, kp
must be somewhat controlled to produce the uni-directional polishing gradient necessary for
progressive polishing. Through further optimization of the perforation geometry, e.g., add or
remove holes at the center, it may be possible to gain greater control over k,.
3.4 Variation of kp with Angular Velocity Ratios
The polishing gradient can be further controlled by manipulating the relative angular
velocity between the pad the wafer as shown in Figure 2.8. Using pads with the blocked center
perforation pattern, blanket Cu wafers were polished at different wafer to pad angular velocity
ratios, ow, / ,p, to study its effect on material removal rate. The process parameters for these
experiments are in shown Table 3.5.
Decreasing o, / cop should increase the control of material removal by creating a steeper
polishing gradient, thus allowing more time to translate the pad. Table 3.6 lists the time of polish
of points along the radius of the wafer and the kp calculated from MRR. The steep gradient was
attained partially in the central region of the wafer when co,, / c, = 0.5, as shown in Figure 3.10.
However, the issue of increasing kp towards the edge of the wafer continues to be present, and
the outer region of the wafer polished much faster than the equations predicted.
According to Figure 2.8, increasing co, / , will result in a more gradual polishing
gradient. This may be useful for one-step polishing where the pad does not need to translate in
as many steps. Table 3.7 and Figure 3.11 show the polishing results when o / CO = 1.5. The
initial Cu removal occurred over approximately one-third of the wafer, then extended outward
relatively slowly. The polishing gradient was therefore flatter near the center of the wafer. In
this case, the initial overlap did not have the predicted effect on material removal rate at the
center. It was expected that since 0c was much larger at the overlap region, the Cu in the region
I.U
6.0
5.0
4.0
S3.0
2.0
x 10-13
* *
aS
· · ·
0 Unbiocked Pad
SBlocked Pad0
*
,U *
* UnlceUa
0 lce a
0 10 20 30 40 50
r, [mm]
Figure 3.9: Variation in kp across the wafer when polishing with different pads.
Table 3.5: Wafer, consumables, and process parameters for velocity ratio experiments.
Parameter , / op = 0.5 Wo,,/ op = 1.0 wo, ,/op = 1.5
hcu (pm) 1.0 1.0 1.0
rw (mm) 50 50 50
Pad type TWI-817 TWI-817 TWI-817
rp (mm) 35 35 35
rcc (mm) 30 30 30
Slurry Cabot iCue 5001 Cabot iCue 5001 Cabot iCue 5001
Slurry additive H20 2 - 3% vol H20 2 - 3% vol H20 2 - 3% vol
pH 8 8 8
ow (rad/s) (rpm) 10 (100) 19 (180) 24 (225)
cop (rad/s) (rpm) 21 (200) 19 (180) 16 (150)
vcc (m/s) 0 0 0
p (kPa) (psi) 17 (2.4) 17 (2.4) 17 (2.4)
Slurry flow rate (ml/min) 100 100 100
_ I I I I
r
-
-
E
E
Table 3.6: Results from the cw, / o = 0.5 polishing experiment.
t(s) Ato / t r (mm) r / rw, 0 (rad) / ,r kp(x 10- 13 Pa-')
480 1.00 6 0.13 2.50 0.80 2.55
540 0.89 8 0.15 2.15 0.68 2.71
600 0.80 8 0.15 2.15 0.68 2.44
660 0.73 11 0.23 1.89 0.60 2.62
720 0.67 31 0.63 1.22 0.39 5.08
780 0.62 35 0.70 1.1.3 0.36 5.52
840 0.57 40 0.79 1.01 0.32 6.44
900 0.53 41 0.83 0.99 0.31 6.33
960 0.50 43 0.87 0.94 0.30 6.60
1020 0.47 44 0.88 0.91 0.29 6.57
1080 0.44 45 0.90 0.89 0.28 6.58
1.2
1.0
0.8
*. 0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1.0
r/r
Figure 3.10: Comparison of experimental and theoretical Ahr when wc, / w, = 0.5.
Table 3.7: Results from the co, / w = 1.5 polishing experiment.
t(s) Ato / t r (mm) r / rw, 0 (rad) /t" k (x 10-13 Pa-)
461 1.00 17 0.33 1.97 0.63 4.50
480 0.96 32 0.63 1.31 0.42 4.77
540 0.85 36 0.72 1.17 0.37 4.40
600 0.77 40 0.79 1.06 0.34 4.10
660 0.70 41 0.82 1.01 0.32 3.80
720 0.64 43 0.86 0.94 0.30 3.58
780 0.59 44 0.88 0.92 0.29 3.34
I .
1.0
0.8
L. 0.6
0.4
0.2
n
0 0.2 0.4 0.6 0.8 1.0
r/rw
Figure 3.11: Comparison of experimental and theoretical Ahr when cwo / c, = 1.5.
would polish much quicker than the rest of the wafer. The gradual polishing gradient caused by
the increase in rotational velocity ratio will then result in the areas outside the overlap region to
be polished at nearly the same time. The experiment, however, showed that the whole wafer
polished in a short span of time, with the edge taking longer than expected.
In Figure 3.12, results from the velocity ratio experiments are compared with those from
the blocked pad experiment in the previous section where w,, / CO =1.0. The plot shows that
there is no significant change in Ah, in the outer region of the wafer between the cases when
wo,, / c, = 0.5 and o, / o, = 1.0. It is likely that rotational effects on slurry output were still
influencing material removal rates, since the pad rotated at different speeds for each of the three
tests. Therefore, one may not be able to control Preston constant by velocity ratio alone without
considering the absolute rotational velocity of the pad. Finally, it should be noted that due to the
limitations of visual endpoint detection, there may have been some discrepancies in the time of
initial substrate exposure, thus causing some disagreement between the experimental and
theoretical curves.
3.5 Pad Position Effects
It has been shown thus far that it is difficult to maintain a uniform Preston constant
during polishing. In progressive polishing, the pad is required to translate uni-directionally
across the wafer. Therefore, it is important to study the variation in kp with pad location.
Experiments were performed by polishing three blanket Cu wafers at various pad locations on
the wafer. The consumables and process parameters are summarized in Table 3.8. These
process variables were kept constant while re was varied. To calculate MRR and kp, the time
and location of complete Cu removal required across the wafer was measured.
Figure 3.13 compares kp(r) when polishing with the pad center at different distances from
the wafer center. There was a general trend that kp increases with r, independent of rc. However,
translating the pad also had an effect on kp(r), particularly when rce > r,. In all cases, kp was
lowest at the inner edge of the pad, which suggested that edge effects were influencing material
removal rate. Possible causes for the decrease in material removal rate include the pressure not
being uniformly distributed to the edge of the pad or a decrease in slurry flow to the edge area
-0 +
X
+ 
X
+
+,
+
,,i.
+ cowP =
Scow =p
x cow cop=
P
0.5
1.0
1.5
0 0.2 0.4 0.6 0.8 1.0
r/r
W
Figure 3.12: Normalized material removed per wafer rotation for various rotational velocity
ratios.
0.8
a" 0.6
0.4
0.2
E
Table 3.8: Wafer, consumables and process parameters for the kp versus rc experiments.
Parameter
he, (ptm)
rw (mm)
Pad type
rp (mm)
Slurry
pH
Ow (rad/s) (rpm)
cop (rad/s) (rpm)
Vcc (m/s)
p (kPa) (psi)
Slurry flow rate (ml/min)
I n
Value
1.0
50
TWI-817
35
5% vol. 300 nm A120 3 particles
3% vol. H20 2
4
18(170)
18(170)
0
12(1.7)
50
x 10-13
U
A· i
A A
10 20 30 40
r, [mm]
Figure 3.13: kp(r) for various rcc's calculated from MRR.
due to the lack pad perforations. The experiments also showed that there was less variation in kp
toward the periphery of the wafer, even when polishing with different rcc's. Due to the small 0c
in that region, the residence time of the pad was limited and the change in material removal rate
caused by variation in kp and rc, may not be as apparent.
3.6 Summary
This chapter presented a series of polishing experiments performed on a face-up CMP
tool with no pad translation. First, pads with different perforations patterns were used to polish
blanket Cu wafers to determine the effects of the pattern on slurry flow distribution. It was
determined that uniformly spaced perforations in the pad led to increased slurry flow from the
central area of the slurry cup, presumably due to a lower tangential velocity of the slurry in the
cup. The higher slurry flow resulted in an increase in Preston constant and material removal rate.
A pad with no perforations in the centeral area was used to block slurry flow from that region.
Polishing with the blocked pad resulted in a uni-directional polishing gradient as predicted by the
theory. Thus, the experimental results showed that the Preston constant was better controlled
with the blocked pad.
The blocked pad was then used to polish wafers at different wafer-pad angular velocity
ratios. It was shown that decreasing the ratio resulted in a steeper polishing gradient at the
central region of the wafer. However, there was minimal difference at the outer region of the
wafer when compared with the case where the angular velocities were equal. The experiments
also showed that increasing the velocity ratio created a flatter polishing gradient, which may be
useful for cases where fewer pad translation steps are desired.
Finally, blanket wafers were polished with the pad at different distances from the center
of the wafer. These experiments were used to determine the effect of pad location on Preston
constant. The results showed some variation in Preston constant, but the disparity was less at the
periphery of the wafer. This effect was likely due to the small contact angles in that region. The
variation in Preston constant was not significant, possibly due to the low material removal rates.
Nomenclature
h = thickness of Cu coating removed (m)
hcu = initial Cu coating thickness (m)
Ahr, Ah, = thickness of coating removed in one wafer rotation (m), normalized value
k -= Preston constant (m2/N)
MRR, MRR = material removal rate, average per wafer rotation (m/s)
p = pressure (N/m 2)
rcc = wafer center to pad center distance (m)
rw, r, = wafer radius, pad radius (m)
Ato = time for center of the wafer to be completely polished (s)
t = polishing time (s)
vc, = translational velocity of the pad; rate of change of rcc (m/s)
vR = relative velocity of the wafer with respect to the pad (m/s)
ý = semi-contact angle (rad)
co , co = angular velocities of the wafer and the pad (rad/s)
CHAPTER 4
VALIDATION OF WAFER-SCALE POLISHING UNIFORMITY
4.1 Introduction
An experimental study to validate the numerical pad translation model is outlined in this
chapter. First, the pad was manually translated at one-minute intervals to uniformly polish a
100-mm region on 300-mm blanket Cu wafers. The pad location was measured at each step and
compared with those obtained by the model. Multiple wafers were polished to test for
repeatability. Patterned 200-mm wafers were then polished using the pad translation velocities
determined by the model to study wafer-scale polishing uniformity. Cu dishing and dielectric
erosion were measured at subdies across the wafer to validate uniform material removal at the
wafer-scale.
4.2 Equipment and Consumables
Experiments were performed by polishing wafers on a rotary face-up CMP tool, shown in
Figure 4.1. A perforated pad, blocked in the center region, was attached to the slurry cup and
slurry was delivered to the cup by a peristaltic pump. Normal load was applied to the pad by
compressed air. To enable lateral movement, the pad assembly was attached to a linear stage
that consisted of a lead screw driven by a stepper motor. The stage could either be controlled
manually with a jogger, as in the blanket wafer polishing experiments, or by a computer, as in
the patterned wafer experiments.
A 300-mm wafer carrier was used for the translating pad experiments to facilitate the
polishing of 200- and 300-mm wafers. The carrier assembly comprised of an aluminum platen
and a glass plate stacked between two rubber sheets as shown in the inset of Figure 4.1. The
glass plate provided a planar reference surface and the rubber sheets prevented slippage during
the process. The wafer was situated on the top rubber sheet and was held in place by friction
during polishing. While the objective of this thesis is to obtain uniform wafer-scale polishing for
ompressedA
Endpoir
Eanmnr
Lineal
;t|nnP.
tepper|
er
ber
;s
ber
ninum
Figure 4.1: Photograph of the face-up CMP tool with the 300-mm platen.
II~ "W,
100-mm wafers, polishing larger wafers to 100 mm ensured that the pad traveled across a planar
surface. Concerns regarding the pad sliding over the interface between the wafer edge and the
carrier were avoided.
The wafers were polished using mixture of Cabot Microelectronics iCue 5001 Cu
polishing slurry and H20 2, which made up 3% the total volume. The pH of the slurry was
measured prior to each experiment. Thomas West TWI-817 polishing pads with SP-7 sub-pads
were the pad materials used. These consumables were the same as those used for the non-
translating pad experiments described in Chapter 3. Due to the non-uniform Preston constant
measured in the non-translating pad experiments, blocked pads were used. To further enhance
slurry flow rate, the perforations were placed at the intersection of the grooves as shown in
Figure 4.2. Each pad was conditioned with deionized water and a stiff nylon brush before usage.
4.3 Blanket Wafer Polishing
To validate the numerical pad translation model, 300-mm blanket Cu wafers were
partially polished to a circular area of 100 mm in diameter at the center of the wafer. Three
wafers were polished using constant process parameters to test for repeatability. The process
parameters are listed in Table 4.1. In these experiments, the pad was initially positioned to cover
the center of the wafer. The pad remained at the initial position until the Cu coating at the center
of the wafer was completely removed. Next, the pad was translated toward the edge of the wafer
in one-minute intervals so that the edge of the pad touched the edge of the polished region at the
beginning of each time step. The overlap between the pad and polished region was about 2 mm.
This overlap was kept at a minimum to reduce both over- and underpolishing, yet still allow the
Cu at the boundary of the polished region to be completely removed. The process ended when
the radius of the central SiO2 region reached 50 mm. Figure 4.3 is a series of video screenshots
showing the wafer and the pad at various times during a polishing experiment.
The distance between the wafer center and the pad center, rcc, was measured after each
translation step and the results are shown in Table 4.2. The data show that the progression of
polishing was similar for the three runs with the center clearing between 5.0 and 5.5 minutes and
the total polishing time ranging from 14 to 16 minutes. This suggests that the Preston constant,
kp, was consistent among the experiments.
S70 mm 
.5 mm
Figure 4.2: A blocked pad with perforations at the intersection of the x-y grooves.
Table 4.1: Experimental parameters for the blanket wafer polishing experiments.
Parameter
hcu (ým)
rw (mm)
Pad type
rp (mm)
rC (mm)
Slurry
Slurry additive
pH
co, (rad/s) (rpm)
c, (rad/s) (rpm)
p (kPa) (psi)
Slurry flow rate (ml/min)
Value
1.0
50
TWI-817
35
30
Cabot iCue 5001
H20 2 - 3% vol
8
16(155)
16(155)
13 (1.9)
150
Video screenshots of the wafer and the pad during the course of face-up
polishing; p = 13 kPa, co, = co = 16 rad/s, and slurry flow = 150 ml/min.
Figure 4.3:
Table 4.2: Pad location measurements from the blanket wafer polishing experiments.
t (s) t / At o  rc, (mm) r, / ,,
0 0 30 0.60
330 1.00 31 0.62
420 1.27 34 0.68
480 1.45 38 0.76
540 1.64 40 0.80
Run 1 600 1.82 43 0.86
660 2.00 50 1.00
690 2.09 54 1.08
720 2.18 58 1.16
780 2.36 60 1.20
840 2.55 67 1.34
900 2.73 72 1.44
0 0 30 0.60
330 1.00 35 0.70
420 1.27 36 0.72
480 1.45 37 0.74
540 1.64 38 0.76
Run2 600 1.82 45 0.90
660 2.00 55 1.10
720 2.18 62 1.24
780 2.36 68 1.36
840 2.55 72 1.44
900 2.73 76 1.52
960 2.91 80 1.60
0 0 30 0.60
300 1.00 36 0.72
360 1.20 36 0.72
420 1.40 37 0.74
480 1.60 38 0.76
Run 3 540 1.80 40 0.80
600 2.00 43 0.86
660 2.20 55 1.10
720 2.40 60 1.20
780 2.60 70 1.40
840 2.80 73 1.46
Figures 4.4(a) and (b) compare the experimental and calculated pad positions versus
polishing time in dimensional and dimensionless form, respectively. The M matrix in Eq. (2.38)
was computed by discretizing the wafer radius, rw, into 50 intervals and assuming that kp and
pressure, p, are constant, i.e. kp is independent of r and rc. Using the time of initial Cu clearance
Ato with Eq. (2.42), the kp at the center of the wafer was computed for each polishing experiment
and the mean value, kp = 5.0x10-13 1/Pa, was used as the constant value in the model. The
amount of oxide overpolishing, ho, was assumed to be zero because the pad was moved away
from the polished region shortly after the endpoint was reached.
In practice, the face-up CMP process must start with the pad slightly overlapping the
center of the wafer to guarantee that the center is completely polished. AB and CD in Figure 4.4
show two long time-steps at the beginning of the process. AB describes the time the pad must
stay at its initial position, r,,c, before the center of the wafer is polished. Next, CD is due to the
steep decrease in the semi-contact angle, 9O, when the pad leaves the center of the wafer. From
Figure 2.5, when the pad terminates contact with the wafer center (rc, > rp), there is a significant
decrease in O, (r). Therefore, after the center is cleared, the pad must remain at the next step for
a long period of time in order to finish polishing the region immediately outside the overlap.
Due to the 5-mm overlap of the pad at the center of the wafer, the model assumes that a
circular region 10-mm in diameter will be first polished concurrently. Then the pad translates
quickly so that its edge is directly on the boundary of the polished region. This is shown in
Figure 4.4 by the sharp increase in r, from B to C and constant r, from C to D. Experimentally,
the constant pad position during time-step CD is reflected by small, discrete displacements for
the first three minutes after the center has been cleared. The pad translational velocity was not
zero because the model assumes the edge of the pad moves directly to the boundary of the
polished region, while experimentally there was an overlap between the pad and polished region.
Furthermore, edge effects reduced the effective pad radius and resulted in an initial polished area
that is smaller than the assumed 10 mm (from the pad/wafer center overlap). Therefore, a
number of small steps were required for the pad to travel from B to D.
The discrepancy between the experimental and calculated values between points D and E
in Figure 4.4(a) is primarily due to the spatial variation in Preston constant. The non-translating
pad experiments in the previous chapter, Figure 3.9, had shown an increase of kp with radial
A frI^
F-"E
L,.C.
ou
I,
L
a,
U.-
',
Polishing Time, t, [s]
(a)
0'
Z.
L 1.C
II
O.5
13
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
t = t / At
(b)
Figure 4.4: Comparison of experimental and calculated pad positions versus polishing time in
(a) dimensional and (b) dimensionless form.
I
location, r, which increased material removal rate. This increase in kp was likely due to chemical
effects from the slurry that remained on the surface of the wafer after exiting the pad-wafer
interface. Screenshots from the polishing video in Figure 4.3 show that there was a layer of
slurry on the surface of the wafer throughout the experiment. Since the pad contacts the points
on the periphery of the wafer for a shorter period of time (due to smaller 0c 's), the chemicals
from residual slurry layer had a longer reaction time with the coating before the pad
mechanically removes the material. This may have altered the mechanical properties of the
coating and created a surface that was easier to remove. Therefore, as the pad moved further
away from the center of the wafer, it was not required to stay at each position for as long as the
calculated time. The experiments show that beyond point D the pad generally moved to a
particular re before the predicted time, i.e. the data points lie above the model curve.
Finally, the spatial variation of kp also affected the total pad displacement. Point E shows
that the model requires a final pad position, re,, , of 84 mm in order to polish a 100-mm region,
while experimentally, r,,, ranged from 72 to 80 mm. Since MRR was higher toward the edge of
the wafer, the pad was not required to travel as far as the model predicts to remove Cu up to the
edge region. However, the overprediction of total polishing time should not have a negative
effect on polishing uniformity. If the pad continued to rci,, = 84 mm as prescribed by the model,
the only result would have be a larger area polished. Due to the continuous translation of the pad,
overpolishing would still be avoided.
4.4 Patterned Wafer Polishing
Patterned Cu/SiO2 test wafers, type SKW6-2, from SKW Associates were polished on the
face-up CMP tool to determine Cu dishing and dielectric erosion after face-up CMP. The
200-mm wafer consisted of 20-mm by 20-mm dies with pitch and density structures as shown in
Figure 4.5. Pitch structures with linewidths ranging from 2.5 Rlm to 100 gm were used for
dishing measurements and linewidths of 5 gLm to 100 gm were used for erosion measurements.
A highly schematic cross-section of the wafer is shown in Figure 4.6. The SiO2 thickness was
found to be 850 nm by chemical etching experiments described in Appendix A. This value was
used in lieu of the listed value (800 nm) as the interconnect depth, hi.
20mm
1 4 20 mm ------ Linewidth (pm)/ Pitch (pm)
Area Fraction
Figure 4.5: Die map of the wafers used for dishing and erosion experiments.
Electroplated Cu 15 T m
100 %nm25-30 nm
0.8 Am
Figure 4.6: Cross-section of an SKW6-2 test wafer. (SKW Associates)
Prior to polishing, the initial surface topography - surface trench width, w,, and initial
step-height, hi, - was obtained with a Tencor P10 surface profilometer with a 2-jim stylus tip.
The Cu deposition factor, a, was determined from w, for each subdie by Eq. (1.1). The a and
h,i values are listed in Table 4.3. Figures 4.7(a) and (b) plot a and hsi with linewidth, w,
respectively. As expected, both a and hsi increased with w. For w > 10 jim, a is close to 1 and
h,i is over 1 im. Therefore, the underlying structure is closely reproduced on the Cu surface.
For w < 5 Lm, both a and hi are low and the initial wafer surface is fairly planar. Figure 4.7(c)
shows the combined effects of initial surface geometry, a -h,, with w.
The CMP process parameters for patterned wafer polishing are listed in Table 4.4. A
200-mm wafer was partially polished to a circular area of 100 mm in diameter for Cu dishing
and dielectric erosion measurements. A dimensionless pad translation model was used to
determine pad motion. This is the appropriate choice for face-up CMP experiments because the
model is only dependent on Ato, which is a function of the polishing conditions. Therefore, any
change in Cu thickness from wafer to wafer as well as any process variation will be accounted
for by Ato. Because the dies in the test wafers are large and feature-scale non-uniformity is not
controlled, fully polishing the entire die would result in excessive dishing in the wide lines.
Therefore, Ato was chosen to completely polish the eight subdies in consideration, which are
highlighted in Figure 4.5.
A dimensionless velocity parameter, v* is defined as
cc=--.vcc (4.1)
t rw
To simplify the pad translation control, the theoretical motion path was approximated with a
linear path where vc, = 0.6. Additionally, the length of time of the second translation step was
shortened to reduce overpolishing at the center of the wafer. This adjustment is attributed to the
non-constant kp and pad edge effects observed from the blanket wafer polishing experiments.
The revised path is shown in Figure 4.8. During the experiment, a Labview program recorded
the time of center polishing, At0, and used the model to compute the pad translation velocity for
the remainder of the process. The velocity and displacement were sent to the stepper motor
controller which translated the pad assembly.
Table 4.3: Measured a and hsi on the SKW6-2 test wafers for various linewidths.
w (gm) a hs; (glm) ah ,, (gtm) ah,, / h,
2.5 0.17 0.084 0.014 0.017
3.5 0.25 0.192 0.048 0.056
4.5 0.33 0.150 0.050 0.058
5 0.95 0.732 0.695 0.818
10 1 1.135 1.135 1.335
20 1 1.132 1.132 1.332
50 1 1.133 1.133 1.333
100 1 1.173 1.173 1.380
0.8
0.6
0.4
0.2
100 101 102
Interconnect Linewidth, w, [pm]
103
(a)
Figure 4.7: Measured initial wafer surface geometry: (a) Cu deposition factor, a, (b) initial
step-height, hsi, and (c) combined feature-scale geometry, a hk,.
-0 ... .. ..... - 0 . .
""' ' ' IILLU I I LLI I )II
E
E
E
0.8
0.6
0.4
0.2
0
1.2
1.0
0.8
0.6
0.4
0.2
10°  101 102
Interconnect Linewidth, w, [plm]
(b)
101 102
Interconnect Linewidth, w, [pm]
(c)
Figure 4.7 (cont.): Measured initial wafer surface geometry: (a) Cu deposition factor, a ,
(b) initial step-height, hsi, and (c) combined feature-scale geometry, a -hi.
· ,/
/
/ ..-
Table 4.4: Experimental conditions for patterned wafer polishing.
Parameter
hcu (jtm)
r, (mm)
rw,effecive (mm)
Pad type
rp (mm)
reo, (mm)
Slurry
Slurry additive
pH
co, (rad/s) (rpm)
co, (rad/s) (rpm)
p (kPa) (psi)
Slurry flow rate (ml/min)
Ato (min)
P)0
Value
1.5
100
50
TWI-817
35
30
Cabot iCue 5001
H20 2 - 3% vol
8
16 (155)
16(155)
13 (1.9)
150
7
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
t = t / At
Figure 4.8: Theoretical and actual dimensionless pad translation path.
4
II
to
C~.'
1.5
1.0
0.5
00
After polishing, a Tencor P10 surface profilometer was used to measure the step-height
of the features at various radial locations across the wafer within the 100-mm central region.
Since the Cu overburden has been removed outside the line features, by definition the remaining
step-height is Cu dishing. Dielectric erosion was determined by comparing the difference in
height between the SiO2 at the center of the subdie and the edge of the subdie. Because Ato was
chosen so that the pad translated when the subdies were just polished, and slurry selectivity was
high, there was negligible overpolishing at the edge of the subdie, or the reference field region.
Therefore, wafer-scale erosion was neglected and the measured feature-scale erosion was
assumed to be the total erosion. The patterned wafer polishing results are shown in Table 4.5.
The experimental data show that the material removal rate across the wafer was controlled
reasonably well by face-up polishing.
4.4.1 Dielectric Erosion
Figure 4.9 plots the normalized dielectric erosion data for each subdie as a function of its
radial location on the wafer. Erosion was at most 5% of the total interconnect depth for all
subdies, which is within industry standards. The low magnitude of erosion is to be expected,
since the face-up CMP pad translation scheme is based on minimizing erosion. The algorithm is
designed so that the pad only remains at each position long enough to just expose the SiO 2 layer.
Therefore, there was no SiO 2 overpolishing. Additionally, the high selectivity slurry had low
material removal rates on Ta and SiO2, which further reduced the amount of SiO2 removed. The
low erosion values, however, also resulted in a large degree of scatter in the measured data. The
statistical summary listed in Table 4.6 shows that the average erosion for each feature ranged
from 1 nm to 29 nm and the standard deviation from 1 nm to 12 nm.
The multi-scale, tribological erosion model from Chapter 1 states:
e= [) awh + (- h, + SCo, - h -( w(/2A)D (1.5)
However, the purpose of face-up CMP is to maintain uniform material removal at the wafer-
scale, so f is approximately unity - the material removal rate at the "slowest" field region of the
wafer is equal to that at the "fastest". Furthermore, deliberate oxide overpolishing is not
necessary in face-up CMP because of wafer-scale polishing uniformity and the ease of endpoint
Table 4.5: Measured Cu dishing and dielectric erosion after face-up CMP.
Feature r (mm) r / rw D (nm) D / h e (nm) e / h
9.5 0.19 73.22 0.09 22.11 0.03
14.0 0.28 66.18 0.08 31.09 0.04
16.5 0.33 71.69 0.08 19.58 0.02
w=5 gm 20.0 0.40 60.70 0.07 16.64 0.02
26.5 0.53 86.05 0.10 28.82 0.03
28.5 0.57 74.32 0.09 42.50 0.05A, = 0.5 31.0 0.62 98.24 0.12 46.03 0.05
36.0 0.72 69.81 0.08 44.62 0.05
37.5 0.75 68.87 0.08 11.79 0.01
46.0 0.92 55.73 0.07 23.38 0.03
6.5 0.13 194.09 0.23 19.23 0.02
15.5 0.31 153.44 0.18 24.46 0.03
17.5 0.35 166.79 0.20 19.30 0.02
w=10 gim 22.5 0.45 148.01 0.17 16.83 0.02
A= 20 jm 25.5 0.51 167.86 0.20 44.04 0.05
28.0 0.56 159.45 0.19 17.14 0.02
A, = 0.5 30.5 0.61 148.89 0.18 28.01 0.03
34.5 0.69 150.57 0.18 14.23 0.02
39.0 0.78 138.37 0.16 22.02 0.03
45.0 0.90 130.92 0.15 26.06 0.03
5.0 0.10 269.27 0.32 25.56 0.03
15.0 0.30 205.12 0.24 4.37 0.01
21.0 0.42 232.39 0.27 13.88 0.02
w= 20 jim 25.0 0.50 221.73 0.26 5.40 0.01
A =40 jm 26.0 0.52 220.45 0.26 6.28 0.01
32.0 0.64 227.78 0.27 12.71 0.01
A, = 0.5 33.0 0.66 226.08 0.27 9.15 0.01
35.0 0.70 214.16 0.25 15.04 0.02
41.0 0.82 196.19 0.23 16.40 0.02
45.5 0.91 213.22 0.25 19.15 0.02
Table 4.5 (cont.): Measured Cu dishing and dielectric erosion after face-up CMP.
Feature r (mm) r / rw D (nm) D / h e (nm) e / h
7.0 0.14 501.19 0.59 10.32 0.01
17.0 0.34 423.10 0.50 1.73 0.01
22.0 0.44 428.39 0.50 0.01 0.00
w = 50 ptm 25.0 0.50 487.95 0.57 12.01 0.01
A =100 pm 29.0 0.58 402.11 0.47 0.00 0.00
30.0 0.60 448.14 0.53 9.62 0.01A, = 0.5 35.0 0.70 434.14 0.51 4.87 0.01
35.5 0.71 415.24 0.49 12.69 0.01
38.5 0.77 404.13 0.48 9.93 0.01
45.5 0.91 405.07 0.48 0.03 0.00
10.0 0.20 592.95 0.70 2.07 0.00
13.5 0.27 588.90 0.69 0.02 0.00
17.5 0.35 563.53 0.66 0.01 0.00
w=100 pm 19.0 0.38 574.95 0.68 1.31 0.00
= 200 pm 26.5 0.53 637.48 0.75 1.53 0.00
29.0 0.58 606.41 0.71 1.92 0.00
A, = 0.5 32.5 0.65 551.89 0.65 0.16 0.00
36.0 0.72 561.25 0.66 0.01 0.00
38.0 0.76 553.06 0.65 1.18 0.00
45.0 0.90 533.80 0.63 0.01 0.00
0 0.2 0.4 0.6
* w= 5 tm
* w= 10 pm
A w= 20 tPm
w= 50 Pm
* w= 100 [tm
------ Mean
1.0
Normalized Radial Location, r riW
Figure 4.9: Normalized dielectric erosion across the wafer after face-up CMP.
Table 4.6: Statistical summary of dielectric erosion at various features after face-up CMP.
w (pm) Mean (nm) Std. Dev (nm) Mean/ hi
5 28.66 12.19 0.03
10 23.13 8.55 0.03
20 12.79 6.70 0.02
50 6.12 5.32 0.01
100 0.82 0.86 0.001
A IAU. IV
08
0.06
04
0.02
ITRS Specification A.
-W u
w=5 m a
w= 40 Pm 0
*A
w=5100 Pq A AA-- -- *--,
.w = 
-I -. -I -m
w A0~
I 1 ~7~
w
L
E
"
detection. The overpolishing of SiO 2 is therefore minimal and ho can be neglected. Hence,
dielectric erosion is only dependent on the feature-scale non-uniformity factors, a and h,i, and
Cu dishing. Equation 1.5 can be reduced to
e = -(a-h,- D) (4.2)
S(1- w/ A) Sc,,ox + w/ A
Erosion is low when slurry selectivity is high. For the current experimental conditions,
w/2=0.5 and Scuox=50, e=0.02(a-hs, - D) . Therefore, even if a=l and dishing is
neglected, erosion is still only 2% of the hi. Moreover, if D ; a h,,, erosion is close to zero.
The erosion results are compared with the multi-scale, tribological erosion model in
Figure 4.10. The model and the data are in agreement in the magnitude of erosion being less
than 3% the interconnect depth. The model, however, suggests that erosion should increase with
linewidth while the experimental data shows erosion decreasing with linewidth. However, the
experiments were conducted with a compliant pad which conformed to the wafer surface and
dishing was large for the large linewidths. The experimental erosion results were therefore
dominated by the dishing term and decreased with linewidth.
The low magnitude of erosion in features with large lines is to be expected when
polishing with a compliant pad because the pad asperities contact both the high and low features
simultaneously. Thus, the applied pressure is evenly distributed between the Cu and SiO2, and
erosion is reduced. For the smaller lines, the pad contacts only the high feature outside the
interconnect, which increases the applied pressure on SiO 2, thereby increaseing erosion.
Nevertheless, both the data and the model show that erosion is not a major defect issue when
overpolishing is minimized by face-up CMP.
4.4.2 Cu Dishing
Figure 4.11 shows the measured Cu dishing across the wafer normalized with
interconnect depth. For each subdie, dishing was fairly constant across the wafer, demonstrating
that wafer-scale polishing can be controlled by the face-up CMP architecture. Table 4.7
summarizes the statistical parameters from the dishing measurements. The standard deviation
for all features is less than 35 nm, or 4% of the interconnect depth. While the magnitude of
dishing is higher than the industry specification of 5% interconnect depth, the low variation
- , , • I , I,
* Experimental
-- Theoretical
Model Parameters:
w/ A= 0.5, Y, = 20 MPa,
Ra = 5 gm, A. = 250 pm,
Sc,/ox = 50, p =13 kPa
ITRS Specification
.... .
101
Linewidth, w, [gLm]
Figure 4.10: Normalized dielectric erosion compared with the multi-scale
Eq. (4.2), where a and hsi were approximated using the data
and D from Eq. (4.5).
erosion model from
shown in Figure 4.7
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
10-1
. 7 -1 I I III I I ITI1I~ I 1 llllln
V__V
---
-------- 0
_ e )
V VV
-- 
---
M
M
s
_ e _. _ __
-C U----E U - U-C--U-
1I 11 *i*:+~~ .+4I,
x w= 2.5 Lm
+ w= 3.5 Lm
+ w= 4.5 gm
* w= 5 1 m
* w= 10 gm
A w= 20 pm
w= 50 gm
V w= 100 lm
------ Mean
0 0.2 0.4 0.6 0.8 1.0
Normalized Radial Location, r rW
Figure 4.11: Normalized Cu dishing across the wafer after face-up CMP.
Table 4.7: Statistical summary of Cu dishing at various features after face-up CMP.
w (pm) Mean (nm) Std. Dev (nm) Mean / hi
2.5 27.31 3.04 0.03
3.5 40.72 2.72 0.05
4.5 22.96 2.07 0.03
5 72.48 12.16 0.09
10 155.84 17.67 0.18
20 222.64 19.67 0.26
50 434.95 34.74 0.51
100 576.42 30.59 0.68
1.0
0.8
0.6
0.4
0.2
0
E
p
E
---
suggests that once the optimal process parameters and consumables are determined to minimize
dishing, face-up CMP can control dishing to approximately the same value across the wafer.
The multi-scale tribological dishing model introduced in Chapter 1 states:
S-1
D (1-w/A)SC,,,,+w/x A Y, 6zk I ex
where
* (1-w/2')Scx + 2w/1A Y 6xrR aw 1
to = 2 1 hc +--h,, + Scuox •-, (1.4)
Assuming again that / =1 and ho = 0,
F=[ (1-w/2A)ScI/ox +w/2-A Y (6Ras  aw (4.3)
For the current experimental conditions: w/ A = 0.5, a = 1, h, =1 I m, Y = 20 MPa, R, = 5 pm,
2 = 250 jim, and Sc,Iox = 50, to = 0.6. Then, 1- exp(-to) = 0.45 or 0.75t . Equation (1.3) can
be simplified to
F S -1 _ 22 (S -law
D= , cu•ox +a - 0.75t = 0.75 Scu/ox -h, (4.4)
(1-w/A)SCu/ox+w/A Y 6R, Cu ox A
which for high selectivities can be approximated as
awD = 0.75 aw-h. (4.5)
Furthermore, when w/ A is constant, D is only a function of a- h,, which in turn varies with
linewidth. More importantly, this approximation is independent of pad properties (Y,, Ja, R),
selectivity, Sculox, and pressure, p. Therefore even when wafer-scale polishing uniformity is
maintained by face-up CMP, Cu dishing is not zero due to feature-scale non-uniformity.
Figure 4.12 shows that the experimental data agree fairly well with the dishing model. It
should be noted that the 3.5 and 4.5-jim features do not have an area fraction of 0.5 as assumed
by the model. However, since a and h,i are low for small features, dishing is low and the area
fraction effects are less significant. It is also important to note that the model presented here
assumes plastic pad asperities. Because the Thomas West TWI-817 pad has higher compliance
than the more common Rohm and Haas IC 1000 pad, it may be more appropriate to consider an
100
I .V
0.9
0.8
0.7
6 0.6
r-Cn
0 .
N 0.4
z 0.3
0.2
0.1
0
-- -tal-I
* Experimental
Theoretical
Model Parameters:
w/ A = 0.5, Y, = 20 MPa,
Ra = 5 lm, 2 = 250 Im,
Sc, /Ox = 50, p = 13 kPa
I
±
-i
ITRS Specification
.- .
101  100 101 102 103
Linewidth, w, [gm]
Figure 4.12: Experimental dishing compared with the multi-scale dishing model presented in
Eq. (4.5), where a and hsi were approximated using the data shown in Figure 4.7.
101
·I
I f• r
I • _ • • L ---------- L L L -.Z-_ ----
elastic rough pad model [Noh, 2005]. In the elastic model, dishing is dependent on the Young's
modulus of the pad instead of hardness. It is expected that this model will predict higher dishing
values. However, the elastic model is not considered in this work due to the need for iterative
computation.
Reducing Cu Dishing
The increase in dishing with linewidth shown in Figures 4.11 and 4.12 is due to feature-
scale effects, which is described in the model through a and hsi. From Figure 4.7, a and hsi are
large when the features are wide. Large a and hsi values increase the material removal rate so
that those features reach their endpoint earlier than the smaller features. Therefore, there will be
more overpolishing at the wide features.
To reduce Cu dishing, feature-scale uniformity must be controlled, meaning that the
initial wafer surface should be fairly planar. Appendix B presents a novel technique for
controlling initial feature topography by spin coating a layer of polymer onto the wafer prior to
polishing. The polymer material partially fills the trenches on the Cu coating so that the initial
wafer surface is close to planar, as shown in Figure 4.13. To examine the effects of spin coating,
200-mm patterned wafers were coated with SU-8 photoresist that measured 2 and 4 jtm thick,
and polished on the face-up CMP tool. The polishing conditions are the same as those listed in
Table 4.4 except that polishing time was increased due to the extra coating.
Results from the experiments show that dishing was reduced for all features when the
photoresist coating was present. Table 4.8 compares dishing for the uncoated and spin-coated
wafers. The magnitude of dishing decreased as the SU-8 coating thickness increased. Dishing
on the wafer coated with 4 gim SU-8 was reduced by 58% for the 100-jm features and 71% for
the 50-jtm features. These results show that it is possible to reduce dishing by controlling the
initial wafer topography with spin coating. Furthermore, there was also less variation in dishing
among different feature sizes, as shown in Figure 4.14. This would suggest that the material
removal rates at each feature were similar, which is important for minimizing overpolishing
because all the subdies will finish polishing around the same time.
Another concern in planarizing wide lines is the increased contact from pad asperities at
the low feature which can excessively remove Cu from inside the interconnect lines.
102
Figure 4.13: Schematic of a wide trench with a polymer coating.
Table 4.8: Comparison of Cu dishing in uncoated wafer and spin-coated wafers.
Without Coating With 2-pm Coating With 4-pm Coating
w (pm) D(nm) D/ h D (nm) D/ h % Change D (nm) D/h, % Change
5 72.48 0.09 69.68 0.08 4 69.14 0.08 5
10 155.84 0.18 122.26 0.14 22 91.67 0.10 41
20 222.64 0.26 151.53 0.18 32 94.96 0.11 57
50 434.95 0.51 192.55 0.23 56 124.14 0.14 71
100 576.42 0.68 323.36 0.38 44 241.02 0.28 58
Polishing Conditions: r,, = 35 mm, rc° = 30 mm, co = w, =16 rad/s (155 rpm),p = 13 kPa
Without Coating
With 2 -tpm SU-8 Coating
With 4-gm SU-8 Coating
K
J I I
20 40 60
Linewidth, w [prm]
80 100
Figure 4.14: Normalized Cu
coated wafers.
dishing versus linewidth after the polishing of uncoated and spin-
103
0.6
0.2
.mI, 1
*r\
I.U --
E
~
Figure 4.15 is a micrograph of a new TWI-817 pad obtained from an Olympus LEXT OLS3000
confocal microscope. The asperity heights range from 20 pm to 80 jtm and the radius of
curvature range from 2 ptm to 20 jtm. According to these values, the asperities can easily fit
within the Cu lines of larger linewidths. Figure 4.16 is a scaled schematic of pad asperities
contacting a wide feature, which shows that the deformation of the pad asperity outside the line
is similar to that inside the line. Additionally, the high global compliance of the TWI-817 further
enables the pad to conform to the feature topography. Therefore, the pressures applied to the
high and low features are comparable, and the Cu from these regions is removed concurrently
from the start of polishing. The step-height does not change significantly with polishing and the
final step-height is close to the initial value. When the linewidths are small, however, the larger
pad asperities glide over the high features without contacting the Cu lines. Thus, material is only
removed at the high feature and step-height is reduced.
To facilitate step-height reduction, the pad should be smooth and stiff. A 200-mm wafer
was polished with a Rohm and Haas IC 1000 pad on the face-up tool using the conditions listed
in Table 4.4. By using a smoother, stiffer pad such as the IC 1000, dishing in the 100-jtm lines
was reduced to 80 nm, or 14% of what was obtained when polishing with the TWI-817 pad.
However, it was also more difficult to achieve global planarity and further work is necessary to
utilize the IC1000 in the face-up CMP tool.
Finally, the high magnitude of dishing is also a result of the high selectivity slurry used.
The iCue 5001 slurry, with a Cu to Ta selectivity of 50 to 60, is used as a first-step Cu polishing
slurry in industrial processes due to its high Cu material removal rate. Therefore, while the
barrier layer was being removed during the last stage of polishing, proportionally more Cu was
also being removed from the interconnects. The increase in dishing is significant even when the
overpolishing time is low. It is expected that the degree of dishing can be reduced using a barrier
layer polishing slurry with low Cu selectivity.
4.5 Summary
This chapter described the face-up CMP experiments regarding pad translation. The
face-up CMP architecture requires the pad to translate away from the center of the wafer during
the polishing process to achieve uniform material removal at the wafer-scale. The numerical
104
Figure 4.15: Confocal micrograph of a new TWI-817 pad surface.
P
Cu-,
Ta-
Sio( 
1
Figure 4.16: Scaled schematic of sinusoidal pad asperities contacting a wide line feature.
105
102
model for pad translation was validated in this chapter by blanket and patterned wafer polishing
experiments.
In the blanket wafer polishing experiments, the pad was translated at one-minute intervals
to the edge of the polished region. The pad position was measured and compared with the
computed values for those times. The results showed that the model was able to predict the
general motion of the pad. Due to the variation of Preston constant across the pad-wafer
interface, however, the model over-predicts the time durations for the translation steps towards
edge of the wafer. For more accurate results, it is possible to incorporate the change in Preston
constant into the model. However, not only does this term vary with radial location and pad
location, but it is also affected by a multitude of parameters such as slurry properties, pad
properties, slurry flow rate, and rotational velocity. An enormous effort to empirically measure
Preston constants at various locations on the wafer under different polishing conditions is
required. Therefore, it is more useful to use the constant Preston constant model as a guideline
for process development and then either make fine adjustments through experimentation or
utilize an in-situ endpoint sensor.
A linearized dimensionless translation model was used to control pad motion during the
polishing of a patterned wafer. Dielectric erosion and Cu dishing at subdies with line features
ranging from 2.5 pm to 100 gm were measured using a surface profilometer. The results
showed fairly uniform erosion and dishing across the wafer for each feature, demonstrating that
wafer-scale polishing was controlled. Due to the minimized overpolishing time and high slurry
selectivity, dielectric erosion was below 5% of the interconnect depth for all the measured
features. When compared with the multi-scale tribological erosion model, both the model and
the experimental data showed that erosion will be low when overpolishing is minimized by face-
up CMP.
Cu dishing ranged from 9% to 68% the interconnect depth. The higher than expected
values were attributed to feature-scale non-uniformity, and the use of a compliant pad and high
selectivity slurry. However, the standard deviation of dishing across the wafer for each feature
was below 35 nm, or 4% of the interconnect depth. Therefore, it is expected that once the
consumables and process parameters are optimized, low dishing values can be obtained
uniformly across the wafer.
106
Comparison of the dishing results with the multi-scale tribological dishing model also
showed good agreement. Both the model and data showed that while wafer-scale polishing was
controlled, pattern dependence was still a dominant factor in material removal rate. A spin
coating method for improving initial wafer surface topography was used to control feature-scale
effects. Polishing of a coated wafer resulted in as much as 71% reduction in dishing. Large
linewidths were also more susceptible to dishing due to increased material removal inside the
lines by pad asperity contact. Experiments showed that polishing with a smoother, stiffer Rohm
and Haas IC1000 pad reduced dishing by 86% when compared to the TWI-817 pad. Lastly, it is
suggested that a barrier layer slurry with low Cu selectivity be used to further reduce dishing.
107
Nomenclature
Af = Area fraction of Cu to SiO2
D = Cu dishing (m)
e = dielectric erosion (m)
hcu = initial Cu coating thickness (m)
hi = interconnect depth (m)
h, = thickness of oxide removed during overpolishing (m)
hsi = initial feature step-height (m)
k, = Preston constant (m2/N)
MRR = material removal rate (m/s)
p = pressure (N/m2)
Ra = radius of curvature of a pad asperity (m)
rc, re = wafer center to pad center distance (m), normalized value
r, rp = wafer radius, pad radius (m)
t, t' = polishing time (s), normalized value
Ato = time for center of the wafer to be completely polished (s)
Vc, v*c = translational velocity of the pad; rate of change of rcc (m/s), normalized value
vR = relative velocity of the wafer with respect to the pad (m/s)
w, w, = interconnect linewidth, surface trench width (m)
a = feature-scale non-uniformity factor, Cu deposition factor
/8 = wafer-scale non-uniformity factor
A = pitch of Cu interconnect lines (m)
Ai = pad asperity spacing (m)
ow,, op, = angular velocities of the wafer and the pad (rad/s)
108
CHAPTER 5
CONCLUSION
5.1 Summary
In this thesis, a face-up CMP tool architecture was introduced to control material removal
rate non-uniformity at the wafer-scale, and to minimize such defects as Cu dishing and dielectric
erosion. Models of material removal have been developed for both non-translating and
translating pads. A numerical approach for determining pad translation for uniform wafer-scale
polishing was also introduced. Polishing experiments on blanket and patterned Cu wafers were
then conducted to validate these models.
Chapter 2 introduced the face-up CMP architecture by relating geometrical and kinematic
parameters to the Preston Law for material removal rate. Equations for material removal rates
for both non-translating and translating pads were derived based on the period of contact of a
point on the wafer with the pad. A numerical method for determining that pad translation
velocity was developed based on a system of equations that equate the total material removed to
the initial Cu thickness. To practically implement the model, a discretization scheme that
eliminates under-polishing and a method of determining the discretization error between the
computational nodes were developed.
Face-up polishing experiments with a non-translating pad were presented in Chapter 3.
Blanket Cu wafers were polished with pads containing different perforation patterns to
investigate the kinematic effect of a rotating slurry cup on material removal rate and the Preston
constant. It was found that slurry flow from the central region of the pad must be blocked in
order to control the variation in Preston constant and obtain uni-directional polishing, a necessary
condition for face-up CMP. Experiments were also performed with various rotational velocity
ratios to examine kinematical methods of controlling the polishing gradient across the wafer.
Increasing the wafer-to-pad velocity ratio resulted in a more gradual polishing gradient.
However, decreasing the ratio did not show much difference when compared with the baseline
case of equal velocities. Thus, absolute velocities also affect the polishing gradient and must be
109
controlled. The variation of Preston constants with pad location was also obtained by polishing
blanket wafers. While the pad location can affect the Preston constant, the dominating trend is
that Preston constant increases with radial position. The Preston constant was found to be more
uniform at the periphery of the wafer.
Chapter 4 introduced face-up CMP experiments with a translating pad. Experiments
were performed on blanket Cu wafers to validate the numerical pad translation model. The
results showed that the model predicts the general pad motion fairly well. To implement face-up
CMP for industrial use, the model can be used in conjunction with empirical data to compensate
for pad edge effects and non-uniform Preston constant. Patterned wafers were then polished to
determine Cu dishing and dielectric erosion across the wafer. The post-CMP topography was
characterized for five different subdies at ten die locations on the wafer. The results showed that
face-up polishing achieved fairly constant material removal rate at the wafer scale. Dielectric
erosion was maintained below 5% of the interconnect depth for all features. Due to feature-scale
non-uniformities, however, Cu dishing was significant and increased with feature linewidth.
Nevertheless, the variation of dishing across the wafer for each feature was less than 4% the
interconnect depth which also indicates good wafer-scale uniformity. Several suggestions for
reducing dishing were presented.
5.2 Suggestions for Future Work
Based on the models and experiments presented in this thesis, further work is
recommended to better utilize the face-up CMP architecture for minimizing Cu dishing and
dielectric erosion.
Feature-scale Non-uniformity: Although this thesis demonstrates the ability of face-up
CMP to control wafer-scale polishing uniformity, to fully eliminate dishing and erosion, the
initial surface topography too must be controlled. Dishing is a pressing problem for large
linewidths, w > 10 Lmm, such as the interconnects used at the global wiring level. At these
features, the underlying trench geometry is more or less reproduced on the Cu surface. Thus,
alternative plating methods should be explored to fill the trenches so that polishing may be
performed on a sufficiently planar surface.
Appendix B examines a method for controlling feature-scale uniformity by spin coating
the wafer with a layer of polymer prior to CMP. Polishing a coated wafer showed promising
110
results for reducing dishing. Further work on optimizing this technique should be conducted to
address issues such as: material compatibility, delamination, and ease of removal. The spin
coating and curing parameters should also be examined for improved control over the process.
Endpoint Detection: Perhaps the greatest advantage of the face-up CMP architecture is
the improved accessibility for in-situ endpoint detection. Since some portion of the wafer is
exposed at all times during face-up polishing, merely two "snapshots" of the surface is adequate
to obtain the planarization status of the entire wafer. While the pad translation model provides a
functional basis for face-up CMP, better control may be attained through in-situ endpoint
detection. Furthermore, endpoint sensing can be adapted for the variation in Preston constant
without the need for the empirical data required by the model. Therefore, methods for obtaining
the process endpoint during face-up polishing should be explored so that the pad can be precisely
translated away from a feature soon after its endpoint is reached.
111
APPENDIX A
CHEMICAL DISSOLUTION OF Cu AND Ta
For ellipsometric erosion measurements after patterned wafer polishing, the initial SiO 2
layer thickness must be precisely known. A method for chemically etching the Cu coating and
Ta barrier layer is therefore developed so that the thickness of an unpolished layer of SiO2 can be
measured by ellipsometry.
Copper is considered a "noble metal" because of its resistance to corrosion and oxidation.
Atomically, this property is due to a filled d-band electronic structure - Cu has ten 3-d electrons
and one 4-s electron. Nitric acid (HNO 3) was therefore chosen to dissolve the Cu due to its
oxidizing properties. The reaction of Cu with concentrated HNO3 produces Cu 2+ ions, nitrogen
oxide, and water, and can be described by [Pauling, 1970]
Cu + 4HNO 3 -> Cu(NO3)2 + 2NO2 + 2H 20 (A.1)
For the experiments, 10 to 20 mm square samples were cleaved from a blanket Cu wafer
and a SKW6-2 patterned wafer. One milliliter HNO 3 was applied to the middle of the sample.
After a minute of reaction time, the acid was cleaned off the sample with deionized water and the
process was repeated to ensure that all the Cu was removed.
While effective in dissolving the Cu, nitric acid was unable to dissolve the Ta barrier
layer. Tantalum, due to its resistance to acids, usually requires etchants that include hydrofluoric
acid (HF). However, HF is also a known etchant for SiO 2, and therefore must be avoided to
ensure the entire initial layer of SiO 2 remains. For this reason, Grossman and Herman proposed
a basic solution for dissolving Ta[Grossman and Herman, 1969].
After Cu etching, the samples were immersed in a heated solution of 30% sodium
hydroxide (NaOH) and 15% hydrogen peroxide (H20 2) at 60*C for 30 seconds. The solution
had a pH of 12. A blanket SiO2 sample that was previously measured on the ellipsometer was
also subjected to the same solution as a reference. The samples were then removed from the
solution and rinsed with deionized water. A photograph of the samples after etching is shown in
Figure A. 1.
The SiO 2 film thickness on the etched samples was measured with a Gaertner
ellipsometer using the quoted thickness as the initial guess. The quoted thickness is 1000 nm for
112
the blanket samples and 800 nm for the patterned sample. Measurements on the reference SiO 2
blanket sample are listed in Table A. 1. There was no discernable difference in the average film
thickness before and after etching, and a slight 1.7 A increase in standard deviation. Thus the
reference sample showed that the basic solution of NaOH and H20 2 does not remove SiO 2.
Tables A.2 and A.3 list the film thickness on the blanket Cu and patterned sample. Due
to the lack of a field subdie in the patterned sample, measurements were taken at the oxide region
in-between two subdies. An average thickness of 988.0 nm was obtained from the blanket
sample with a standard deviation of 1.68 nm, yielding a 1.2% variation from the quoted thickness.
For the patterned sample, the average measured thickness was 827.1 nm with a standard
deviation of 20.2 nm for the patterned sample, which is only 3.4% off from the quoted value.
The average SiO 2 thickness for the patterned sample was used as the initial SiO 2 thickness for
quantifying erosion.
113
Figure A. 1: Wafer samples after etching with NaOH+H 20 2.
Table A. 1: Measured SiO2 thickness on the reference sample before and after etching.
Measurement Thickness (nm)
Number Before Etching After Etching
1 1023 1022
2 1023 1024
3 1024 1024
4 1024 1023
5 1023 1024
6 1024 1023
Mean (nm) 1023 1023
Std. Dev. (nm) 0.47 0.64
114
Table A.2: Measured SiO 2 thickness on the blanket sample.
Measurement Number Thickness (nm)
1 991.0
2 986.3
3 987.2
4 986.8
5 988.4
6 988.2
Mean (nm) 988.0
Std. Dev. (nm) 1.68
Table A.3: Measured SiO 2 thickness on the patterned sample.
Measurement Number Thickness (nm)
1 853.0
2 827.4
3 832.1
4 851.0
5 818.3
6 823.6
7 836.8
8 832.1
9 827.7
10 780.9
11 801.2
12 840.7
Mean (nm) 827.1
Std. Dev. (nm) 20.2
115
APPENDIX B
CONTROL OF FEATURE-SCALE NON-UNIFORMITY BY SPIN COATING
B.1 Introduction
Feature-scale non-uniformity affects the polishing rates of individual subdies. Therefore,
even when wafer-scale polishing is uniform, subdies with different geometry will have different
polishing rates. According to the multi-scale tribological CMP model, the ideal condition is
when a = 0 and h, = 0, i.e. the initial wafer surface is planar and independent of the underlying
feature geometry, as shown in Figure B.1(a). Current Cu deposition methods, however,
reproduce the feature geometry on the Cu surface for linewidths greater than 10 Lm . A
schematic of an actual Cu feature is shown in Figure B. (b). Previous attempts to fill the metal
features using methods such as electroplating have been unsuccessful, especially when the
features are wide: w 2 10 gpm. Hence, a novel method is proposed for creating a planar initial
polishing surface by spin coating a layer of polymer on the wafer prior to polishing, as shown in
Figure B. (c). An epoxy-based photoresist, SU-8, was chosen as the polymer for spin coating
due to its compatible mechanical properties and availability. Researchers have reported a
Young's modulus of 4.02 - 4.95 GPa based on screw tensile and beam deflection testing
[Dellmann et al., 1997; Lorenz et al., 1997].
B.2 Film Thickness
Spin coating is a commonly employed technique for photoresist deposition. In this
process, the photoresist solution is poured at the center of the wafer and the wafer is rotated at
high speed to uniformly distribute the liquid on the surface. Depending on the fluid properties of
the resist, and the spin speed and time, a uniform thin film can be obtained.
B.2.1 Hydrodynamic Model
Emslie et al. modeled the film thickness of a viscous Newtonian fluid on a rotating disk
116
(a)
IAI = £VAI
hsiI1
(b)
(c)(c)
Schematics of (a) an ideal initial feature topography where a = 0 and h,, = 0,
(b) an actual feature topography where the feature is replicated on the Cu surface,
and (c) a feature coated with polymer to obtain a = 0 and h,, = 0.
117
Figure B.1:
7
by assuming: (a) the disk is planar and radially infinite, (b) the fluid flow is axisymmetric, and
(c) the fluid layer is thin, so that the viscous effects are much greater than the inertial effects
[Emslie et al., 1958]. If the initial film thickness is uniform, the thickness of the fluid layer with
time is given by:
h= ho (B.1)
1+ 4 w2h2 t
3 v
where h is the film thickness, ho the initial film thickness, o the rotational speed of the disk, v
the kinematic viscosity of the fluid, and t the time. The time constant, r, represents the time
required for the film thickness to reduce by 1/ -[, and is defined as
3vv = (B.2)
4w2h2
If t /r >> 1, Eq. (B. 1) can be reduced to
h- = (B.3)
From Eq. (B.2), r is dependent on the initial film thickness. To measure the initial film
thickness of the film, SU-8 photoresist was spin coated on a Cu wafer using the spreading speed
of o = 105 rad/s (1000 rpm) for t = 10 s. The film was then cured according to the baking steps
listed in Table B. 1. After curing, a scratch was formed on the resist surface using a carbon steel
blade and measured with Tencor P10 surface profilometer. A profile of the scratch is shown in
Figure B.2. The Cu coating thickness was subtracted from the measured step-height to arrive at
an initial film thickness of 5 gim.
B.2.2 Experiments with Different Viscosities
To validate the model, SU-8 photoresist of different viscosities were spin coated on Cu
wafers at co = 314 rad/s (3000 rpm) for t = 30 s. The solids concentration of SU-8 was varied
by diluting MicroChem SU-8 2002 photoresist with cyclopentanone and mixing the solution in a
magnetic stirrer for 10 minutes. The viscosities of the solutions were estimated by linearly
interpolating the available solids content versus viscosity data for SU-8 2000 series resists
[MicroChem]. Table B. 1 lists the steps for photoresist spin coating and curing. After curing, a
118
Table B. 1: SU-8 spin coating and curing process steps.
Step Process Parameter Time
Wafer pre-bake T = 95 "C 3 min
SU-8 dispense V = 20 ml (200mm wafer)
Spin = 105 rad/s (1000 rpm) 10 s
o= 314 rad/s (3000 rpm) 30 s
Soft bake T = 95 *C 3 min
Exposure Hg arc bulb
'ight = 405 nm 5 min
J = 3 mW/cm 2
Post-expose bake T = 95 "C 5 min
Hard bake T = 135 'C 10 min
0 50 100 150 200
Scan Length, [plm]
Figure B.2: Profile of a scratch created on an SU-8 coating to measure initial film thickness.
119
scratch was made on the coated wafer surface and the scratch depth was measured with a
profilometer to determine the thickness of the resist. The Cu coating thickness is subtracted from
the total scratch depth to obtain the final thickness of the coating.
Table B.2 lists the results for film thickness when spin coating SU-8 of different
viscosities. The computed time constants ranged from 0.82 to 2.28 s. Considering the spin time
of 30 s, the resulting dimensionless time, t/r, ranges from 13 to 36. Thus, the approximation
made in Eq. (B.3) is valid for the experimental conditions. The measured thickness was within
9% the theoretical thickness for the thinnest resist tested. However, the deviation between
theoretical and experimental values increased with viscosity, with the thickest resist resulting in
44% deviation. The larger discrepancy when coating viscous fluids may be due to the variation
in initial film thickness from test to test. Increasing viscosity results in larger time constants,
which reduces the accuracy of the approximation made in Eq. (B.3). For this case, the model has
a stronger dependence on the initial film thickness, which is not measured in-situ. Figure B.3
plots theoretical thickness versus experimental thickness. If the experimental data agreed with
the theory completely, the points will fall on the 45-degree line. The data points being above the
line indicates that the experimental results are higher than predicted. As thickness increased, a
result of increased viscosity, the difference between the data points and the line also grew. The
experimental dimensionless film thickness, h/h o , as a function of t/r is compared with the
hydrodynamic model in Figure B.4.
B.3 Step Coverage
SKW6-2 test wafers were coated with SU-8 photoresist of various thicknesses to
determine the planarity of the wafer surface after spin coating. The coating process parameters
are as described in Table B. 1. Figures B.5-B.9 show the profiles of features with lines ranging
from 5 to 100 p[m (a) before and (b) after spin coating a 4-pm thick layer of resist. For all
features, the photoresist was able to fill the trenches reasonably well. However, the profiles also
show a reduction in film thickness directly over the features. This is likely due to the extra
material required to fill the trenches, which decreases the coating thickness when compared to a
blanket region. Table B.3 compares the feature step-heights on an uncoated wafer with those on
wafers coated with a 1 to 4 [pm layer of SU-8 photoresist. The step-heights were reduced as the
120
Table B.2: Theoretical and experimental results for the viscosity versus thickness spin
coating experiments for ho = 5 pm, a = 314 rad/s, and t= 30 s.
% Solids v (m2/s) r (s) log(t / r) htheory(jm) log(h/ho)theo
.  
hexp (Rm) log(h/ho)cxp
15 2.7 x 10-6  0.82 1.56 0.82 -0.780 0.896 -0.747
22 5.1 x 10-' 1.55 1.29 1.14 -0.645 1.333 -0.574
29 7.5 x 10-6 2.28 1.12 1.38 -0.560 1.988 -0.401
2.0
E
:L
1.5
1.0
0.5
0
Figure B.3: Compar
thickne,
0 0.5 1.0 1.5 2.0 2.5
htheory, [m]
rison of experimental spin coating film thickness with theoretical film
ss.
121
0-0.5
0
-1.0
-1.5
-2.0
0.5
log( t I/ )
1.0 2.0
Experimental dimensionless film thickness versus dimensionless time compared
with the hydrodynamic model proposed by Emslie et al.
122
h = 5 Lm
Experimental w= 314 rad/s
Hydrodynamic Model t = 30 s
Figure B.4:
I
Before Spin Coating
0 500 000 2000
After Spin Coating
0
-100
2500
Scan Length, [1im]
II I I j
0 500 0 .1 2000 2500
Scan Length, [ýtm]
U-
X 1053
Y: 17.68/U
X: 1058
Y: -682
SI I I I i r [
1050 1052 1054 1056 1058
Scan Length, [ir
0
-100
-200
-300
-400
-500
-600
-700
-800
1060 1062 1064 1066 1068
X: 1056 X 1060 X: 1065
Y: -398.3 Y: -409.8 Y: -398
E0- m •
1050 1055 1060
Scan Length, [gpm]
(b)
1065 1070
Figure B.5: Profile of a subdie (a) before and (b) after spin coating with a 4-[pm layer of SU-8 photoresist, w= 5 jpm and A = 10 jm.
-100
-200
-300
-400
-500
-600
-700
-800
0
- -100
.E- -200
E -300
-400
S-500
U) -600
-700
-800
1048
, I II' 1
jii_ H'--·· --- ·-·--· -- --·· --
^^^ L I i J•vv 0I
I
t . .
I I I '
.. Id g~t•II II
100
i
I
-
-
-
Before Spin Coatinq
0
-100 X: 1014 X 1028
-200 Y: 44.5 Y: 41.09
-300-
-400
-500
-600
-700
-800
-900 X: 1022
-1000 Y: -1098
-1100 -
1005 1010 1015 1020 1025 1030 1035 1040
Scan Length, [pm]
After Spin Coating
1005 1010 1015 1020 1025 1030 1035 1040
Scan Length, [pm]
Profile of a subdie (a) before and (b) after spin coating with a 4-[tm layer of SU-8 photoresist, w= 10 ýim and
A = 20 [tm.
0)
C
0
&a
C,
Figure B.6:
Before Spin Coating
500
Scan
2000 2L
Length, [pm]
X: 1147
Y: 68.66
X: 1
Y: -
1120 1130 1140 1150 1160
X 1171
Y: 102.1
161
996
1170 1180 1190 1200
500
Scan Length, [pm]
200
100
0
-100
-200
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
200
100
0
-100
-200
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
-1200
0
1120 1130 1140 1150 1160 1170 1180 1190 1200
Scan Length, [gm]
)0
Profile of a subdie (a) before and (b) after spin coating with a 4-pm layer of SU-8 photoresist, w= 20 [tm and
A, =40 ýtm.
~3n ~1$ ~
0 500 0'R . .. 2000 25(
Scan Length, [pm]
100
0
-100
-200 X: 1149 X 1171
-300 Y: -407.1 Y: -407
-5400
-500- X: 1161
-600- Y: -435.6
-700
-800
-900
-1000
-1100
'1
-1100
Figure B.7:
* m
After Spin Coating
Scan
m
Before Spin Coating
200
100
0
-100
-200
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
-1200
H~i
0 500 1000
aSca
100
0X:
-100
-200 Y: 7
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
-1200
1000 1050
In Ler
K H
2000 25
ngth, [njm]
1100
Scan Length, [gm]
.00
1150 1200 1050 1100 1150 1200 1250
Scan Length, [gm]
Profile of a subdie (a) before and (b) after spin coating with a 4-pm layer of SU-8 photoresist, w= 50 gtm and
A= 100 gtm.
E
.09
CC)
0.U,
m
081
2.46
X: 1110
Y: -937
X:1137
Y: 102.9
Figure B.8:
[L • • I
I I i
After Spin Coating
Jl II
-200
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
-1 0nn
Before Spin Coating
0 500 1000
ScE
1 th, 2000 25
an Length, [inm]
200 1 1 1 1 1 1 1 1WU
0-
-100-
-200-
-300-
-400
-500
-600
-700-
-800-
-900 • -\.-." -
.-lnnn
.00
-1000-i
1000 1050 1100 1150 1200 1250
Scan Length, [ýlm]
1300 1350 1400 1450
200
100
-100
I
1050 1100 1150 1200 1250 1300
Scan Length, [ým]
(b)
i,
1350 1400 1450 1500
Profile of a subdie (a) before and (b) after spin coating with a 4-tm layer of SU-8 photoresist, w = 100 [m and
, = 200 mrn.
Table B.3: Comparison of feature step-heights for SU-8 coatings of various thicknesses.
Linewidth (m) Step-height (nm)
No Coating hsU-8 =1 tm hsu_8 = 2 pm hsu_8 = 4 [m
5 732 28 16 12
10 1135 58 33 30
20 1132 77 45 36
50 1133 455 63 61
100 1173 1096 396 194
I I .
1276
139.4
Figure B.9:
200
100
0
-100
-200
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
-1200
100
0
-100
-200
-300
-400
-500
-600
-700
-800
-900
-1000
-1100
~12
7X 1 2 1 6 X 1 3 2 1
Y: -430.1 Y: -431.4
X: 1274
Y: -539.4
-
w
_ L__
0O
After Spin Coating
0 500 1000 15005
Scan Length, [4m]
II
-
-
-
-
-
-
-
-
f
-
· ___
'"" I
resist thickness increased, which is expected since more material was available to fill the
trenches. The step-heights on the 4-tm coating ranged from 12 to 194 nm, a significant
reduction from 732 to 1173 nm before coating.
B.4 Preston Constant
The Preston constant, kp, of the fill material should be comparable to Cu so that the
structure will remain relatively planar throughout polishing. Therefore, a polishing experiment
was performed to determine kp,su-8. A 200-mm Cu wafer coated with SU-8 photoresist was
polished on the face-up CMP tool with a blocked perforated TWI-817 CMP pad and iCue 5001
slurry. The process parameters are listed in Table B.4. The time required to remove the SU-8
from the central region was recorded and kp, su-8 was found to be 1.9 x 10-13 1/Pa using Eq. (2.42).
This value is about half of kpc,, from the Cu polishing experiments, 5.0 x 10-13 1/Pa. Considering
the variation in kp, these values are within the same order of magnitude and therefore SU-8
photoresist is a suitable fill material for Cu CMP. It may be beneficial to find a material with a
k, even closer to that of kpc,, during process optimization to reduce polishing time and non-
uniformity.
It was also observed that there was delamination at various points on the wafer during the
polishing of resists thicker than 2 gm. This result is likely due to the thickness of the film which
can affect exposure and curing. Therefore, the optimization of resist thickness for planarity and
removability should be further studied.
B.5 Summary
A method of spin coating a polymer thin film on the wafer surface to improve feature-
scale uniformity has been proposed. SU-8, an epoxy-based photoresist, was used as the fill-
material for creating a planar initial polishing surface. First, experiments were performed to
control the coating thickness by changing the resist viscosity. The results compared well with a
hydrodynamic model for fluid film thickness on a rotating disk. To determine step coverage,
SU-8 was spin coated on 200-mm patterned wafers to various thicknesses. Profiles of the
features with linewidths ranging from 5 jm to 100 jm were obtained using a surface
profilometer. Comparison of the feature topographies before and after spin coating shows an
128
improvement in planarity. The feature step-heights were reduced by 83 to 98%. Finally, the
coated wafer was polished on a face-up CMP tool. The Preston constant of the SU-8 was found
to be 1.9 x 10-13 1/Pa, which is on the same order of magnitude as that of Cu, 5.0 x 10-3 1/Pa.
SU-8 is therefore an appropriate fill material for Cu CMP. However, delamination was observed
at various regions across the wafer during the polishing of resists thicker than 2 pm. Further
study is recommended to optimize the coating thickness for planarity and removability.
129
Table B.4: CMP conditions for determining the Preston constant of SU-8 photoresist.
Parameter
hs-s8 (pm)
rw (mm)
rw,effective (mm)
Pad type
rp (mm)
rcT (mm)
Slurry
Slurry additive
pH
ow (rad/s) (rpm)
cp (rad/s) (rpm)
VcC (m/s)
p (kPa) (psi)
Slurry flow rate (ml/min)
Ato (min)
Value
2.2
100
50
TWI-817
35
30
Cabot iCue 5001
H20 2 - 3% vol
8
16(155)
16(155)
0
13 (1.9)
150
31
130
Nomenclature
h
ho
hca, hsu-8
hs
hsi
J
kp,cu, kp,su-8
p
rcc
rw, rp
T
t
V
Ato
Vcc
VR
w
a
light
V
T)0
j4CO~
= spin coating film thickness (m)
= initial film thickness prior to spin coating (m)
= initial Cu coating thickness, SU-8 coating thickness (m)
= step-height (m)
= initial feature step-height (m)
= photoresist exposure source flux (J/m 2 /s)
= Preston constant for Cu, SU-8 (m2/N)
= pressure (N/m2)
= wafer center to pad center distance (m), normalized value
= wafer radius, pad radius (m)
= temperature ( C)
= spin coating time (s)
= volume of photoresist solution dispensed for spin coating (m3)
= time for center of the wafer to be completely polished (s)
= translational velocity of the pad; rate of change of rcc (m/s)
= relative velocity of the wafer with respect to the pad (m/s)
= feature linewidth (m)
= feature-scale non-uniformity factor, Cu deposition factor
= pitch of the Cu interconnect lines (m)
= wavelength of photoresist exposure source (m)
= kinematic viscosity of the spin coating material (m2 /s)
= time constant (s)
= angular speed of wafer during spin coating (rad/s)
= angular velocities of the wafer and the pad during CMP (rad/s)
131
References
G. Ahmadi and X. Xia, "A Model for Mechanical Wear and Abrasive Particle Adhesion During
the Chemical Mechanical Polishing Process," J. Electrochem. Soc., vol. 148, pp. G99-
G109, 2001.
K. D. Beyer, W. L. Guthrie, S. R. Makarewicz, E. Mendel, W. J. Patrick, K. A. Perry, W. A.
Pliskin, J. Riseman, P. M. Schaible, and C. L. Standley, "Chem-Mech Polishing Method
for Producing Coplanar Metal/ Insulator Films on A Substrate," U.S. Patent 4 944 836,
Jul. 31, 1990.
O. G. Chekina, L. M. Keer, and H. Liang, "Wear-Contact Problems and Modeling of Chemical
Mechanical Polishing," J. Electrochem. Soc., vol. 145, pp. 2100-2106, 1998.
L. M. Cook, "Chemical Processes in Glass Polishing," J. Non-Cryst. Solids, vol. 120, pp. 152-
171, 1990.
J. Coppeta, C. Rogers, L. Racz, A. Philipossian, and F. B. Kaufman, "Investigating Slurry
Transport Beneath a Wafer During Chemical Mechanical Polishing Processes," J.
Electrochem. Soc., vol. 147, pp. 1903-1909, 2000.
L. Dellmann, S. Roth, C. Beuret, G. A. Racine, H. Lorenz, M. Despont, P. Renaud, P. Vettiger,
and N. F. de Rooij, "Fabrication process of high aspect ratio elastic structures for
piezoelectric motor applications," in Proc. Transducers 1997, Chicago, 1997, pp. 641-
644.
T. Du, D. Tamboli, V. Desai, and S. Seal, "Mechanism of Copper Removal during CMP in
Acidic H20 2 Slurry," J. Electrochem. Soc., vol. 151, pp. G230-G235, 2004.
L. Economikos, X. Wang, A. Sakamoto, P. Ong, M. Naujok, R. Knarr, L. Chen, Y. Moon, S.
Neo, J. Salfelder, A. Duboust, A. Manens, W. Lu, S. Shrauti, F. Liu, S. Tsai, and W.
Swart, "Integrated Electro-chemical Mechanical Planarization (Ecmp) for Future
Generation Device Technology," in Proc. IEEE International Interconnect Technology
Conference, San Francisco, CA, 2004, pp. 233-235.
A. G. Emslie, F. T. Bonner, and L. G. Peck, "Flow of a Viscous Liquid on a Rotating Disk," J.
Appl. Phys., vol. 29, pp. 858-862, 1958.
T. Eusner, "Nano-Scale Scratching in Chemical-Mechanical Polishing," S.M. Thesis, Dept. of
Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 2008.
G. Fu and A. Chandra, "A Model for Wafer Scale Variation of Removal Rate in Chemical
Mechanical Polishing Based on Elastic Pad Deformation," Journal of Electronic
Materials, vol. 30, pp. 400-408, 2001.
132
G. Fu and A. Chandra, "An Analytical Dishing and Step Height Reduction Model for Chemical
Mechanical Planarization (CMP)," IEEE Trans. Semicond. Manuf, vol. 12, pp. 477-485,
2003.
M.-N. Fu, S.-H. Liao, C.-C. Li, and P.-Y. Chang, "Slurry Transport During Chemical
Mechanical Polishing," Jpn. J. Appl. Phys., vol. 44, pp. 7843-7848, 2005.
J. Grossman and D. S. Herman, "A New Etchant for Thin Films of Tantalum and Tantalum
Compounds," J. Electrochem. Soc., vol. 116, p. 674, 1969.
S. Hoshino, Y. Kitade, N. Yoshida, and Y. Uda, "<0.1 psi Ultra Low Pressure CMP for
Copper/Ultra Low-k Films," in Proc. CMP-MIC, Marina Del Rey, CA, 2003, pp. 212-218.
ITRS, "The International Technology Roadmap for Semiconductors," www.itrs.net, 2007.
M. Kulawski, K. Henttinen, I. Suni, F. Weimar, and J. Mdikinen, "A Novel CMP Process on
Fixed Abrasive Pads for the Manufacturing of Highly Planar Thick Film SOI Substrates,"
in Proc. Mater. Res. Soc. Symp., San Francsico, CA, 2003.
J.-Y. Lai, N. Saka, and J.-H. Chun, "Evolution of Copper-Oxide Damascene Structures in
Chemical Mechanical Polishing," J. Electrochem. Soc., vol. 149, pp. G3 1-G50, 2002.
Y. Li, Microelectronic Applications of Chemical Mechanical Planarization. Hoboken, NJ: Wiley
Interscience, 2008.
C.-W. Liu, B.-T. Dai, W.-T. Tseng, and C.-F. Yeh, "Modeling of the Wear Mechanism During
Chemical-Mechanical Polishing," J. Electrochem. Soc., vol. 143, pp. 716-721, 1996.
H. Lorenz, M. Despont, N. Fahrni, N. LaBianca, P. Renaud, and P. Vettiger, "SU-8: a low-cost
negative resist for MEMS," J. Micromech. Microeng., vol. 7, pp. 121-124, 1997.
J. Luo and D. A. Dornfeld, "Material Removal Regions in Chemical Mechanical Planarization
for Submicron Integrated Circuit Fabrication: Coupling Effects of Slurry Chemicals,
Abrasive Size Distribution, and Wafer-Pad Contact Area," IEEE Trans. Semicond.
Manuf, vol. 16, pp. 45-56, 2003.
C. Mau, N. Saka, and J.-H. Chun, "Kinematical Effects on Slurry Flow and Material Removal
Rate in Face-up CMP," in Proc. CMP-MIC, Fremont, CA, 2007, pp. 445-452.
C. Mau, N. Saka, and J.-H. Chun, "An Alogrithm for Pad Translation in Face-up CMP," in Proc.
CMP-MIC, Fremont, CA, 2008, pp. 99-106.
G. E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, vol. 38, pp.
114-117, 1965.
133
S. Mudhivarthi, P. B. Zantye, A. Kumar, A. Kumar, M. Beerbom, and R. Schlaf, "Effect of
Temperature on Tribological, Electrochemical, and Surface Properties During Copper
CMP," Electrochem. Solid-State Lett., vol. 8, pp. G241-G245, 2005.
G. P. Muldowney and D. B. James, "Characterization of CMP Pad Surface Texture and Pad-
Wafer Contact," in Proc. Mater. Res. Soc. Symp., Warrendale, PA, 2004.
K. Noh, "Modeling of Dielectric Erosion and Copper Dishing in Copper Chemical-Mechanical
Polishing," Ph.D. Dissertation, Dept. of Mechanical Engineering, Massachusetts Institute
of Technology, Cambridge, MA, 2005.
K. Noh, K. Kopanski, N. Saka, and J.-H. Chun, "The Effect of Pad Topography on Surface Non-
Uniformity in Cu CMP," in Proc. CMP-MIC, Fremont, CA, 2005, pp. 443-451.
K. Noh, N. Saka, and J.-H. Chun, "Effect of Slurry Selectivity on Dielectric Erosion and Cu
Dishing in Copper Chemical Mechanical Polishing," Annals of the CIRP, vol. 51, pp.
463-466, 2004.
K. Noh, N. Saka, and J.-H. Chun, "Control of the Multi-scale Non-uniformities in Cu CMP by
Face-up Polishing," in Proc. CMP-MIC, Fremont, CA, 2006, pp. 360-367.
E. Paul, J. Horn, Y. Li, and S. V. Babu, "A Model of Pad-Abrasive Interactions in Chemical
Mechanical Polishing," Electrochem. Solid-State Lett., vol. 10, pp. HI 31-H133, 2007.
L. Pauling, General Chemistry, 3d ed. San Francisco, CA: W. H. Freeman, 1970, pp. 274, 697-
704.
F. W. Preston, "The Theory and Design of Plate Glass Polishing Machines," J. Soc. Glass
Technol., vol. 11, pp. 214-256, 1927.
S. R. Runnels, "Feature-Scale Fluid-Based Erosion Modeling for Chemical-Mechanical
Polishing," J. Electrochem. Soc., vol. 141, pp. 1900-1904, 1994.
N. Saka and J.-H. Chun, "Face-up Chemical-Mechanical Polishing: Theory and Experiments," in
Proc. CMP-MIC, Fremont, CA, 2007, pp. 427-436.
N. Saka, J.-Y. Lai, J.-H. Chun, and N.-P. Suh, "Mechanisms of the Chemical Mechanical
Polishing (CMP) Process in Integrated Circuit Fabrication," Annals of the CIRP, vol. 50,
pp. 233-238, 2001.
A. Simpson, L. Economikos, F.-F. Jamin, and A. Ticknor, "Fixed Abrasive Technology for STI
CMP on a Web Format Tool," in Proc. Mater. Res. Soc. Symp., San Francisco, CA, 2001.
Z. Stavreva, D. Zeidler, M. P16tner, G. Grasshoff, and K. Drescher, "Chemical-Mechanical
Polishing of Copper for Interconnect Formation," Microelectron. Eng., vol. 33, pp. 249-
257, 1997.
134
J. M. Steigerwald, S. P. Murarka, and R. J. Gutmann, Chemical Mechanical Planarization of
Microelectronic Materials. New York, NY: Wiley Interscience, 1997.
J. M. Steigerwald, R. Zirpoli, S. P. Murarka, D. Price, and R. J. Gutmann, "Pattern Geometry
Effects in the Chemical-Mechanical Polishing of Inlaid Copper Structures," J.
Electrochem. Soc., vol. 141, pp. 2842-2848, 1994.
D. G. Thakurta, C. L. Borst, D. W. Schwendeman, R. J. Gutmann, and W. N. Gill, "Three-
Dimensional Chemical Mechanical Planarization Slurry Flow Model Based on
Lubrication Theory," J. Electrochem. Soc., vol. 148, pp. G207-G214, 2001.
J. Warnock, "A Two-Dimensional Process Model for Chemimechanical Polish Planarization," J.
Electrochem. Soc., vol. 138, pp. 2398-2402, 1991.
135
