Physics-Based Electromigration Modeling and Analysis and Optimization by Sun, Zeyu
UC Riverside
UC Riverside Electronic Theses and Dissertations
Title
Physics-Based Electromigration Modeling and Analysis and Optimization
Permalink
https://escholarship.org/uc/item/7tw2x57k
Author
Sun, Zeyu
Publication Date
2020
 
Peer reviewed|Thesis/dissertation
eScholarship.org Powered by the California Digital Library
University of California
UNIVERSITY OF CALIFORNIA
RIVERSIDE
Physics-Based Electromigration Modeling and Analysis and Optimization
A Dissertation submitted in partial satisfaction
of the requirements for the degree of
Doctor of Philosophy
in
Electrical Engineering
by
Zeyu Sun
March 2020
Dissertation Committee:
Professor Sheldon X.-D. Tan, Chairperson
Professor Daniel Wong
Professor Kambiz Vafai
Copyright by
Zeyu Sun
2020
The Dissertation of Zeyu Sun is approved:
Committee Chairperson
University of California, Riverside
Acknowledgments
First and foremost, I would like to express the deepest appreciation to my adviser
Dr. Sheldon X.-D. Tan for his constant guidance and tireless encouragement that help me
successfully complete this work.
I would also like to thank all the committee members Prof. Daniel Wong and
Prof. Kambiz Vafai for their beneficial direction, discussion and advises for improving my
research.
Also, I would like to extend my special thanks to all my lab members and friends
who have helped me and encouraged me during my study, research and life at UC Riverside:
Chase Cook, Han Zhou, Hengyang Zhao,Sheriff Sadiqbatcha, Shuyuan Yu, Taeyoung Kim,
Yibo Liu.
Last but not least, I would like to thank my family for their support and unwavering
love.
iv
ABSTRACT OF THE DISSERTATION
Physics-Based Electromigration Modeling and Analysis and Optimization
by
Zeyu Sun
Doctor of Philosophy, Graduate Program in Electrical Engineering
University of California, Riverside, March 2020
Professor Sheldon X.-D. Tan, Chairperson
Long-term reliability is a major concern in modern VLSI design. Literature has shown that
reliability gets worse as technology advances. It is expected that the future VLSI systems
would have shorter reliability-induced lifetime comparing with previous generations. Being
one of the most serious reliability effects, electromigration (EM) is a physical phenomenon
of the migration of metal atoms due to the momentum exchange between atoms and the
conducting electrons. It can cause wire resistance change or open circuit and result in
functional failure of the circuit. Power-ground networks are the most vulnerable part to
EM effect among all the interconnect wires since the current flow on this part is the largest
on the chip. With new generation of technology node and aggressive design strategies,
more accurate and efficient EM models are required. However, traditional EM approaches
are very conservative and cannot meet current aggressive design strategies. Besides circuit
level, EM also need to be thoroughly studied in system level due to limited power and
temperature budgets among cores on chip.
v
This research focuses on developing physical level EM model for VLSI circuits
and system level EM optimization for multi-core systems in order to overcome the afore-
mentioned problems. Specifically, for physical level, we develop two EM immortality check
methods and a power grid EM check method. Firstly, a voltage based EM immortality
analysis has been developed. Immortality condition in nucleation phase can be determined
fast and accurately for multi-segment interconnect wires. Secondly, a saturation volume
based incubation phase immortality check method has been proposed. This method can
further reduce the redundancy in VLSI circuit design by immortality check in multiphase.
Furthermore, both immortality check methods are integrated into a new power grid EM
check methodology (EMspice) as filter for EM analysis. These filters can accelerate the sim-
ulation by filtering out immortal trees so that we only need to do simulation on fewer trees
that are mortal. Coupled EM simulation considering both hydrostatic stress and electronic
current/voltage in the power grid network will be applied to these mortal trees. This tool
can work seamlessly with commercial synthesis flow.
Besides physical level reliability models, system level reliability optimization is
also discussed in this research. A deep reinforcement learning based EM optimization
has been proposed for multi-core system. Both long term reliability effect (hard error) and
transient soft error are considered. Energy can be optimized with all the reliability and other
constraints fast and accurately compared to existing reliability management techniques.
Last but not least, a scheduling based reliability optimization method for multi-core systems
has been proposed. NBTI, HCI and EM are considered jointly. Lifetime of the system can be
improved significantly compared to traditional methods which mainly focus on utilization.
vi
Contents
List of Figures x
List of Tables xiii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Electromigration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dissertation contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Review of three phase based EM physics and stress modeling 11
2.1 Review of three phase based EM physics and stress modeling . . . . . . . . 11
2.1.1 Nucleation phase modeling . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Incubation phase modeling . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Growth phase modeling . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Fast Voltage Based Electromigration Immortality Analysis for Multi-
Segment Copper Interconnect Wires 18
3.1 Voltage-based EM stress estimation . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Steady-state EM-induced stress modeling . . . . . . . . . . . . . . . 19
3.1.2 New voltage-based analysis for steady-state EM stress . . . . . . . . 20
3.1.3 General and key VBEM equations . . . . . . . . . . . . . . . . . . . 24
3.1.4 Relationship to the Blech limit . . . . . . . . . . . . . . . . . . . . . 25
3.1.5 Steady-state analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.6 Study of some special cases . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.7 Current crowding impact analysis . . . . . . . . . . . . . . . . . . . . 39
3.1.8 Application to mesh-structured interconnect wires . . . . . . . . . . 44
3.1.9 Application to IBM power grids . . . . . . . . . . . . . . . . . . . . . 46
3.2 Numerical validation results and discussions . . . . . . . . . . . . . . . . . . 48
3.2.1 Results for straight-line 3-terminal interconnects . . . . . . . . . . . 48
3.2.2 Results for T-shaped 4-terminal interconnect . . . . . . . . . . . . . 50
vii
3.2.3 Results for comb structure interconnects . . . . . . . . . . . . . . . . 51
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Saturation Volume Estimation For Multi-Segment Copper Interconnect
Wires 54
4.1 The new void saturation volume estimation for general multi-segment wire . 55
4.1.1 Review of the void saturation volume for single segment . . . . . . . 55
4.1.2 Proposed void saturation volume for multi-segment wires . . . . . . 59
4.2 New EM immortality check algorithm . . . . . . . . . . . . . . . . . . . . . 67
4.3 Numerical results and discussions . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.1 Model validation on two-segment wire . . . . . . . . . . . . . . . . . 69
4.3.2 Model validation on a T-shaped wire . . . . . . . . . . . . . . . . . . 71
4.3.3 Model validation on a more complicated wire . . . . . . . . . . . . . 72
4.3.4 The results from the new EM immortality check flow . . . . . . . . . 73
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5 EMSpice: Physics-Based Electromigration Check Using Coupled Elec-
tronic and Stress Simulation 76
5.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 The proposed new coupled EMSpice simulation for EM failure check . . . . 79
5.2.1 The new power grid EM check flow . . . . . . . . . . . . . . . . . . . 79
5.2.2 Physics-based coupled multi-physics power grid simulation . . . . . 80
5.2.3 EM immortality filtering . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Numerical results and discussions . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3.1 Accuracy comparison against existing EM analysis methods . . . . . 90
5.3.2 Filtering and coupled simulation results of Cortex . . . . . . . . . . 93
5.3.3 Filtering and coupled simulation results of ChipTop . . . . . . . . . 97
5.3.4 Comparison against existing full-chip EM analysis methods . . . . . 101
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Dynamic Reliability Management for Multi-Core Dark Silicon Processor
Based on Deep Reinforcement Learning 104
6.1 Review of the EM and soft error reliability models . . . . . . . . . . . . . . 105
6.1.1 Soft error reliability model considering DVFS impacts . . . . . . . . 106
6.2 DRL framework for Dark silicon optimization . . . . . . . . . . . . . . . . . 107
6.2.1 Background of deep reinforcement learning . . . . . . . . . . . . . . 107
6.2.2 Deep reinforcement learning based optimization framework . . . . . 109
6.3 Numerical result and discussions . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3.2 Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3.3 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.4 Steps to converge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
viii
7 Resource Allocation for Long-term Reliability Enhancement of Real-time
Embedded Systems 119
7.1 Background on Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.1.1 Device-level to System-level Models . . . . . . . . . . . . . . . . . . 120
7.1.2 Unified system-level reliability model . . . . . . . . . . . . . . . . . . 125
7.1.3 Review of current scheduling techniques considering reliability . . . 127
7.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.3 Reliability-aware Real-Time Resource Allocation . . . . . . . . . . . . . . . 130
7.3.1 Motivational experiments and key observations . . . . . . . . . . . . 131
7.3.2 Proposed algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3.3 Applicability to sporadic tasks and fixed-priority scheduling . . . . . 136
7.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.4.1 Taskset Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8 Conclusion 146
8.1 Voltage based EM immortality check . . . . . . . . . . . . . . . . . . . . . . 146
8.2 Saturation volume based EM immortality check . . . . . . . . . . . . . . . . 147
8.3 Coupled Electronic and Stress Simulation . . . . . . . . . . . . . . . . . . . 148
8.4 Deep Reinforcement based reliability management . . . . . . . . . . . . . . 149
8.5 Resource allocation based reliability management . . . . . . . . . . . . . . . 149
Bibliography 151
ix
List of Figures
1.1 Evolution of current densities: Jmax, the maximum equivalent DC current
density and JEM , the current density for targeted lifetime [1] . . . . . . . . 2
2.1 EM-stress evolution of a two segment wire in the (a)nucleation phase (b)
incubation and growth phase. . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Side-view of void formation: (a) void in a via-above line (early failure mode);
(b) void in a via-below line (later failure mode). . . . . . . . . . . . . . . . . 16
3.1 A three-terminal wire, with the direction indicating electron flow. . . . . . . 20
3.2 (a) A 3-terminal wire with inactive (passive) sink, with the cathode at node
2. (b) The steady-state stress distribution of a 3-terminal wire with inactive
(passive) sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Different discretization steps (sizes) lead to the same steady-state stress. . . 30
3.4 Interconnect examples for EM analysis for a straight-line 3-terminal wire. . 32
3.5 (a) 2-D stress distribution on wire at steady state for passive reservoir (b)
EM stress versus length at steady state. . . . . . . . . . . . . . . . . . . . . 34
3.6 (a) 2-D stress distribution on wire at steady state for passive sink (b) EM
stress versus length at steady state. . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Interconnect examples for EM analysis for T-shaped 4-terminal wire. . . . . 35
3.8 (a) Illustration of T-junction interconnect with directed graph inserted to
indicate electron flow. (b) Stress at the cathode end increases linearly as the
stub is placed further away from the cathode end. . . . . . . . . . . . . . . 36
3.9 Comb structure interconnect examples for EM analysis. . . . . . . . . . . . 37
3.10 (a) EM stress validation for each comb structure interconnect with changing
LB. (b) EM stress validation for each comb structure interconnect with
changing LF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.11 L-shaped wire structure with 3 nodes. . . . . . . . . . . . . . . . . . . . . . 41
3.12 L-shaped wire structure with current-crowding effects. . . . . . . . . . . . . 42
3.13 U-shaped wire structure with 4 nodes. . . . . . . . . . . . . . . . . . . . . . 43
3.14 U-shaped wire structure with current-crowding effects. . . . . . . . . . . . . 43
3.15 A 4× 4 mesh-structured wire. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.16 A small portion of a typical power supply network [2]. . . . . . . . . . . . . 46
x
3.17 Steady-state EM stress comparisons for each straight-line 3-terminal inter-
connect case (x: case number, y: EM stress at the node 0 (cathode node)). 49
3.18 EM stress validations for each T-shaped 4-terminal interconnect case (x: case
number, y: EM stress at the node 0 (cathode node)). . . . . . . . . . . . . . 50
3.19 EM stress validations for each comb structure interconnect case (x: number
of fingers, y: EM stress at the node 0 (cathode node)) . . . . . . . . . . . . 52
4.1 The typical stress evolution on a 30 µm copper wire computed by finite
element analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 (a) A one-segment wire and the direction indicate electron flow (b) Stress
integration area of a one-segment wire . . . . . . . . . . . . . . . . . . . . . 57
4.3 (a) A two-segment wire and the direction indicate electron flow (b) Stress
integration area of a two-segment wire . . . . . . . . . . . . . . . . . . . . . 61
4.4 Stress distribution for two-segment wire at steady state . . . . . . . . . . . 64
4.5 A T-shaped wire (Arrows indicate electron flow) . . . . . . . . . . . . . . . 65
4.6 (a) Stress on horizontal segment 0-2; (b) Stress on vertical segment 1-3 . . . 66
4.7 EM immortality check algorithm flow . . . . . . . . . . . . . . . . . . . . . 68
4.8 (a) Case 1; (b) Case 2; (c) Case 3; (d) Case 4 . . . . . . . . . . . . . . . . . 71
4.9 Complex multi-segment structure. . . . . . . . . . . . . . . . . . . . . . . . 72
4.10 (a) Complex multi-segment structure result from FEM analysis tool. (b)
Result zoom to void area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1 Simulation Framework for EMSpice simulator . . . . . . . . . . . . . . . . . 80
5.2 Block diagram of coupled FDTD and IR drop power grid simulator . . . . . 80
5.3 Block diagram of EM immortality filter . . . . . . . . . . . . . . . . . . . . 85
5.4 (a) Part of power grid of Cortex in Synopsys ICC; (b) Part of power grid of
Cortex in proposed EM GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Stress comparison between the EMSpice and Xsim on a wire from Cortex . 91
5.6 Void size comparison between EMSpice and COMSOL on a wire from Cortex
design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.7 Current densities of the mortal trees. (X and Y in um and current densities
in MA/cm2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.8 (a) Stress distribution and void formation at 8th year; (b) 12th year; (c) 20th
year for Cortex design. (X and Y in um and stress in Mpa) . . . . . . . . . 94
5.9 (a) Zoomed stress distribution and void formation at 8th year, (b) 12th year,
(c) 20th year of Cortex design. . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.10 Voltage drop on the power grid of Cortex . . . . . . . . . . . . . . . . . . . 95
5.11 Resistance change over time on a mortal tree on power grid of Cortex . . . 95
5.12 Voltage drop change over time for different nodes on power grid of Cortex . 96
5.13 Resistance change over time of failed interconnect tree wires on power grid
of Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.14 Current densities of the mortal trees on power grid of ChipTop. (X and Y in
um and current densities in MA/cm2) . . . . . . . . . . . . . . . . . . . . . 98
5.15 (a) Stress distribution and void formation on power grid of ChipTop at 8th
year; (b) 10th year; (c) 17th year. (X and Y in um and stress in Mpa) . . . 98
xi
5.16 Void distribution on the ChipTop at 17th year on (a), left up corner(b) right
up corner, (c) left down corner, (d) right down corner . . . . . . . . . . . . 99
5.17 Voltage drop on the power grid of ChipTop . . . . . . . . . . . . . . . . . . 99
5.18 Resistance change of mortal wires on the power grid of ChipTop . . . . . . . 100
5.19 Cathode voltage drop of mortal wires on the power grid of ChipTop . . . . 100
6.1 The proposed deep learning frame work for reliability management flow . . 111
6.2 The evaluation platform for dark silicon based on DRL method . . . . . . . 112
6.3 (a) Memory used of q-learning (b) Memory used of DRL . . . . . . . . . . . 115
6.4 (a) Energy consumption with loose power, performance, temperature and
reliability bound (b) Energy consumption with tight power, performance,
temperature and reliability bound . . . . . . . . . . . . . . . . . . . . . . . 116
6.5 (a) Time to converge of DRL method (b) Time to converge of Q-learning
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1 (a) Balanced utilization with short lifetime. (b) Unbalanced utilization with
long lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2 (a) MTTF at different frequencies for task utilization of 0.5. (b) MTTF at 1
GHz for different task utilizations . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3 (a) MTTF for all workloads at different frequencies with 0.5 utilization. (b)
MTTF for all workloads at 1 GHz with different utilizations . . . . . . . . . 138
7.4 (a) Lifetime on different frequency and utilization of benchmark fft. (b)
Lifetime on different frequency and utilization of benchmark barnes. . . . . 139
7.5 (a) Lifetime of different effects for benchmark fft at 1GHz. (b) Lifetime of
different effects for benchmark barnes at 1GHz. . . . . . . . . . . . . . . . 139
7.6 MTTF for a different number of cores (NP ) with n tasks randomly chosen
from [NP , 4NP ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.7 (a) MTTF for a different number of tasks from 8 to 32 with 8 cores. (b)MTTF
for a different number of tasks from 64 to 256 with 64 cores. . . . . . . . . . 142
7.8 (a) MTTF for different percentage of low-MTTF tasks with 8 cores and 20
tasks. (b) MTTF for different percentage of low-MTTF tasks with 64 cores
and 160 tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
xii
List of Tables
3.1 The computed poles for the 3-terminal wire . . . . . . . . . . . . . . . . . . 31
3.2 EM stress calculated with and without current-crowding effects . . . . . . . 42
3.3 Stress values at each node for the U-shaped structure . . . . . . . . . . . . 43
3.4 Stress condition for mesh structure . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 property of IBM benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Parameters for each straight-line 3-terminal interconnect case . . . . . . . . 49
3.7 Parameters for each T-shaped 4-terminal interconnect case (l = µm, w = µm,
and j =MA/cm2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8 EM stress validations for comb structure interconnect cases. . . . . . . . . . 51
4.1 Comparison of void area of two methods (wire width = 1um thickness =
0.3um) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Comparison of void area of two methods (wire thickness = 0.1um) . . . . . 71
4.3 Parameters for the 13-segment interconnect wire . . . . . . . . . . . . . . . 73
4.4 Comparison of void area of two methods (wire width = 1um thickness =
0.3um) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Void size comparison between EMSpice (FDTD) and COMSOL on a wire
from Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Parameter for the EM simulation . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3 Comparison of failed tree number of three methods on the power grid of Cortex 98
5.4 Lifetime comparison of the three full-chip EM analysis methods on the power
grid of Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5 Comparison of failed tree number of three methods on the power grid of
ChipTop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Lifetime comparison of the three full-chip EM analysis methods on power
grid of ChipTop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.1 Example task parameters for Fig. 7.1 . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Base parameters for taskset generation . . . . . . . . . . . . . . . . . . . . . 141
7.3 statistics of lifetime of a system with 64 cores . . . . . . . . . . . . . . . . . 144
xiii
Chapter 1
Introduction
1.1 Motivation
1.1.1 Electromigration
Reliability has become a major design challenge and limiting factor for nanometer
VLSI designs. As VLSI technology features are pushed to the limit with each successive
generation and with the introduction of new materials and increased current densities to
satisfy performance demands. As can be seen in the prediction of International Technology
Roadmap for Semiconductor (ITRS) [1], current densities keep increasing and will result
serious reliability issues. Among these issues, Electromigration (EM) is projected to be a key
reliability issue for current and future VLSI technologies. As technology advances into the
sub-10nm regime with FinFET devices, it has been predicted that EM failure will become
more significant for interconnects due to increased current density and elevated heating.
The EM assessment and verification methods are essential in the design of VLSI circuits to
1
ensure interconnect circuits especially wires in power grid in do not have functional failure
and can function will during its lifetime.
Figure 1.1: Evolution of current densities: Jmax, the maximum equivalent DC current
density and JEM , the current density for targeted lifetime [1]
Accurately predicting the EM-induced time-to-failure for full chip is still a chal-
lenging task. Most currently employed EM models are very simple empirical models. They
are very conservative and may cause too much redundant in chip chip design. These re-
dundant caused by conservative overdesign and will have extra area, power, and energy
costs [3]. Among all these empirical models, Black’s equation [4] and Blech limit [5] are
the most commonly used. Black’s equation use curve fitting method to get the model. In
this model, EM induced lifetime only depend on temperature and current densities. This
model is very conservative since they do not consider physical meaning of EM and only
use experiment data for curve fitting. Also, only single segment with fixed wire width and
current density is considered. However, in most of VLSI designs wires are multi-segment
2
with different current densities. Blech limit is an immortality check method. If the product
of current density and length is smaller than a certain critical level, the wire can be consid-
ered as immortal. It is generally employed as immortality check method in most designs.
However, it also only considers single segment wire and not suitable for modern VLSI de-
sign. Besides, all these models only focus on device level EM analysis. System level EM
checking method is also required for the high level Mean time to failure (MTTF) check and
optimization. In order to overcome the drawbacks of empirical of EM models, many physics
based EM models are proposed. Among them one of the most famous model is Korhonen’s
model [6]. Transient hydrostatic stress described by Korhonen’s partial differential equation
(PDE) is used to simulate EM effect. However, it is not easy to solve the PDE directly
in EDA tools. So, many models and methods are developed to integrate the EM analysis
model. A closed form equation derived from Korhonen’s equation is proposed in [7, 8].
They get the exact analytic solution of the Korhonen’s equation and assume some terms
are dominant to calculate EM induced mean time to failure (MTTF). And that method
is further improved in [9] with more proper assumption. Finite difference method (FDM)
and finite element method (FEM) are also employed to resolve EM problems. For example,
[10,11] use FDM to fast solve the void nucleation and growth process and [12] focus on the
post voiding analysis of EM. Comparing with Korhonen’s equation, these methods are fast
and can be easily integrated to the EDA tools. However, it still takes a long time to do EM
analysis with these tools since there are numerous of wires on chip. For these wires, only
some of them will fail due to EM effect. So an efficient filter system is required. Also , it is
important to combine the EM model with commercial EDA tools.
3
System level reliability issues also becomes more and more serious with more ad-
vanced technology nodes. Due to increasing number of cores and high power density, re-
liability management methodologies considering reliability issues are essential for lifetime
optimization on chips. For system level, reliability issues mainly consists of long term device
reliability effects (hard error) and short term transient error of signal (soft error). Hard
error such as Negative Bias Temperature Instability (NBTI), Hot Carrier Injection (HCI),
and EM, may adversely affect such systems as they decrease allowable executing speed,
thereby possibly resulting in unexpected deadline misses. Soft error generally will result
to wrong signal or datum and finally result to change instruction or data value. In order
to improve the lifetime on system level, many reliability management methodologies are
proposed. Work in [13] considers the negative-bias temperature instability (NBTI) and
time-dependent dielectric breakdown (TDDB), while [14] mainly focuses on mitigation of
the thermal cycling induced reliability at the operation system level. All of those methods
did not impacts of soft-errors. Recently, dynamic reliability management for multi-core
dark silicons have been proposed [15, 16]. In this work, both EM-based hard errors and
soft-errors have been considered. Due to conflicting requirements from both hard and soft
errors, the author employs Q-learning as the management method since it can provide cost
effective solutions so that the optimal states can be selected to reduce the system energy.
However, they are not efficiency enough and lack of online optimization steady. And some
scheduling based methods are proposed. [17] uses a simple and general reliability model
which cannot represent MTTF in an accurate manner. [18] uses an EM model and focuses
on a single-core system. Hence, multi-core systems that are widely used these days cannot
4
be analyzed and the consideration of only one reliability effect also limits the practicality
of this work. [19], which considers multiple reliability effects including EM, time-dependent
dielectric breakdown (TDDB), stress migration (SM), and thermal cycling (TC). The work
uses temperature to represent such effects, with an observation that they have a strong
exponent dependence on temperature. However, temperature is not the only factor that
impacts the failure mechanisms. Other factors such as power and operation frequency are
also related to these reliability issues; hence, more specific physics-based models are required
for the faithful analysis of reliability in modern multi-core systems.
1.2 Dissertation contributions
This dissertation presents advanced technology in the physics-based modeling and
system level application of EM. The major contributions of this dissertation are summarized
as follows:
• For the voltage based EM (VBEM) analysis we present a novel and fast EM immortal-
ity check for general multi-segment interconnect wires. Instead of using current density
as the key parameter, the new method estimates the EM-induced steady-state stress
in general multi-segment copper interconnect wires based on a novel parameter, Crit-
ical EM Voltage, VCrit,EM . We show that the VCrit,EM is essentially the natural, but
important, extension of the Blech limit concept, which describes the EM immortality
condition for a single segment wire, to more general multi-segment interconnect wires.
The proposed method, called voltage-based EM or VBEMmethod, mitigates the prob-
lem of current-density-based EM criteria, which can only be applied to a single wire.
5
The new VBEM method can naturally comprehend the impact of the topology of the
wire structure on EM-induced stress. As a result, this new VBEM analysis method is
very amenable to addressing EM violations, as it brings new optimization capabilities
to the physical design flow. The VBEM stress estimation method is based on the
fundamental steady-state stress equations. This approach avoids computationally-
intensive numerical methods and can be implemented in CAD tools very easily, as
we demonstrate on real design examples. We also show that the proposed VBEM
analysis method agrees with results from the finite difference method in the steady
state through one example and also agrees with one published closed-form expression
of steady-state stress for a special 3-terminal wire case. We also study the impact of
current crowding in practical interconnect wires on the estimated steady-state stress,
which are shown to be not significant if the length of the wire is much greater than its
width. An extension of the VBEM method to consider the significant current crowd-
ing effects is also shown and additionally, we analyze mesh-structured interconnect
wires and demonstrate that the proposed VBEM method is correct and accurate on
such structures.
• For the saturation volume based EM analysis, a new formula of the void’s saturation
volume for general multi-segment interconnect wires is presented. The new formula
is based on the fundamental atom conservation at the steady-state condition of void
growth phases. The new void saturation estimation formula agrees with the existing
single segment wire saturation void volume formula and is an natural extension of the
single segment case to general multi-segment wires. In addition, we consider impacts of
6
the void volume on final stress distributions of the wire to further improve the accuracy
of the proposed formula. The proposed saturation void volume estimation can be
applied for fast EM immortality check for nucleated wires, which were considered to
be failed wires in the past. Combined with the recently proposed EM immortality
check for nucleation [20, 21], we propose a new EM immortality check algorithm,
which considers both void nucleation and void growth for the first time and thus
overcoming the conservativeness of the existing EM immortality check method. Our
numerical results show that the proposed formula agrees well with a published work
for two-segment cases, which are supported by experimental data. The formula is also
validated by the recently proposed physics-based 3D finite element (FEM) analysis
tool for general multi-segment interconnect wires. We also demonstrate new EM
immortality check flow can quickly identify the new type of immortal wires, which are
nucleated but with smaller-than-critical voids.
• For power grid level EM analysis, we propose a new full-chip EM failure analysis
tool, called EMSpice, for physics-based coupled EM and electronic analysis. EMSpice
takes power grid netlists from Synopsys ICC flow, and outputs the failed EM wires
and their resistance changes and resulting IR drops of the power grids over the given
aging time. It can be used as a EM sign-off tool or used in the EM optimization
engine for commercial EDA physical synthesis flow. EM spice can simultaneously
considers the hydrostatic stress and electronic current/voltage in a power grid net-
work. The new method starts from first principles and can solve the resulting coupled
time-varying partial differential equations in time domain to accurately stress evolu-
7
tion in multi-segment interconnect trees for EM failure analysis. EMSpice simulator
employs a finite difference time domain (FDTD) solver for stress analysis for every
interconnect tree for both nucleation and post-voiding phases. EMSpice simulator
also couples the EM analysis with IR drop analysis at time domain. Thus the solver
can consider the interplay among stress, void growth, resistance change and IR drop
in a single simulation framework. Furthermore, EMSpice simulator considers both
recently proposed nucleation phase immortality and incubation phase immortality for
the first time to remove immortal interconnect trees from EM analysis.
• For machine learning based reliability optimization, we propose a new deep reinforce-
ment learning optimization frame work considering both hard and soft errors for dark
silicon processors. Our work is motivated by the lack of online optimization tools and
low efficiency of traditional Q-learning algorithm in run time operation. We formulate
our DRM problem as minimizing the energy consumption subject to the reliability
include hard and soft errors, power, performance and thermal constraints. The hard
error model is based on a recently proposed physics based EM model. Hard and
soft errors are obtained by recently proposed reliability models. Power, performance
are acquired by X86 based micro-architecture model [22] and thermal information is
given by temperature simulators. The control operation we use dynamic voltage and
frequency scaling (DVFS) and ON/OFF pulsing action. DRL method is used to se-
lect the most cost effective operation. Experimental results show that the DRL-based
DRM method can lead to energy reduction much better than the Q-learning based
method and simple DVFS based method. Furthermore, the DRL-based method also
8
shows much smaller space complexity (linear versus the quadratic) and less steps to
converge than the Q-learning based method as well.
• For scheduling based reliability optimization, we propose a novel resource allocation
algorithm to improve the long-term reliability of multi-core real-time embedded sys-
tems. To the best of our knowledge, this work is the first to show the dissimilar-
ity between utilization and reliability and propose a reliability-aware task allocation.
Utilization is not the only factor related to reliability issue and other factors such as
power, temperature and frequency can also hurt the reliability. We combine multiple
system-level reliability models, including NBTI, HCI and EM, and apply these models
to obtain a trustworthy lifetime estimation of processor cores. Our work is motivated
by the fact that conventional allocation methods, such as worst-fit decreasing (WFD),
consider only the amount of workload (utilization), which may have a negative im-
pact on lifetime. We show that the proposed algorithm yields significantly benefits in
improving system-level lifetime over conventional methods, while guaranteeing timing
correctness of all real-time tasks in the system.
1.3 Organization
The rest of this dissertation is organized as follows. Chapter 2 provides the back-
ground knowledge about fundamentals of EM. It then focuses on Korhonen’s equation based
three phase modeling. Chapter 3 presents a voltage based fast EM immortality check for
general multi-segment interconnect wires. And a saturation volume based EM immortality
analysis is presented in Chapter 4. Chapter 5 presents a new full-chip EM failure analy-
9
sis tool using filter method and coupled simulation. Chapter 6 focus on machine learning
based reliability optimization. Chapter 7 presents scheduling based reliability optimization.
Chapter 8 concludes the dissertation.
10
Chapter 2
Review of three phase based EM
physics and stress modeling
2.1 Review of three phase based EM physics and stress mod-
eling
EM is a physical phenomenon of the migration of metal atoms along the direction
of the applied electrical field. Atoms (either lattice atoms or defects / impurities) migrate
along the trajectory of conducting electrons. Due to momentum exchange between lattice
atoms, hydrostatic stress is generated inside the embedded metal wire during migration
process. Before the hydrostatic stress reaches the critical level, the atomic flux flowing
caused by electron flow from cathode to anode can still balance with the atomic flux caused
by inhomogeneous distribution of hydrostatic stress. When the stress reaches the critical
level, a void and a hillock formation caused by conducting electrons will be formed at
11
the cathode end and the anode end. Indeed, when metal wire is passivated into a rigid
confinement, which is the case for copper dual damascene structures, the wire volume
changes (induced by the atom depletion and accumulation due to migration), and creates
tension at the cathode end and compression at the anode end of the line. However, if the
hydrostatic stress cannot reach critical level, the void will not be formed due to the balance
between atomic flux flowing caused by electron flow and inhomogeneous distribution of
hydrostatic stress.
A physics based three phase model is proposed in [9]. EM failure process in this
work is described by nucleation phase(tnuc), incubation phase(tinc) and growth phase(tgrowth).
In nucleation phase, stress start accumulate and a void will be formed in the end of nucle-
ation phase. The void start growth in incubation phase. In the end of incubation phase,
the cross section of via is blocked by void and can cause either early failure or late failure
of the wire. In late failure case, resistance start increase in the growth phase. TTF can be
described as the summation of three phases as following:
TTF = tlife = tnuc + tinc + tgrowth (2.1)
2.1.1 Nucleation phase modeling
Stress evolution is described by the stress-based model Korhonen’s model [6]. Tran-
sient hydrostatic stress σ(t) can be described by Korhonen’s partial differential equation
(PDE) with the following zero-flux boundary conditions (BC) and initial stress condition
(IC) as shown in the following:
12
∂σ(t)
∂t
= ∇ ·
(
DaB
kBT
(Ω∇σ − eZρj)
)
, in ΩL, (2.2)
∇σ(t) = eZρjNi
Ω
, on ∂ΩL ∩ ΓNi , i = 1, ..., k, (2.3)
σ(0) = [σ1(0), σ2(0), ..., σk(0)] , at t = 0 (2.4)
where Da = D0 exp(Ea/kBT ) is the effective atomic diffusivity. Ea is the activation energy
of the failure process, T is defined as absolute temperature, e is the electron charge, eZ is
the effective charge of the migrating atoms, ρ is the wire electrical resistivity, and j is the
current density. Ω is the atomic lattice volume. B is the effective bulk elasticity modulus.
kB is the Boltzmann’s constant. ΩL is the domain of simulated copper interconnect and
∂ΩL denotes its boundary. ΓNi is the ith flux termination boundary, where normal current
density jNi are prescribed, among kth termination boundaries. In Eq. (2.2), a uniform
D0, synthesized from different body and boundary diffusion coefficients, is used. Note that
Korhonen’s hydrostatic diffusion equations can be applied to 3D multi-segment interconnect
wires with multiple flux-termination boundary nodes, allowing any number of evolving voids
to be simulated. Fig 2.1(a) shows the stress evolution on a wire in the nucleation phase.
As shown in the figure, stress at the cathode increase as time goes on. While, stress at the
anode will decrease with time.
2.1.2 Incubation phase modeling
When stress reaches a critical level at tnuc, a void is formed. However, the resis-
tance of the interconnect remains almost the same since cross section of the via, which is
13
Length(um)
0 50 100
S
tr
es
s(
P
a)
×109
-2
-1
0
1
2
3
t=0
t=1e5(s)
t=2e5(s)
t=1e6(s)
t=2e6(s)
(a)
Length(um)
0 50 100
S
tr
es
s(
P
a)
×108
-10
-8
-6
-4
-2
0
2
4
t=1e4(s)
t=1e5(s)
t=3e5(s)
t=7e5(s)
t=1e6(s)
t=2e6(s)
(b)
Figure 2.1: EM-stress evolution of a two segment wire in the (a)nucleation phase (b) incu-
bation and growth phase.
recognized as critical volume, is not consumed by void. Once a void has nucleated when
stress reaches a critical level at tnuc, which is typically at or near a terminal node, the stress
at the void boundary will immediately go to zero. However, the stress around the void will
be close to the same stress level as immediately prior to the nucleation [6, 23]. As a result,
a very large stress gradient will be formed around the voids at nucleation time, which can
be described by [6, 24].
∇σ(t) = σ
δ
, on ∂ΩL ∩ Γvoid, at tnuc < t <∞ (2.5)
Γvoid denotes the void boundary, which can be a union of several discrete voids and δ is
the effective thickness of the copper-void boundary [23]. The ∇σ(t) will serve as a force to
move the void boundary toward the atomic flux direction. The void volume is determined
by the stress distribution on the remaining section of the wire. Specifically, the void volume
14
Vv(t) in a multi-segment wire will satisfy the following atom conservation equation [23,25].
Vv(t) = −
∫
ΩL
σ(t)
B
dV (2.6)
where ΩL is the volume of the remaining interconnect wire and V is the volume of the wire
and define as V = W × h× l, where W is width of wire and h is thickness of the wire and
l is the length of the wire. Notice that Eq. (2.6) is also true for the void growth phase we
discussed later. The incubation time can be defined as follows:
tnuc < t < ti where Vv(t) < Vcrit (2.7)
where Vcrit is the critical volume, which mainly depends on the cross section of via. In this
work, we assume the diameter of via is equal to the width of wire W . So the critical volume
can be written as W × h×W since the resistance start to change when void’s size reaches
this volume. Fig 2.1(b) shows the stress evolution on a wire in the incubation phase. Stress
at the cathode decrease to 0 very fast. Stress at anode keep decreasing until reach the
steady state. This will lead to growth of the void since the integration of negative stress
increase in Eq. (2.6). Void volume will saturate when the stress reach steady state.
Also, in the incubation phase, the void volume is not large enough to block the
cross section of the via and smaller than critical volume. At the end of incubation phase a
void will fill the cross section of the via. The void can cause either early failure or late failure
of the wire [26]. Early failure typically happens in via above lines as shown in Fig. 2.2(a).
The via will be blocked by the void and thus the connection to the upper layer will also be
15
blocked (capping layer is fabricated with dielectrics such as Si3N4 which does not conduct
current flow). In this case, the wire will fail in the end of incubation phase. Late failure
typically happens in via-below structures as shown in Fig. 2.2(b). When the void forms in
a via-below line and reaches critical size, current can still go through the barrier layer and
the resistance will increase in the growth phase.
(a)
(b)
Figure 2.2: Side-view of void formation: (a) void in a via-above line (early failure mode);
(b) void in a via-below line (later failure mode).
2.1.3 Growth phase modeling
In the void growth phase, the void continues to grow. The major difference between
growth phase and the incubation phase is that wire resistance starts to change (at least
meaningfully). The reason is that current has to flow over the highly resistive barrier which
is made with Ta/TaN for instance, whose resistivity is much higher than Cu. Specifically,
the wire resistance change can be approximately described as:
∆R(t) =
Vv(t)− Vcrit
WH
[
ρTa
hTa (2H +W )
− ρCu
HW
]
, at tinc < t (2.8)
16
where ρTa and ρCu are the resistivity of the barrier material (Ta/TaN) and copper, W is
the wire width, H is the copper thickness, and hTa is the barrier layer thickness.
2.2 Summary
This chapter presents a review of the three phase physics based EM model. Ko-
rhonen’s equation based stress evolution in nucleation phase and post void condition in
incubation phase and growth phase is presented. Besides, early failure and late failure
condition is also demonstrated in this chapter.
17
Chapter 3
Fast Voltage Based
Electromigration Immortality
Analysis for Multi-Segment
Copper Interconnect Wires
In this chapter, we present a novel and fast voltage based EM immortality check for
general multi-segment interconnect wires. Instead of using current density as the key param-
eter, the new method estimates the EM-induced steady-state stress in general multi-segment
copper interconnect wires based on a novel parameter,Critical EM Voltage, VCrit,EM .The
new VBEM method can naturally comprehend the impact of the topology of the wire struc-
ture on EM-induced stress.
18
3.1 Voltage-based EM stress estimation
In this section, we first present the compact EM model based on the solution of
steady-state modeling of EM effects, which lead to the voltage-based assessment for EM
failures for interconnect trees. We then show that the VBEM assessment is more suitable
for multi-branch interconnect trees. We compare the results from the voltage-based VBEM
with results from dynamic EM models to validate the new VBEM model on some complex
interconnect wire structures.
3.1.1 Steady-state EM-induced stress modeling
At the steady state, the atomic flux becomes zero, which means that Γσ = ΓEM .
As a result, we have
∂σ
∂x
=
eZρ
Ω
j (3.1)
On the other hand, the hydrostatic stress also leads to displacement of interconnect mate-
rials, which can be described as
∂u
∂x
= ασ (3.2)
∑
k
uikwik = 0 (3.3)
where, uik is the displacement and wik is the width at the k-th node on a branch between
nodes i and k, and M = 1/α is the Young’s modulus of the interconnect materials [27].
We note that Eq. (3.2) and (3.3) can be viewed as the atomic conservation (in
the sense that the total number of metal atoms will remain the same during the migration
19
process in the wire) in the stress kinetics. We notice that a similar conservation equation
was given in [7] for steady-state stress computation for multi-branch interconnects.
3.1.2 New voltage-based analysis for steady-state EM stress
Figure 3.1: A three-terminal wire, with the direction indicating electron flow.
One important observation is that the stress difference between two nodes at the
steady state can be expressed in terms of voltage, instead of current density [27]. To
demonstrate this, suppose that we have N nodes in the interconnect tree. We can discretize
(3.1) in space, if we ignore the initial or residual stress, leading to
σk − σi = eZρ
Ω
likj =
eZ
Ω
Vik (3.4)
where σk is the steady-state stress at node k, and Vik is the voltage or potential difference
from node i and node k. When Vik > 0, stress at node k is higher than the stress at node i
(we assume that tensile stress is positive and compressive stress is negative).
From (3.2) and (3.3), we have
N∑
k
akσk = 0 (3.5)
where ak is the total area of the branches connected to the node k. This equation represents
the conservation in the stress kinetics. Note that (3.5) does not include the initial residual
20
stress as we will consider that later. For instance, in Fig. 3.1, the area for node 1 is
lawa+ lbwb. For this case, (3.5) can be expressed as lawaσ0+ (lawa+ lbwb)σ1+ lbwbσ2 = 0.
We also notice that for this case, we have
N∑
k
ak = lawa + lawa + lbwb + lbwb = 2(lawa + lbwb) = 2A (3.6)
where A is the total area of the branches in the wire. By considering (3.4), we can compute
the stress for any node i in terms of all the node voltages with respect to the node i:
σi =
β
2A
N∑
k 6=i
akV
i
k (3.7)
where β = eZΩ , and V
i
k is the nodal voltage at node k with respect to node i (node i is
treated as the reference node).
If we select the reference node as the ground node with the lowest voltage of the
segment, then the stress at this node can be expressed as
σg =
β
2A
N∑
k 6=g
akVk (3.8)
where, Vk is the normal nodal voltage (with respect to cathode node g) at the node k in
the wire. The cathode node is where the tensile stress σg is the highest in the segment.
If we further define a virtual voltage as
VE =
1
2A
N∑
k 6=g
akVk (3.9)
21
which can be viewed as EM Voltage, then according to (3.4), the stress at any node with
nodal voltage Vi (with respect to ground node), can then be computed easily as in [27].
Furthermore, if the current density is not evenly distributed, an integration is
required to calculate the EM voltage as follows:
VE =
1
A
∫∫
V (x, y)dxdy (3.10)
Then the steady-state stress at the node i can be calculated as:
σi = β(VE − Vi) (3.11)
This equation shows that the EM-induced stress at any node can be easily deter-
mined by the difference between the node voltage and the EM voltage for an interconnect
segment. Therefore, EM stress determination is simplified to a problem of analyzing the
node potentials and combining them with geometric information of the interconnect. Node
potentials are readily available after circuit simulation and additional numerical computa-
tion is not required.
For EM effects, when the hydrostatic stress (EM-induced stress plus other existing
stresses) at a node hits the critical stress, σcrit, then the void starts to be nucleated and the
resistance of the wire starts to increase over time, which can be measured experimentally [28,
29].
Assuming that initial or residual stress is σinit, which is the same for all the
nodes (this may not be true in general, but we have used the assumption to simplify our
22
presentation), then we can determine the voltage required for the nucleation to happen
called Critical EM Voltage, VCrit,EM defined as below:
Vcrit,EM =
Ω
Ze
(σcrit − σinit) (3.12)
The significance of VCrit,EM is that it essentially is the natural, but important
extension of Blech product or Blech limit concept, which describes the EM immortality
condition for a single-segment wire, to more general multi-segment interconnect wires (we
will discuss this more in the next sub-section). Specifically, for any nodal voltage Vi, one
only needs to further check whether
VCrit,EM > VE − Vi (3.13)
If this condition is met for all the terminal voltages of the wire, then no EM failures will
happen. Typically, for a multi-branch wire, when it is stressed only by positive voltages, if
VE < VCrit,EM , no wire will have an EM failure because even the cathode node (with the
lowest voltage) will not fail in this case. Otherwise, EM failures may happen. This makes
the immortality check much more efficient as we typically only need to check one node for
an interconnect segment, instead of checking every branch as is done for the traditional
method. Note that if we have multiple nodes failing (3.13), all those nodes can lead to
nucleation. However, for the question of EM immortality, as long as one node fails as per
(3.13), the whole tree is mortal. On the other hand, if a wire fails the immortality check in
(3.13), then more detailed and time-dependent stress analysis is required to determine the
23
time-to-failure or mean time-to-failure, which still remains a difficult problem for general
multi-branch interconnect trees [30].
3.1.3 General and key VBEM equations
In this subsection, we summarize the general and key VBEM equations used for
the EM immortality check. For a given arbitrary interconnect tree with N nodes, assuming
the voltage in node i is Vi and the ground node is g and Vg = 0, then the stress at the node
i can be computed as
σi =
eZ
Ω
(
1
2A
N∑
k 6=g
akVk − Vi) (3.14)
Given the Critical EM Voltage, VCrit,EM , then EM immortality check for node i
becomes
VCrit,EM >
1
2A
N∑
k 6=g
akVk − Vi (3.15)
We want to remark that as far as immortality/mortality is concerned, we are only
interested in whether or not there exists at least one void formed in a given wire. If no void
is formed, the wire is immortal, else the wire is mortal. Hence, we only need to look at the
node with lowest voltage, the ground node or cathode node of the whole tree, as a result,
(3.15) can be simplified to
VCrit,EM >
1
2A
N∑
k 6=g
akVk (3.16)
If (3.16) fails, then transient EM analysis will be carried out to find the void location and
the nucleation time.
24
3.1.4 Relationship to the Blech limit
In this sub-section, we show how our VBEM analysis and Critical EM Voltage,
VCrit,EM is related to the existing Blech limit. We first show that the Blech product
essentially is the voltage-based EM assessment for just one wire segment. The proposed
VBEM method can actually be viewed as the general extension of this technique to multi-
branch interconnect wires.
Specifically, let L be the length of a single wire and j the current density of the
wire. Starting with the steady-state condition of EM stress shown in (3.1), which is also
called the Blech condition, if we integrate (3.1) along the line, we obtain
σ(x) = σinit +
eZρj
Ω
x (3.17)
where, σinit is the residual stress. The maximum tensile stress can be achieved at the
cathode end of the wire (x = L).
If the critical stress that the wire can withstand is σcrit, we can define the critical
product for EM failure as
(jL)crit =
Ω(σcrit − σinit)
eZρ
(3.18)
which is called the Blech limit or Blech product [5]. A wire is immortal for EM if it satisfies
jL < (jL)crit. As a result, the Blech product can help identify all the immortal wires
efficiently.
25
Notice that if ρ is the resistivity, ρLj is actually the voltage across the wire, then
(3.18) becomes
(jL)crit ∗ ρ = V bcrit =
Ω(σcrit − σinit)
eZ
. (3.19)
where V bcrit is actually the critical voltage for this single wire. As we can see, this equation is
the same as (3.12) for the single wire case. However, we want to stress that the newCritical
EM Voltage, VCrit,EM is more general than the Blech limit due to the following reasons:
first, the failure criterion is no longer associated with current density and length of a specific
wire segment. In other words, the existing Blech product, jL, does not work any more in
this case as the jL of individual branch not only depends on the critical stress, but also
depends on the wire structures as the stresses in each wire segment are not independent
as they affect each other. However, the proposed VCrit,EM concept can be applied to
general interconnect trees with multi-segment wires as one single EM immortality criterion.
Second, VCrit,EM still retains the benefits of the Blech limit as the voltage values can be
measured directly based on the pass/fail determination of wires from experiments. Third,
it agrees with the Blech limit for the single-wire segment case, which essentially validates
the proposed method. It has the potential be used as a new design rule parameter between
the foundry and design teams to replace or extend the current Blech limit parameter.
3.1.5 Steady-state analysis
In this sub-section, we first show that the proposed VBEM analysis agrees with
the results from the steady-state results computed from the finite difference method. Then
we show that it agrees with a published closed-form expression for a special case. We
demonstrate this using one example, a three-terminal wire as depicted in Fig. 3.1.
26
We first compare with the finite difference method. Let the total length be L.
The segment lengths la and lb in the three-terminal wire are equal to
L
2 . We then use this
wire segment length as the spatial step size and use the backward difference method for
boundary derivation as shown in [31]. We remark that the matrix derived by the 1D finite
difference method actually is singular. The reason is that the atomic conservation in the
stress kinetics is not observed in this case. In order to resolve this problem, Eq. (3.5) is
introduced to replace one row in the matrix. The resulting system of the equations for the
three-terminal wire case is presented as below:


1 0 0
0 0 0
0 0 1




σ˙0
σ˙1
σ˙2


=
κ
(L/2)2


−1 1 0
1 2 1
0 1 −1


×


σ0
σ1
σ2


+


2κGa
L
0
−2κGbL


(3.20)
Here, Ga and Gb are the EM driving forces corresponding to segments a and b,
respectively. We then rewrite these equations into the following format:
Cσ˙(t) = Aσ(t) +B
y(t) = Eσ(t)
(3.21)
In Eq. (3.21), E = (1, 0, 0), which means we are looking at the steady state of node 0 which
27
is the cathode node. Then, a Laplace transform can be applied and the transfer function
can be obtained as:
F (s) = E(sC−A)−1B (3.22)
Under step input, which is 1s in frequency domain, the Final Value Theorem can be used to
obtain the stress at steady state. If limt→∞ f(t) has a finite limit under a step input, the
Final Value Theorem for a function under the step input can be expressed as:
lim
t→∞
f(t) = lim
s→0
sF (s)
1
s
= F (s)|s=0 = −EA−1B (3.23)
We notice that (−A)−1 is
(−A)−1 = L
2
4κ
× 1
4


3 −1 −1
−1 −1 −1
−1 −1 3


(3.24)
Then, we can obtain the steady-state result of the system:
σsteady = F (s)|s=0 = (3
4
× 2κGa
L
+
1
4
× 2κGb
L
)
L2
4κ
=
(3Ga +Gb)L
8
(3.25)
On the other hand, we can compute steady-state stress at the cathode node (node
28
0) based on the voltage-based method as:
σsteady = VE × β = A1V1 +A2V2
2A
× β
=
jaLρ+ jaLρ/2 + jbLρ/2
4
× β = (3Ga +Gb)L
8
(3.26)
where VE =
A1V1+A2V2
2A is the EM voltage at the cathode node and β =
eZ
Ω . As we can
see, the results from the two methods are identical. This example gives another theoretical
validation of the proposed VBEM analysis method, which agrees with the steady-state
results from the finite difference method.
(a) (b)
Figure 3.2: (a) A 3-terminal wire with inactive (passive) sink, with the cathode at node 2.
(b) The steady-state stress distribution of a 3-terminal wire with inactive (passive) sink.
Now we show that the discretization schemes (by using different discretization
sizes) will not change the steady-state results. Fig. 3.3 shows stress analysis using the
simplified FDM method comparing COMSOL result of Korhonen’s equation. It can be
observed that the steady-state results under different discretization sizes are same although
their transient behaviors are different. In other words, discretization errors will not affect
the final steady-state results from FDM.
29
Figure 3.3: Different discretization steps (sizes) lead to the same steady-state stress.
In addition to analyzing the FDM steady state results considering different dis-
cretization schemes, we further show some pole information from the FDM analysis of
Korhonen’s equation. We use the same 3-terminal wire shown in Fig.3.1 as an example.
In this case, we discretize the two segments into 21 nodes (so we have 21 poles) instead of
just the 3 boundary nodes presented in (3.20). As we can see from Table 3.1, all the poles
are real poles and the EM system is stable. The EM-induced stress basically is progressing
monotonically for a given step current input. Such a monotonic nature of EM-induced
stress is important to ensure the steady state is sufficient for the EM immortality check.
Our study shows that this is the case for all the examples we analyzed. We also remark
that although there is still a possibility of overshoots for a system with negative poles, for
our case, it is rarely observed since they are only theoretically possible. Practically, it does
not seem necessary to worry about the oscillating behaviors in the stress evolution process.
30
Table 3.1: The computed poles for the 3-terminal wire
Poles values (×103)
-1.7541 -1.7248 -1.6767 -1.6107 -1.5286
-1.4319 -1.3230 -1.2042 -1.0783 -0.9479
-0.8161 -0.6857 -0.5598 -0.4410 -0.3321
-0.2354 -0.1533 -0.0873 -0.0392 -0.0099
-0.0000 – – – –
Furthermore, we notice that recent work in [32] gives the closed-form expression
(which is consistent with the measured experimental results) of steady-state stress for a
special three-terminal wire case with an inactive (passive) segment (Fig. 3.2(a)), and the
stress profile (Fig. 3.2(b)). In this case, the segment Ls has zero current (so it is inactive
or passive), while the active segment L has current flow j. The cathode is located at node
2. It was shown in [32] that the steady-state stress at any location x in the active segment
σ(x) is given by:
σ(x) =
eρZj(x− Ls)
Ω
+ σ1 (3.27)
where σ1 is the stress in the inactive sink as shown in Fig. 3.2(b), which is given by
σ1 = − eρZjL
2
2Ω(L+ Ls)
(3.28)
As a result, the stress at the cathode node (x = Ls + L) will become
σ(L+ Ls) =
eρZjL
Ω
− eρZjL
2
2Ω(L+ Ls)
=
eρZj
Ω
[
L− L
2
2(L+ Ls)
] (3.29)
On the other hand, based on the VBEM method, we have V0 = V1 = jLρ (as there is
31
no current in Ls and A0 = Lsw and A1 = Lw + Lsw, where w is the width of the two
segments (assuming two segments have the same width). Then, the stress at the cathode
node, σ(L+ Ls), can be computed by
σ(L+ Ls) = β × VE(L+ Ls) = β × A0V0 +A1V1
2A
= β ×
[
jLρ(Lsw + Lw) + jLρLsw
2(Lw + Lsw)
]
= β ×
[
jLρ
2
+
jLρLs
2(L+ Ls)
]
=
eρZj
Ω
[
L− L
2
2(L+ Ls)
]
(3.30)
Comparing (3.29) and (3.30), we can see again that the VBEM method agrees exactly with
the closed-form expression for this particular case given by [32].
3.1.6 Study of some special cases
In this sub-section, we study three multi-branch interconnect structures to illus-
trate the proposed method. The three structures consist of a straight-line 3-terminal wire
in Fig. 3.4, a T-shaped 4-terminal wire in Fig. 3.7, and a comb structure wire in Fig. 3.9.
We stress that the proposed method can be applied to any multi-branch tree-structured
interconnects.
The straight-line 3-terminal wire
Figure 3.4: Interconnect examples for EM analysis for a straight-line 3-terminal wire.
32
The straight-line 3-terminal wire is shown in Fig. 3.4. In this wire, the node 0 is
treated as the ground node. Note that current densities in the two segments are ja and jb,
which are determined by the rest of the circuit and may not be the same. Then, the EM
stress equation becomes:
V0 = 0, A0 = lawa, σ0 = βVE
V1 = jalaρ, A1 = lawa + lbwb, σ1 = β(VE − V1)
V2 = jblbρ+ jalaρ, A2 = lbwb, σ2 = β(VE − V2)
A = A0 +A1 +A2
where
VE =
V0A0 + V1A1 + V2A2
2A
=
V1A1 + V2A2
2A
where β = eZΩ .
Passive sink and passive reservoir configurations, described in [32], are typical
elements in the general interconnect tree. They are used as test cases in this sub-section.
Fig. 3.5 and Fig. 3.6 show the steady-state stress for the cases with passive reservoir (segment
a with ja = 0) and passive sink (segment b with jb = 0). Here we define “passive” and
“active” as representing zero current density and non-zero current density, respectively. The
analysis will focus on mitigating the EM effect in the active segment. It can be observed from
33
Fig. 3.5 that the passive reservoir (segment a) is characterized with higher stress compared
with the active sink (segment b). Thus, the void will first nucleate in the reservoir which
can relax the EM effect in the sink. On the contrary, as shown in Fig. 3.6, the existence of
passive sink (segment b) will lead to higher stress in the active reservoir (segment a), which
accelerates the void formation thus leading to EM failure in the reservoir. A comparison
of the steady-state stress predicted by VBEM with the finite element analysis (COMSOL)
simulation in both cases has demonstrated an excellent agreement.
(a) (b)
Figure 3.5: (a) 2-D stress distribution on wire at steady state for passive reservoir (b) EM
stress versus length at steady state.
(a) (b)
Figure 3.6: (a) 2-D stress distribution on wire at steady state for passive sink (b) EM stress
versus length at steady state.
34
T-shaped 4-terminal wire with stub
The structure of the T-shaped 4-terminal wire is shown in Fig. 3.7. In this case,
we have three segments which connect through the middle node 1. Current densities are ja,
jb, and jc on the three branches. In this case, if we make the branch c (the vertical branch),
the stub (its current density is set to zero, jc = 0), then the EM stress can also be obtained:
Figure 3.7: Interconnect examples for EM analysis for T-shaped 4-terminal wire.
V0 = 0, A0 = lawa
V1 = jalaρ, A1 = lawa + lbwb + lcwc
V2 = jblbρ+ jalaρ, A2 = lbwb
V3 = jalaρ+ jclcρ, A3 = lcwc
σ0 = βVE , σ1 = β(VE − V1)
σ2 = β(VE − V2), σ3 = β(VE − V3)
35
A = A0 +A1 +A2 +A3
where
VE =
V1A1 + V2A2 + V3A3
2A
(a) (b)
Figure 3.8: (a) Illustration of T-junction interconnect with directed graph inserted to in-
dicate electron flow. (b) Stress at the cathode end increases linearly as the stub is placed
further away from the cathode end.
The stub acts as a sink when it is close to an anode and serves as a source when
it is close to a cathode. The distance between the stub and the cathode and the length of
the stub can be important factors related to EM stress. As shown in Fig. 3.8(b), when the
stub is moved away from the cathode node (node 0), it allows more atoms to migrate to
the stub and thus creates more tensile stress at the cathode node. This is a commonly seen
structure in interconnect circuit design. We will discuss the effects of the distance between
the stub and the cathode as well as the length of the stub on EM stress. Thus, by adjusting
the stub location and length, we can adjust the stress at the cathode node to fix potential
EM failures in the physical design.
36
Interconnect wires with comb structure
Now, we study a more complicated interconnect structure, which is the comb or
ladder structure as shown in Fig. 3.9.
Figure 3.9: Comb structure interconnect examples for EM analysis.
In this comb-structured interconnect, we have N fingers, in which each finger
structure is assumed to be the same. Rsh is the sheet resistance of the metal and I is the
current along each finger. We assume that node 0 is still the ground node. LB and WB are
the length and width, respectively, for the body structures, LF and WF are the length and
width, respectively, for the fingers. i refers to i-th node on the body and i′ is the node on
the i-th finger.
The total area connected to node k, except node N , is Ak and total area connected
to node k′ is Ak
′, total area connected to node N is AN , and the total area of the whole
37
comb structures, A, can be expressed as:
Ak = 2WBLB +WFLF , Ak
′ =WFLF ,
AN =WBLB +WFLF ,
A = N(WBLB +WFLF )
Note that since the N -th node is only connected to one part of the body structure, the
total area connected with node N is different from other nodes. Current flows in the same
direction (which is opposite to the arrows in the figure), the highest EM tensile stress will
be generated at node 0 because it has the lowest potential. Hence, in this case, we only
need to check VE against the critical potential.
The potential at each node for Vk and Vk
′ can be obtained as:
Vk =
[
Nk − k(k − 1)
2
]
× I × LB
WB
Rsh,
Vk
′ = Vk + I × LF
WF
Rsh
Finally, the EM stress of the comb structure can be obtained as:
VE =
IRsh
12
×
[
(N + 1)(4N − 1)LB2
WBLB +WFLF
+
2(N + 1)(2N + 1)LBLFWF /WB + 6LF
2
WBLB +WFLF
]
,
σ0 = βVE
(3.31)
38
As shown in the above equation, three factors, N , LF and LB have a strong influence on
stress. Fig. 3.10 shows how the EM-induced stresses at the node 0 change with LF and LB.
As we can see, LB has a much larger impact on the stress than LF , which can be used for
EM optimization. Both LF and LB have nonlinear impacts on the stress and this nonlinear
trend is more clear for LF
Length(um)
0 50 100 150
N
om
al
iz
ed
S
tr
es
s
0
2000
4000
6000
8000
10000
Incresing length of LB
(a)
Length(um)
0 50 100 150
N
om
al
iz
ed
S
tr
es
s
3960
3980
4000
4020
4040
4060
4080
4100
4120
increasing length of LF
(b)
Figure 3.10: (a) EM stress validation for each comb structure interconnect with changing
LB. (b) EM stress validation for each comb structure interconnect with changing LF .
Other trends of stress change will be analyzed and discussed in the numerical
results section.
3.1.7 Current crowding impact analysis
In this sub-section, we study the impact of the current density distributions on
the proposed VBEM analysis method. The VBEM method assumes that current density is
evenly distributed. However, this is not always the case since for real interconnect wires, the
current density may vary and become larger around the corner areas (the current crowding
effect). We observe that if the width is not much smaller than the segment length, the
39
current crowding effect can be quite significant. In this case, the calculated nodal voltage
will be less accurate for stress calculations.
To consider the current crowding effects, instead of using (3.9) to compute the EM
voltage, we need to perform the area integration of voltage using (3.10) to compute the final
steady-state stress after current and voltage distributions are computed. As we will show in
this subsection, this will lead to more accurate results compared to the results using (3.9),
and that the proposed method can be extended to consider current crowding effects. The
area used for the integration is the total areas of the wires in 2D case (although we show 3D
structures of the wires) since (3.10) is for 2D integration. In principle, the integration can
be done over the 3D volume. It does not have to be restricted to 2D integration. We also
remark that the VBEM method will be more expensive for the numerical 2D integration
operation. This is due to the nature of the current crowding modeling problem. One has
to compute the detailed current and voltage distributions first using expensive numerical
methods such as the finite element method, to account for the effects of current crowding
on EM risks.
In the following, we study two wire structures to assess the impact of current
crowding effects. In Fig. 3.11, an L-shaped wire with 3 nodes is used as the first test
structure. Here, la and lb are not much larger than the width of wire w. A voltage is
applied on node 1 and node 0 is the ground node. Node 2 is connected with node 1 through
a stub with a current density of 0.
40
Figure 3.11: L-shaped wire structure with 3 nodes.
In this case, the current density distribution around node 1 is not uniform, as
shown in Fig. 3.12. COMSOL simulation is used to obtain that non-uniform current density
distribution as well as voltage distribution. At node 1, the current density is smaller than
the current density on the other part of the branch. This means that if the nodal voltage
is used to calculate the stress, it will be smaller than the actual condition. On the other
hand, if the branch is longer (compared to the width of the wire), the current crowding has
a smaller effect on the final steady-state stress.
Table. 3.2 summarizes the results for two cases for the stress values at the cathode
node. In case 1, la = 10, lb = 2, and j = 3MA/cm
2 with voltage difference 0.03V between
node 0 and 1, In case 2, la = 4, lb = 2, and j = 3MA/cm
2 with voltage difference is
0.012V between node 0 and 1. Column Crowding indicates that the current-crowding effect
is considered by using (3.10) and Err1 is the relative error of considering the crowding effect
compared to COMSOL. Column noCrowding indicates that no current crowding effect is
computed using (3.9) and Err2 is the error without considering the crowding effect.
As we can see from Table. 3.2, for the long branch (case 1), the error for the
41
method without considering current crowding is 1.53% and the error increases to 7.72% for
the shorter branch (case 2), which is quite significant. However, if the current crowding
effect is considered (as shown in Crowding column), the errors become smaller (less than
2%).
Figure 3.12: L-shaped wire structure with current-crowding effects.
The second structure we study is the U-shaped wire shown in Fig. 3.13. Its segment length
is not significantly larger than the width in this case and current density distribution can
be seen in Fig. 3.14, with the current-crowding effect being very visible at nodes 2 and 3.
Table 3.2: EM stress calculated with and without current-crowding effects
Cases COMSOL Crowding Err1 NoCrowding Err2
1 286MPa 284MPa 0.94% 282MPa 1.53%
2 97.7MPa 96.1MPa 1.60% 90.4MPa 7.72%
42
Figure 3.13: U-shaped wire structure with 4 nodes.
Figure 3.14: U-shaped wire structure with current-crowding effects.
Table 3.3: Stress values at each node for the U-shaped structure
Cases/Nodes 1 2 3 4
COMSOL 345.5Mpa 104.2Mpa -104.2Mpa -345.5Mpa
Crowding 339.0MPa 102.2MPa -102.2MPa -339.0Mpa
Error 1.74% 1.95% 1.95% 1.74%
No Crowd-
ing
339Mpa 135.6Mpa -135.6Mpa -339Mpa
Error 1.74% 23.16% 23.16% 1.74%
43
For the U-shaped structure, we apply 0.05V to node 1 and 0V voltage to node
4. The result is shown in Table 3.3 for the stress values at the four nodes. Row Crowding
indicates that the current-crowding effect is considered and row noCrowding indicates that
the crowding effect is not considered.
As we can see, for nodes 1 and 4, even when current crowding is not considered,
the error is only 1.74%, which is close to the errors of cases considering current crowding.
However, for nodes 2 and 3 if current crowding is not considered, the errors increase to
23.16%. On the other hand, the error reduces to 1.95% when current crowding is considered.
Thus, in both cases, if the wire segment is long enough (more than 10 times of wire width),
the current-crowding impact on stress values by the VBEM method is not significant.
3.1.8 Application to mesh-structured interconnect wires
In this section, we study if the proposed VBEM analysis method can be applied
to mesh-structured interconnect wires, which can be used at the cell-level layout design and
can be vulnerable to EM failure as well.
Fig. 3.15 shows a 4 × 4 mesh structure with 16 nodes. In order to calculate the
VE , nodal voltages and areas connected with each node are required. A voltage is applied
on node 4 and node 13 is set to be ground node. The voltage of each node can be measured
or analyzed by SPICE. Areas connected to each node are different at different locations.
For the nodes at corners (1,4,13,16), the area connected with them is 2WL, where W is the
width of the wire and L is the distance between adjacent nodes.
44
Figure 3.15: A 4× 4 mesh-structured wire.
For the nodes at the boundaries (2,3,5,8,9,12,14,15), the area connected with them
is 3WL. For the nodes in the middle (6,7,10,11), the area connected with them is 4WL.
The total area is the area summation of each branch, which is 48WL. Then the VE can be
obtained as:
VE = (2WL(V1 + V4 + V13 + V16)
+ 3WL(V2 + V3 + V5 + V8 + V9 + V12 + V14 + V15)
+ 4WL(V6 + V7 + V10 + V11))/48WL
σmax = σ13 = βVE
45
In our stressing set up, we apply 0.005V to node 4 and 0V (ground) to node 13.
The length of each branch is 10um and width is 0.1um. Table 3.4 shows the result of the
test case. As we can see, for all the cases, the VBEM method leads to less than 0.17% error
compared to COMSOL, which shows that the proposed VBEM method can be directly
applied to mesh-structured wires.
3.1.9 Application to IBM power grids
Figure 3.16: A small portion of a typical power supply network [2].
Besides small interconnect structures, we also validate the proposed method on
a large practical IBM power grid benchmark [2]. A portion of the power grid network is
shown in Fig. 3.16. Details of the benchmark are shown in Table 3.5. In this experiment,
critical stress is 500MPa [33]. The critical voltage is 3.694 × 10−3 V. For the IBM power
grid networks, COMSOL based FEM analysis method is too slow. Instead, we use a recently
proposed eigen-function-based stress analysis method for multi-segment interconnect trees
[34] as the baseline for comparison. The proposed VBEM and baseline methods were both
46
Table 3.4: Stress condition for mesh structure
Nodes 1 2 3 4
Voltage 2.498e-3V 2.880e-3V 3.656e-3V 5e-3V
COMSOL 0.2441Mpa -51.61MPa -157.0Mpa -339.5MPa
VBEM 0.2439Mpa -51.53Mpa -156.8Mpa 52.07Mpa
Error 0.0771% 0.1549% 0.1572% 0.1609%
Nodes 5 6 7 8
Voltage 2.116e-3V 2.498e-3V 3.079e-3V 3.652e-3V
COMSOL 52.15Mpa 0.2118MPa -78.62Mpa -156.6MPa
VBEM 52.07Mpa 0.2116Mpa -78.50Mpa -156.3Mpa
Error 0.1543% 0.095% 0.1547% 0.1572%
Nodes 9 10 11 12
Voltage 1.347e-3V 1.921e-3V 2.502e-3V 2.884e-3V
COMSOL 156.8Mpa 78.62MPa -0.2114Mpa -52.15MPa
VBEM 156.3Mpa 78.50Mpa -2.110Mpa -52.07Mpa
Error 0.1571% 0.1545% 0.1529% 0.1545%
Nodes 13 14 15 16
Voltage 0 1.344e-3V 2.120e-3V 0.250e-3V
COMSOL 338.5Mpa 157.0MPa 51.61Mpa -0.2441MPa
VBEM 339.8Mpa 156.8Mpa 51.53Mpa -0.2438Mpa
Error 0.1543% 0.095% 0.1547% 0.1572%
implemented with C/C + + for comparison. The comparison works were carried out on a
workstation with 2 Intel Xeon E5-2698 CPUs and 128GB memory.
We can see thatthe VBEM method has significant acceleration compared to the
baseline method. For ibmpg1, VBEM takes only 0.69 seconds to simulate all 689 trees,
which translates to a 1319X speed-up over the baseline method. All the trees in ibmpg1
are small trees with the maximum number of branches in a tree being 30. With larger
trees, the acceleration rate decreases, but one still sees 30.86X acceleration for ibmpg2,
4.43X acceleration for ibmpg3, and 4.91X acceleration for ibmpg4. Note that the lengths
Table 3.5: property of IBM benchmarks
Name ibmpg1 ibmpg2 ibmpg3 ibmpg4
#node 11572 61797 407279 474836
#branch 5580 61143 399201 384709
#max branch 30 192 965 571
#trees 689 462 7388 9358
#failed trees 249 91 1585 0
VBEM(s) 0.69 29.63 3999.97 4565.27
Baseline(s) 910.56 973.65 17720.97 22456.13
Acceleration 1319X 30.86X 4.43X 4.91X
47
of branches are not the same. As can be seen in ibmpg4, although it has trees with many
branches, the lengths of branches are small so all trees are immortal in this case.
We note that the baseline method is a transient EM analysis while the VBEM
method is a steady-state method. Another saving brought by the VBEM method is the
percentage of the immortal tree count over the total tree count, as the immortal trees will
not need the costly transient EM analysis. The immortal tree number can be significant
compared to the total tree count (ranging from 63.8% in ibmpg1 to 100% in ibmpg4 case.
Hence, the percentage of savings from the proposed VBEM method is problem-specific and
can be very significant.
3.2 Numerical validation results and discussions
In this section, we validate the proposed voltage-based EM (VBEM) check tool
against numerical analysis results. We validate the VBEM method against the results by
a finite element analysis (FEA) tool, COMSOL [35], based on the dynamic stress evolu-
tion described by Korhonen’s equation. In the following, we list the results for the three
structures we have discussed.
3.2.1 Results for straight-line 3-terminal interconnects
The parameters used for the validation cases are summarized in Table 3.6 and the
results for the 3-terminal wire are shown in Fig. 3.17, which shows the largest tensile stress
at node 0. We compare our results against COMSOL and another published EM numerical
simulator, XSim [36], which has been validated by measured results [36, 37].
48
Table 3.6: Parameters for each straight-line 3-terminal interconnect case
Case
Branch a Branch b
l w j l w j
µm µm MA/cm2 µm µm MA/cm2
1 25 1 1.25 0 1 0
2 25 1 1.25 175 1 0
3 25 1 1.25 175 1 0.125
4 25 1 1.25 175 1 0.625
5 25 1 1.25 175 1 1.25
6 10 0.1 10 25 1.25 1.25
7 10 0.2 5 25 1.25 1.25
8 10 0.3 3.3 25 1.25 1.25
9 10 0.4 2.5 25 1.25 1.25
10 10 0.5 2 25 1.25 1.25
Case Number
1 2 3 4 5 6 7 8 9 10
N
or
m
al
iz
ed
S
tr
es
s
0
20
40
60
80
100
120
140
VBEM
COMSOL
Xsim
Figure 3.17: Steady-state EM stress comparisons for each straight-line 3-terminal intercon-
nect case (x: case number, y: EM stress at the node 0 (cathode node)).
As demonstrated in Fig. 3.17, the results of the proposed method agree well with
COMSOL results. To further validate the new method, we compared our results against the
EM simulator XSim for steady-state results. As we can see, the VBEM method also agrees
very well with XSim, which further validates the proposed method. With the increase in
length and current density in the branch b, the EM stress increases. If the current density
in branch a decreases, EM stress also decreases.
49
3.2.2 Results for T-shaped 4-terminal interconnect
We next validate the VBEM method on the T-shaped 4-terminal interconnect
case. Again, we list the parameters used for the validation cases in Table. 3.7 and results of
3-terminal wires are shown in Fig. 3.18, which shows the largest tensile stress at the node
0.
Figure 3.18: EM stress validations for each T-shaped 4-terminal interconnect case (x: case
number, y: EM stress at the node 0 (cathode node)).
During the analysis of the T-shaped interconnect, forward and reverse currents
Table 3.7: Parameters for each T-shaped 4-terminal interconnect case (l = µm, w = µm,
and j =MA/cm2)
Branch Case 1 2 3 4 5 6
a
l 6 10 11 20 20 20
w 0.14 0.14 0.14 0.14 0.28 0.28
j 7.142 7.142 7.142 7.142 3.571 3.571
b
l 4 0 9 0 0 0
w 0.14 0.14 0.14 0.14 0.28 0.28
j 7.142 7.142 7.142 7.142 3.571 3.571
c
l 5 5 10 10 5 10
w 0.28 0.28 0.14 0.14 0.14 0.14
j 0 0 0 0 0 0
50
Table 3.8: EM stress validations for comb structure interconnect cases.
Comb Case Method
Number of fingers
1 2 4 6 8 10
Case 1 - WB = 1,WF = 1,
LB = 10,LF = 10
Proposed EM 10 23.75 71.25 145.42 246.25 373.75
COMSOL 10 23.78 71.33 146.50 245.10 375.88
Error 0.00% 0.08% 0.11% 0.74% 0.47% 0.57%
Case 2 - WB = 1,WF = 1,
LB = 20,LF = 10
Proposed EM 15 41.67 135 281.67 481.67 735
COMSOL 15 41.59 136.28 279.80 486.41 738.12
Error 0.00% 0.18% 0.94% 0.67% 0.98% 0.42%
Case 3 - WB = 1,WF = 1,
LB = 10,LF = 20
Proposed EM 15 29.17 77.5 152.5 254.17 382.5
COMSOL 15 29.42 77.19 152.07 257.51 385.73
Error 0.00% 0.85% 0.41% 0.28% 1.30% 0.84%
are provided in branches a and b. Again, we observe that the obtained results are also very
close to COMSOL results and the average error rate is 0.56% while the maximum error is
1.42% in case 2 reverse. Also, it can be seen if the total length of branches a and b increases,
the stress increases. Furthermore, the location and current density of the stub can have a
significant impact on the stress at the stub (branch c). With zero current density, the stub
can decrease the stress if it is closer to the cathode, or if its length is decreased. Also, if the
stub is placed further away from the cathode and its length is longer, the stress increases.
3.2.3 Results for comb structure interconnects
Now, we further validate the proposed VBEM method on the comb-structured
interconnect. We list the parameters used for different test cases and the predicted stress
and error rate in Table. 3.8. Fig. 3.19 shows the impact of the number of fingers, N , on the
stress at the node 0 for these three cases or configurations. As we can see, with an increase
in N , the EM stress increases super-linearly. Besides, we can observe from Fig. 3.10 that,
increasing LB and LF increases EM-induced stress at node 0. However, the increase in
LF only has a small effect on EM stress as compared to the increase in LB. Furthermore,
51
the results of the VBEM approach show good agreement with the results obtained from
COMSOL. The average error is 0.49% and maximum error is 1.3% in case 3 at N=8.
Number of fingers
0 5 10
N
o
rm
a
li
ze
d
S
tr
es
s
0
200
400
600
800
VBEM for case1
COMSOL for case1
VBEM for case2
COMSOL for case2
VBEM for case3
COMSOL for case3
Figure 3.19: EM stress validations for each comb structure interconnect case (x: number of
fingers, y: EM stress at the node 0 (cathode node))
3.3 Summary
In this chapter, we have presented a novel and fast EM immortality check for gen-
eral multi-branch interconnect trees. The new method estimates the EM-induced steady-
state stress in general multi-segment copper interconnect wires based on a novel parameter,
Critical EM Voltage, VCrit,EM . We have shown that the VCrit,EM essentially is the natural,
but important extension of the Blech product or Blech limit concept, which describes the
EM immortality condition for a single-segment wire, to more general multi-segment inter-
connect wires. Furthermore, the new VBEM analysis method is very amenable for EM
violation fixing as it brings new design knobs and capabilities into the physical design flow.
The resulting EM risk assessment method can be much easier to integrate with physical
design tools and flows. The new voltage-based EM stress estimation method is based on
52
the exact solution of fundamental steady-state stress equations. We have shown that the
proposed VBEM analysis method agrees with the results from the finite difference method
in the steady-state through one example and also agrees with one published closed-form
expression of steady-state stress for a special 3-terminal wire case, which further validates
the proposed method. Furthermore, we compared VBEM against the COMSOL finite ele-
ment analysis tool and another published EM numerical simulator XSim and it was shown
that the VBEM approach agrees with both of them very well in terms of accuracy. We
also studied the impact of the current crowding of practical interconnect wires on the es-
timated steady-state stress and showed that the effect is not significant if the length of
the wire length is much greater than its width. An extension of the VBEM method to
consider the significant current crowding effects was given and additionally, we analyzed
mesh-structured interconnect wires and demonstrated that the proposed VBEM method is
correct and accurate on these structures as well.
53
Chapter 4
Saturation Volume Estimation For
Multi-Segment Copper
Interconnect Wires
In this chapter, we present a new formula for fast estimation of the void’s satu-
ration volume for general multi-segment interconnect wires. The new formula is based on
the fundamental atom conservation at the steady-state condition of void growth phases.
The new void saturation estimation formula agrees with the existing single segment wire
saturation void volume formula and is an natural extension of the single segment case to
general multi-segment wires. In addition, we consider impacts of the void volume on final
stress distributions of the wire to further improve the accuracy of the proposed formula.
The proposed saturation void volume estimation can be applied for fast EM immortality
check for nucleated wires, which were considered to be failed wires in the past. Combined
54
EM immortality check for nucleation [20,21] mentioned in chapter 3, we propose a new EM
immortality check algorithm, which considers both void nucleation and void growth for the
first time and thus overcoming the conservativeness of the existing EM immortality check
method. Our numerical results show that the proposed formula agrees well with a published
work for two-segment cases, which are supported by experimental data. The formula is also
validated by the recently proposed physics-based 3D finite element (FEM) analysis tool for
general multi-segment interconnect wires. We also demonstrate new EM immortality check
flow can quickly identify the new type of immortal wires, which are nucleated but with
smaller-than-critical voids.
4.1 The new void saturation volume estimation for general
multi-segment wire
4.1.1 Review of the void saturation volume for single segment
As previously mentioned, computing the saturation void volume at steady state is
critical for the immortality check of a wire. After a void is formed in a segment, the tensile
(positive) stress around the void will gradually reduce to zero and the stress distribution
will become compressive (negative) as shown in Fig. 4.1.
From the physics perspective, once a void is formed, the void volume Vv(t) in a
multi-segment wire will satisfy the following atom conservation equation [25].
Vv(t) = −
∫
ΩL
σ(t)
B
dV (4.1)
55
0 5 10 15 20 25 30
Longitude (µm)
−5
−4
−3
−2
−1
0
H
y
d
ro
st
a
ti
c
st
re
ss
(P
a
)
×108
t = 0 s
t = 1000 s
t = 5000 s
t = 50000 s
t = 500000 s
Figure 4.1: The typical stress evolution on a 30 µm copper wire computed by finite element
analysis.
where ΩL is the volume of the remaining interconnect wire. Here W is width of wire and
h is thickness of the wire. In the steady-state, the stress is compressive and negative. As a
result, there is a negative sign in (4.1). For the one dimensional, single segment case with
wire length L, the steady state saturation volume (Vsat) of the void becomes
Vsat = −A
∫ L
0
σ(x)
B
dx = −AσmaxL
2B
(4.2)
where σmax is the maximum stress in steady state and A is its cross section area which can
be calculated by A = W × h. Since length of the void is typically much smaller than the
length of the wire (smaller than 1% of the segment), total length L is used here instead
of the length of the remaining interconnect without the void. Another formula considering
the length of remaining interconnect will be discussed in the next subsection.
56
If we only consider the one segment wire in the one-dimensional case as shown in
Fig. 4.2(a), Eq. (4.2) essentially says that the saturation volume depends on the product of
σmax and L. If we plot the stress versus the length for this one segment case, the product
actually is the area A1 in Fig. 4.2(b). Such observation is important as it will lead us to
the solution for the multi-segment case shown later.
(a)
(b)
Figure 4.2: (a) A one-segment wire and the direction indicate electron flow (b) Stress
integration area of a one-segment wire
57
To compute σmax, we need to go back to Korhonens equation, in the steady state,
in which we have
∂σ
∂x
=
σmax
L
= −jρeZ
∗
2Ω
σmax =− jρeZ
∗L
2Ω
= −VeZ
∗
2Ω
(4.3)
where V = j × L × ρ is the branch/node voltage between cathode and anode of the wire
which was also used to represent the immortality condition here. For a single wire segment,
the Blech’s criterion or Blech limit, is defined as (jL)crit =
Ω(σcrit−σinit)
eZρ , where σcrit is the
critical stress and σinit is the initial stress in the wire. As we can see that the immortality
condition of a wire (jL < (jL)crit) depends on the product of j (current density) and L
(length). The Blech limit actually can also be written as (jL)crit ∗ ρ = Vcrit = Ω(σcrit−σinit)eZ .
As a result, the immortality criteria can be further written in terms of critical branch voltage
Vcrit. Recent study shows that using the branch voltage for immortality check will become
much more convenient than current density for the multi-segment case as shown in [20,21].
Using (4.2) and (4.3), we get
Vsat =
AjρeZ∗L2
2ΩB
=
AVeZ∗L
2ΩB
(4.4)
where A is the cross-section area of the wire: A = W × h. If we have the initial stress
distribution, then
Vinit =
AσinitL
B
(4.5)
58
Therefore, void saturation volume, Vsat, for a single wire can be expressed as
Vsat = A(
σinitL
B
+
jρeZ∗L2
2ΩB
) = A(
σinitL
B
+
VeZ∗L
2ΩB
) (4.6)
which agrees exactly with [38]. However, this method only works for one dimensional single
wires.
4.1.2 Proposed void saturation volume for multi-segment wires
In this work, we propose a formula to estimate the saturation volume for general
multi-segment interconnect wires where each wire segment may have different widths.
Before we present our work, we would like to remark that void volume depends
on many factors such as shape, location. The void location and shapes are stochastic in
nature and depend on the capping and barrier materials used and the fabrication process.
The void shape also keeps changing over time and the void itself may migrate as well during
the growth process [39, 40]. As a result, It is very difficult to completely consider all those
effects. As with all the modeling work, we have to make some assumptions based on the
fundamental physics of EM effects and wire structures. Specifically, in our work, we assume
that the void has a square shape initially and eventually will grow to occupy across-section
of the wire with moving edges toward the atom migration direction. Eventually the void
shape will become the shape of a wire or via section [12]. For the void position, statistically,
the void most likely nucleate in or close to the cathode node of the wire. For the copper
dual damascene wire structure, if the via is not blocked by the void, the resistance does not
59
increase. So the volume above the via can be recognized as the critical void volume (void
fatal volume) [41, 42].
As mentioned above, void is typically formed around the cathode node of the wire.
Since tensile stress at cathode is the highest part of the wire, stress at this position exceeds
the critical stress first and void formation starts here. However, experimental data indeed
shows that voids can also form somewhere near the cathode nodes [43]. But the location of
the voids will not significantly affect the saturation volume estimation as the steady-state
compressive stress distributions of the wires will be similar if the void location with zero
stress is not far away from the cathode node. If void volume occupies the significant portion
of the wire, zero stress location may not be very close to the cathode. But in this case, the
wire will be mortal anyway and the estimation of saturation volume is less important.
Based on the previous observation, for the multi-segment case, based on (4.2), the
saturation volume will be equal to the area under the multi-segment wires (A1 + A2) as
shown in Fig. 4.3 for a two-segment case with equal widths. As a result, the saturation
volume can be boiled down to computing those areas as shown below:
In general, for a single segment (such as L1from Fig. 4.3(a)), the stress between
cathode and anode can be expressed as
σc − σa = (Va − Vc)eZ
Ω
=
jLρeZ
Ω
(4.7)
where σc and σa represent stress on cathode and anode respectively, and Vc and Va represent
voltages on cathode and anode respectively. At the steady state , the stress is linearly
distributed on the metal wire as shown by the shaded areas in Fig. 4.3(b). Since we need
60
(a)
(b)
Figure 4.3: (a) A two-segment wire and the direction indicate electron flow (b) Stress
integration area of a two-segment wire
to consider the width of each segment, which may be different, the problem becomes a 2
dimensional stress-area integration problem. Then the contribution from the steady-state
void volume, Vsat,i, for a segment i, can be found by computing the stress-areas (width
61
times length of the wire):
Vsat,i = h× ((−σc,i) + (−σa,i))× LiWi
2B
= h× (−2σc,i + VieZ
Ω
)× LiWi
2B
= h× (−2σc,i + jiLiρeZ
Ω
)× LiWi
2B
(4.8)
where Vi,ji,Li,Wi are the voltage difference between anode and cathode, current density,
length, and width of ith segment respectively. h is the thickness of the wire, which is the
same for all the wire segments. σc,i is the steady state stress on the cathode of segment i,
which becomes 0 where the void is nucleated. We note that except for the segment with
the void, steady-state stress on the cathode node of other segments are actually the anode
of the segment connected to them.
Now we are ready to calculate the total void volume at the steady-state, which
essentially is the void saturation volume. Given the steady-state void volume contribution
from segment i, Vsat,i in (4.8) . The total void volume can be calculated by adding all the
contributions together based on (4.1). Then we have the following results:
Proposition 1. For a general multi-segment wire, assume that a void is formed in the
cathode node of one of the segments and all the initial stress’ are zero. Then the saturation
volume of the void Vsat,total can be computed as
Vsat,total =
∑
i
Vsat,i = h×
∑
i
(−2σc,i + VieZ
Ω
)× LiWi
2B
= h×
∑
i
(−2σc,i + jiLiρeZ
Ω
)× LiWi
2B
(4.9)
62
where Vsat,i defined in (4.8) represents the saturation volume contribution from the ith
segment. If a non-zero initial stress is considered, we can add the initial stress contributions
as shown in (4.6).
Eq. (4.9) shows that the saturation volume is a quadratic function of wire segment
lengths and a linear function of wire width in the interconnect trees. As a result, the
saturation volume can be adjusted or controlled by wire width and length sizing. Vsat,total
is smaller than the critical volume, resistance will not increase. So the interconnect tree
can be considered as immortal in this case. And the interconnect tree would fail if the
saturation volume is larger than the critical volume.
However, since a void is formed at the cathode of the main branch, the length
of the main branch is reduced. In order to have better accuracy, a reduced main branch
with length of Lˆm is used in the calculation. Since the final saturation volume affects the
remaining length of the main branch, which further impacts the final saturation volume,
iterations are required to compute the final results. But for our method, we only iterate
once, which is sufficient as we show later. Thus a new formula is as follows:
Lˆm = Lm − Vsat,total
hWm
Vˆsat,total = h×
∑
i 6=m
(−2σˆc,i + VieZ
Ω
)× LiWi
2B
+ h× (VmeZ
Ω
)× LˆmWm
2B
= h×
∑
i 6=m
(−2σˆc,i + jiLiρeZ
Ω
)× LiWi
2B
+ h× (jmLˆmρeZ
Ω
)× LˆmWm
2B
(4.10)
63
Note that the σc,i also changes to σˆc,i since the stress on main branch changes. In
the following, we go through a few example to illustrate the new formula (4.9). The first
example is a three terminal wire shown in Fig. 4.3. Here, stress at node 1 and node 2 can
be expressed as
σ1 = 0− (V1 − 0)eZ
Ω
= −j1L1ρeZ
Ω
σ2 = −σ1 − (V2 − V1)eZ
Ω
= −(j1L1 + j2L2)ρeZ
Ω
(4.11)
Fig. 4.4 shows calculated stress at steady state during growth phase. The stress estimation
agrees with the results in Eq. (4.11). As a result, the void saturation can be computed as:
Length(um)
0 20 40 60 80 100
S
tr
es
s(
P
a
)
×109
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
A1
A2
Figure 4.4: Stress distribution for two-segment wire at steady state
Vsat,2seg =h× −σ1L1W1 + (−σ1 − σ2)W2L2
2B
=h× (j1L
2
1W1ρeZ
2BΩ
+
(2j1L1 + j2L2)L2W2ρeZ
2BΩ
)
(4.12)
64
If we consider the void volume effect using (4.10), the reduced main branch length
Lˆ1 can be calculated by Vsat,2seg in equation (4.12)
Lˆ1 = L1 − Vsat
hW1
Vˆsat,2seg =h× −σˆ1L1W1 + (−σˆ1 − σˆ2)W2L2
2B
=h× (j1Lˆ
2
1W1ρeZ
2BΩ
+
(2j1Lˆ1 + j2L2)L2W2ρeZ
2BΩ
)
(4.13)
For the other example we will consider the T-intersection shown in Fig. 4.5 In this
Figure 4.5: A T-shaped wire (Arrows indicate electron flow)
case a void will be formed at node 0, stress at other nodes can be calculated as:
σ1 = −V1eZ
Ω
= −j1L1ρeZ
Ω
σ2 = σ1 − (V2 − V1)eZ
Ω
= −(j1L1 + j2L2)ρeZ
Ω
σ3 = σ1 − (V3 − V1)eZ
Ω
= −(j1L1 + j3L3)ρeZ
Ω
(4.14)
Fig. 4.6 shows stress at steady-state during the growth phase.
65
Length(um)
0 50 100
S
tr
es
s(
P
a
)
×109
-2.5
-2
-1.5
-1
-0.5
0
A1
A2
(a)
Length(um)
0 20 40 60
S
tr
es
s(
P
a
)
×109
-3
-2
-1
0
A3
(b)
Figure 4.6: (a) Stress on horizontal segment 0-2; (b) Stress on vertical segment 1-3
Here, the saturation void volume can be calculated as
Vsat,3seg =h× −σ1L1W1 + (−σ1 − σ2)L2W2
2BΩ
+
(−σ1 − σ3)L3W3
2B
=h× (j1L
2
1W1ρeZ
2BΩ
+
(2j1L1 + j2L2)L2W2ρeZ
2BΩ
+
(2j1L1 + j3L3)L3W3ρeZ
2BΩ
)
(4.15)
We can also consider volume effects by converting L1 to Lˆ1 as shown in equa-
tion (4.10) to improve the simulation accuracy.
In order to validate this model, we validate the estimated saturation void volume
with a recently proposed 3D FEM analysis tool [12, 44]. This tool employs stress model
in [6], current density model, as well as Joule heating induced temperature model in the
FEM based analysis method for void growth simulation. This model has high accuracy
with the FEM analysis hence we use it to validate the model proposed in this study. The
validation results are shown in section 4.3.
66
4.2 New EM immortality check algorithm
In this section, we propose a new EM immortality check algorithm for general
multi-segment wires based on the new formula for estimating the saturation void volume
discussed in the previous section.
First, we review the recently proposed EM immortality check for nucleation, called
the voltage-based EM or VBEM method [20,21]. In the new EM immortality check model,
which can be viewed as a natural extension of the well-known Blech product [5] for immor-
tality check of a single wire. The immortality of an interconnect tree can be summarized
as
Vcrit,EM > VE − Vcat (4.16)
where Vcat is the voltage at cathode. Vcrit,EM is the critical EM voltage defined by
Vcrit,EM = 1β (σcrit − σinit) where σinit is the initial stress. VE is called EM voltage, which
is computed as VE = 12A
∑
k 6=g akVk, where Vk is the normal nodal voltage (with respect to
cathode node g) at node k of the wire. ak is the total area of branches connected to node
k. Since generally the cathode node has the lowest voltage within an interconnect wire, we
may just check the cathode node using (4.16). However, this EM immortality check does
not consider the case where a void is nucleated in a wire, but the saturation void volume is
less than the critical volume.
As a result, we propose following new EM immortality check algorithm: Given
a new multi-segment wire W with its branch current and node voltage given, the new
algorithm first checks (4.16) for the cathode node. If it passes, then the wire can be
considered as immortal. Otherwise, we compute the saturation void volume Vsat from
67
(4.9) or (4.10). If Vsat < Vcrit, the wire is still immortal. Otherwise, it is mortal. Then
numerical methods are needed to compute the time to failure [9]. The whole algorithm flow
is illustrated in the Fig, 4.7.
Figure 4.7: EM immortality check algorithm flow
The new EM immortality analysis flow in Fig. 4.7 can be integrated with the P/G
network analysis tool for more accurate full-chip EM-aware IR drop analysis. When void
volume reaches over the critical volume, the current starts to flow over the barrier layer
68
and wire resistance starts to increase. But for power grid networks with redundant wiring,
the current indeed can start to flow alternative paths in the P/G networks. As a result,
the resulting P/G networks will become a time-varying network and the wire resistance
changing over time can be modeled by the proposed method, which will be very efficient
(closed form expression) and more accurate than the previous method to further remove
the conservativeness.
4.3 Numerical results and discussions
In this section, we validate the proposed saturation volume EM model by compar-
ing it against other models for post-voiding process. In subsection 4.3.1, we first validate
the saturation volume of the two segment wire in Fig. 4.3 using the method in [45] and the
proposed method. In subsection 4.3.2 and subsection 4.3.3, we study a T-shaped wire and a
more complicated 13-segment wire against the physics-based 3D FEM analysis tool [12,44]
respectively. And in subsection 4.3.4, we compare the proposed EM immortality check
algorithm with the VBEM method [20, 21] using 2-segment wire test cases with different
lengths and current densities.
4.3.1 Model validation on two-segment wire
For a two-segment wire structure, the saturation void volume estimation was pro-
posed in [45] where segment 2 is treated as a reservoir (j2 = 0). However, in this work,
the problem is still considered as 1D where all wire segments are assumed to have the same
69
width. The saturation void volume, Vmax, computed using this method is given below:
Vmax/Wh = L1 + L2 − B
K
[
√
(
Kp
B
+ 1)2 +
2L1K
B
− 1] (4.17)
where K = eZρj/Ω. Note, Vmax/wh is actually the saturation length. This work considers
the void size formulated in the cathode of the L1 segment and its impact on the stress
distributions. In order to validate our model, we compare the void volume computed using
this model against with proposed model. Our analysis shows that the void size can be small
compared to the segment length and therefore negligible. Comparisons of the void volume
calculated using method in [45] and proposed method are presented here. The results are
shown in Table 4.1. Where V1 is the saturation volume calculated using [45], Vw/o is the
Table 4.1: Comparison of void area of two methods (wire width = 1um thickness = 0.3um)
L1 (um) L2 (um) j (A/m
2) V1 (um
3) [45] Vw/o (um
3) Vw (um
3)
10 10 1010 0.0183 0.0184 0.0183
20 10 1010 0.0485 0.0491 0.0485
10 20 1010 0.0303 0.0307 0.0303
10 10 5× 109 0.0092 0.0092 0.0092
saturation volume calculated using the proposed method without void volume effect and
Vw is the saturation volume calculated using the proposed method considering void volume
effect. Among the four test cases in Table. 4.1, the maximum difference is only 1.32%
Between V1 and Vw/o. And the result Vw considering void volume effect is almost the same
with V1. It can be seen that the proposed method without void volume effect is already
very close to the method in [45] and if the void volume effect is considered, their results
matchs exactly.
70
4.3.2 Model validation on a T-shaped wire
In this subsection we compare the saturation volume estimate by proposed method
and a physics-based 3D FEM analysis tool [12,44] on a T-shaped three segment wire. Four
test cases with their estimated void volumes generated by the FEM tool is shown in Fig 4.8.
The results of aforementioned test cases are shown in Table. 4.2.
(a) (b)
(c) (d)
Figure 4.8: (a) Case 1; (b) Case 2; (c) Case 3; (d) Case 4
Table 4.2: Comparison of void area of two methods (wire thickness = 0.1um)
Case 1 Case 2 Case 3 Case 4
L1 (um) 10 10 20 10
L2 (um) 10 10 10 20
L3 (um) 10 10 10 10
W1 (um) 1 4 1 1
W2 (um) 1 1 1 1
W3 (um) 1 1 1 1
j1 (A/m
2) 1010 1010 2010 2010
j2 (A/m
2) 1010 1010 1010 1010
j3 (A/m
2) 0 0 0 0
VFEM (um
3) 0.0125 0.0185 0.0510 0.0368
Vw (um
3) 0.0121 0.0183 0.0495 0.0356
Where VFEM is the saturation volume calculated using FEM analysis tool, Vw is
71
the saturation volume calculated using the proposed method considering the void volume
effect. We can see among the four test cases in Table 4.2, the maximum difference is only
3.2% between VFEM and Vw.
4.3.3 Model validation on a more complicated wire
Now we compare our estimation with the FEM simulation from physics-based 3D
FEM analysis tool [12,44] on a more complicated wire structure with 13 segments as shown
in Fig. 4.9. Here, the width of all the wires is 0.2 um and thickness is 0.1 um. The current
Figure 4.9: Complex multi-segment structure.
densities and lengths of all the segment are given in Table. 4.3, where Brch is branch index,
J is current density, Lth is the length of the wire. Critical void volume is 0.004 um2. As
shown in the Fig. 4.10, the void is formed on segment1.
The saturation void volume estimated by the proposed method is 0.1025um3 while
the saturation void volume calculated by FEM analysis tool is 0.1021um3. As we can see,
the results are almost identical (about 0.39% difference). We remark that the errors for this
72
Table 4.3: Parameters for the 13-segment interconnect wire
Brch# CD (A/m2) Lth (um) Brch# J (A/m2) Lth (um)
1 10× 109 10 8 15× 109 10
2 5× 109 10 9 5× 109 10
3 5× 109 20 10 10× 109 30
4 5× 109 10 11 5× 109 20
5 10× 109 40 12 5× 109 20
6 5× 109 20 13 5× 109 10
7 5× 109 20 – – –
complicated wire structure is much smaller than the previous T-shaped wire. The reason is
that the saturation volumes for the T-shaped wire has much smaller while the 13-segment
wire has much large void volume compared to the length of the wires. As a result, we
obtained the much smaller relative errors for the later case.
4.3.4 The results from the new EM immortality check flow
Last, not lest, we show that we can easily identify a new type immortal wires, which
have voids formed and still remain immortal due to the voids being the critical void volumes.
We use 2-segment wires with different lengths and current densities to perform the proposed
new EM immortality check considering both saturation volume and EM immortality check
for nucleation in section 4.2. The result is shown in Table. 4.4. where VBEM represents the
Table 4.4: Comparison of void area of two methods (wire width = 1um thickness = 0.3um)
L1 (um) L2 (um) j1 (A/m
2) j2 (A/m
2) VBEM Proposed method
case1 10 10 1010 1010 Immortal Immortal
case2 20 10 1010 1010 Mortal Immortal
case3 10 50 2010 2010 Mortal Mortal
EM immortality check for nucleation for general multi-segment wires [20,21]. As we can see
in Table. 4.4, case 2 is recognized as mortal in the VBEM method. However, in our method,
73
(a)
(b)
Figure 4.10: (a) Complex multi-segment structure result from FEM analysis tool. (b) Result
zoom to void area
we can see this wire is actually immortal as the void volume is smaller than the critical
volume. That means the VBEM method is still too conservative and the new method can
74
identify more immortal cases in the EM analysis, which can translate into further reduced
guard bands and reduced circuit areas and better performance.
4.4 Summary
In this chapter, we have presented a novel formula to fast estimation of void’s sat-
uration volume for general multi-segment wires. Presented new formula is less conservative
than the existing method since void growth phase is also considered. The new void satura-
tion estimation formula agrees with the existing single segment wire saturation void volume
formula and is the natural extension of the single segment case to general multi-segment
wires. Based on the new formula, a new EM immortality check flow is proposed including
both recently proposed nucleation-based EM immortality and the void saturation volume.
The new flow can further reduce the conservativeness of existing EM failure effect analysis
which only considers the nucleation-based EM immortality check.
75
Chapter 5
EMSpice: Physics-Based
Electromigration Check Using
Coupled Electronic and Stress
Simulation
In this chapter, we present a new full-chip EM failure analysis tool, called EM-
Spice, for physics-based coupled EM and electronic analysis. EMSpice takes power grid
netlists from Synopsys ICC flow, and outputs the failed EM wires and their resistance
changes and resulting IR drops of the power grids over the given aging time. It can be
used as a EM sign-off tool or used in the EM optimization engine for commercial EDA
physical synthesis flow. EMSpice simulator simultaneously considers the hydrostatic stress
and electronic current/voltage in a power grid network. The new method starts from first
76
principles and can solve the resulting coupled time-varying partial differential equations
in time domain to accurately stress evolution in multi-segment interconnect trees for EM
failure analysis. EMSpice simulator employs a finite difference time domain (FDTD) solver
for stress analysis for every interconnect tree for both nucleation and post-voiding phases.
At the whole power grid circuit level, EMSpice simulator also couples the EM analysis with
IR drop analysis at time domain. Thus the solver can consider the interplay among stress,
void growth, resistance change and IR drop in a single simulation framework. Furthermore,
EMSpice simulator considers both recently proposed nucleation phase immortality and in-
cubation phase immortality for the first time to remove immortal interconnect trees from
EM analysis. EMSpice simulator can work seamlessly with Synopsys IC (ICC) compiler
physical synthesis flow. EMSpice simulator reads the power grid layout from Synopsys ICC
and it can output the power grid layout with current density, hydrostatic stress, IR drop,
and void distributions for any given specific time on the power grid layout in a graphic way
for better EM failure analysis and optimization.
5.1 Related works
A few full-chip EM analysis for power grid networks have been proposed re-
cently [11, 21, 46, 47]. Their methods can predict the EM lifetime of the power grid and
obtain failed trees. Specifically, Huang et al. proposed first physics-EM model based full-
chip EM analysis method [7, 46]. The EM model in this method only considers nucleation
and growth phase. An approximate closed form obtained from analytic solution of the
nucleation phase of Korhonen’s equation is employed to estimate nucleation time (tnuc).
77
Initial current densities are used for tnuc. Resistance starts to increase immediately when
the void is nucleated, i.e. t > tnuc. If the time exceed the tnuc, a simple linear model is
used for resistance change calculation. This method indeed considers interaction between
the EM and IR drops of power grids but the compact EM models are less accurate.
To get better accuracy, Chatterjee et al. proposed finite difference method (FDM)
based full-chip EM analysis tool [11, 47]. In order to accelerate the simulation, a fitting-
based immortal wire filter was applied in this work. Trees most likely to form a void were
picked out and the FDM based method were only applied to solve these trees. The filter
uses a conservative approximation model [48] to quickly estimate the nucleation time, tnuc.
If this estimated time is less than a time threshold, the trees were picked out for further
EM simulation. Time threshold in this work [11] is estimated by the Monte Carlo process,
specifically, is the mean of samples’ TTF which has a limited normal distribution. Here
the samples are from an IBM power grid. FDM was applied to solve these mortal wires
and a model order reduction technique based on matrix exponential form was applied to
reduce simulation costs. However, this method also mainly considers the EM stress without
considering impacts from wire resistance changes of power grid networks.
Cook et al. proposed a FDM accelerated by Krylov subspace based reduction
technique [10]. This method can be applied to general multi-segment interconnect wires
with time-varying currents and temperature. To further speed up FDM, the Krylov based
subspace reduction techniques in frequency domain were applied, which leads to order of
magnitude speedup over plain FDM. But this method still considers the EM stress and
ignores the EM and IR drop interaction in power grid networks.
78
5.2 The proposed new coupled EMSpice simulation for EM
failure check
5.2.1 The new power grid EM check flow
Before presenting the coupled EM-IR drop analysis for full-chip EM check and
analysis, we first present the proposed new EM sign-off and check flow as shown in Fig. 5.1.
The whole EM check flow mainly consists of four major steps: the power grid
generation step from the EDA tool (Synopsys ICC), EM immortality filtering step, the
FDTD EM solver and linear network IR drop solver, and the EM check framework GUI. In
the power grid generation step, the power grid information is constructed from Synopsys
IC Compiler (ICC) during the physical synthesis process for a specific design. Power grid
information is dumped during power grid synthesis step in ICC flow and the P/G layout
geometrical and layer, via information as well as branch currents are also dumped at the
same time. After this, the power gird and corresponding branch current are first passed
to the EM immortality filter step (to be discussed in subsection 5.2.3). Immortal trees are
filtered out since they will never fail and resistance of these trees will never change. After
this step, all the mortal trees are passed to the coupled solver step. The coupled solver
consists of the FDTD for EM stress solver and linear network IR drop solver (which will be
discussed in section 5.2.2). In this step, hydrostatic stress on these mortal tree wires will
be simulated and void position and volume size will be calculated. All this information will
then be passed to the EM check framework graphical user interface (GUI) for interactive
user analysis. The EM check framework GUI can show the current density, hydrostatic
79
Figure 5.1: Simulation Framework for EMSpice simulator
stress, IR drop, and void distributions for a given specific time. In the following section, we
will present computing steps in detail.
5.2.2 Physics-based coupled multi-physics power grid simulation
Figure 5.2: Block diagram of coupled FDTD and IR drop power grid simulator
In order to improve the accuracy of the EM stress estimation in the full-chip power
grid level, we have to consider multiple physics effects and their interaction: the stress
evolution modeled by (2.2), atom conservation equation for void volume, and electrical
potential change over time. It was shown, when void is formed, that the void volume
(shapes) and the stress distributions of the remaining wire are correlated by the following
80
atom conservation equation [25]
VS(t) =
1
B
∫
ΩL
σ(V, t)dV, ∀t (5.1)
where VS is void volume. As a result, we can estimate void volume or sizes once we know
the stress distributions.
For electrical potential, since EM analysis is done on a time scale of at least days,
we treat the electrical potential distribution u as quasi static so it can be described by the
strong form Laplace equation
∇ ·
(
1
ρ(φ)
∇u
)
= 0, inΩL, (5.2)
u = gu, on ∂ΩL ∩ Γu, (5.3)
~n · 1
ρ(φ)
∇u = gj , on ∂ΩL ∩ Γj (5.4)
where ρ(φ) is the copper resistivity that is implied by the phase field, Γu denotes boundaries
where Dirichlet (voltage) conditions are applied and gu represent voltage/potential source,
Γj denotes the boundaries where Neumann (normal current flux) conditions are applied
and gj represents current density source. For our full-chip EM analysis, however, it will
be very expensive to solve (5.2) using the numerical approaches. Instead, we assume that
homogeneous current density distribution along the wires and simple resistance network
is extracted from the power grid network layout. The modified nodal analysis (MNA) is
applied, which will lead to
M(t)u(t) = PI(t) (5.5)
81
whereM(t) is the admittance matrix for the power grid network, which is time-varying due
to the fact that wire resistance will change with EM failure effects over time. P is the b× p
input matrix, where p is the number of inputs or the size of driving current density sources
I(t). u(t) represent the nodal voltages in the network and I(t) are the current sources
from the function blocks of the chips. From this equation, we can also compute the branch
current and nodal voltages of all the branches, which are required for both immortality
check and Korhonen’s equation.
For Korhonen’s equation, the FDTD method is carried out for each multi-segment
interconnect tree for detailed stress analysis over time. We follow the similar work [10]
without the model order reduction feature. After the finite difference discretization process,
the partial differential equation in (2.2) with boundary conditions will be converted to the
following linear time invariant (LTI) system:
Cσ˙(t) = Aσ(t) +PI(t), (5.6)
σ(0) = [σ1(0), σ2(0), ..., σn(0)]
C,A are n×n matrices. P is the b× p input matrix, which is also used in (5.5). Note that
σ(0) is the initial stress at time 0. For each new simulation step, the stress from previous
simulation will be used as initial condition. Now we are ready to present the final coupled
82
EM-IR drop partial differential equations which can be written as follows:
Cσ˙(t) = Aσ(t) +PI(t),
Vv(t) =
∫
ΩL
σ(t)
B
dV,
M(t)× u(t) = PI(t),
σ(0) = [σ1(0), σ2(0), ..., σn(0)] , at t = 0
(5.7)
The three equations are coupled and solved together as shown in Fig. 5.2. Linear
network IR drop solver passes time-dependent current density information and P/G layout
information to the FDTD EM solver. Once the voids are formed and IR drop occurs in
the power grid, the current at each time step will be different. The FDTD EM solver will
provide the IR drop solver with new resistance information for wires with voids. As we can
see, these two simulations are coupled together, and wire current and resistance depend on
each other for each mortal wires. Note that C,A matrices which depend on wire structures
are time-independent in the coupled equation.
Wire length change with void growth is also considered in the coupled simulation.
The length of segment with void is updated each iteration and become shorter with void
growth. This change is very small, and do not have very significant effect on the stress
distribution on the tree since the void size generally is less than 1% of the total size of a
tree.
Impact of resistance change of power grid networks may loosen the EM constraint
significantly since current densities of the interconnects reduce. However, this effect is
not thoroughly studied in aforementioned methods in section 5.1. Huang’s method [7, 46]
83
considers this impact in the growth phase with linear model. But the accuracy of compact
linear model is not high enough for resistance estimation. Besides, Chatterjee’s method [11,
47] and Cook’s method [10] focus on EM stress analysis and ignore the impact of resistance
change caused by EM and IR drop interaction in power grid networks.
5.2.3 EM immortality filtering
FDTD based EM simulation is still computationally intensive for full chip EM
analysis. With the sheer number of interconnect trees in a chip, FDTD is not efficient.
However, in most designs, the tensile stress of many trees will not exceed the critical level
and no void will be formed on these trees. Besides, trees that do nucleate a void do not mean
they will fail since the void may not be large enough to exceed the critical void volume. If a
tree is either nucleation phase immortal or incubation phase immortal, the resistance does
not change. In other words, these trees will not have any impact on the IR drop and FDTD
EM simulation is not necessary on these trees. We first present the immortality filter flow.
The immortality filtering flow
The immortality check flow is shown in Fig. 5.3, which consists of two filtering
algorithms. First, a tree in the power grid is checked by nucleation phase filtering to see if
voids can be nucleated. If no void can be nucleated, the wire is immortal. If voids will be
nucleated, the tree is then passed to incubation phase immortality filter. If voids volume
cannot exceed critical void volume, the tree is still treated as immortal. Only the mortal
trees whose resistance will change are simulated with coupled EM-IR simulation.
84
Figure 5.3: Block diagram of EM immortality filter
We remark that the Blech’s product was widely used to find the immortal wire
before. However, it only works for single-segment [5]. For multi-segment wires, an empirical
curving-filtering based approximation method has been proposed [11,47] which is discussed
in section 5.1. However, the accuracy of this method depends on the training of the models.
In contrast, the proposed EM filtering method is based on physics-based models for the
immortality and thus is more accurate and predictable for over a wide range of stress
conditions and technology nodes. Also we consider both nucleation and incubation immortal
cases for the first time. In the section, we briefly discuss the two filtering algorithms.
85
Nucleation phase immortality filtering
If the stress on the cathode of a tree, at steady state (σsteady) , is lower than
the critical stress (σcrit), then the tree is considered to be EM nucleation phase immortal.
Based on the well-known steady state analysis method Blech product [5], if the current
density and the wire length are not large enough, σsteady cannot exceed σcrit. However, this
method is only suitable for a single wire with only one branch. Recently, a voltage-based
EM nucleation phase immortality analysis method for multi-branch interconnect trees has
been proposed [20, 21].
In this method, an EM voltage (VE) which is proportional to stress at the cathode
node (σc) is calculated as
VE = 1
2A
∑
i 6=c
aiVi (5.8)
where Vi is the normal nodal voltage (with respect to cathode node c) at node i, ai is the
total area of branches connected to node i and A is the total area of the wire. That equation
is derived from the equation for atomic conservation equation. Relationship between σc and
area of segments as well as corresponding nodal voltages is shown in that equation. With
the voltage of the ith node(Vi), steady state stress at that node (σi) can be calculated as
σi = β(VE − Vi), where β = eZΩ , e is elementary charge. A critical EM voltage Vcrit,EM is
defined by
Vcrit,EM = 1
β
(σcrit − σinit) (5.9)
where σinit is the initial stress.
86
EM failure will not happen if cathode node can pass the immortality check since
it has the lowest voltage among all the nodes on a tree. Following equation is the condition
of immortality of the tree:
Vcrit,EM > VE − Vcat (5.10)
where Vcat is the voltage at the cathode.
Incubation phase immortality filtering
Voids are formed on trees that do not pass the nucleation phase immortality. After
a void is formed, it will keep growing until saturated. Void saturation happens when two
kinds of flux balance with each other. One is the flux of atoms previously located in metal
which is consumed by the growing void and the other is the back flux of atoms generated by a
gradient of growing stress. If the void volume is smaller than the volume of the intersection,
which is recognized as critical volume Vcrit, current can still flow through the copper wire
as the wire cross-section is not blocked by the void. Once the void size is large enough
and occupies the wire cross-section, current has to go through the liner whose resistivity is
much higher than copper and the resistance of the wire will increase. As can be seen, if the
saturation volume is smaller than critical volume, the wire is still immortal although a void
is formed. This case is called EM incubation phase immortal.
In order to determine if a wire is immortal, a model predicting the saturation
volume Vsat was proposed in [49] shown as following
87
Vsat =
∑
i
Vsat,i = h×
∑
i
Asat,i
= h×
∑
i
[(−2σc,i + jiliρeZ
Ω
)× liwi
2B
]
= h×
∑
i
[(−2σc,i + VieZ
Ω
)× liwi
2B
]
(5.11)
where h is thickness of the wire and Vsat,i, Asat,i, σc,i, Vi, ji, li and wi represent the
contribution to void volume, contribution to void area, stress at the cathode, current density,
voltage and length and width of the ith segment respectively. For the segment in which
a void has nucleated, σc,i is 0 on the cathode where the void is nucleated. Except for the
segment with the void, steady-state stress on the cathode of other segments are the same
as the anode of the segment connected to them.
As shown in the Eq. (5.11), voltage and width on each branch i can contribute to
the void volume. So saturation void volume can be adjusted in order to reach incubation
immortality by modifying the voltage and width of the branches.
In order to know if the wire is EM incubation phase immortal, saturation volume
is compared with the critical volume, specifically, it should satisfy the following equation
Vcrit > Vsat (5.12)
We remarked that our incubation phase filtering is based on the assumption that
void is nucleated in cathode node, which has largest tensile stress. However, for cases that
88
void nucleated outside cathode nodes, the immortality study in this case needs to be future
investigated.
5.3 Numerical results and discussions
In this section, we present the numerical simulation and comparison results of the
proposed EMSpice simulator. In our implementation, EM immortality filtering and Linear
network IR drop solver are implemented in C++, FDTD EM analysis engine/solver was
implemented in Matlab and EM check framework GUI is implemented in Python 3.6.0 with
Matplotlib. The experiments were carried out on a Linux server with dual 3.3-GHz Xeon
processors and 316GB memory.
In order to validate our work, we use two test cases. The first test case is the
power grid of the Cortex-M0 DesignStart processor, named (Cortex). This is a 32-bit
processor that implements the ARMv6-M architecture [50]. This processor is synthesized
using Synopsys Design Compiler, and is placed and routed with Synopsys 32/28nm Generic
Library [51]. The power grid of Cortex has two layers, and there are 68 trees in total.
The second case is also microprocessor design called ChipTop. ChipTop was pro-
vided by Synopsys Electronic Design University Program [52]. It is a low power design
processor architecture, which contains cache blocks, I/O standard cells and digital stan-
dard cells. It is also synthesized using Synopsys Design Compiler, and is placed and routed
with Synopsys 32/28nm Generic Library [51]. ChipTop has a 4 layers power grid with 912
trees.
89
(a) (b)
Figure 5.4: (a) Part of power grid of Cortex in Synopsys ICC; (b) Part of power grid of
Cortex in proposed EM GUI
Fig. 5.4 shows the power grid layout of Cortex from Synopsys ICC and the same
power grid in the EM check framework GUI (graphic user interface) system. In the
Fig. 5.4(a), green boxes are standard cells and the mesh structure is the power grid. The
zoomed area of this power grid is also shown in Fig. 5.4(a). Fig. 5.4(b) is the power grid
from the EM check framework GUI. EM check framework GUI can also display the layout
in an interactive way like Synopsys ICC. Zoomed power grid is also displayed in Fig. 5.4(b).
Power grid information obtained from Synopsys ICC is then fed into the proposed
EMSpice simulator. For power grid network for Cortex, the EMSpice simulation takes about
67.14 second to finish. It takes 381.2 seconds for the simulation of power grid of ChipTop.
In the following, we present more analysis and comparison results for these two cases.
5.3.1 Accuracy comparison against existing EM analysis methods
We first compare the proposed EMSpice method with existing EM analysis meth-
ods in terms of accuracy. We verify the accuracy of the FDTD EM analysis engine used
Table 5.1: Void size comparison between EMSpice (FDTD) and COMSOL on a wire from
Cortex
Time after void formed 1 year 3 year 7 year
Void size from EMSpice (um3) 3.63 7.38 12.04
Void size from COMSOL(um3) 3.69 7.53 12.16
Error 1.65% 1.99% 0.98%
90
in the EMSpice for both nucleation and post-voiding phases on a few wire trees from the
Cortex design.
We first compare the EMSpice with a published EM simulator XSim [36] for stress
analysis of multi-segment wires in the nucleation phase. XSim is widely used to calculate
the hydrostatic stress evolution in an interconnect which is assumed to be confined within
diffusion barriers and it has been validated by the measured results. We take a tree from
Cortex for stress comparison in nucleation phase. Fig. 5.5 shows the stress comparison
between EMSpice and XSim at different time. Average error between these two method is
only 0.53%, which is very small and can be ignored.
um
1000 1050 1100 1150 1200
P
a
×109
-2
-1
0
1
2
3
4
5
EMSpice, t = 1year
Xsim, t = 1year
EMSpice, t = 2year
Xsim, t = 2year
EMSpice, t = 7year
Xsim, t = 7year
Figure 5.5: Stress comparison between the EMSpice and Xsim on a wire from Cortex
y
For EM post-voiding phase comparison, we compare EMSpice with the simulation
result of a finite element analysis (FEA) tool, COMSOL [35] based on the same partial
different equation and post-voiding boundary conditions in [24]. The void volume is also
computed by using atom conservation equation (Eq. (2.6)). A 2D wire structure was used
in the COMSOL simulation. Fig. 5.6 shows comparison on void size over time between
91
EMSpice and COMSOL, which shows that their results match very well.
Time (month)
0 200 400 600 800
V
oi
d
S
iz
e
(u
m
3
)
0
2
4
6
8
10
12
14
16
18
20
FDTD
COMSOL
Figure 5.6: Void size comparison between EMSpice and COMSOL on a wire from Cortex
design
Table 5.1 shows detailed void size comparison between the two methods on different
time for the aforementioned wire in Fig. 5.6. As can be seen, the error is less then 1.99%
and accurate enough for the void volume estimation.
We want to remark that Korhonen’s equations for EM failure modeling is well
accepted in the research communities as they are validated against the silicon data or
Table 5.2: Parameter for the EM simulation
Name Symbol Value
Atomic lattice Volume Ω 1.19× 10−29m3 [53]
Critical stress σcrit 500Mpa [33]
Critical EM voltage Vcrit,EM 3.69× 10−3V [21]
Critical volume Vcrit depends on via size
Effective bulk elasticity modulus B 1× 1011Pa [54, 55]
Effective charge number Z 10 [56]
Temperature T 373K
92
explain well the EM failure processes observed in experimental data in many published
works [6,24,39,48,57–59]. XSim [36] was also tested against the measured results from the
properly design wire test structure. As a result, comparison aginst FEM analysis based on
Korhonen’s equation and XSim will provide good accuracy valiation for EMSpice.
5.3.2 Filtering and coupled simulation results of Cortex
Firstly, efficiency of EM immortality filter is employed to reduce time of EM sim-
ulation of Cortex. This power grid has 68 trees in total. Among all of these trees, 42 trees
are filtered out by nucleation immortality filter and 15 mortal trees among them are filtered
out by incubation phase immortality filtering algorithm. So the proposed EM immortality
can filter out 78% trees, which translating to accelerating the whole simulation by almost
5 times. Fig. 5.7 shows the mortal trees (highlighted with current density distribution in
the whole chip layout). As we can see, most of the mortal wires are located around the
peripheral of the chip.
Figure 5.7: Current densities of the mortal trees. (X and Y in um and current densities in
MA/cm2)
93
After the filtering of the immortal wires, we start to analyze the lifetime of mortal
wires in the coupled EMSpice simulator. As time passes, the stress starts increasing and
voids are formed. Fig. 5.8 shows the stress evolution and void formation in the power grid.
(a) (b) (c)
Figure 5.8: (a) Stress distribution and void formation at 8th year; (b) 12th year; (c) 20th
year for Cortex design. (X and Y in um and stress in Mpa)
As can be seen from the Fig. 5.8 in the 8th year, all the wires are still in the
nucleation phase. Stress at some hotspots has increased to a high level. In the 12th year,
some voids are formed and stress near to the void starts decreasing. These wires get into
incubation phase and void sizes at that time are very small and not enough to block vias
cross sections. In the 20th year, more voids are nucleated and most of them are in the
growth phase. Some voids have grown to the saturation volume. As aforementioned in
section 2.1.2, stress accumulated on the cathode before void nucleation and decrease to zero
very fast after the void formed. A set of zoomed figures of void growth process is shown in
Fig. 5.9.
94
(a) (b) (c)
Figure 5.9: (a) Zoomed stress distribution and void formation at 8th year, (b) 12th year,
(c) 20th year of Cortex design.
Fig. 5.10 shows the voltage drop on the full-chip power grid. As we can see, there
are significant voltage drop in two areas where the void forms. It also shows that the void
formation causes the resistance change and leads to large voltage drop.
Figure 5.10: Voltage drop on the power grid of Cortex
Figure 5.11: Resistance change over time on a mortal tree on power grid of Cortex
Fig. 5.11 shows resistance change over time of one of the mortal trees experiences
the late failure effect. As shown in the figure, the void is formed at the end of nucleation
95
phase. However, the resistance does not change at that time. At the end of incubation phase,
the void occupies the cross section of the via, and resistance starts to change since this is a
late failure case. The resistance change pattern shown in Fig. 5.11 based on the three-phase
EM model mentioned in chapter 2, is consistent with the experimental observation [60].
When the increment of resistance reaches 10% (note that this failure criterion is used for
illustration purpose), this tree is marked as failed. Finally, the void saturates and resistance
change stops. Node voltages keep changing with time after void volume exceeds a critical
Figure 5.12: Voltage drop change over time for different nodes on power grid of Cortex
level. As shown in Fig. 5.12, voltage changes with void growth and a sudden voltage drop
happens once a new void exceeds the critical level. Compared with Fig. 5.12 and Fig. 5.13,
it can be observed that the time when sudden voltages drop matches well with the time
when voids exceed critical volume.
Figure 5.13: Resistance change over time of failed interconnect tree wires on power grid of
Cortex
96
Fig. 5.13 shows the resistance of some failed tree wires. We can see that trees
failed at different times. Both early failure and late failure can happen in this power grid.
Early failure cause open circuit and resistance goes to infinite. Resistance of late failure
trees increase gradually and finally stop when voids reach saturation volume.
5.3.3 Filtering and coupled simulation results of ChipTop
EM immortality filter is also applied for power grid of ChipTop which has 912
trees in total. As shown in Fig. 5.14, there are only 50 failure trees on the power grid.
Among all of these trees, 695 trees are filtered out by nucleation immortality filter and
177 immortal trees are filtered out by incubation phase immortality filtering algorithm.
After filtering, there are only 5.48% mortal trees which need further EM analysis. The
immortality filtering dramatically reduce the simulation by about 20 times. Fig. 5.14 shows
the mortal trees (highlighted with current density distribution in the whole chip layout) of
ChipTop design. We observe that many mortal tree wires are thin wires. The reason is
that these thin wires has small critical volume, although their saturation volume is not that
large, they are still mortal. On the other hand, some large wires have larger critical volume.
Even though their saturation volume is larger than thin wires, they are still incubation
phase immortal.
Fig. 5.15 shows void formation process of ChipTop. Voids are not nucleated at 8th
year. This time all trees are in the nucleation phase. As can be seen in the figure, stress
are accumulated at cathodes. At 10th year, voids start to be formed and many wires enter
incubation phase. We can see the stress near voids decreasing to zero in this figure. At
17th year, most voids are formed and some of them already reaching saturation volume.
97
Figure 5.14: Current densities of the mortal trees on power grid of ChipTop. (X and Y in
um and current densities in MA/cm2)
Zoomed figures of voids are shown in Fig. 5.16. As can be seen in the figure, the voids are
gathered in the edges of the power grid. Similar with the Cortex, the lines on the edge have
larger current densities and they are more vulnerable. Based on the result of these cases,
in the design, wires on edges of the power grid need to be optimized to avoid EM failure.
(a) (b) (c)
Figure 5.15: (a) Stress distribution and void formation on power grid of ChipTop at 8th
year; (b) 10th year; (c) 17th year. (X and Y in um and stress in Mpa)
Table 5.3: Comparison of failed tree number of three methods on the power grid of Cortex
Simulation time 8 years 12 years 20 years
Black’s method
Failed tree number 24 24 24
Failed tree percentage 35.3% 35.3 % 35.3%
Huang’s method
Failed tree number 11 13 16
Failed tree percentage 16.2% 19.1 % 23.5%
EMSpice simulator
Failed tree number 0 2 9
Failed tree percentage 0% 2.9 % 13.2%
98
(a) (b) (c) (d)
Figure 5.16: Void distribution on the ChipTop at 17th year on (a), left up corner(b) right
up corner, (c) left down corner, (d) right down corner
Fig 5.17 shows the voltage change on the power grid of ChipTop. The voltage
change significantly on the location where voids are formed. As can be seen, hill of voltages
changes are in the four corners of power grid. Both early failure and late failure contribute
to the voltage change in these locations. Fig 5.18 shows resistance change of some typical
Figure 5.17: Voltage drop on the power grid of ChipTop
wires. In the figure, two wires are early failure and the other two wires are late failure.
Here two early failure wires are with smaller current. They fail faster since the cross section
of the vias on these wires are also smaller. The other two late failure wires have larger
Table 5.4: Lifetime comparison of the three full-chip EM analysis methods on the power
grid of Cortex
Trees name
Time to Failure (years)
Black’s method Huang’s method EMSpice simulator
Tree 1 8.33 immortal immortal
Tree 2 4.45 9.17 immortal
Tree 3 3.08 6.16 16.67
Tree 4 2.17 4.50 14.58
Tree 5 1.67 3.67 10.41
99
current. They are wider wires with large cross section of the vias and long incubation
phases. So their resistance change happens later. As can be seen in the Fig 5.19, wires with
less current have less contribution to voltage change even though they have early failure and
have significant resistance change. Voltage change caused by these wires are only about 1%
of the voltage change caused by wires with large current density. However, for wires with
large current, there is very obvious voltage change when their resistance starts increasing.
Figure 5.18: Resistance change of mortal wires on the power grid of ChipTop
Figure 5.19: Cathode voltage drop of mortal wires on the power grid of ChipTop
Table 5.5: Comparison of failed tree number of three methods on the power grid of ChipTop
Simulation time 8 years 10 years 17 years
Black’s method
Failed tree number 227 227 227
Failed tree percentage 30.4% 30.4% 30.4%
Huang’s method
Failed tree number 98 144 165
Failed tree percentage 10.7% 15.8 % 18.9%
EMSpice simulator
Failed tree number 0 17 40
Failed tree percentage 0% 1.9 % 4.4%
100
5.3.4 Comparison against existing full-chip EM analysis methods
In this section, we compare the proposed EMSpice simulator against one existing
full-chip EM analysis method. We compare the EMSpice against two methods: the tradi-
tional Black’s method in which the time to failure is calculated for each segment based on
Black’s equation [4] with the same parameters used; we also employ Huang’s method [46]
as another baseline.
The comparison result of power grid of Cortex is shown in Table 5.3 and result of
power grid of ChipTop is shown in Table 5.5.
In the Black’s method, tree failure only depends on the current densities on that
tree. It cannot consider time-varying current change in the power grid. Compared with
other methods, Black’s method leads to most conservative results. Around 35% of the
trees in the power grid of Cortex and 30.4% of the trees in the power grid of ChipTop are
estimated to be failed by this method. Huang’s method considers both nucleation phase
and growth phase. But this method still is less accurate as compact models are used for
both nucleation and growth phases. It marks 23.5% trees in the power grid of Cortex and
18.9% trees in the power grid of ChipTop as the failed trees in the 20th year. The proposed
EMSpice simulator shows only 13.2% failed tree wires in the power grid of Cortex and 4.4%
failed trees in the power grid of ChipTop in the 20th year. As we can see, at the 20th
years, the number of the failed trees by EMSpice simulator is 63.5% less than the Black’s
method and 43.8% less than Huang’s method in the Cortex case. There are 76.7% less than
the Black’s method and 66.7% less than Huang’s method in the ChipTop case. As we can
see, EMSpice method can significantly reduce the over-conservations from the existing two
101
methods, which can lead to more aggressive design with less guard bands, which leads to
better performances under the same design costs.
Furthermore, we also look at the lifetime of some immortal and failed trees. Their
lifetime given by the three methods are shown in Table 5.4. As we can see, the proposed
EMSpice still gives the long lifetime estimation among the three methods for the five tree
wires. The difference can be quite significant (up to 6.71X longer). Further, we observe that
the immortal tree wire marked in the EMSpice method can be considered as mortal tree
wires in both the Black’s method and Huang’s method. The reason is that the EMSpice
method considers the unique incubation phase immortality case and it can identify the
nucleated wires as immortal as long as their void sizes are small enough.
We want to remark that longer lifetime estimation for EM analysis does not alway
mean better accuracy as recent work shows that Black’s model can be also optimistic as
well [11]. But in general, Korhonen’s based EMmodels, more physics-based EM immortality
filtering, and more accurate post-voiding analysis tend to lead to more accurate results,
which typically are less conservative than existing Black-Blech based EM models and other
compact models as shown in many recent works [9, 11, 46, 47, 61, 62].
Table 5.6: Lifetime comparison of the three full-chip EM analysis methods on power grid
of ChipTop
Trees name
Time to Failure (years)
Black’s method Huang’s method EMSpice simulator
Tree 1 12.14 immortal immortal
Tree 2 6.39 12.77 immortal
Tree 3 4.35 9.58 12.1
102
5.4 Summary
In this chapter, we have proposed a novel full-chip electromigration verification for
power grid network of nanometer VLSI chips. The new method simultaneously considers
two physics effects in interaction: the hydrostatic stress and electronic current/voltage in
a power grid network. It can solve the resulting coupled time-varying (partial) differential
equations in time domain to accurately predict time-to-failure for a power grid in the entire
chip. The new tool reads the power grid layout from Synopsys ICC. Then an immortality
filter considering both nucleation phase immortality and incubation phase immortality is
applied to remove immortal interconnect trees from EM analysis. A finite difference time
domain solver is employed for stress analysis for every interconnect tree. Metal atom con-
servation equation is used to estimate the void volume change and resistance increments
over time. The EM analysis is coupled with IR drop analysis of a whole power grid net-
works at each time step so that we can consider the interplay among stress, void growth,
resistance change, and IR drop in a single simulation framework. Accuracy of EMSpice has
been validated by comparing with a published EM simulator, XSim, for nucleation phase,
and finite element method based COMSOL for post-voiding phase. The comparison results
show that EMSpice agrees well with both methods very well.
103
Chapter 6
Dynamic Reliability Management
for Multi-Core Dark Silicon
Processor Based on Deep
Reinforcement Learning
In this chapter, we present a new deep reinforcement learning optimization frame
work considering both hard and soft errors for dark silicon processors. Our work is moti-
vated by the lack of online optimization tools and low efficiency of traditional Q-learning
algorithm in run time operation. We formulate our DRM problem as minimizing the energy
consumption subject to the reliability include hard and soft errors, power, performance and
thermal constraints. The hard error model is based on a recently proposed physics based
EM model. Hard and soft errors are obtained by recently proposed reliability models.
104
Power, performance are acquired by X86 based micro-architecture model [22] and thermal
information is given by temperature simulators. The control operation we use dynamic
voltage and frequency scaling (DVFS) and ON/OFF pulsing action. DRL method is used
to select the most cost effective operation.
6.1 Review of the EM and soft error reliability models
For the system level EM analysis, we assume that each core can have its own
voltage (voltage island) and frequency settings, and DVFS can be done locally for each
core. For the micro-architecture of EM failure we focus on power grid since it is the most
vulnerable part to EM failure. A simple mesh-structured power grid is generated for each
core, and each of them has its power island with its power regulator. We assume that power
grids for each core are the same and are less coupled as far as EM effects are concerned.
Power for each core is estimated by a power modeling framework. For the device level
power distribution, we assume that the power is evenly distributed in the power grid inside
each core. Current on each branch can be obtained from the power on each interconnect
tree. Lifetime of each tree is estimated by aforementioned three phase model [9] using
the corresponding current and temperature. When the voltage drop caused by resistance
increase of the power grid reaches a certain level, the core is considered as fail.
In order to consider lifetime impacts of different tasks, a system-level model is
proposed in [63]:
MTTFEM =
1
(
∑n
m=1(∆tm
1
MTTFm
))/T
(6.1)
105
where MTTFm is the actual MTTF (mean time to failure) under the m-th task for ∆tm
period, assuming the chip works through n different power and frequency settings and
T =
∑n
m=1∆tm.
6.1.1 Soft error reliability model considering DVFS impacts
Soft errors, or single event upset, are defined as the transient faults inside the
logic or memory on a chip, and result in an incorrect system output. The soft errors can
be caused by cosmic radiation, alpha particle decay, and thermal neutrons. Soft error rate
(SER) is the rate in which a chip or system encounters soft errors and typically can be
expressed as the number of failures in the given time. Although there is still a lack of
consensus on the exact soft error rate (SER) of specific chips and systems, it is obvious
that the SER per chip is practically increasing due to the increasing number of components
or cores on a chip. Recently a new exponential soft error models have been introduced to
account for those effects [64, 65].
For our problem, we employ an existing exponential model considering DVFS
effects on soft error rate, which assumes that the radiation-induced failure follows a Poisson
distribution, so the average soft error rate can be express as terms of operating frequency
f , supply voltage Vdd, where SER0 is the average failure rate at the maximum frequency
fmax and voltage Vmax, (so, fmin < f < fmax , Vmin < V < Vmax) in (6.2) [64].
SER(f, Vdd) = SER0e
d(fmax−f)
(fmax−fmin) (6.2)
where d is an architecture dependent constant, which is the sensitivity of failure rate with
106
DVFS. We also employ the previous work to model the relationship between operating
frequency and supply voltage to further simplify (6.2) from [66].
f = β
(Vdd − Vth)2
Vdd
(6.3)
where β is a technology-related constant, and Vth is the threshold voltage. By substitut-
ing (6.3) into (6.2), DVFS-aware SER equation can be derived as the function of only supply
voltage Vdd only [65]:
SER(Vdd) = SER0e
d(fmax−βVdd−2Vth+
V 2
th
Vdd
)
fmax−fmin
(6.4)
6.2 DRL framework for Dark silicon optimization
In this section, the DRL method based framework for the dark silicon will be
introduced. The background of DRL method will be reviewed and the DRL method based
framework applied in this work will be discussed in detail. The framework can handle high
dimension state and action space and provide a fast and effective solution for energy saving
of dark silicon processor.
6.2.1 Background of deep reinforcement learning
Reinforcement learning is the learning method whose goal is to map situations to
actions so as to maximize the reward or minimize the penalty [67]. It can handle problems
with stochastic transition without any adaptation and is a method able to converge close
to the optimal solution of a state-action function (s, a) for an arbitrary policy. After each
107
action a Q-value (Q(s, a)) will be calculated with reward/penalty.
Qt+1(s(t), a(t)) = Qt(s(t), a(t))+
α(t)×
(
PT (t+ 1) + γmin
a
(∀Qt(s(t+ 1), a))
)
−α(t)×Qt(s(t), a(t))
(6.5)
In Eq. 6.5, α(t) is learning rate between 0 and 1 for new Q-value calculation. s(t + 1) is
the state after taking action a(t), and Qt(s(t+1), a) are all possible actions’ Q-values from
future state. Discount factor γ (between 0 and 1) affects the importance of future penalty.
min(∀Qts(t+ 1), a) is the estimate of the optimal future value. The difference between
old Q-value (Qt) and learned value (PT (t + 1) + γmin
a
(∀Qt(s(t + 1), a))) updates the new
Q-value (Qt+1) with the learning rate.
As we can see in Eq. 6.5, Q-value only depends on the current state and action
takes. This Q-value is stored in a large Q-table in order to determine the next optimal state.
Dimension of the Q-table is (|A||S|) where A is a number of actions and S is a number of
states since each (s, a) pair has a corresponding Q-value. And the iterations steps to get
the final result would be in O(|A||S|2) [68]. As can be seen, the Q-learning method take
large memory to store the Q-value and need many steps to converge to the optimized state
and time to access data stored in the Q-table is long.
In order to resolve these problems, deep reinforcement learning is proposed recently
in order to handle large state action space. A Q-function in Eq. (6.6) is used to estimate
the Q-value of the (s, a) pair.
108
Q(s, a) ≈ f(s, a) (6.6)
As can be seen in Eq. (6.6), only when the result of the Q-function is accurate,
Q-learning process can find the optimal choice. Neural network is employed so as to get the
accurate Q-function and it can lead to an accurate Q-value.
In order to train the neural network, two steps are required. Data for training is
required and the loss function needs to be decided. For the training parameters, learning
experience of each time step is stored as (st, at, rt, st+1). And the Q-value of each step is
used as the output label of the training. That value is obtained by rk + γmaxa′Q(st+1, a
′)
Since the training goal is to make the estimated Q-value more close to real Q-value, so the
loss function can be written as [69]:
L = E[(Q(s, a)−Q∗(s, a))] (6.7)
Here Q(s, a) is the real Q-value and Q∗(s, a) is estimated Q-value
6.2.2 Deep reinforcement learning based optimization framework
In this subsection, we formulate our new dynamic reliability management for multi-
core dark silicon processor based on deep reinforcement learning. The purpose is minimizing
energy considering a hard and soft error induced lifetime by controlling the p-state of cores.
p-state is the performance state subject to power budget, performance deadline, and tem-
perature constraints. The dynamic reliability management method including the DRL is
109
implemented in software which is used to control the hardware performance. Learning plat-
form formulation and implementation of dark silicon evaluation platform will be discussed.
DRL method framework based formulation and solution
Based on DRL algorithm proposed in [69], a DRL method based framework is
formulated in Fig 6.1.
In the first step, a learning agent is put in a random initial state. In the proposed
framework for dark silicon, the state is the set of cores at different p-state. Each core has its
own unique power, EM and SER. So it can be seen the power, temperature, EM and SER
of the system at each states are different with different p-state of each core which leads to
different penalty functions for each action which the next state agent will go to. Agent will
transfer from state to state and each action will provide the agent a penalty and its goal is
to minimize total penalty [70].
After initial the learning agent, transition profiles (st, at, rt, st+1) and its corre-
sponding Q-value Q(st, at). These data are restored in an experience memory D and used
for neural network training (NN). This training is done offline
With offline trained NN model, we can perform the online learning process. The
agent selects the action based on a ε greedy policy. That policy is used to control the level
of greed [67]. With probability 1− ε select the action perform the maximum estimated Q-
value at = argmaxaQ(st, a). Otherwise, a random action is selected. This random selection
is used to avoid the sub-optimal decisions. Whit that random selection, the learning agent
will not stuck in the local optimized point and will perform global optimized result. After
selected action performed, the agent takes the action and goes to the next state.
110
Figure 6.1: The proposed deep learning frame work for reliability management flow
After the action performed, the new transition profiles (st, at, rt, st+1) and its Q-
value Q(st, at) are observed and stored in memory D. If the agent does not reach the end of
an execution sequence, a new learning step is performed and a new action will be selected.
Then the loop will continue. When it reaches the end of an execution sequence the NN
model will be updated. Experience replay is performed and data are randomly picked from
memory D. By using the experience replay the behavior distribution is averaged over many
of previous state so that the learning result can be more smooth and avoid the oscillations
in the output. The NN is updated with the selected data set and put back to the loop for
the learning process in the new execution sequence.
Implementation of dark silicon evaluation platform
In order to evaluate the proposed DRL method based dynamic reliability, a evalu-
ation platform is implemented as shown in Fig 6.2. In this work, SPLASH-2 benchmark [71]
is used to evaluate the dark silicon multi-core processor since it includes applications and
kernels in most of the area. The p-state of cores are defined by DVFS. In that state, the
111
Figure 6.2: The evaluation platform for dark silicon based on DRL method
Micro-architecture simulator Sniper [72] is used to obtain the performance and Cycle per
instruction (CPI) stacks. These informations are passed to power estimation model. In this
work, McPAT (multi-core power, area, and timing) [73]. It can provide dynamic and static,
even short-circuit power dissipation and provides multi-threaded and multi-core processor
models. With performance and CPI stacks from Micro-architecture simulator, the power
and energy consumption can be estimated. HotSpot [74] is employed as the thermal model.
The CPI stacks and performance from micro-architecture model, power trace from power
estimation model and thermal trace from thermal model are passed to reliability models
discussed in section 6.1 in order to calculate the EM lifetime and SER.
After all results are obtained, they are passed to the DRL method based framework
shown in Fig 6.1. Performance from micro-architecture model, average power and energy
consumption from power estimation model, average temperature from thermal model and
EM lifetime, SER from reliability model are passed to the learning model. Here energy is
the goal to optimize and the other factors are served as constraints. With the constrains
and optimization goal, we define the penalty function here. The part of a factor larger
112
than its constraint is multiplied with a large penalty coefficient and all these penalties and
energy are used to calculate the penalty function of each action as shown in Equation 6.8.
PT = PTenergy + C ×
∑
PTconstraint (6.8)
Here PT is the total penalty, PTenergy is penalty from energy. C is the large penalty
coefficient and PTconstraint is the penalty from constraints. With these parameters, the
next p-state of each core will be find. The system will run at this state for an execution
period. After that the models will calculate the updated parameters and select the new
optimized working state.
6.3 Numerical result and discussions
6.3.1 Experiment Setup
The proposed platform is implemented in Python 3.6.0 with with the numerical
libraries (Numpy 1.11.3). Neural network is trained with tensorflow 1.3.0 [75]. Micro-
architectural simulator is Sniper 6.1 [72], power estimator is McPAT 1.0.32 [73], and thermal
simulator is HotSpot5.02 [74].
Two performance states (p-state) with the clustered DVFS [76] is chosen, which
have been employed to reduce the simulation time with small solution quality degradation
due to the large number of cores in our experiment. Global DVFS method, which all active
cores have the same p-state, is used as the baseline to compare the energy optimization. In
this work, The full power mode is (2.0GHz, 1.2V) and the low power mode is (1.0GHz,0.9V).
113
In the experiment, 150 states are used and there are 150 actions. The states and actions
are decided by realistic processor running conditions.
6.3.2 Memory Usage
In this subsection, the memory efficiency of proposed DRL algorithm will be dis-
cussed. In the experiment 150 states are used. Q-learning needs a 150 × 150 size Q-table
shown in Fig 6.3(a) to store Q value since in the framework the number of actions equals
the number of states. In general case, the space complexity is Q(N2). Memory used of
DRL method include two parts. One part is used to store the NN model and the other
part is used to store the Q-value. In the memory store the NN model, since the input is
represented by two vectors represent state and action, and one element for result of penalty
function. The length of vector equals to the number of state and in this case the length
of vector equals to 150. And we assume we have A1 element in the layer connected with
input layer. The memory taken by weight between these two layers is (150 × 2 + 1) × A1.
The general formula of this part is (2 × N + 1) × A1 and space complexity is O(N). The
weight between other layers can be calculated as Ai × Ai+1 with the element in layer i is
Ai and the element in the next layer is Ai+1. Space complexity of this part is O(1). In the
Q-value part, Q-value estimated by the neural network of (st, a) pair needs to be stored in
order to find the minimum among them. N × 1 space is used to store these Q-values and
in this case the size is 150. Its space complexity is O(N). So as shown in Fig 6.3(b), only
150× 1 space is taken. Space complexity of all these parts is O(N)
0.1in As can be seen, the DRL algorithm saves more memory than the traditional
Q-learning method. In the experiment the Q-learning method can work since the states are
114
(a) (b)
Figure 6.3: (a) Memory used of q-learning (b) Memory used of DRL
not that much. However, if the p-states are increased and state-action space increase, the
Q-table will be too large to store in the memory.
6.3.3 Energy Consumption
In this subsection, energy consumption of the working state selected by DRL
method and Q-learning method will be compared. Since in the real application, there is
limited time for state switching. So the learning process need to be finished before its
deadline. In the experiment, we set the time to finish the learning is 0.008s. This is an
acceptable time for run time operation Energy consumption at the selected state with loose
and tight constraint of DRL method, Q-learning and DVFS will be compared in Fig 6.4.
Constraints are selected based on realistic processor running conditions.
Fig 6.4(a) shows the results using a loose bound for constraints. Power budget
is 800W , performance constraint is 40.3ms, temperature constraint is 345K, EM lifetime
115
DRL Qlearning DVFS
E
n
er
gy
(J
)
0
5
10
15
(a)
DRL Qlearning DVFS
E
n
er
gy
(J
)
0
5
10
15
20
(b)
Figure 6.4: (a) Energy consumption with loose power, performance, temperature and re-
liability bound (b) Energy consumption with tight power, performance, temperature and
reliability bound
constraint is more than 3 years and SER constraint is smaller than 0.6. As we can see,
the DRL-based method can reduce energy Energyreductionratio = 27.79% better than the
Q-learning based method and 42.01% better than the DVFS method. Note that the energy
reduction ratio Energyr−ratio between two methods is calculated by following equation:
Energyr−ratio =
EQ−learning − EDRL
EQlearning
(6.9)
Fig 6.4(b) use a tight bound for constraints. Power budget is 400W , performance
constraint is 24.1ms, temperature constraint is 340K, EM lifetime constraint is more than
5 years and SER constraint is smaller than 0.4. In this case, the DRL-based method
can reduce energy 53.50% better than the Q-learning method and 61.29% better than the
DVFS method. From the result we can see the DRL method significantly outperform the
Q-learning method and the DVFS method in terms of energy reduction. Also, the energy
saving of the DRL method is even more significant compared to the two methods with the
tight constraint as tight constraint can help the DRL method converges faster.
116
6.3.4 Steps to converge
In this subsection, we compare the steps to converge of the DRL method and the
Q-learning method. In the comparison, no deadline is set so both method can converge.
Steps to converge of the two methods are shown in Fig. 6.5. We can see the DRL method
Steps
0 1000 2000 3000
E
n
er
gy
(J
)
0
50
100
150
(a)
Steps
0 1000 2000 3000
E
n
er
gy
(J
)
0
50
100
150
(b)
Figure 6.5: (a) Time to converge of DRL method (b) Time to converge of Q-learning method
in Fig. 6.5(a) reach steady state after few hundred iterations. That means it does not take
very long time to reach the selected state. The jump after steady state is caused by random
action selection, whose possibility is 1− ε, mentioned in 6.2.2. That can help the learning
agent to find global optimizing action instead of local optimizing one. So the spikes on top
does not mean the optimization is not finished. The steady state is already reached. Final
state with highest possibility to appear is selected. However, the Q-learning based method
keeps oscillating until around 2700 iterations as shown in Fig 6.5(b). That explains why
the DRL method have better energy saving since with limited learning time, Q-learning
method cannot converge to optimized state while DRL method can reach the optimized
state quickly.
117
6.4 Summary
In this chapter, we have presented a new dynamic optimization approach with
deep reinforcement learning for multi-core dark silicon processors considering both hard
and soft errors. Reliability model is based on a recently proposed three phase physics-based
electromigration model and an exponential soft error model considering DVFS effects. DRL
which has better scalability and has lower memory complexity and computational complex-
ity comparing with traditional reinforcement learning is applied so as to obtain cost-effective
solutions for online optimization. A large class of multi-threaded applications is used as the
benchmark to validate and compare the proposed dynamic reliability management meth-
ods. Experimental results on a 64-core dark silicon processor show that the proposed DRL
method based framework significantly reduce memory consumed and computational time
in comparison to to traditional Q-learning based method.
118
Chapter 7
Resource Allocation for Long-term
Reliability Enhancement of
Real-time Embedded Systems
In this work, we propose a novel resource allocation algorithm to improve the long-
term reliability of multi-core real-time embedded systems. To the best of our knowledge, this
work is the first to show the dissimilarity between utilization and reliability and propose a
reliability-aware task allocation. Utilication is not the only factor related to reliability issue
and other factors such as power, temperature and frequency can also hurt the reliability.
We combine multiple system-level reliability models, including NBTI, HCI and EM, and
apply these models to obtain a trustworthy lifetime estimation of processor cores. Our work
is motivated by the fact that conventional allocation methods, such as worst-fit decreasing
(WFD), consider only the amount of workload (utilization), which may have a negative
119
impact on lifetime. We show that the proposed algorithm yields significantly benefits in
improving system-level lifetime, while guaranteeing timing correctness of all real-time tasks
in the system.
7.1 Background on Reliability
Reliability is one of the most important design consideration in modern micropro-
cessors with aggressive design strategies and decreasing feature size. It is a major challenge
and limiting factor for VLSI design with increasing temperature, current, and voltage in
more advanced nano-scale technology. Nowadays, with larger and more complex multi-core
systems, fault tolerance in the device and system levels are demanded, so reliability mod-
els with high accuracy are required. Among all the reliability issues, NBTI, HCI and EM
are the dominant effects [1, 77]. In our work, all these three reliability effects are con-
sidered in the optimization at system level. First, we will discuss how device-level models
are abstracted for system-level models. Detailed reliability models used in this work will
be reviewed. Then a system-level model to analyze all these reliability problems will be
discussed as well as system features sensitive to reliability such as utilization and frequency.
7.1.1 Device-level to System-level Models
NBTI, HCI and EM are dominant among all reliability effects. NBTI and HCI
will lead to shift of threshold voltage (Vth) of the impaired transistors which manifests in
increasing switching and path delays [77]. EM mainly acts on a power-ground network,
causes excessive IR drop, and leads to the failure of the circuit [78]. Many device-level
120
models are developed for these three reliability phenomena. However, in order to analyze
them at the system-level, these device-level models are cumbersome and not practically
applicable. To solve this problem, some system-level reliability models abstracted from the
device-level models are proposed. In the following subsections, we review these system-level
reliability models for NBTI, HCI and EM.
Model for Negative Bias Temperature Instability
Negative Bias Temperature Instability (NBTI) mainly affects p-type transistors
(PMOS). NBTI effect consists of two phases; when the circuit is under stress, and when it
is recovering. For the under stress case, logic ’0’ is applied to the gate of PMOS transistor
and |Vth| increase because of traps generated at the interface between gate oxide and channel.
For the recovery phase, in contrast, logic ’1’ is applied to the same transistor which leads
to decreasing of |Vth| since some traps are filled.
Many device level models have been developed for NBTI with different parameters
such as temperature (T), supply voltage (Vdd), and duty cycle (δ) which is the ratio between
the time a transistor is under stress and total time [79]. However, in order to fit system-level
optimization, a NBTI model at higher level is required.
A system level model was proposed in [80]. The model is
∆avgVth(t) ≤
∫ 1
0
ANu(Vdd)
(v(TB) · δB · δe · tm)n
w(δB·e, TB, t)2n dδe (7.1)
121
where
u =(Vdd − Vth) · exp((Vdd − Vth/E0))
v =ξ4 · exp(−Ea/kT )
w =1− (1− ξ1 +
√
ξ3 · v(T ) · (1− δ(t)) · tm
ξ2 +
√
v(T ) · t )
1
2n
(7.2)
Here An, n, E0 and ξi are technology dependent constants. Ea is activation energy (pos-
itive), k is Boltzmann constant, tm is the period between two measurements, TB is the
temperature of the transistor, δB is duty cycle of the block, and δe is effective duty cycle.
In VLSI design, the system level behavior depends on RTL level blocks and tran-
sistors in these blocks. However, with uniqueness of design and numerious of transistors
in blocks, it is not realistic to get exact transistors’ states for system level NBTI analysis.
So in this model, the author assumes that all the transistors in the block behave similarly
since RTL level description development is not available for system level analysis. Also they
assume the effective duty cycle δe is uniformly distributed from 0 to 1.
As can be seen, the delay change is calculated for the entire block instead of a single
transistor. So, average voltage shift is used here for delay estimation. This estimation might
be too optimistic for a certain transistor but it is still a good estimation for system-level
block. If detailed knowledge about the hardware implementation is available, partially
weighted mean can be used to achieve higher accuracy.
122
Model for Hot Carrier Injection
Hot Carrier Injection (HCI) mainly affects n-type transistors (NMOS), where ac-
celerated electrons inside the channel can collide with gate oxide interface and thereby
create electron-hole pairs [81]. An increase in Vth is caused by these free electrons since
they get trapped in the gate oxide layer. Based on experimental results in [82], HCI effect
has sub-linear dependency on the frequency (f), runtime (t), and active factor (α), which
is the ratio of the cycles in which the transistor is transitioning and the total amount of
cycles. Moreover, temperature dependency is exponential for HCI. Using the same assump-
tion used for system-level NBTI model that all the transistors in the block behave similarly.
The system level model [80] is expressed as:
∆Vth(t) = AH · √aavg,B · U(Vdd)V (TB) ·
√
αB · f · t (7.3)
where
U = exp((Vdd − Vth)/E1), V (T ) = exp(−Ea/kT ) (7.4)
AH and E1 are technology dependent constants and the activation energy Ea
is positive. aavg,B represents the average switching activity of all gates in the micro-
architectural block B. In the worst case, the average switching activity for the (near-)critical
paths can be 1. So here we use 1 for aavg,B in order to gain maximum Vth shift for general
application. So, if detailed information of the transistors can be obtained, aavg,B can be ad-
justed in order to get more accurate value. After we obtain the change of threshold voltage
(Vth), the switch delay of transistor can be calculated. With the assumption we made, all
123
transistors inside one micro-architectural block can be represented by one single transistor
TB (representative transistor). Delay of the block (∆
reldB) can be estimated by relative
delay if the representative transistor ∆reldB = ∆
reldTB . Thus, this delay can be calculated
by a power law [83] :
∆reldTB (t) = (1−
∆Vth(t)
Vdd − Vth(t0))
r − 1 (7.5)
In this equation, Vth is the only variable and delay can be obtained if Vth is
calculated. Besides HCI, NBTI delay can also be calculated using Eq. (7.5).
Model for Electromigration
Here the EM model employed is the model fro chapter 2. And for system level
optimization, We assume that each core can have its own voltage (voltage island) and
frequency settings, and DVFS can be done locally for each core. For the micro-architecture
of EM failure we focus on power grid since it is the most vulnerable part to EM failure.
A simple mesh-structured power grid is generated for each core, and each of them has its
power island with the its power regulator. We assume that power grids for each core are
the same and are less coupled as far as EM effects are concerned. Power for each core
is estimated by a power modeling framework. For the device level power distribution, we
assume that the power is evenly distributed in the power grid inside each core. This is an
reasonable assumption since it is hard to consider detailed architecture inside cores. Current
on each branch can be obtained from the power on each interconnect tree. Lifetime of each
tree is estimated by aforementioned three phase model [9] using the corresponding current
124
and temperature. When the voltage drop caused by resistance increment of the power grid
reaches a certain level, the core is considered as fail.
7.1.2 Unified system-level reliability model
Based on the system-level models for different reliability issues, we introduce a
unified system-level reliability model used in this work based on industry standard sum-of-
failure-rates (SOFR) model [63,84]. Since detailed joint model between different reliability
effects is hard to obtain, mathematical method is employed to combine these effects. Al-
though that model may not reflect the exact joint effect of all these reliability models, it
provide an efficient way to obtain the joint lifetime with different reliability models and
tasks. Although some improved methods like [85] is proposed recently, SOFR is still ac-
curate enough for most real-time systems. For multi-core systems, they will run different
tasks in different frequencies. In order to consider lifetime impacts of different tasks, a
system-level model is proposed in [63]:
MTTF =
1
(
∑n
m=1(∆tm
1
MTTFm
))/T
(7.6)
whereMTTFm is the actual MTTF (mean time to failure) under them-th task. Here actual
MTTF means if the MTTF of a system run only one task repeatedly. For ∆tm is time period
of m-th task. Assuming the chip works through n different power and frequency settings
and total time T can be represneted as T =
∑n
m=1∆tm. Here ∆tm can be represented by
125
utilization. So, MTTF can be defined by utilization as following:
MTTF =
1
(
∑n
m=1(Um
1
MTTFm
))
(7.7)
For each singleMTTFm, we treat lifetime as a resource and combine three different
effects. Industry standard sum-of-failure-rates (SOFR) model [84] is employed as:
MTTFm =
1
1
MTTFNBTI
+ 1MTTFHCI +
1
MTTFEM
(7.8)
As can be seen, in this model, the type of tasks, period of corresponding tasks
and frequencies will be the key factors of reliability analysis. The effect with the lowest
lifetime dominates the MTTFm. So the optimization on the critical effect can have a
positive improvement in the lifetime. These three factors have the same correlation trend
with frequency and utilization as well as temperature. Other important factors for MTTF
of each effect, such as temperature, can be obtained with architecture simulators such as
Sniper and Hotspot tools. Their values also change with different frequencies and task
periods for different tasks. As aforementioned, the periods of tasks can be represented by
task utilization, in this work utilization is used to relax the task period in the analysis.
In order to consider system-level reliability on a multi-core processor, the shortest lifetime
among all the cores is employed as the lifetime for all multi-core processors [14].
126
7.1.3 Review of current scheduling techniques considering reliability
Scheduling techniques are employed in embedded systems in order to address long-
term reliability challenges [17,18]. A simulated annealing-based task allocation and schedul-
ing strategy is proposed in [17]. This strategy can extend lifetime of platform-based MPSoC
design significantly with performance constrains. A cost function is defined in that work
which is:
Cost = µ · 1{∃i:ei>di} −MTTF (7.9)
where µ is a significant large number, and 1· is the indicator function which equals to 1 if
deadline is missed and equal to 0 if the scheduler can meet the deadline. As can be seen, the
cost function can decrease with higher MTTF but will be significantly higher if the deadline
missed. Simulated annealing can obtain the lowest cost, which means the longest MTTF
can be achieved with a performance constraint. However, they use only simple general
model for MTTF which is not realistic and do not have physics meaning.
Physics-based EM model is employed in [18] as the reliability model for lifetime
task optimization technique. This model is a two phase model proposed in [7] and only
nucleation phase and growth phase are considered. DVFS is employed first for each task and
corresponding MTTFs of all the performance states (p-state) are calculated. Best p-state
with maximum lifetime for each task with timing constraint is obtained by equation (7.10).
Max Lifetime(m) =
1
(
∑i=1
n (∆ti
1
MTTFi
Thyper
Ti
))/Thyper
Subject to:
ri = ∆ti +
∑
k∈hp(i)
⌈
ri
Tk
⌉
∆tk ≤ Ti
MTTFi,l ≤MTTFi ≤MTTFi,u
(7.10)
127
where m is the variable vector for MTTF, i is the task id, ∆ti is the execution time for task
i, n is the total number of task, MTTFi is the segment MTTF for task i, ri is the response
time of task i, hp(i) is the task set containing the higher priority tasks than the current
task i. Ti is the task period of task i, which is a deadline. Thyper is the hyper period for all
tasks. At the single-rate Thyper is equal to Ti. Ttotal is the total execution time of all tasks.
MTTFi,l is the minimum bound of MTTF for task i and MTTFi,u is the maximum bound
of MTTF for task i.
Response surface method (RSM) and adaptive simulated annealing (ASA) are used
optimize constrained non-linear formula in Eq. (7.10) in order to find optimized p-state.
However, this method assumes a uniprocessor real-time system and cannot consider the
multi-core systems which is generally used in the application. Also, only EM is considered
in this work and other important reliability issues such as NBTI and HCI are ignored.
Recent work [19] proposes an optimization method for system-level reliability in
multi-core systems. Their goal is to minimize the deviation of cores’ utilization from a set
point in each time interval with certain constrains:
min
∑
pi∈M
(U s − Ui(k))2 (7.11)
Although four reliability factors (EM, TDDB, SM and TC) are considered in [19],
their primary focus is on temperature since EM, TDDB, and SM are strongly dependent
on it. However, besides temperature, there are also many other momentous factors such as
power and voltage which are not considered in their work. These factors can cause lifetime
128
variation among tasks. So, balanced utilization does not necessarily lead to optimized
MTTF due to lifetime variation among tasks. We address such limitations in this paper.
7.2 System Model
The real-time system considered in this work is equipped with a single-chip multi-
core processor. The processor has NP identical cores, each of which can be configured
with different operating frequencies and voltage levels. This assumption is supported by
other existing works [86] and existing processors like ARM Cortex A75 also support per-core
DVFS [87]. The system runs an operating system that is capable of scheduling n tasks based
on task priorities. We focus on partitioned preemptive earliest-deadline-first scheduling (P-
EDF) due to many benefits: (i) partitioned scheduling does not introduce task migration
costs, thereby widely used in safety-critical embedded real-time systems, and (ii) on each
core, EDF is known to be optimal to schedule tasks with timing constraints. Under P-EDF,
each task is statically assigned to a single CPU core. Any arbitrary tie-breaking rule can
be used to determine the priorities of tasks with the same deadline, e.g., task indices.
We consider the periodic real-time task model [88], which is a widely used model
in the literature. Each task τi is represented by the following parameters:
τi := (C
f
i , Ti, Di)
• Cfi : the worst-case execution time (WCET) of a single instance (called job) of the
task τi, when it executes on a core running at f Hz of frequency
129
• Ti: the period of τi
• Di: the relative deadline of τi (Di ≤ Ti)
Note that for notational simplicity, Ci may be used instead of C
f
i in the rest of the paper,
if the frequency of τi’s core is assumed to be known. The utilization of a task τi is defined
as Ui = Ci/Ti. Without loss of generality, we assume implicit deadline for all tasks, i.e.,
Ti = Di. If the deadline of a task τi can be satisfied under any circumstance, τi is said to be
schedulable. The schedulability of tasks under P-EDF can be checked on a per-core basis.
Tasks on a core p are schedulable if the following condition satisfies:
∑
∀τi:τi∈Γp
Ci/Ti ≤ 1 (7.12)
where Γp is the set of tasks allocated on the core p.
As our main concern is system-level reliability, we assume that task execution
time is not affected by contention in multi-core shared hardware resources, such as caches
and memory buses. Details on accounting for the delay from such resources can be found
in [89–95]. Note that our proposed work can be easily integrated with them by incorporating
the blocking terms obtained by those approaches into Eq. (7.12).
7.3 Reliability-aware Real-Time Resource Allocation
In this section, we propose our resource allocation algorithm that improves the
long-term reliability of multi-core real-time systems. We first discuss the challenges observed
from motivational experiments and then provide details on the proposed algorithm.
130
7.3.1 Motivational experiments and key observations
In a multi-core system, the problem of finding an optimal allocation of tasks to
cores is typically modeled as a bin-packing problem that is known to be NP-complete [96].
As the allocation may need to be changed at runtime even in real-time embedded systems,
e.g., mode changes for different missions, bin-packing heuristics are widely used to find
a near-optimal solution , such as worst-fit decreasing (WFD) best-fit decreasing (BFD)
and first-fit decreasing (FFD). These heuristics first sort tasks in decreasing order of their
utilization so that large tasks can be allocated first, which helps improve the success rate
or reduce the number of required cores in general. WFD puts tasks in the emptiest core,
BFD puts tasks in the fullest core which they fit and FFD puts tasks in the core opened
earliest which they fit.
System-level reliability is largely affected by task allocation. As it is good to
balance the wear-out of cores, it is seemingly intuitive to evenly distribute the load across
cores, which WFD does. However, we find that this approach does not yield long system-
level lifetime due to a huge lifetime variation among tasks. For instance, tasks with high
utilization do not necessarily have a short lifetime. If such tasks are allocated by WFD, the
utilization of cores will be balanced but not the lifetime. Fig. 7.1 illustrates an example of
3 tasks allocated to 2 cores, where each bin represents a core. The MTTF and utilization
of individual tasks are shown in Table 7.1. Note that the MTTFi in Table 7.1 is calculated
by Eq. 7.7. As can be seen in Fig. 7.1(a), the utilization is balanced (0.5 for each core), but
the system-level lifetime is only 2.68 years based on Eq. 7.7. In Fig. 7.1(b), the utilization
of the first core is 0.7 and that of the second core is 0.3. Although it is unbalanced in terms
131
of utilization, the system-level lifetime is 3.9 years based on Eq. 7.6, which is 45% longer
than the prior case.
(a) (b)
Figure 7.1: (a) Balanced utilization with short lifetime. (b) Unbalanced utilization with
long lifetime
From the above example, we can observe that balanced utilization is not enough for
lifetime optimization. Task allocation should consider the fact that each task has a different
impact on lifetime regardless of its utilization; e.g., tasks with low (or high) utilization may
have short (or long) lifetime.
For the same type of tasks, the operating frequency of assigned cores and task
utilization play a dominant role in lifetime. Fig. 7.2(a) shows the lifetime (MTTF) of a
task under different frequency. Note that Cfi (WCET) is different with different frequency
since the utilization is fixed. As can be seen, MTTF decreases rapidly with increasing
frequency.This trend is also observed in [97]. In such a case, the core should run at the
lowest feasible frequency where all the tasks of that core remain schedulable, in order to
achieve the longest possible lifetime. For a task executing at a fixed frequency, as shown in
Fig. 7.2(b), MTTF decreases with increasing utilization. In this case, Cfi is different with
different utilization with a fixed frequency. The sudden change at 1.5GHz in Fig. 7.2(a) and
132
Table 7.1: Example task parameters for Fig. 7.1
Task τi Utilization Ui MTTF MTTFi
τ1 (yellow) 0.5 39.8
τ2 (green) 0.3 3.9
τ3 (blue) 0.2 8.6
at 0.9 utilization in Fig. 7.2(b) are caused by change of domination of different reliability
factors. This will be discussed in detail in section7.4.1.
Frequency(GHz)
1 1.2 1.4 1.6 1.8 2
M
T
T
F
(y
ea
rs
)
0
2
4
6
8
10
12
14
(a)
Utilization at 1GHz
0 0.2 0.4 0.6 0.8 1
M
T
T
F
(y
ea
rs
)
0
10
20
30
40
50
(b)
Figure 7.2: (a) MTTF at different frequencies for task utilization of 0.5. (b) MTTF at 1
GHz for different task utilizations
With the aforementioned problems, a new algorithm is required for task allocation
and CPU frequency selection. Different impact sources on lifetime, such as tasks and
running frequency and utilization of cores, need to be taken into consideration for lifetime
optimization.
7.3.2 Proposed algorithm
Motivated by the observations in the previous subsection, we proposed a reliability-
aware algorithm. Our goal is to maximize system-level lifetime while guaranteeing the
schedulability of all tasks in the system. To achieve this goal, our algorithm integrates
lifetime estimation and frequency selection into task allocation. As the problem of task
133
allocation itself is NP-complete [96], we develop a heuristic so that it can be used at runtime
when re-allocation is needed.
Algorithm 1 Reliability-aware Resource Allocation
Input: Γ: a taskset (all tasks in the system), P: a set of all available CPU cores in the system, F:
a set of available frequencies
Output: Success or failure
1: /* Get number of cores and nominal frequency */
2: NP ← |P|
3: fnom ← minf∈F f s.t.
∑
τi∈Γ
Cfi /Ti ≤ NP
4: if there is no feasible fnom then failure
5: end if
6: /* Divide tasks into two subsets: Γlarge and Γsmall */
7: Γlarge ←
⋃
∀τi∈Γ∧Ui≥0.5
τi
8: Γsmall ← Γ \ Γlarge
9: /* Allocate large tasks first */
10: for all τi ∈ Γlarge in inc. order of lifetime at fnom do
11: lifetimemax ← 0
12: for all pj ∈ P do
13: fpj ← minf∈F f s.t.
∑
τk∈Γpj∪{τi}
Cfk /Tk ≤ 1
14: if there is no feasible fpj then
15: continue /* τi is unschedulable on pj */
16: end if
17: lifetimepj ← Lifetime of Γpj ∪ {τi} at fpj by Eq. (7.8)
18: if lifetimepj > lifetimemax then
19: lifetimemax ← lifetimepj
20: p∗ ← pj
21: end if
22: end for
23: if lifetimemax = 0 then failure /* there is no core to schedule τi
24: else
25: Γp∗ ← Γp∗ ∪ {τi}
26: end if
27: end for
28: Do the steps from lines 10 to 27 for Γsmall success
Our proposed algorithm is given in Algorithm 1. It takes three input parameters: Γ
is the taskset to be considered, P is the set of cores in the system, and F is the set of available
operating frequencies of P. The algorithm returns success if all tasks are schedulable, and
failure otherwise. Task allocation and frequency selection results can be easily obtained by
checking the corresponding local variables before completion.
134
The algorithm first computes the nominal frequency fnom for a given taskset
(line 3), which is the minimum frequency required to schedule tasks in Γ, i.e., total uti-
lization does not exceed NP . Hence, fnom can be said to be the optimal frequency for a
given taskset. Here the optimal frequency means the lowest possible frequency to run the
taskset which will lead to longest MTTF . However, it is possible that the taskset cannot
be partitioned and fnom need to be increased to minimum feasible frequency fpj in the
following part of the algorithm. The algorithm immediately returns tasks which are un-
schedulable by any frequency in F. It then divides Γ into two subsets: Γlarge and Γsmall
(lines 7 to 8). If the utilization of a task is larger than 0.5 at fnom, it is classified as a
large task; otherwise, it is a small task. The reason for dividing tasks into two groups is to
allocate large tasks first as they have a higher impact on allocation than small tasks. The
threshold of 0.5 is widely used to distinguish large tasks in the literature because no more
than one task with 0.5 utilization can be allocated to the same core.
Once tasks are divided, the algorithm allocates tasks in Γlarge in increasing order
of their lifetime computed at fnom (line 10). Hence, tasks with shorter lifetime are allocated
first as they have a higher impact on the system-level reliability. Note that this ordering
does not give a significant disadvantage to the allocation performance of our algorithm,
compared to WFD, BFD and FFD, because the algorithm allocates large tasks separately
from small tasks. For each task τi, the algorithm computes the expected lifetime of each core
pj when τi is allocated to it. Specifically, in line 13, it finds the minimum feasible frequency
fpj for the core pj such that all tasks on pj and τi (Γpj ∪ {τi}) are schedulable at fpj under
EDF (see Eq. (7.12)). It then computes the lifetime of Γpj ∪ {τi} at fpj using Eq. (7.8),
135
and takes that value as the expected lifetime of pj (lifetimepj ). If it fails to compute the
lifetime of any core at all, this means there is no core to schedule τi and thus the algorithm
returns failure (line 23). Otherwise, the algorithm allocates τi to the core with the longest
expected lifetime among all cores. Once the allocation of large tasks is done, the algorithm
performs the same steps for small tasks (line 28). Finally, the algorithm returns success.
The time complexity of the proposed algorithm is O(n2 ·NP · |F|), where n is the number
of tasks and |F| is the number of frequency levels.
7.3.3 Applicability to sporadic tasks and fixed-priority scheduling
It is worth noting that our proposed work can also support the sporadic task
model [98] where each task has the minimum inter-arrival time between any two consecutive
instances (called jobs) of the task. In this case, some jobs of a task may have inter-arrival
times longer than the task’s minimum inter-arrival time. However, larger inter-arrival times
will benefit, or increase, the MTTF. The reasons are listed in the following. (i) In the
NBTI model given by Eq. (7.1), since the block duty cycle δB will be smaller when the
inter-arrival time is longer, the corresponding lifetime of NBTI will be increased. (ii) The
same phenomenon can be observed in the HCI model in Eq. (7.3). With longer inter-
arrival times, active factor (α) which represents the ratio of the transistor transitioning
cycles and total amount of cycles will be smaller which results to increase of lifetime. (iii)
EM lifetime also increase since the idle times are increased with longer inter-arrival times.
Therefore, the estimated MTTF using the minimum arrival time will be the worst case
MTTF because it will have the longest δB, largest α and the shortest idle time for three
reliability factors. Although this paper assumes partitioned EDF scheduling, our algorithm
136
given in Algorithm 1 can be easily extended for systems using fixed-priority scheduling on
each core. Basically, the line 13 of Algorithm 1 finds the minimum feasible core frequency
by using the schedulability condition under EDF given in Eq. 7.12. This condition can
be easily changed to the one for fixed-priority scheduling, such as the Rate Monotonic
utilization bound [88] and the iterative response-time test [99]. The other parts of the
algorithm do not need any modification as they will be unaffected by the change of the
schedulability condition.
7.4 Evaluation
In this section, we will evaluate the effectiveness of our proposed resource allocation
algorithm with randomly-generated tasksets. The performance metric we are interested in
is the system-level MTTF (also called lifetime) while all tasks in the system meet their
real-time constraints.
7.4.1 Taskset Generation
A modified architecture simulator (Sniper 6.1 [22]) with SPLASH-2 multi-threaded
application benchmarks [71] at different DVFS levels is used to capture the reliability char-
acteristics of realistic workloads and to use them in real-time taskset generation. Power is
obtained by the power simulator (McPAT 1.0.32) [73] and temperature is obtained by tem-
perature simulator (HotSpot 5.02) [74]. In this work, 45nm-based simulation framework is
employed. Among a total of eleven SPLASH-2 benchmarks, we choose the following seven
benchmarks: barnes, cholesky, fft, fmm, radiosity, radix, and raytrace, which are
137
chosen to represent a combination of long and short lifetime tasks. The simulated proces-
sor architecture has 4 power modes: (2 GHz, 1.2 V), (1.8 GHz, 1.1 V), (1.5 GHz, 1.0 V),
and (1.0 GHz, 0.9 V). We assume core DVFS sets with voltage and frequency scaling pro-
portionally, with frequencies in the extracted range (1.0 GHz-2.0GHz) from the cpufreq
governors [100], which is based on Enhanced Intel Speedstep Technology [101], and with
voltages in 4 equally-spaced voltages in the rage 0.9 V-1.2V from 45-nm technology.
Frequency(GHz)
1 1.5 2
M
T
T
F
(y
ea
rs
)
0
10
20
30
40
50
60
cholesky
fft
barnes
radiosity
fmm
raytrace
radix
(a)
utilization at 1GHz
0 0.2 0.4 0.6 0.8 1
M
T
T
F
(y
ea
rs
)
0
10
20
30
40
50
60
70
cholesky
fft
barnes
radiosity
fmm
raytrace
radix
(b)
Figure 7.3: (a) MTTF for all workloads at different frequencies with 0.5 utilization. (b)
MTTF for all workloads at 1 GHz with different utilizations
Fig. 7.3 shows the MTTFs of the SPLASH-2 benchmarks as the operating fre-
quency and the utilization of each benchmark change. As can be seen, barnes, radix and
raytrace have longer lifetime than the other four benchmarks. This is because these bench-
marks require lower power as well as lower temperature during their run time comparing
with other tasks running at same frequencies.
The MTTFs of the benchmarks are computed considering the three reliability
effects discussed in Section 7.2. Each benchmark used has a different dominant effect on
138
1Frequency (GHz)
1.5
1.81
0.8
0.6
Utilization
0.4
0.2
60
40
20
0
0
M
T
T
F
10
20
30
40
50
(a)
1
Frequency(GHz)
1.5
1.8
21.0
0.8
0.6
Utiliation
0.4
0.2
0
40
20
80
60
0
M
T
T
F
10
20
30
40
50
60
(b)
Figure 7.4: (a) Lifetime on different frequency and utilization of benchmark fft. (b)
Lifetime on different frequency and utilization of benchmark barnes.
Utilization at 1GHz
0 0.2 0.4 0.6 0.8 1
L
if
et
im
e
(y
ea
rs
)
0
50
100
150
200
NBTI
HCI
EM
MTTF
(a)
Utilization at 1GHz
0 0.2 0.4 0.6 0.8 1
L
if
et
im
e
(y
ea
rs
)
0
50
100
150
200
NBTI
HCI
EM
MTTF
(b)
Figure 7.5: (a) Lifetime of different effects for benchmark fft at 1GHz. (b) Lifetime of
different effects for benchmark barnes at 1GHz.
its MTTF. As an example, we compare in Fig. 7.5 the impact of the three reliability effects
on the lifetime of the two benchmark, fft and barnes. The lifetime of fft is dominated
by EM, and NBTI causes a higher impact than HCI when the utilization is high. On the
other hand, the lifetime of barnes is dominated by HCI if the utilization is lower than 70%,
and by NBTI otherwise. That means that, depending on the type, operating frequency
and utilization of a benchmark, the dominant effect on MTTF can be different. Based on
these results, we have classified the benchmarks into two groups: the one is for high-lifetime
139
tasks and the other is for low-lifetime tasks. The impact of the percentage of high- and
low-lifetime tasks on the system-level MTTF will be studied in the next subsection.
For each experimental setting, we use 1,000 randomly-generated tasksets. Table 7.2
shows the base parameters used for taskset generation. Systems with four to sixty four cores
are considered (NP = 4, 8, 12, 16, 32, 64). For each taskset, n tasks are generated where
n is randomly chosen from [NP , 4NP ]. The type of each task is randomly picked from
the aforementioned seven SPLASH-2 benchmarks and is used to determine the reliability
characteristics of that task. The period of a task τi is chosen from [100, 500] ms. The
utilization of τi at 2 GHz (U
2G
i ) is randomly chosen from [0.05, 0.5], and the worst-case
execution time (WCET) of τi at 2 GHz (C
2G
i ) is obtained by U
2G
i · Ti. For simplicity, we
assume that task utilization is linear to core frequency, e.g., the range of task utilization at
1 GHz is [0.1, 1].
7.4.2 Experimental Results
For comparison with our proposed method, we consider variants of the worst-fit
decreasing (WFD) best-fit decreasing (BFD) and first-fit decreasing (FFD) heuristics. Each
of these heuristics first allocates tasks to cores assuming all the cores run at the maximum
frequency (2 GHz). As aforementioned, minimum feasible frequency need to be found
in order to find longest system MTTF, it then reduces the frequency of each core pj as
much as possible while all tasks allocated to pj remain schedulable, i.e., minimize f : f ∈
{2G, 1.8G, 1.5G, 1G} s.t.∑τi∈Γpj Cfi /Ti ≤ 1. Finally, the MTTF (lifetime) of each core is
computed and the shortest MTTF is taken as the system-level MTTF. The three heuristics
combined with this approach serve as the baselines in our experiments since the existing
140
Table 7.2: Base parameters for taskset generation
Parameters Values
Number of cores (NP ) 4, 8, 12, 16, 32, 64
Number of tasks (n) [NP , 4NP ]
Task type One of SPLASH-2 benchmarks
Task util. at 2 GHz (U2Gi ) [0.05, 0.5]
Task period (Ti = Di) [100, 500] ms
Task WCET at f Hz (Cfi ) U
f
i · Ti
task allocation approaches developed for reliability, such as [102–104], focus on only the
utilization of cores during allocation and thus share the same idea as these heuristics.
Number of Cores
4 8 12 16 32 64
M
T
T
F
(y
ea
rs
)
0
1
2
3
4
5
6
7
8
WFD
BFD
FFD
proposed method
Figure 7.6: MTTF for a different number of cores (NP ) with n tasks randomly chosen from
[NP , 4NP ]
Fig. 7.6 shows the MTTF of a system with a different number of cores (NP ).
For each NP from {4, 8, 12, 16, 32, 64}, all other parameters are randomly chosen from the
ranges given in Table 7.2. As can be seen, MTTF decreases with an increment of NP . With
more cores, more tasks can be scheduled so that cores run at a higher frequency and the
utilization of them increases. The proposed method gives the highest improvement over
the other methods when the system has 12 cores. The maximum lifetime improvement
by our method is 70.45% compared to WFD and 1219.67% compared to BFD and FFD.
141
The reason for the bad performance of BFD and FFD is that they lead to imbalanced core
utilization, thereby requiring higher frequency.
Number of tasks
50 100 150 200 250
M
T
T
F
(y
ea
rs
)
0
1
2
3
4
5
6
7
WFD
BFD
FFD
Proposed Method
(a)
Number of Tasks
50 100 150 200 250
M
T
T
F
(y
ea
rs
)
0
1
2
3
4
5
6
7
WFD
BFD
FFD
Proposed Method
(b)
Figure 7.7: (a) MTTF for a different number of tasks from 8 to 32 with 8 cores. (b)MTTF
for a different number of tasks from 64 to 256 with 64 cores.
Fig. 7.7 depicts the MTTF with respect to the number of tasks in the system. Np
is set to 8 and 64 in this experiment, and the number of tasks is chosen form [NP , 4Np] with
a step equals to 2. We can see that the lifetime decreases as the number of tasks increases.
With more tasks, our proposed methods give a higher percentage of lifetime improvement
over the other methods. For instance, when there are 18 tasks, there are 182.7% increment
in lifetime compared to WFD and 1404.0% increment in lifetime compared to BFD and FFD
with 8 core case. For 64 core case, when there are 128 tasks there are 464% increment in
lifetime compared to WFD and 1438.7% increment in lifetime compared to BFD and FFD.
However, the absolute value of MTTF becomes lower as the number of tasks increases
because core frequencies will need to increase to meet task deadlines.
Fig. 7.8 shows the MTTF for different percentage of low-MTTF tasks
(task type={cholesky,fft, fmm,raytrace}). NP is set to 8 and 64 and corresponding
142
Percentage of low MTTF tasks
0 20 40 60 80 100
M
T
T
F
(y
ea
rs
)
0
2
4
6
8
10
12
WFD
BFD
FFD
Proposed Method
(a)
Percentage of low MTTF tasks
0 20 40 60 80 100
M
T
T
F
(y
ea
rs
)
0
2
4
6
8
10
12
14
16
WFD
BFD
FFD
Proposed Method
(b)
Figure 7.8: (a) MTTF for different percentage of low-MTTF tasks with 8 cores and 20
tasks. (b) MTTF for different percentage of low-MTTF tasks with 64 cores and 160 tasks.
numbers of tasks are set to 20 and 160. For each taskset with 20 tasks, we randomly choose
the percentage of low-MTTF tasks in the taskset from the range of [0, 100]. It can be
seen in the figure that lifetime reduces with higher percentage of low-MTTF tasks. When
there are 50% of low-MTTF tasks in a taskset, the proposed method achieves the largest
improvement in lifetime over the other methods, 171.7% over WFD and 755.6% over BFD
and FFD for 8 core case and for 64 core case, the proposed method achieves the largest
improvement in lifetime over the other methods, 378.1% over WFD and 1696% over BFD
and FFD when there are 80% of low-MTTF tasks in a taskset.
As aforementioned, proposed method has better lifetime comparing with other
method since lifetime is balanced across all CPU cores. Table 7.3 shows the statistics of
lifetime of a system with 64 cores. As shown in the table, with BFD and FFD method,
some cores are immortal while the minimum lifetimes are only 0.384 years. And proposed
method has higher minimum lifetime and smaller variance comparing with WFD. So we
can see proposed method achieve balanced lifetime across all cores.
143
Table 7.3: statistics of lifetime of a system with 64 cores
Lifetime (years)
Method
WFD BFD FFD Proposed Method
Max 62.5 Immortal Immortal 26.6
Min 5.99 0.384 0.384 7.56
Average 20.0 - - 15.8
Variance 136 - - 10.8
Based on these results, we conclude that our proposed method yields a significant
benefit in the system-level lifetime while satisfying the timing constraints of real-time tasks.
In this work, we propose a novel resource allocation algorithm to improve the long-term
reliability of multi-core real-time embedded systems. To the best of our knowledge, this
work is the first to show the dissimilarity between utilization and reliability and propose a
reliability-aware task allocation. Utilication is not the only factor related to reliability issue
and other factors such as power, temperature and frequency can also hurt the reliability.
We combine multiple system-level reliability models, including NBTI, HCI and EM, and
apply these models to obtain a trustworthy lifetime estimation of processor cores. Our work
is motivated by the fact that conventional allocation methods, such as worst-fit decreasing
(WFD), consider only the amount of workload (utilization), which may have a negative
impact on lifetime. We show that the proposed algorithm yields significantly benefits in
improving system-level lifetime, while guaranteeing timing correctness of all real-time tasks
in the system.
7.5 Summary
In this chapter, we have presented a long-term reliability-aware resource allocation
algorithm for multi-core real-time embedded systems. Our work considers system-level
144
models for long-term reliability effects including EM, NBTI and HCI. The lifetime obtained
with these models is treated as resource to be controlled. Our proposed resource allocation
algorithm uses this information to improve the lifetime of a system under partitioned EDF
scheduling. Unlike the conventional allocation methods for partitioned scheduling, such as
WFD, BFD, and FFD, which are agnostic to reliability, our algorithm achieves balanced
lifetime across all CPU cores. Experimental results show that the proposed algorithm can
achieve significant improvement in MTTF, which means that our algorithm substantially
outperforms the conventional scheduling methods in terms of long-term reliability. In the
future , reliability on single core after task allocation will also be studied.
145
Chapter 8
Conclusion
Physics level and System level reliability has been getting worse with more aggres-
sive design strategy. It is expected that reliability will be one of the most serious issue for
VLSI design, especially for advanced sub 7nm technology node. Both physics level relia-
bility for interconnects on chip and system level reliability for multi core systems need to
be held in high regard. In this dissertation, we propose physics level EM model for VLSI
circuits and system level EM optimization for multi-core systems. In this chapter, the main
contributions are summarized.
8.1 Voltage based EM immortality check
In chapter 3 we have proposed a EM immortality check for general multi-segment
interconnect wires. The new method estimates the EM-induced steady-state stress in gen-
eral multi-segment copper interconnect wires based on a novel parameter, Critical EM
Voltage, VCrit,EM . The proposed method, called voltage-based EM or VBEM method, mit-
146
igates the problem of current-density-based EM criteria, which can only be applied to a
single wire. The new VBEM method can naturally comprehend the impact of the topology
of the wire structure on EM-induced stress. As a result, this new VBEM analysis method
is very amenable to addressing EM violations, as it brings new optimization capabilities to
the physical design flow. The VBEM stress estimation method is based on the fundamental
steady-state stress equations. This approach avoids computationally-intensive numerical
methods and can be implemented in CAD tools very easily. We also study the impact of
current crowding in practical interconnect wires on the estimated steady-state stress, which
are shown to be not significant if the length of the wire is much greater than its width. An
extension of the VBEM method to consider the significant current crowding effects is also
shown and additionally, we analyze mesh-structured interconnect wires and demonstrate
that the proposed VBEM method is correct and accurate on such structures.
8.2 Saturation volume based EM immortality check
In chapter 4 we have proposed a new formula of the void’s saturation volume for
general multi-segment interconnect wires. The new formula is based on the fundamental
atom conservation at the steady-state condition of void growth phases. The new void satura-
tion estimation formula agrees with the existing single segment wire saturation void volume
formula and is an natural extension of the single segment case to general multi-segment
wires. In addition, we consider impacts of the void volume on final stress distributions of
the wire to further improve the accuracy of the proposed formula. The proposed saturation
void volume estimation can be applied for fast EM immortality check for nucleated wires,
147
which were considered to be failed wires in the past. Combined with the recently proposed
EM immortality check for nucleation, we propose a new EM immortality check algorithm,
which considers both void nucleation and void growth for the first time and thus overcoming
the conservativeness of the existing EM immortality check method. Our numerical results
show that the proposed formula agrees well with a published work for two-segment cases,
which are supported by experimental data. The formula is also validated by the recently
proposed physics-based 3D finite element (FEM) analysis tool for general multi-segment
interconnect wires.
8.3 Coupled Electronic and Stress Simulation
In this chapter 5 we have proposed EMSpice, which is a new full-chip EM failure
analysis tool for physics-based coupled EM and electronic analysis. EMSpice takes power
grid netlists from Synopsys ICC flow, and outputs the failed EM wires and their resistance
changes and resulting IR drops of the power grids over the given aging time. Hydrostatic
stress and electronic current/voltage in a power grid network can be considered simultane-
ously. EMSpice simulator employs a finite difference time domain (FDTD) solver for stress
analysis for every interconnect tree for both nucleation and post-voiding phases. EMSpice
simulator also couples the EM analysis with IR drop analysis at time domain. Thus the
solver can consider the interplay among stress, void growth, resistance change and IR drop
in a single simulation framework. Furthermore, EMSpice simulator considers both recently
proposed nucleation phase immortality and incubation phase immortality for the first time
to remove immortal interconnect trees from EM analysis.
148
8.4 Deep Reinforcement based reliability management
In chapter 6 we have proposed a new deep reinforcement learning optimization
frame work considering both hard and soft errors for dark silicon processors. We formulate
our DRM problem as minimizing the energy consumption subject to the reliability include
hard and soft errors, power, performance and thermal constraints. The hard error model is
based on a recently proposed physics based EM model. Hard and soft errors are obtained by
recently proposed reliability models. Power, performance are acquired by X86 based micro-
architecture model and thermal information is given by temperature simulators. The control
operation we use dynamic voltage and frequency scaling (DVFS) and ON/OFF pulsing
action. DRL method is used to select the most cost effective operation. Furthermore, the
DRL-based method also shows much smaller space complexity (linear versus the quadratic)
and less steps to converge than the Q-learning based method as well.
8.5 Resource allocation based reliability management
In chapter 7 we have proposed a novel resource allocation algorithm to improve
the long-term reliability of multi-core real-time embedded systems. To the best of our
knowledge, this work is the first to show the dissimilarity between utilization and reliability
and propose a reliability-aware task allocation. Utilization is not the only factor related to
reliability issue and other factors such as power, temperature and frequency can also hurt the
reliability. We combine multiple system-level reliability models, including NBTI, HCI and
EM, and apply these models to obtain a trustworthy lifetime estimation of processor cores.
We show that the proposed algorithm yields significantly benefits in improving system-level
149
lifetime over conventional methods, while guaranteeing timing correctness of all real-time
tasks in the system.
150
Bibliography
[1] “International Technology Roadmap for Semiconductors 2.0 (ITRS 2.0),” 2015.
http://www.itrs2.net/itrs-reports.html.
[2] S. R. Nassif, “Power Grid Analysis Benchmarks,” in Proc. Asia South Pacific Design
Automation Conf. (ASPDAC), pp. 376–381, Mar. 2008.
[3] B. Bailey, “Thermally Challenged,” Semiconductor Engineering, pp. 1–8, Dec. 2013.
[4] J. R. Black, “Electromigration-A Brief Survey and Some Recent Results,” IEEE
Trans. on Electron Devices, vol. 16, no. 4, pp. 338–347, 1969.
[5] I. A. Blech, “Electromigration in thin aluminum films on titanium nitride,” Journal
of Applied Physics, vol. 47, no. 4, pp. 1203–1208, 1976.
[6] M. A. Korhonen, P. Bo/rgesen, K. N. Tu, and C.-Y. Li, “Stress evolution due to
electromigration in confined metal lines,” Journal of Applied Physics, vol. 73, no. 8,
pp. 3790–3799, 1993.
[7] X. Huang, T. Yu, V. Sukharev, and S. X.-D. Tan, “Physics-based electromigration
assessment for power grid networks,” in Proc. Design Automation Conf. (DAC), pp. 1–
6, IEEE, June 2014.
[8] X. Huang, A. Kteyan, S. X.-D. Tan, and V. Sukharev, “Physics-Based Electromigra-
tion Models and Full-Chip Assessment for Power Grid Networks,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 11, pp. 1848–
1861, 2016.
[9] S. X.-D. Tan, H. Amrouch, T. Kim, Z. Sun, C. Cook, and J. Henkel, “Recent advances
in EM and BTI induced reliability modeling, analysis and optimization,” Integration,
the VLSI Journal, vol. 60, pp. 132–152, Jan. 2018.
[10] C. Cook, Z. Sun, E. Demircan, M. D. Shroff, and S. X.-D. Tan, “Fast electromigration
stress evolution analysis for interconnect trees using krylov subspace method,” IEEE
Trans. on Very Large Scale Integration (VLSI) Systems, vol. 26, pp. 969–980, May
2018.
151
[11] S. Chatterjee, V. Sukharev, and F. N. Najm, “Power Grid Electromigration Checking
using Physics-Based Models,” IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, vol. 37, no. 7, pp. 1317–1330, 2018.
[12] H. Zhao and S. X.-D. Tan, “Postvoiding fem analysis for electromigration failure char-
acterization,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 26,
pp. 2483–2493, Nov. 2018.
[13] S. Feng, S. Gupta, A. Ansari, and S. Mahlke, “Maestro: Orchestrating lifetime re-
liability in chip multiprocessors,” in Proceedings of the 5th International Conference
on High Performance Embedded Architectures and Compilers, HiPEAC’10, (Berlin,
Heidelberg), pp. 186–200, Springer-Verlag, 2010.
[14] A. Das, A. Kumar, and B. Veeravalli, “Reliability-driven task mapping for lifetime
extension of networks-on-chip based multiprocessor systems,” in Proceedings of the
Conference on Design, Automation and Test in Europe, DATE ’13, (San Jose, CA,
USA), pp. 689–694, EDA Consortium, 2013.
[15] T. Kim, X. Huang, V. S. H.-B. CHen, and S. X.-D. Tan, “Learning-based dynamic
reliability management for dark silicon processor considering EM effects,” in Proc.
Design, Automation and Test In Europe Conf. (DATE), pp. 463–468, IEEE, March
2016.
[16] T. Kim, Z. Sun, H.-B. Chen, H. Wang, and S. X.-D. Tan, “Energy and lifetime
optimizations for dark silicon manycore microprocessor considering both hard and
soft errors,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 25,
no. 9, pp. 2561–2574, 2017.
[17] L. Huang, F. Yuan, and Q. Xu, “On Task Allocation and Scheduling for Lifetime Ex-
tension of Platform-Based MPSoC Designs,” Parallel and Distributed Systems, IEEE
Transactions on, vol. 22, no. 12, pp. 2088–2099, 2011.
[18] T. Kim, B. Zheng, H.-B. Chen, Q. Zhu, V. Sukharev, and S. X.-D. Tan, “Lifetime
optimization for real-time embedded systems considering electromigration effects,” in
Proc. Int. Conf. on Computer Aided Design (ICCAD), pp. 434–439, IEEE, Nov. 2014.
[19] Y. Ma, T. Chantem, R. P. Dick, and X. S. Hu, “Improving system-level lifetime
reliability of multicore soft real-time systems,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 25, pp. 1895–1905, June 2017.
[20] Z. Sun, E. Demircan, M. D. Shroff, T. Kim, X. Huang, and S. X.-D. Tan, “Voltage-
Based Electromigration Immortality Check for General Multi-Branch Interconnects,”
in Proc. Int. Conf. on Computer Aided Design (ICCAD), pp. 1–7, Nov. 2016.
[21] Z. Sun, E. Demircan, M. D. Shroff, C. Cook, and S. X.-D. Tan, “Fast Electromigration
Immortality Analysis for Multisegment Copper Interconnect Wires,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3137–
3150, 2018.
152
[22] T. E. Carlson, W. Heirman, and L. Eeckhout, “Sniper: Exploring the level of abstrac-
tion for scalable and accurate parallel multi-core simulations,” in International Con-
ference for High Performance Computing, Networking, Storage and Analysis (SC),
pp. 52:1–52:12, Nov. 2011.
[23] V. Sukharev, A. Kteyan, and X. Huang, “Postvoiding stress evolution in confined
metal lines,” IEEE Transactions on Device and Materials Reliability, vol. 16, no. 1,
pp. 50–60, 2016.
[24] V. Sukharev, A. Kteyan, and X. Huang, “Post-Voiding Stress Evolution in Confined
Metal Lines,” IEEE Trans. on Device and Materials Reliability, vol. 16, no. 1, pp. 50–
60, 2016.
[25] M. A. Korhonen, P. Borgesen, D. D. Brown, and C.-Y. Li, “Microstructure based
statistical model of electromigration damage in confined line metallizations in the
presence of thermally induced stresses,” Journal of Applied Physics, vol. 74, no. 8,
pp. 4995–11, 1993.
[26] S. M. Alam, C. L. Gan, C. V. Thompson, and D. E. Troxel, “Reliability computer-
aided design tool for full-chip electromigration analysis and comparison with different
interconnect metallizations,” Microelectronics Journal, vol. 38, no. 4, pp. 463–473,
2007.
[27] E. Demircan and M. D.Shroff, “Model based method for electro-migration stress de-
termination in interconnects,” in 2014 IEEE International Reliability Physics Sym-
posium, pp. IT.5.1–IT.5.6, June 2014.
[28] V. Sukharev, “Physically Based Simulation of Electromigration-Induced Degradation
Mechanisms in Dual-Inlaid Copper Interconnects,” IEEE Trans. on Computer-Aided
Design of Integrated Circuits and Systems, vol. 24, pp. 1326–1335, Sept. 2005.
[29] A. Kteyan, V. Sukharev, M. A. Meyer, E. Zschech, and W. D. Nix, “Microstructure
Effect on EM-Induced Degradations in Dual-Inlaid Copper Interconnects,” Proc. AIP
Conference, vol. 945, pp. 42–55, 2007.
[30] H. Chen, S. X.-D. Tan, V. Sukharev, X. Huang, and T. Kim, “Interconnect relia-
bility modeling and analysis for multi-branch interconnect trees,” in Proc. Design
Automation Conf. (DAC), pp. 1–6, IEEE, June 2015.
[31] C. Cook, Z. Sun, T. Kim, and S. X.-D. Tan, “Finite difference method for electromi-
gration analysis of multi-branch interconnects,” in Synthesis, Modeling, Analysis and
Simulation Methods and Applications to Circuit Design (SMACD), pp. 1–4, IEEE,
June 2016.
[32] M. Lin and A. Oates, “An electromigration failure distribution model for short-length
conductors incorporating passive sinks/reservoirs,” IEEE Transactions on Device and
Materials Reliability, vol. 13, pp. 322–326, March 2013.
153
[33] R. Gleixner and W. Nix, “A Physically Based Model of Electromigration and
Stress-Induced Void Formation in Microelectronic Interconnects,” Journal of Applied
Physics, vol. 86, no. 4, pp. 1932–1944, 1999.
[34] X. Wang, H. Wang, J. He, S. X.-D. Tan, Y. Cai, and S. Yang, “Physics-based Elec-
tromigration Modeling and Assessment for Multi-Segment Interconnects in Power
Grid Networks,” in Proc. Design, Automation and Test In Europe Conf. (DATE),
pp. 1727–1732, Mar. 2017.
[35] “Comsol multiphysics.” https://www.comsol.com/ [Oct. 16, 2013].
[36] F. L. Wei, C. L. Gan, T. L. Tan, C. S. Hau-Riege, A. P. Marathe, J. J. Vlassak, and
C. V. Thompson, “Electromigration-induced extrusion failures in cu/low-k intercon-
nects,” Journal of Applied Physics, vol. 104, no. 023529, pp. 1–10, 2008.
[37] C. S. Hau-Riege, A. P. Marathe, and Z. S. Choi, “The effect of current direction on
the electromigration in short-lines with reservoirs,” in IEEE International Reliability
Physics Symposium (IRPS), pp. 381–384, 2008.
[38] R. G. Filippi, R. A. Wachnik, H. Aochi, J. R. Lloyd, and M. A. Korhonen, “The effect
of current density and stripe length on resistance saturation during electromigration
testing,” Applied Physics Letters, vol. 69, pp. 2350–2352, Oct. 1996.
[39] C. M. Tan, Electomigration in ULSI Interconnects. International Series on Advances
in Solid State Electronics and Technology, Word Scientific, 2010.
[40] V. Mishra and S. S. Sapatnekar, “Probabilistic wire resistance degradation due to
electromigration in power grids,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 36, pp. 628–640, April 2017.
[41] C.-K. Hu, D. Canaperi, S. T. Chen, L. M. Gignac, B. Herbst, S. Kaldor, M. Krishnan,
E. Liniger, D. L. Rath, D. Restaino, R. Rosenberg, J. Rubino, S.-C. Seo, A. Simon,
S. Smith, and W.-T. Tseng, “Effects of overlayers on electromigration reliability im-
provement for cu/low k interconnects,” in Reliability Physics Symposium Proceedings,
2004. 42nd Annual. 2004 IEEE International, pp. 222–228, IEEE, 2004.
[42] L. Zhang, Effects of Scaling and Grain Structure on Electromigration Reliability of
Cu Interconnects. PhD thesis, University of Texas at Austin, 2010.
[43] T. N. Marieb, E. Abratowski, J. C. Bravman, M. Madden, and P. Flinn, “Direct
observation of the growth and movement of electromigration voids under passivation,”
AIP Conference Proceedings, vol. 305, no. 1, pp. 1–14, 1994.
[44] H. Zhao and S. X.-D. Tan, “Multi-physics-based FEM analysis for post-voiding anal-
ysis of electromigration failure effects,” in Proc. Int. Conf. on Computer Aided Design
(ICCAD), pp. 1–8, IEEE, Nov. 2018.
[45] C. W. Chang, Z.-S. Choi, C. V. Thompson, C. L. Gan, K. L. Pey, W. K. Choi, and
N. Hwang, “Electromigration resistance in a short three-contact interconnect tree,”
Journal of Applied Physics, vol. 99, no. 9, p. 094505, 2006.
154
[46] X. Huang, A. Kteyan, S. X.-D. Tan, and V. Sukharev, “Physics-based electromigration
models and full-chip assessment for power grid networks,” IEEE Trans. on Computer-
Aided Design of Integrated Circuits and Systems, vol. 35, pp. 1848–1861, Nov. 2016.
[47] V. Sukharev and F. N. Najm, “Electromigration check: Where the design and relia-
bility methodologies meet,” IEEE Transactions on Device and Materials Reliability,
vol. 18, pp. 498–507, Dec 2018.
[48] S. P. Hau-Riege and C. V. Thompson, “Experimental characterization and modeling
of the reliability of interconnect trees,” Journal of Applied Physics, vol. 89, no. 1,
pp. 601–609, 2001.
[49] Z. Sun, S. Sadiqbatcha, H. Zhao, and S. X.-D. Tan, “Accelerating Electromigration
Aging for Fast Failure Detection for Nanometer ICs,” in Proc. Asia South Pacific
Design Automation Conf. (ASPDAC), pp. 623–630, Jan. 2018.
[50] A. Limited, “Arm cortex-m0 designstart,” 2010.
[51] “Synopsys 32/28nm generic library for teaching ic design.” http://www.synopsys.com.
[52] R. Goldman, K. Bartleson, T. Wood, K. Kranen, C. Cao, V. Melikyan, and
G. Markosyan, “Synopsys’ open educational design kit: Capabilities, deployment and
future,” in 2009 IEEE International Conference on Microelectronic Systems Educa-
tion, pp. 20–24, July 2009.
[53] R. L. de Orio, Electromigration modeling and simulation. na, 2010.
[54] “Engineering toobox.” http://www.engineeringtoolbox.com.
[55] S. P. Hau-Riege and C. V. Thompson, “The Effects of The Mechanical Properties of
The Confinement Material on Electromigration in Metallic Interconnects,” Journal of
Materials Research, vol. 15, no. 8, pp. 1797–1802, 2000.
[56] K. Tu, “ Recent Advances on Electromigration in Very-Large-Scale-Integration of
Interconnects,” Journal of Applied Physics, vol. 94, no. 9, pp. 5451–5473, 2003.
[57] V. Sukharev, A. Kteyan, and E. Zschech, “Physics-Based Models for EM and SM Sim-
ulation in Three-Dimensional IC Structures,” IEEE Trans. on Device and Materials
Reliability, vol. 12, no. 2, pp. 272–284, 2012.
[58] V. Sukharev, A. Kteyan, J.-H. Choy, H. Hovsepyan, A. Markosian, E. Zschech, and
R. Huebner, “Multi-scale Simulation Methodology for Stress Assessment in 3D IC:
Effect of Die Stacking on Device Performance,” Journal of Electronic Testing, vol. 28,
pp. 63–72, Nov. 2011.
[59] S. X.-D. Tan, M. Tahoori, T. Kim, S. Wang, Z. Sun, and S. Kiamehr, VLSI Systems
Long-Term Reliability – Modeling, Simulation and Optimization. Springer Publishing,
2019.
155
[60] R. G. Filippi, P.-C. Wang, A. Brendler, K. Chanda, and J. R. Lloyd, “Implications of
a threshold failure time and void nucleation on electromigration of copper intercon-
nects,” Journal of Applied Physics, vol. 107, no. 10, 2010.
[61] V. Mishra and S. S. Sapatnekar, “Predicting Electromigration Mortality Under Tem-
perature and Product Lifetime Specifications,” in Proc. Design Automation Conf.
(DAC), pp. 1–6, Jun. 2016.
[62] A. Abbasinasab and M. Marek-Sadowska, “RAIN: A tool for reliability assessment
of interconnect networks—physics to software,” in Proc. Design Automation Conf.
(DAC), (New York, NY, USA), pp. 133:1–133:6, ACM, 2018.
[63] Z. Lu, W. Huang, J. Lach, M. Stan, and K. Skadron, “Interconnect lifetime prediction
under dynamic stress for reliability-aware design,” in Proc. IEEE/ACM Int. Conf. on
Computer Aided Design (ICCAD), pp. 327–334, IEEE, November 2004.
[64] D. Zhu, R. Melhem, and D. Mosse, “The effects of energy management on reliability
in real-time embedded systems,” in Proceedings of the 2004 IEEE/ACM International
Conference on Computer-aided Design, ICCAD ’04, (Washington, DC, USA), pp. 35–
40, IEEE Computer Society, 2004.
[65] L. Tan, S. Song, P. Wu, Z. Chen, R. Ge, and D. Kerbyson, “Investigating the interplay
between energy efficiency and resilience in high performance computing,” in Parallel
and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pp. 786–
796, May 2015.
[66] Y. Zhang, K. Chakrabarty, and V. Swaminathan, “Energy-aware fault tolerance
in fixed-priority real-time embedded systems,” in Computer Aided Design, 2003.
ICCAD-2003. International Conference on, pp. 209–213, Nov 2003.
[67] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning. Cambridge,
MA, USA: MIT Press, 1st ed., 1998.
[68] M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving
markov decision problems,” in Proceedings of the Eleventh Conference on Uncertainty
in Artificial Intelligence, UAI’95, (San Francisco, CA, USA), pp. 394–402, Morgan
Kaufmann Publishers Inc., 1995.
[69] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra,
and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” CoRR,
vol. abs/1312.5602, 2013.
[70] C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279–
292, 1992.
[71] S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, “The splash-2 programs:
characterization and methodological considerations,” in Computer Architecture, 1995.
Proceedings., 22nd Annual International Symposium on, pp. 24–36, June 1995.
156
[72] J. H. Ahn, S. Li, O. Seongil, and N. Jouppi, “Mcsima+: A manycore simulator with
application-level+ simulation and detailed microarchitecture modeling,” in Perfor-
mance Analysis of Systems and Software (ISPASS), 2013 IEEE International Sym-
posium on, pp. 74–85, April 2013.
[73] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi,
“Mcpat: An integrated power, area, and timing modeling framework for multicore and
manycore architectures,” in 2009 42nd Annual IEEE/ACM International Symposium
on Microarchitecture (MICRO), pp. 469–480, Dec 2009.
[74] K.Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan,
“Temperature-aware microarchitecture,” in International Symposium on Computer
Architecture, pp. 2–13, 2003.
[75] Abad et al., “Tensorflow: A system for large-scale machine learning,” in Proceedings
of the 12th USENIX Conference on Operating Systems Design and Implementation,
OSDI’16, (Berkeley, CA, USA), pp. 265–283, USENIX Association, 2016.
[76] T. Kolpe, A. Zhai, and S. Sapatnekar, “Enabling improved power management in
multicore processors through clustered dvfs,” in Proc. Design, Automation and Test
In Europe. (DATE), pp. 1–6, March 2011.
[77] K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J.
Nowak, D. J. Pearson, and N. J. Rohrer, “High-performance cmos variability in the 65-
nm regime and beyond,” IBM Journal of Research and Development, vol. 50, pp. 433–
449, July 2006.
[78] S. Chatterjee, M. B. Fawaz, and F. N. Najm, “Redundancy-Aware Electromigration
Checking for Mesh Power Grids,” in Proc. IEEE/ACM Int. Conf. on Computer Aided
Design (ICCAD), pp. 540–547, 2013.
[79] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu, and Y. Cao, “The impact
of nbti effect on combinational circuit: Modeling, simulation, and analysis,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, pp. 173–183,
Feb 2010.
[80] F. Oboril and M. B. Tahoori, “Extratime: Modeling and analysis of wearout due to
transistor aging at microarchitecture-level,” in IEEE/IFIP International Conference
on Dependable Systems and Networks (DSN 2012), pp. 1–12, June 2012.
[81] K.-L. Chen, S. Saller, and R. Shah, “The case of ac stress in the hot-carrier effect,”
IEEE Transactions on Electron Devices, vol. 33, pp. 424–426, Mar 1986.
[82] E. Takeda, Y. Nakagome, H. Kume, and S. Asai, “New hot-carrier injection and de-
vice degradation in submicron mosfets,” IEE Proceedings I - Solid-State and Electron
Devices, vol. 130, pp. 144–150, June 1983.
157
[83] K. A. Bowman, B. L. Austin, J. C. Eble, X. Tang, and J. D. Meindl, “A physical
alpha-power law mosfet model,” in Proceedings of the 1999 International Symposium
on Low Power Electronics and Design, ISLPED ’99, pp. 218–222, 1999.
[84] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, “The case for lifetime reliability-
aware microprocessors,” in Proceedings. 31st Annual International Symposium on
Computer Architecture, 2004., pp. 276–287, June 2004.
[85] C. Bolchini, M. Carminati, M. Gribaudo, and A. Miele, “A lightweight and open-
source framework for the lifetime estimation of multicore systems,” in 2014 IEEE
32nd International Conference on Computer Design (ICCD), pp. 166–172, Oct 2014.
[86] C. Lin, C. Chang, Y. Syu, J. Wu, P. Liu, P. Cheng, and W. Hsu, “An energy-efficient
task scheduler for multi-core platforms with per-core dvfs based on task characteris-
tics,” in 2014 43rd International Conference on Parallel Processing, pp. 381–390, Sep.
2014.
[87] A. Limited, “Arm cortex-a75 technical reference manual,” 2016.
[88] C. Liu and J. Layland, “Scheduling algorithms for multiprogramming in a hard-real-
time environment,” Journal of the ACM, vol. 20, no. 1, pp. 46–61, 1973.
[89] N. Suzuki, H. Kim, D. de Niz, B. Andersson, L. Wrage, M. Klein, and R. R. Rajkumar,
“Coordinated bank and cache coloring for temporal protection of memory accesses,”
in IEEE International Conference on Embedded Software and Systems (ICESS), 2013.
[90] H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar, “Bounding
and reducing memory interference in COTS-based multi-core systems,” Real-Time
Systems, vol. 52, no. 3, pp. 356–395, 2016.
[91] H. Kim, A. Kandhalu, and R. R. Rajkumar, “A coordinated approach for practical
OS-level cache management in multi-core real-time systems,” in Euromicro Confer-
ence on Real-Time Systems (ECRTS), 2013.
[92] H. Kim and R. Rajkumar, “Real-time cache management for multi-core virtualiza-
tion,” in ACM International Conference on Embedded Software (EMSOFT), 2016.
[93] H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha, “Memory access control in
multiprocessor for real-time systems with mixed criticality,” in Euromicro Conference
on Real-Time Systems (ECRTS), 2012.
[94] H. Yun, R. Mancuso, Z.-P. Wu, and R. Pellizzoni, “PALLOC: DRAM bank-aware
memory allocator for performance isolation on multicore platforms,” in IEEE Real-
Time Technology and Applications Symposium (RTAS), 2014.
[95] R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni, “Real-
time cache management framework for multi-core architectures,” in IEEE Real-Time
Technology and Applications Symposium (RTAS), 2013.
158
[96] D. S. Johnson, A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham, “Worst-case
performance bounds for simple one-dimensional packing algorithms,” SIAM Journal
on Computing, vol. 3, no. 4, pp. 299–325, 1974.
[97] Y. Ma, T. Chantem, X. S. Hu, and R. P. Dick, “Improving lifetime of multicore
soft real-time systems through global utilization control,” in Proceedings of the 25th
Edition on Great Lakes Symposium on VLSI, GLSVLSI ’15, (New York, NY, USA),
pp. 79–82, ACM, 2015.
[98] A. K. Mok, “Fundamental design problems of distributed systems for the hard real-
time environment,” PhD Thesis, Massachusetts Institute of Technology, 1983.
[99] M. Joseph and P. K. Pandya, “Finding response times in a real-time system,” The
Computer Journal, vol. 29, no. 5, pp. 390–395, 1986.
[100] D. Brodowski and N. Golde, “Linux cpufreq governors,” Linux Kernel. https://www.
kernel. org/doc/Documentation/cpu-freq/governors. txt, 2013.
[101] V. Pallipadi, “Enhanced intel speedstep technology and demand-based switching on
linux,” Intel Developer Service, 2009.
[102] M. Mandelli, G. Castilhos, G. Sassatelli, L. Ost, and F. G. Moraes, “A distributed
energy-aware task mapping to achieve thermal balancing and improve reliability of
many-core systems,” in Proceedings of the 28th Symposium on Integrated Circuits and
Systems Design, SBCCI ’15, (New York, NY, USA), pp. 13:1–13:7, ACM, 2015.
[103] A. Naithani, S. Eyerman, and L. Eeckhout, “Reliability-aware scheduling on het-
erogeneous multicore processors,” in 2017 IEEE International Symposium on High
Performance Computer Architecture (HPCA), pp. 397–408, Feb 2017.
[104] J. Sun, R. Lysecky, K. Shankar, A. Kodi, A. Louri, and J. Roveda, “Workload as-
signment considering NBTI degradation in multicore systems,” J. Emerg. Technol.
Comput. Syst., vol. 10, pp. 4:1–4:22, Jan. 2014.
159
