Electromigration-Aware Architecture for Modern Microprocessors by Gabbay, Freddy et al.
Electromigration-Aware Architecture for Modern Microprocessors  
 
Freddy Gabbay1, Avi Mendelson2 and Yinnon Stav1 
Ruppin Academic Center1 
{freddyg, yinnon}@ruppin.ac.il 
Technion – Israel Institute of Technology2 
avi.mendelson@technion.ac.il 
 
Abstract 
 Reliability is a fundamental requirement in any 
microprocessor to guarantee correct execution over its 
lifetime. The design rules related to reliability depend on the 
process technology being used and the expected operating 
conditions of the device. To meet reliability requirements, 
advanced process technologies (28 nm and below) impose 
highly challenging design rules. Such design-for-reliability 
rules have become a major burden on the flow of VLSI 
implementation because of the severe physical constraints 
they impose.  
 This paper focuses on electromigration (EM), which is 
one of the major critical factors affecting semiconductor 
reliability. EM is the aging process of on-die wires and vias 
and is induced by excessive current flow that can damage 
wires and may also significantly impact the integrated-circuit 
clock frequency. EM exerts a comprehensive global effect on 
devices because it impacts wires that may reside inside the 
standard or custom logical cells, between logical cells, inside 
memory elements, and within wires that interconnect 
functional blocks. 
 The design-implementation flow (synthesis and place-
and-route) currently detects violations of EM-reliability rules 
and attempts to solve them. In contrast, this paper proposes a 
new approach to enhance these flows by using EM-aware 
architecture. Our results show that the proposed solution can 
relax EM design efforts in microprocessors and more than 
double microprocessor lifetime. This work demonstrates this 
proposed approach for modern microprocessors, although the 
principals and ideas can be adapted to other cases as well.  
1. Introduction 
 Chip reliability is an essential design requirement and is 
crucial to assure the correct functionality of a semiconductor 
integrated circuit (IC). For every product (e.g., processor), 
chip vendors are required to guarantee a minimum lifetime, 
which depends on a reliability prediction for each chip. To 
meet these reliability requirements, a design-for-reliability 
methodology was developed that, unfortunately, is highly 
complicated because it depends on the expected workload, 
the process technology, the operating voltage, and the 
temperature. As part of the design-for-reliability 
methodology of modern processors, a workflow is defined 
[1,2,3] that aims to guarantee a minimum product lifetime 
under a specified workload (i.e., the mission profile). Given 
the use of new advanced process technologies and new 
applications such as computation-intensive infrastructures 
(e.g., autonomous cars, data-center computing, cloud 
computing, life-support systems, etc.), the need for high 
reliability has recently heightened. 
 The shrinking dimensions of VLSI technology, the 
increasing density of logical elements, and the challenging 
voltage and temperature operating conditions combine today 
to make electromigration (EM) one of the most influential 
factors affecting the reliability of modern systems. EM is a 
phenomenon related to the reliability of wires and vias in 
integrated circuits and is caused by excessive current flow 
that can potentially damage a physical device. Such damage 
may either reduce a wire’s conductivity or cause wire 
disconnect, both of which lead to reliability concerns. In this 
work, we focus on the impact of EM on wires and vias that 
reside inside logical cells or memory elements; or used as 
interconnects between logical cells or functional units. 
 To date, the design community has focused on enhancing 
chip-design implementation flow [1,2,4-10] to solve EM 
issues, whereas few works have proposed architectural 
solutions. In the present study, we propose a novel 
architecture that significantly improves reliability by 
reducing EM impact while relaxing the physical design 
efforts and significantly extending microprocessor lifetime. 
This study is based on the observation that numerous 
reliability concerns result from excessive write activities (or 
change of logical state) spread across processing elements of 
the same type (gates, logical units, or memory elements) in a 
nonuniform manner. This observation led us to develop 
enhanced resource-allocation mechanisms that uniformly 
distribute the write operations workload across all resources.  
As a result, the maximum EM stress induced by singular 
elements is minimized, and the overall IC reliability is 
extended in up to several orders of magnitude. This work 
focuses on a microprocessor as a case study; however, the 
concepts can be applied to other ICs and applications. 
 The remainder of this paper is organized as follows: 
Section 2 introduces EM reliability challenges and reviews 
EM and previous works related to EM effects. Section 3 
introduces the limitations of modern microprocessor 
architecture to deal with EM effects, Section 4 describes the 
proposed EM-aware architectural enhancements, and Section 
5 presents simulation results of the proposed EM-aware 
architecture. Finally, Section 6 summarizes the study and 
suggests future research works. 
2. IC reliability  
 IC reliability has become a crucial discipline in VLSI 
chip design. The need for highly reliable systems has existed 
from the early days of computing and was mainly driven in 
  2 
the past by “special systems” such as mission-critical 
embedded systems. However, given the vulnerability of the 
new process technology and the appearance of new 
applications that require safe and reliable processing such as 
autonomous cars, large-scale computing-intensive systems 
(e.g., HPC, cloud computing, data centers), and life-support 
systems, reliability today is a fundamental requirement for 
most systems. The product specifications of such systems 
impose strict requirements on reliability through the lifetime 
and operating conditions. For example, the automotive 
industry expects an IC to function reliably for 10–15 years at 
a given temperature (usually about 125 °C [11,12] and under 
various workloads. In data-center computing, the 
requirements are slightly relaxed but remain challenging: the 
lifetime requirement demands at least ten years, whereas the 
temperature can range from 105 to 110 °C with arbitrary 
workloads. None of these reliability-sensitive applications 
can afford microprocessor faults caused by reliability issues.  
 Over the past decade, as advanced process technologies 
have been introduced, the susceptibility to reliability-related 
issues has grown dramatically. Starting at 28 nm process 
technology and below (16, 7, 5, and 3 nm), the design efforts 
dedicated to reliability have substantially increased. The 
design community has mainly tried to enhance the synthesis 
and place-and-route flows to minimize and eliminate 
reliability-related issues. Such flows involve substantial 
design efforts and, in many cases, required multiple iterations 
to make the IC comply with the design rules (also known as 
the “sign-off process”). Note that few prior studies have 
addressed these physical reliability challenges from the 
architecture point of view [5-8]. The remainder of this section 
reviews the EM phenomenon and previous studies on the 
impact of EM. 
2.1 Electromigration  
 EM is a physical phenomenon related to excessive 
current density within wires and vias. EM became a major 
concern in advanced process technologies when the 
geometrical dimension of wires and vias shrank to very small 
dimensions, making them highly susceptible to the negative 
effect of electrical current stress. This stress is induced by the 
force of conduction electrons and metal ions. When the force 
of conduction electrons reach a certain strength level, it may 
tear atoms from the boundary of the metal and transport them 
in the direction of the current flow. If such current force is 
maintained for a long time or if current flows frequently, the 
wire may become malformed. To ease the problem, one may 
consider occasionally reversing the current direction, but 
experiments indicate that such a strategy has minor impact on 
improving on the overall reliability issues (e.g., wire 
disconnect or significant change in the wire resistance). The 
occurrence of such an issue, even on a single wire, may result 
in overall chip failure. Note also that the geometrical 
granularity of wires plays a major role in susceptibility to 
EM, with smaller wire granularity encouraging greater EM 
forces. Therefore, we expect EM to continue to be a major 
challenge in semiconductors as we deploy new advanced 
process technologies [8]. 
 Black [13] derived the following formula for the EM 
mean time to failure (MTTF): 
 
𝑀𝑇𝑇𝐹 =  
𝐴
𝐽𝑛
𝑒
𝐸𝑎
𝑘B𝑇 
Equation 1 - EM MTTF 
 
where A is a constant, J is the current density, 𝐸𝑎 is the 
activation energy, n is a scaling factor, kB is the Boltzmann 
constant, and T is the absolute temperature. The MTTF 
depends exponentially on temperature; in fact, higher 
temperature accelerates the negative effect of EM because it 
weakens the atomic bonds in a wire by making them even 
more sensitive to EM forces. Because many new 
applications, and in particular control systems (e.g., in the 
automotive or robotics fields), are required to operate at high 
temperatures of 105–125 °C, this induces much greater 
susceptibility to EM that will be highly challenging to 
mitigate during IC implementation and sign-off.  
 In addition to the temperature effect, Refs. [6,14] 
express the current density J in metals as 
 
𝐽 =
𝐶𝑉𝐷𝐷
𝑊𝐻
𝑝𝑓 
Equation 2 - Current Density 
 
where C is the wire capacitance, W and H are the metal width 
and height, respectively, VDD is the operating voltage, f is the 
clock frequency, and p is the switching probability, also 
known as the toggle rate. To meet the reliability 
requirements, two additional design-rule constraints are 
usually imposed by advanced process technology design 
rules [14]:  
1. The current applied in every wire should be less than or 
equal to the peak current allowed by the process 
technology. 
2. The current flow in a wire must be calculated by using 
the root mean square (RMS). Note that the use of an 
average current is not useful for EM analysis because 
the average current is usually zero since the number of 
charge carriers is the same when charging or 
discharging an electrical junction. More details on RMS 
current are available in Ref. [14]. 
For advanced process technologies, RMS current has become 
a very significant concern for EM reliability because of the 
incredibly small dimensions of wires and vias. 
 Handling the design rules for both maximum current 
and RMS current is highly challenging. The maximum-
current constraint is mainly enforced by the physical design 
implementation tools that assure that the driving gates will 
not exceed the maximum-current limitation and by other 
physical design means [14]. With respect to the RMS current, 
the situation is more complex. Equation 2 shows that the RMS 
  3 
current flow within a wire is proportional to both the toggle 
rate and the clock frequency, which means that a higher 
toggle rate for logical elements increases the susceptibility to 
EM stress. Therefore, the MTTF of wires and vias can be 
increased either by increasing their physical width W or by 
minimizing their switching rate p. Increasing the width of 
metal wires has, of course, a negative impact on die area and 
on the number of available routing resources, which can 
degrade performance and increase overall power 
consumption. Minimizing the switching probability depends 
on both workload and architecture. In many cases, the 
switching probability depends on the change of logical state 
due to a write operation or to the use of logical elements for 
different computations. 
 Further studies on EM and its effects are available in 
Refs. [1,2,9,10,17,19]. To relax EM stress, we propose in 
Section 4 a novel architectural solution that exploits the 
relationship between EM and toggle rate. 
2.2 Prior Works on Electromigration  
 This subsection summarizes previous works on EM. 
The overview differentiates between works that propose EM 
solutions through the physical design flow and works that do 
so through micro-architectural or architectural solutions. 
2.2.1 Prior work based on physical design  
 EM phenomena have been broadly studied from the 
physical design point of view. Various studies [4,7,16] 
examined different interconnects such as copper or 
aluminum and how they are affected by EM under different 
process, voltage, and temperature conditions. From a 
physical point of view, the most common solution for EM is 
to widen the interconnect wires. As Equation 2 indicates, 
widening a wire reduces the current density and eventually 
decreases the effect of EM but, from the physical design 
viewpoint, it is not always the preferred solution because it 
may introduce several overhead effects, such as increasing 
the overall die area, which leads to more crosstalk delays, 
which would reduce the device frequency. In addition, a 
larger die may also create timing and power challenges 
because signals would need to travel farther.  
 Modern electronic-design-automation (EDA) tool 
vendors, in conjunction with process foundries, enforce EM-
related design rules that must be met as part of the IC sign-
off process. Such tools verify that interconnects and vias 
meet the EM design rules and identify all EM-related 
violations that require design fixes. EM analysis tools are 
even able to simulate switching activity patterns extracted 
from functional simulations representing real applications 
and take these patterns into account in the EM analysis 
process. When the worst-case switching patterns cannot be 
determined, designers often use a statistical analysis provided 
by the electronic-design-automation EM sign-off tool. In this 
case, the design is analyzed under a given set of switching 
probabilities, which may lead to an over-design process. The 
EM sign-off process is tedious and involves many fix 
iterations and trials. Some of the trials involve the use of 
wider metals and vias and, in several cases, may even limit 
the clock frequency, the switching rate, and the 
computational workload. The combination of all these 
limitations may result in degraded IC performance.  
 A study by Dasgupta et al. [7] introduced a 
methodology for synthesizing the design and scheduling data 
transfer from the control data flow graph to the hardware 
buses in an EM-aware manner. Their algorithm requires that 
the activity be determined in advance, so it becomes tightly 
coupled to each specific computational use that it targets.  
 A broad survey of additional physical-design-based 
techniques to mitigate EM impact is available in Ref. [10]. 
2.2.2 Prior work based on architecture  
 Only a limited number of prior works have suggested 
architecture-based solutions to the EM problem. Srinivasan 
et al. [6] suggested structural duplication and graceful 
performance degradation techniques to handle the EM effect. 
Structural duplication adds spare design structures to the IC 
and turns them on when the original structures fail. Graceful 
performance degradation, however, shuts down failing 
structures but keeps the IC functional while degrading its 
performance. This approach seems to incur a major hardware 
overhead related to the dedicated mechanisms to detect EM 
degradation through normal IC operation and the need for 
special circuits to switch on the redundant logic. In addition, 
it introduces extra power and performance overhead due to 
the addition of redundant hardware. 
 Abella et al. suggested [8] a novel architectural 
approach for “refueling” bi-directional busses by monitoring 
the current-flow direction each time data are transferred on 
the bus and suggested a mechanism that triggers current 
compensation whenever an imbalance occurs between the 
current flowing in each direction. Such a scheme could 
indeed relieve EM stress in older technologies; however, it 
has limited impact on advanced process node technologies 
because the healing effect of RMS current is less effective, 
and its negative impact on wire and via conductivity and 
reliability is more significant. In addition, given their design 
complexity, modern VLSI circuits do not commonly use 
bidirectional buses. The refueling mechanism also disrupts 
bus operation and may introduce a dynamic power overhead 
due to the reversal current.  
 Srinivasan et al. [5,20] suggested a dynamic reliability 
management approach where the processor dynamically 
maintains its lifetime reliability target by responding to the 
changing behavior of the application. This approach allows a 
processor with lower reliability to correctly operate while 
compromising performance or operating conditions. 
 Thus, applying only physical design-based solutions 
does not suffice because of the growing challenges involved 
by EM. The remainder of this paper describes our 
comprehensive architectural solution for handling EM.  
  4 
3. Distribution of Electromigration Stress in 
Modern Microprocessors 
 Since EM design rules are limited by the weakest link 
(i.e., the wire, which is most likely to be damaged), we start 
by examining the distribution of EM stress over the entire 
design of a modern microprocessor (note that the same 
concept may be applied to other ICs and applications). In this 
work, we choose to focus on subsystems that expect an 
intensive toggling rate of wires, which results in hotspots of 
EM stress. Subsection 3.1 describes our experimental 
environment, and subsection 3.2 presents our comprehensive 
observations on EM stress in microprocessors. 
3.1 Experimental Environment 
 For this study, we used the sniper x86-64 simulator 
[21]. We modified the simulation platform and added the 
needed mechanisms to model the behavior and measure the 
characteristics required for our experiments.  The simulation 
environment included both a detailed cycle-level x86 core 
model and a memory system. Table 1 summarizes the 
configuration of the simulation environment (based on the 
Intel Gainestown core [22]). 
 
Core 
model 
Frequency 2.66 GHz 
Execution 
units 
3 ALUs, 1 FP add / sub, 1 FP mul /div 
1 Branch, 1 Load unit, 1 Store unit 
Dispatch 
width 
4 
Execution 
order 
Out-of-order (instruction window: 
128) 
Memory 
system 
model 
L1-D 
Cache 
32KB, 8-Way, 64B block size, LRU, 
4 clock cycles access time and a 
throughput period of one cycle. 
L1-I Cache 32KB, 4-Way, 64B block size, LRU, 
4 clock cycles access time with 
instruction prefetching and instruction 
queue of 16-byte per cycle throughput 
L2 Cache 256KB, 8-Way, 64B block size, LRU, 
8 clock cycles access time. 
L3 Cache 8MB, 16-Way, 64B block size, LRU, 
30 clock cycles access time. 
D-TLB 64 entries, 4-Way 
I-TLB 128 entries, 4-Way 
S-TLB 
(2nd level) 
512 entries, 4-Way 
Table 1 – Configuration of baseline simulation model  
  
 We used the simulation benchmarks Spec2017 [23,24] 
with ref inputs. Every benchmark was run as a single-core 
workload in two different regions of interest: initialization 
phase and main execution phase (denoted “Init” and “Main,” 
respectively). Each experiment used 10 billion instructions 
(for both initialization and main execution phases).  
3.2 Experimental Observations of 
Electromigration Stress  
 This section examines the EM stress induced by three 
different parts of the microarchitecture: ALU execution units, 
architecture register files, and memory hierarchy subsystem. 
We believe that these areas involve the most intensive EM 
activities when running these workloads and, thus, will 
experience intense EM stress.  
 ALUs: Figure 1 shows the distribution of write 
operations among different ALUs when using the FIFO 
selection mechanism among all ready-to-execute 
instructions. Note that ALU0 is the most-used ALU of the 
three available, and ALU2 is the least used, which is 
attributed to the fixed allocation policy of the available 
ALUs, whereby a higher priority is given to an ALU with a 
lower index. Since ALU execution time is 1 clock cycle, all 
ALUs become available every cycle. For example, for a 
program that provides exactly one instruction per cycle, we 
expect only ALU0 to be used. Figure 1 supports this claim 
and shows that ALU0 is used at over twice the rate than 
ALU1, and nearly ten times the rate than ALU2 for most 
benchmarks. In such a logical implementation, the worst-
case switching factor of ALU0 dictates the worst-case EM 
scenario to be taken into account and applied to all ALUs.  
 
 
Figure 1 - Distribution of ALU execution count  
 
 Register-file: Our next set of experiments examines the 
EM stress on architectural registers. Figure 2 illustrates the 
distribution of write operations on general-purpose registers 
(GPRs: integer general purpose) for the Spec2017 
benchmarks. The distribution clearly is not uniform; for 
example, the RAX register is the most-stressed register in 
terms of write operations, whereas the non-legacy registers 
are hardly used and thus are significantly less stressed than 
the x86 legacy registers. The root cause of these differences 
is the nature of compiler register-allocation algorithms. 
Figure 2 also shows that the ratio of the average number of 
write operations to the maximum number of write operations 
varies from nearly 7% to 33%. This measurement is another 
indication that EM stress is not equally balanced between 
registers; thus, the register with the greatest number of writes 
dictates the overall switching ratio for EM.  
 
0%
10%
20%
30%
40%
50%
60%
70%
80%
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ain
62
3.
xa
lan
cb
m
k -
 In
it
62
3.
xa
lan
cb
m
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ai
n
M
ill
io
n
s
ALU Execution Units - Execution Count
ALU0 ALU1 ALU2 Avg/Max
9E+09
  5 
 
Figure 2- Distribution of general-purpose-register writes  
 
Figure 3 presents the number of write operations on 
FP registers only for the Spec2017 benchmarks that involve 
FP operations. The results presented for this case are similar 
to the results presented in Figure 1. For FP registers, the 
number of writes is significantly greater in the registers with 
lower indexes (i.e., ZMM0, ZMM1, and ZMM2 are the 
registers with the highest write count). Similar to integer 
registers, this can also be explained by the nature of the 
register-allocation algorithm of common compilers. In this 
case, the ratio of the average number of write operations to 
the maximum number of write operations is even smaller, 
which is indicative of an even larger variance relative to 
integer registers.  
 
 
Figure 3 - Distribution of writes to floating point registers  
 
 Memory hierarchy: Memories are highly susceptible 
to EM because they employ high-density bitcells with narrow 
and long metal wires that toggle upon every change of logical 
state. In addition, physical design tools lack the ability to 
handle every bitcell in an individual manner; therefore, the 
worst-case scenario is commonly applied to all bitcells. Since 
write operations are not uniformly distributed across all 
memory bitcells, the worst-case scenario is determined by the 
bitcell with the largest number of writes.  
 Note that the granularity of EM stress differs from one 
level of memory hierarchy to another; e.g., a single byte can 
be written in the L1 cache, but a minimum granularity of the 
cache line is imposed on all other levels of the cache 
hierarchy (assuming a line-fill mechanism). Since all bits 
within the write granularity have the same EM stress, we 
assume that they all have the same probability for failure and 
therefore conventional error-correction mechanisms are not 
effective at that granularity. 
 Because the memory hierarchy is important to the 
reliability of the entire system, Figure 4 collects the write 
statistics of the components that form modern memory 
hierarchy: L1 instruction and data caches, L2 cache, L3 
cache, the instruction translate look-aside buffer (ITLB), data 
TLB (DTLB), and secondary TLB (STLB). Figure 4 shows 
the ratios of the average number of write operations (as a 
result of TLB entry allocation) per entry, which reveals that 
DTLB involves significantly more write operations than 
ITLB. DTLB also involves nearly tenfold more write 
operations than STLB. A similar observation results from 
examining the ratio of write access of the L1-D cache to that 
of the L1-I cache. The L1-I cache involves write operations 
only upon cache line replacement, whereas L1-D maintains a 
much higher rate of write operations because of block 
replacement and each time an instruction targets a memory 
location. 
 Note that, although the initial observations indicate that 
the L1-D cache and the D-TLB have the highest write rate, 
we must still continue carefully watching the write 
distributions in the remaining memory hierarchy. In 
particular, it is important to monitor the write distribution to 
L2 and L3 cache memories. Although our experimental 
results show that these caches maintain lower write rates, 
they may be much more susceptible to EM than the L1 caches 
because of physical design considerations. Since both the L2 
and L3 caches are significantly larger than the L1 cache, they 
involve higher-density memory bitcells and significantly 
longer and narrower interconnect metal. Equation 2 supports 
this argument by indicating that the current density is 
inversely proportional to the metal width and proportional to 
the wire capacitance. The interconnect metals in both the L2 
and L3 caches, which use long wires, introduce a much 
greater interconnect capacitance than the L1 caches. 
 
 
Figure 4 – Write ratios in memory hierarchy 
 
 Based on this observation, the next few graphs focus 
on how EM affects the L1-D cache, L2 cache, L3 cache, and 
D-TLB. In the next figures, we present histograms of write 
operations partitioned into five histogram bins: 0%–25%, 
26%–50%, 51%–75%, 76%–90%, and 91%–100%. Each bin 
shows the number of cache entries with the ratio of write 
0%
5%
10%
15%
20%
25%
30%
35%
0
200
400
600
800
1000
1200
1400
1600
1800
2000
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ain
62
3.
xa
lan
cb
m
k -
 In
it
62
3.
xa
lan
cb
m
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ai
n
M
ill
io
n
s
General Purpose Writes Count 
RAX RBP RBX RCX RDX RDI RSI RSP R8 R9 R10 R11 R12 R13 R14 R15 Avg/Max
9E+09
0%
2%
4%
6%
8%
10%
12%
14%
16%
0
200
400
600
800
1000
1200
1400
1600
1800
2000
607.cactuBSSN -
Main
620.omnettp - init 620.omnettp - Main 625.x265 - Init 625.x265 - Main 628.pop2 - Init 628.pop2 - Main 649.fotonik3d - Init 649.fotonik3d  -
Main
M
ill
io
n
s
FP Registers Writes Count 
ZMM[0] ZMM[1] ZMM[2] ZMM[3] ZMM[4] ZMM[5] ZMM[6] ZMM[7] ZMM[8] ZMM[9] ZMM[10]
ZMM[11] ZMM[12] ZMM[13] ZMM[14] ZMM[15] ZMM[16] ZMM[17] ZMM[18] ZMM[19] ZMM[20] ZMM[21]
ZMM[22] ZMM[23] ZMM[24] ZMM[25] ZMM[26] ZMM[27] ZMM[28] ZMM[29] ZMM[30] ZMM[31] Avg/Max
0.1
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
60
0.
pe
rlb
en
ch
 - I
nit
60
0.
pe
rlb
en
ch
 - M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ain
60
5.
m
cf 
- I
ni
t
60
5.
m
cf 
- M
ain
60
7.
ca
ctu
BS
SN
 - M
ain
62
0.
om
ne
ttp
 -  
ini
t
62
0.
om
ne
ttp
 -  
M
ain
62
3.
xa
lan
cb
m
k -
 In
it
62
3.
xa
lan
cb
m
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ain
62
8.
po
p2
 - I
nit
62
8.
po
p2
 - M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
lee
la 
- I
ni
t
64
1.
lee
la 
- M
ain
64
8.
ex
ch
an
ge
2 
- In
it
64
8.
ex
ch
an
ge
2 
- M
ai
n
64
9.
fo
to
nik
3d
 - 
Ini
t
64
9.
fo
to
nik
3d
  -
 M
ai
n
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
Av
er
ag
e
Memory Hierarchy EM stress ratios 
DTLB/ITLB DTLB/STLB L1-D/L1-I L1/L2 L2/L3
  6 
distributions relative to the cache entry with the maximum 
number of write operations. For example, 20% for bin 26%–
50% means that 20% of the cache entries each experienced 
write operations in a ratio range of 26%–50% relative to the 
cache entry with the maximum number of write operations. 
The cache entry with the maximum number of writes is the 
entry that dictates the EM toggle-rate assumption for the 
entire cache. This illustrates how the distribution of all cache 
entries relative to the cache entry with the maximum number 
of writes can help clarify the EM stress distribution among 
all cache entries and allow us to explore new architecture to 
relieve EM stress.  
Figure 5 shows the write histogram of D-TLB entries 
and their tags. Note that, for all Spec2017 benchmarks, only 
a small number of entries experience a large ratio (above 90% 
relative to the entry with the maximum number of writes); 
these entries dictate the overall switching rate of the D-TLB. 
The majority of entries experience much lower write rates. 
Figure 5 also presents the ratio of the average number of 
writes per entry to the maximum number of writes of all 
entries, which varies from 2% to 100%, with an average of 
55%.  
 
 
Figure 5 - Distribution of DTLB writes  
 
Figure 6 shows a histogram of writes to L1-D cache data 
lines. A phenomenon appears similar to that observed in the 
D-TLB. Only a small number of cache lines have a high write 
ratio (above 90% relative to the maximal data cache line), 
whereas the majority of cache lines experience much lower 
write ratios. In most of the benchmarks, the ratio of the 
average number to the maximum number of writes is less 
than 30%, whereas the average ratio is 33%. 
 Figure 7 shows histogram cache writes for the L2 cache 
data lines. The observations, in this case, are similar to those 
for the L1-D cache. For both data blocks and tags, we observe 
that only a small portion of cache entries (data and tags) 
experience the highest write ratio (>90% relative to the entry 
with the maximum number of writes) and, as a result, they 
indicate severe EM conditions for all cache entries. We 
observe that the ratio of the average number of writes per 
entry to the maximum number of writes of all entries is 
approximately 50%. A similar result for write operations on 
cache lines was also obtained by Valero et al. in their study 
of the different aspects of cache reliability [19]. 
 
 
Figure 6 - Distribution of L1-D cache block writes  
 
 Examination of Figure 6 and Figure 7Figure 7 shows that 
two benchmarks, 623.deepsjeng-init and 649.fotonik3d-
main, behave differently than all other benchmarks. This is 
explained by the fact that the initialization phase of 
623.deepsjeng and the main execution phase of 
649.fotonik3d have write distributions that are spread 
uniformly over most cache lines.  
 
 
Figure 7 - Distribution of L2 cache block writes  
 
 Figure 8 shows a histogram for L3 writes for cache data 
lines. For most benchmarks, the number of writes is very 
small for the majority of cache data lines, where almost all of 
them experience 25% or less write operations relative to a 
very small portion of cache lines with the maximum number 
of writes. Overall, the ratio of the average number of write 
operations to the maximum number of writes in cache data 
lines is 8%. The benchmark 631.deepsjeng-init exhibits a 
similar behavior for writes spread uniformly across all cache 
lines, which is similar to the behavior of the L2 cache due to 
the relatively high store instruction count that peculates to the 
L3 cache.  
Figure 9–11 illustrate the write histograms of L1-D, L2, 
and L3 tag writes, respectively. The tag writes spread more 
uniformly over data lines, and the majority of cache tags 
experience less variance in the number of writes. The ratio of 
the average number of tag writes to the maximum number of 
tag writes is nearly 70% on average (over all benchmarks) for 
the L1-D cache and approximately 50% for L2 and L3 tags. 
 
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ai
n
62
0.
om
ne
tt
p 
- i
ni
t
62
0.
om
ne
tt
p 
- M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
D-TLB Entry/Tag Writes distribution
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ai
n
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ai
n
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ai
n
62
3.
xa
la
nc
bm
k 
- I
ni
t
62
3.
xa
la
nc
bm
k 
- M
ai
n
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ai
n
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ai
n
64
1.
le
el
a 
- I
ni
t
64
1.
le
el
a 
- M
ai
n
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ai
n
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ai
n
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
L1 Data cache block writes count distribution
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ai
n
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ai
n
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ai
n
62
3.
xa
la
nc
bm
k 
- I
ni
t
62
3.
xa
la
nc
bm
k 
- M
ai
n
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ai
n
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ai
n
64
1.
le
el
a 
- I
ni
t
64
1.
le
el
a 
- M
ai
n
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ai
n
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ai
n
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
L2 Cache block writes distribution 
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
  7 
 
Figure 8 - Distribution of L3 cache block writes  
 
To conclude the discussion of how EM affects the 
memory hierarchy, we observe that cache data lines 
experience a write distribution with high variance and with a 
minority of cache data lines being highly stressed by the 
maximum number of write operations and, as a result, dictate, 
much more severe EM conditions for the entire cache. 
Similar conclusions are obtained from our observation of 
register file write access and ALU use where, in both cases, 
the EM stress induced by the workload is nonuniformly 
distributed. Such behavior leads to an over-design condition 
for EM that can degrade overall performance and increase IC 
area. In the next section, we propose architectural 
mechanisms that take EM considerations into account to 
smooth EM stress. This approach results in a dramatic 
relaxation of the overall EM sign-off design conditions. 
 
 
Figure 9 - Distribution of L1-D cache tag writes  
 
 
Figure 10 - Distribution of L2 cache tag writes  
 
 
Figure 11 - Distribution of L3 cache tag writes  
4. Proposed Electromigration-Aware 
Resource-Allocation Mechanism  
 This section introduces architecture solutions to reduce 
EM stress. The principal of the solutions is based on EM-
aware resource allocation that smoothens write operations 
and the use of computational elements distribution over all 
available resources. As a result, EM stress is significantly 
reduced. Subsections 4.1–4.3 introduce EM-aware 
architectures for dealing with EM stress on ALU execution 
units, register files, and cache memories, respectively. 
4.1 Electromigration-Aware ALU Allocation 
 In the previous section, we observed that ALUs are not 
utilized in an EM-aware manner, which means that the 
maximum EM stress is dictated by a small, over-used subset 
of ALUs. The proposed EM-aware scheme assumes that all 
pending ALU instructions are allocated to a centralized 
instruction queue, and in each cycle a scheduler allocates 
ALUs to execution-ready instructions. Although the 
proposed scheme is described for ALUs, it can also be 
applied to any type of multi-execution unit employed by the 
microprocessor. 
 We present two alternatives that implement the same 
basic principle in different ways. The aim of both solutions 
is to start allocating the resources from a different leading 
point each time. The first simple solution is to have a counter 
(e.g. 32-bit counter) that is incremented each clock cycle and 
wraps around when expired so that the leading resource 
number to use is calculated as 
 
Resource# = counter mod N 
Equation 3 - Leading resource allocation 
 
where N is the number of the physical resource. Thus, for our 
simulated environment, we assume N = 3. When the counter 
expires, we stop allocating resources for that cycle, reset its 
content, and continue with the allocation in the next cycle. 
 The second solution is illustrated in Figure 12; here, we 
extend each resource with a single bit and add a single global 
bit for the overall management of the allocation. All counters 
are initialized to zero upon reset. 
0.000%
10.000%
20.000%
30.000%
40.000%
50.000%
60.000%
70.000%
80.000%
90.000%
100.000%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ai
n
62
0.
om
ne
tt
p 
- i
ni
t
62
0.
om
ne
tt
p 
- M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
L3 Cache block writes distribution 
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ai
n
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ai
n
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ai
n
62
3.
xa
la
nc
bm
k 
- I
ni
t
62
3.
xa
la
nc
bm
k 
- M
ai
n
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ai
n
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ai
n
64
1.
le
el
a 
- I
ni
t
64
1.
le
el
a 
- M
ai
n
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ai
n
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ai
n
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
L1 Data cache tag writes count distribution 
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
0.000%
10.000%
20.000%
30.000%
40.000%
50.000%
60.000%
70.000%
80.000%
90.000%
100.000%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ai
n
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ai
n
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ai
n
62
3.
xa
la
nc
bm
k 
- I
ni
t
62
3.
xa
la
nc
bm
k 
- M
ai
n
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ai
n
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ai
n
64
1.
le
el
a 
- I
ni
t
64
1.
le
el
a 
- M
ai
n
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ai
n
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ai
n
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
L2 Cache Tag writes distribution 
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ai
n
62
0.
om
ne
tt
p 
- i
ni
t
62
0.
om
ne
tt
p 
- M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
L3 Cache tag writes distribution 
0--0.25 0.26--0.50 0.51--0.75 0.76--0.90 0.91-1.0 Avg/Max
  8 
 
Figure 12 - Electromigration-aware ALU allocation scheme 
  
The allocation algorithm is specified as follows: 
 
Algorithm 1 – EM-aware execution-unit allocation: 
Input: k<N number of execution units to be allocated. 
Output: Vector E=(e0, e1, …, en-1), for every 0 i  n−1, 
if ei=1 execution unit i to be allocated, otherwise 
execution unit i is not allocated. 
Initialization: Ex_counter[i]=0 for every 0 i  n−1, 
Global_counter=0 
1. M = {0 i  n−1 | Ex_counter[i]= Global_counter} 
2. if k< |M| then 
3.  let PM such that |P| = k 
4.  ei=1 for every iP, otherwise ei=0 
5.  Ex_counter[i]++ for every iP 
6. end if 
7. else // k |M| 
8.  let P  U\M such that |P|= k-|M|  
9.  ei=1 for every iPM, otherwise ei=0 
10.  Ex_counter[i]++ for every i PM 
11.  Global_counter++ 
12. end else 
13. return E 
 
 We suggest that the EM-aware allocation algorithm 
selects execution units whose corresponding counter state 
equals the global counter. If the number of available 
execution units that satisfy this condition exceeds the 
required number of instructions to be issued, then a subset 
(based on the required number of instructions to be issued) of 
those execution units is selected, and all their corresponding 
counters are switched (between zero and one). Otherwise, all 
execution units with their counter state equal to the global 
counter are selected while the rest of the execution units 
needed to satisfy the required instruction to be issued are 
selected from the other pool of ALUs whose counter is not 
equal to the global counter. In this case, the global counter 
incremented. In addition, the counters corresponding to the 
selected execution units are incremented. Table 2 shows an 
example of the algorithm output for three ALUs. 
 
Clock 
cycle 
ALU 
instructions 
to be issued 
ALU  
2, 1, 0 
counters 
Global 
counter 
Selected 
ALU(s) 
0 0 0, 0, 0 0 None 
1 2 0, 1, 1 0 0, 1 
2 2 1, 1, 0 1 2, 0 
3 3 0, 0, 1 0 1, 2, 0 
Table 2 - Example of EM-aware ALU scheduling  
 
 As seen in Table 2, for each instruction issued, the 
algorithm balances the use of all execution units and thereby 
protect all execution units from overuse. The implementation 
of the first solution is straightforward and may perform well 
given a large number of execution units. The implementation 
of the second solution is more complicated, but our 
implementation trial indicates that it can be done with 
negligible overhead in terms of logical area and computation 
time for both the ALU-selection logic and the counter-
incrementation logic.  
4.2 Electromigration-Aware Registers Allocation 
 The results of the measurements presented Section 3 
clearly indicate that write operations to registers are not 
uniformly distributed. Moreover, specific registers (e.g., 
RAX) experienced an excessive number of writes. Such 
behavior by a small number of registers dictates difficult EM 
conditions for all registers and may result in reliability 
concerns. Note that this section deals mainly with 
architectural registers assigned by the compiler rather than 
with physical registers implemented by the out-of-order 
(OoO) microprocessors. For the latter, physical registers 
(implemented within the reorder buffer) are usually 
implemented as a cyclic buffer and, as a result, all writes are 
spread uniformly over time.  
 The proposed architectural solution, illustrated in 
Figure 13, avoids hotspots in register writes by periodically 
changing the mapping of registers to their corresponding 
architectural hosting locations. The scheme is based on 
modulo rotation of the mapping between the architectural 
register identifier and their physical locations. As illustrated 
in Figure 13, a pulse trigger is asserted to shift the register 
mapping in the register-file (RF) either periodically (or each 
time we change CR3) or as part of the return-from-interrupt 
procedure before saving the values of the user-level process. 
A modulo-counter (RF rotator) serves to map the 
architectural register number to the physical register location 
by modulo addition. After each assertion of the rotation 
trigger (at any arbitrary time point), the counter is 
incremented, and the register values are shifted between 
registers, as illustrated in Figure 13.  
 
ALU0 ALU1 ALU n-1….
1-bit 
counter
1-bit 
counter
1-bit 
counter
1-bit global 
counter
  9 
 
Figure 13 – Scheme for electromigration-aware RF mapping  
4.3 Electromigration-Aware Cache Memories 
 EM in cache structures generates hot spots in various 
cache lines that are spread nonuniformly. Note that, in this 
subsection, the term “cache” refers to any architectural 
structure that uses a cache organization (e.g., TLBs, L1 
cache). As a result, a small fraction of cache lines dictates the 
worst EM scenario for the entire cache. The principal of the 
proposed EM-aware cache memory scheme, illustrated in 
Figure 14, is similar to the register file solution. It avoids 
hotspots of cache writes by periodically changing the cache 
set mapping of memory addresses to their corresponding 
physical cache lines. As with the RF solution, the principal 
of this scheme is based on modulo rotation of the mapping 
between the set field (taken from the memory address) and 
its physical set location. A pulse trigger is periodically 
asserted to shift the mapping of the set. A modulo-counter 
(cache rotator) maps the address set field to the physical set 
location by modulo addition. After each assertion of the 
rotation trigger, the counter is incremented, and all cache 
lines are invalidated, as illustrated in Figure 14. To avoid the 
potential overhead incurred by flushing the content of the 
caches (and by the write-back of all the dirty lines), we 
suggest doing the operation either very infrequently or by 
exploiting events that require flushing these structures (e.g., 
after a sleep mode when all caches were cleaned).  
 
 
Figure 14 - Electromigration-aware cache memory mapping 
5. Experimental Study of Electromigration-
Aware Architecture  
 This section presents the experimental results for the 
proposed architecture solution (presented in the previous 
section) to reduce the impact of EM. Note that our proposed 
techniques in Section 4 did not report performance overhead, 
so this section focuses on how the algorithms proposed herein 
affect the EM stress.  
 We first examine an EM-aware solution for ALU 
execution units. Figure 15 shows how the second solution 
presented in the previous section (see Algorithm 1) affects 
the EM stress for the SPEC2017 benchmarks. Examination 
of the two solutions indicates that they behave very similarly. 
The results show that the proposed algorithm significantly 
reduces EM stress by 50% over all benchmarks. The results 
vary from nearly 25% reduction up to 65% reduction. This 
result is due to the fact that the proposed scheme distributes 
ALU use uniformly, which spreads the maximum EM stress 
smoothly over all ALUs.  
 As part of this study, we also compare the instructions 
per cycle (IPC) versus EM stress reduction, as shown in 
Figure 16. Benchmarks with small IPC have a greater 
potential for EM stress reduction because of the underused 
ALUs that could potentially help reduce the maximum EM 
stress by distributing it uniformly. 
 
 
Figure 15 - Distribution of ALU execution count with 
electromigration-aware allocation 
 
 
Figure 16 - ALU electromigration stress reduction vs. IPC 
 
 The next results show the EM stress reduction obtained 
by the proposed architectural solution for both the GPR 
Rn-1
Rn-2
Rn-3
R0
RF rotator
Arch Register ID
+
Rotated Register ID
….
R
F w
rite
 p
o
rt
Rn-1
Rotate trigger
Tag Index Offset
Cache rotator
+Data blocks Tags
Set selection
Rotate 
trigger
Cache 
invalidate
Cache
0%
10%
20%
30%
40%
50%
60%
70%
0
500
1000
1500
2000
2500
3000
3500
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ain
62
3.
xa
lan
cb
m
k -
 In
it
62
3.
xa
lan
cb
m
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ai
n
M
ill
io
n
s
ALU Execution Units with EM aware scheduleing - Execution Count and EM stress Reduction
ALU0 ALU1 ALU2 EM stress reduction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.5
1
1.5
2
2.5
3
3.5
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ai
n
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ai
n
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
ini
t
62
0.
om
ne
ttp
 - 
M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - I
nit
62
5.
x2
65
 - M
ain
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ai
n
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
. -
 M
ai
n
64
1.
lee
la 
- I
ni
t
64
1.
lee
la 
- M
ain
64
8.
ex
ch
an
ge
2-
In
it
64
8.
ex
ch
an
ge
2 -
 M
ain
64
9.
fo
to
ni
k3
d-
In
it
64
9.
fo
to
ni
k3
d-
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ai
n
EM stress reduction vs. IPC
IPC EM stress reduction
  10 
register file and FP register file (see Figure 17 and 18, 
respectively). For both register files, the number of writes is 
distributed uniformly over all registers, and no hotspots exist 
(e.g., RAX or ZMM0). In addition, the write stress decreases 
dramatically by nearly 80% on average for the GPR register 
file and 90% for the FP register file. The rotation trigger in 
the simulation was asserted every 10 000 000 clock cycles. In 
the experiment, we examined different rotation trigger rates 
and found that this value does not impact performance.  
 
 
Figure 17 – Distribution of general purpose register writes 
with electromigration-aware allocation 
   
 
Figure 18 - Distribution of FP register writes with 
electromigration-aware allocation 
   
 As part of the EM study, we also observed that the flags 
and stack-pointer registers experienced excessive stress of 
write operations, which makes them highly susceptible to 
EM stress. Figure 19 illustrates the number of write 
operations to the flags register and stack-pointer register and 
compares them with the maximum number of writes per 
register in the GPR register file. For almost all benchmarks, 
the number of writes to the flags register significantly 
exceeds those to the GPR and stack-pointer registers. This 
result is due to the fact that almost every computation 
instruction involves implicit write operations to the flag 
register, which motivates us to extend the EM-aware scheme 
proposed for the GPR register file to include both the flag and 
stack-pointer registers. Figure 19 shows that, in this case, the 
maximum number of write operations (EM stress) is reduced 
even more (varying from 80% to >90%).  
 
 
Figure 19 - Distribution of general purpose registers, flags, and 
stack pointer writes with electromigration-aware allocation 
 
 The last part of this section is devoted to examining the 
reduced write stress for the TLBs and cache memory data 
lines and tags. The experimental results are illustrated in 
Figure 20–22, respectively. In most cases, the EM write 
stress is significantly reduced as a result of the repetitive 
rotation of the set mapping and the cache invalidation. Such 
rotation and invalidation actions help to distribute write 
operations uniformly over all sets and ways. For the D-TLB, 
we suggest triggering the rotation either when the TLB is 
flushed by the system, or by performing a period rotation 
(e.g., every 10M TLB accesses). For the L1-D cache, we 
suggest a similar periodic rotation trigger every 10M 
accesses. For all these options, the performance overhead is 
minimal, and the EM stress is reduced, as indicated in Figure 
20. As previously discussed, for both L2 and L3, we suggest 
triggering the set rotation upon each system wakeup from 
sleep mode. In this case, no performance overhead is 
incurred. In our simulation we use an interval of 10M cache 
accesses, the same trigger duration of the L1-D cache for both 
the L2 and L3 caches.  
 Figure 20 illustrates the write-stress reduction for 
DTLB. On average, the write stress is reduced by 44% over 
all benchmarks. Figure 21 summarizes the reduction in EM 
write stress for L1-D, L2, and L3 caches. For L1-D, L2, and 
L3 caches an average reduction in the maximum number of 
writes is 69%, 46%, and 92%, respectively. Figure 22 
summarizes the EM stress reduction in cache tags. In this 
case, the EM stress reduction is 28%, 46%, and 46% for L1-
D, L2, and L3 caches, respectively. Note that the 
experimental results of the EM-aware architectural solution 
are consistent with the results presented in Section 3. These 
figures suggest that a smaller ratio of the average number of 
write operations to the maximum number of writes 
corresponds to greater EM stress reduction. 
 
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
0
100
200
300
400
500
600
700
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ain
62
3.
xa
lan
cb
m
k -
 In
it
62
3.
xa
lan
cb
m
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ai
n
M
ill
io
n
s
General Purpose Register Writes with EM aware scheduling 
and EM stress reduction
RAX RBP RBX RCX RDX RDI RSI RSP R8 R9 R10 R11 R12 R13 R14 R15 EM stress reduction
75%
80%
85%
90%
95%
100%
0
20
40
60
80
100
120
140
607.cactuBSSN - Main 620.omnettp - init 620.omnettp - Main 625.x265 - Init 625.x265 - Main 628.pop2 - Init 628.pop2 - Main 649.fotonik3d - Init 649.fotonik3d  - Main
M
ill
io
n
s
FP Register Writes with EM aware scheduling and EM stress reduction
ZMM[0] ZMM[1] ZMM[2] ZMM[3] ZMM[4] ZMM[5] ZMM[6]
ZMM[7] ZMM[8] ZMM[9] ZMM[10] ZMM[11] ZMM[12] ZMM[13]
ZMM[14] ZMM[15] ZMM[16] ZMM[17] ZMM[18] ZMM[19] ZMM[20]
ZMM[21] ZMM[22] ZMM[23] ZMM[24] ZMM[25] ZMM[26] ZMM[27]
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
0
2000
4000
6000
8000
10000
12000
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ain
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ain
62
0.
om
ne
ttp
 - 
in
it
62
0.
om
ne
ttp
 - 
M
ain
62
3.
xa
lan
cb
m
k -
 In
it
62
3.
xa
lan
cb
m
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ain
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ai
n
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ai
n
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ai
n
M
ill
io
n
s
General Purpose, stack and flags registers Writes with EM aware scheduling and EM 
stress reduction
Flags Max GPRs Stack pointer EM stress reduction
  11 
 
Figure 20 - DTLB electromigration stress reduction 
 
 
Figure 21 - Cache lines electromigration stress reduction 
 
 
Figure 22 - Cache tags electromigration stress reduction 
 
 Based on the experimental results, we observe that EM 
stress can be significantly reduced in various microprocess 
building blocks that are highly susceptible to EM, namely, 
execution units, register files, and the memory hierarchy. The 
observations detailed herein reveal an average reduction in 
EM stress of 50% for ALUs, 80%–90% for the register files, 
and 46%–92% for cache-memory data blocks. These results 
indicate that the proposed EM-aware solution should allow 
microprocessor designers to significantly relax the maximum 
toggling rate and, as a result, to avoid a significant number of 
potential EM violations.  
 Alternatively, the reduction in the maximum switching 
rate translates into an extended device lifetime. As indicated 
in Section 2, the MTTF is proportional to the switching rate, 
so a reduction of 50% in the switching rate should double the 
lifetime. These numbers, of course, depend on the workload 
being run by the microprocessor, and benchmarks exist 
where EM stress is reduced even more (e.g., in 600.perlbench 
the write reduction in the memory hierarchy exceeds 70%, 
which may more than triple the overall lifetime). Still, a small 
number of benchmarks exist with less EM stress reduction 
(e.g., 628.pop2, for which the EM stress reduction is 5%–
25%). As a result, the overall gain in lifetime is 5%–33%. 
6. Conclusions 
 Microprocessor reliability is a crucial requirement that 
introduces major micro-architectural and design challenges 
in advanced process nodes. In this study, we observed that 
microprocessors are highly susceptible to EM because they 
process highly variable dynamic workloads on non-EM-
aware micro-architectures. We introduce herein an 
architectural solution that takes into account the EM effect 
and reduces excess use of execution units and write 
operations to register files and memory-hierarchy elements. 
The principal of the proposed solution is based on EM-aware 
resource-allocation mechanisms that smoothly distribute 
write operations and the use of computational elements over 
all available resources. The experimental results indicate that 
the proposed architecture significantly relaxes the EM sign-
off conditions by 50% for ALUs, 80%–90% for the register 
files, and 46%–92% for the data blocks of cache memories. 
In addition, because the MTTF is proportional to the 
switching rate, these results translate to at least a twofold 
extension in lifetime. Of course, this result depends on the 
specific workload; for certain benchmarks, the lifetime 
extension may be threefold or even higher. 
 EM has become a major challenge in advanced 
technologies, and further studies are required to continue 
exploring new architectures and to identify other avenues to 
reduce EM and extend device lifetime. In this study, we 
examined how EM stress affects modern microprocessors, 
although the approach used herein may be extended to other 
processing elements such as VLIW machines, DSPs, network 
processors, security engines, GPUs, and TPUs. 
 
References 
[1]  X. Xuan, Analysis and Design of Reliable Mixed-Signal CMOS 
Circuits, PhD thesis, Georgia Inst. of Technology, Dept. of Electrical 
and Computer Engineering, 2004. 
[2]  J. Lienig and G. Jerke, Embedded Tutorial: Electromigration-Aware 
Physical Design of Integrated Circuits, Proc. 18th Int’l Conf. VLSI 
Design (VLSID 05), IEEE Press, 2005, pp. 77-82. 
[3]  J. Lienig, Introduction to electromigration-aware physical design. In 
Proceedings of the International Symposium on Physical Design 
(ISPD’06). ACM, New York, 39–46. 
[4] J. Lienig. Electromigration and Its Impact on Physical Design in Future 
Technologies. Proceedings of the 2013 ACM International symposium 
on Physical Design, March 2013. 
[5] J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers. Lifetime 
Reliability: Toward an Architectural Solution. IEEE Micro, special 
issue on Emerging Trends, vol. 25, issue 3, May-June 2005, 2-12. 
[6]  J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers. Exploiting 
Structural Duplication for Lifetime Reliability Enhancement. the 
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ai
n
62
0.
om
ne
tt
p 
- i
ni
t
62
0.
om
ne
tt
p 
- M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
D-TLB EM Writes stress reduction 
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ai
n
62
0.
om
ne
tt
p 
- i
ni
t
62
0.
om
ne
tt
p 
- M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
Cache Line Writes EM Stress Reduction 
L1-D L2 L3
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
60
0.
pe
rlb
en
ch
 - 
In
it
60
0.
pe
rlb
en
ch
 - 
M
ain
60
2.
gc
c -
 In
it
60
2.
gc
c -
 M
ai
n
60
5.
m
cf
 - 
In
it
60
5.
m
cf
 - 
M
ain
60
7.
ca
ct
uB
SS
N 
- M
ai
n
62
0.
om
ne
tt
p 
- i
ni
t
62
0.
om
ne
tt
p 
- M
ai
n
62
3.
xa
la
nc
bm
k -
 In
it
62
3.
xa
la
nc
bm
k -
 M
ain
62
5.
x2
65
 - 
In
it
62
5.
x2
65
 - 
M
ai
n
62
8.
po
p2
 - 
In
it
62
8.
po
p2
 - 
M
ain
63
1.
de
ep
sje
ng
 - 
In
it
63
1.
de
ep
sje
ng
 - 
M
ain
64
1.
le
el
a -
 In
it
64
1.
le
el
a -
 M
ain
64
8.
ex
ch
an
ge
2 
- I
ni
t
64
8.
ex
ch
an
ge
2 
- M
ain
64
9.
fo
to
ni
k3
d 
- I
ni
t
64
9.
fo
to
ni
k3
d 
 - 
M
ain
65
7.
xz
. -
 In
it
65
7.
xz
. -
 M
ain
Cache Tag  Writes EM Stress Reduction
L1-D L2 L3
  12 
Proceedings of the 32nd International Symposium on Computer 
Architecture (ISCA'05) June 2005. 
[7] A. Dasgupta and R. Karri, Electromigration Reliability Enhancement 
Via Bus Activity Distribution, Proc. 33rd Ann. Conf. Design 
Automation (DAC 96), ACM Press, 1996, pp. 353-356. 
[8] J. Abella, Xavier Vera, Osman S. Unsal Oguz Ergin, Antonio 
Gonza ́lez and James W. Tschanz. Refueling: Preventing Wire 
Degradation due to Electromigration. IEEE Micro (Volume: 28 , Issue: 
6 , Nov.-Dec. 2008 ). 
[9]  J. Tao et al., Modeling and Characterization of Electromigration 
Failures under Bidirectional Current Stress, IEEE Trans. Electron 
Devices, vol. 43, no. 5, May 1996, pp. 800-808. 
[10] J. Abella and X. Vera, Electromigration for Microarchitects. ACM 
Computing Surveys (CSUR)March 2010 Article No.: 9 
[11] Operating Temperature, Wikipedia - 
https://en.wikipedia.org/wiki/Operating_temperature.  
[12] Failure Mechanism based Stress test Qualification for Integrated 
Circuit. Automotive Electronics Council, Component Technical 
Committee - AEC - Q100 - REV-G standard. 
[13]  J. R. Black, “Electromigration – A brief survey and some recent 
results,” IEEE Trans. on Electronic Devices (April 1969), 338-347. 
DOI= http://dx.doi.org/10.1109/T-ED.1969.16754 
[14]  Andrew B. Kahng, Siddhartha Nath and Tajana S. Rosing, On Potential 
Design Impacts of Electromigration Awareness. 2013 18th Asia and 
South Pacific Design Automation Conference (ASP-DAC) 
[15]  I. A. Blech, Electromigration in thin aluminum films on titanium 
nitride, J. Appl. Phys., vol. 47 (1976), 1203–1208. 
http://dx.doi.org/10.1063/1.322842 
[16]  C. S. Hau-Riege, An introduction to Cu electromigration, Microel. 
Reliab., vol. 44 (2004), 195–205. DOI= 
http://dx.doi.org/10.1016/j.microrel.2003.10.020 
[17] A. Scorzoni, B. Neri, C. Caprile, F. Fantini, Electromigration in thin- 
film inter-connection lines: models, methods and results, Material 
Science Reports, New York: Elsevier, vol. 7 (1991), 143–219. 
http://dx.doi.org/10.1016/0920-2307(91)90005-8 
[18] D. Young, A. Christou, Failure mechanism models for 
electromigration, IEEE Trans. on Reliability, vol. 43(2) (June 1994), 
186–192. DOI= http://dx.doi.org/10.1109/24.294986 
[19] A. Valero, N. Miralaei, S. Petit, J. Sahuquillo, and T. M. Jones. On 
Microarchitectural Mechanisms for Cache Wearout Reduction. IEEE 
Transactions on Very Large-Scale Integration (VLSI) Systems, Vol. 
25, No. 3, March 2017. 
[20] J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, The Case for 
Lifetime Reliability-Aware Microprocessors, Proceedings of 31st 
International Symposium on Computer Architecture (ISCA '04) June 
2004. 
[21] T. E. Carlson, W. Heirman, and L. Eeckhout. Sniper: Exploring the 
level of abstraction for scalable and accurate parallel multi-core 
simulations. In Proceedings of the International Conference for High 
Performance Computing, Net- working, Storage and Analysis (SC), 
Nov. 2011. 
[22]  Michael E. Thomadakis. The architecture of the Nehalem processor 
and Nehalem-EP smp platforms. Technical report, December 2010. 
http://sc.tamu.edu/systems/eos/nehalem.pdf.  
[23] A. Limaye and T. Adegbija, “A workload characterization of the spec 
cpu2017 benchmark suite,” in 2018 IEEE International Symposium on 
Performance Analysis of Systems and Software (ISPASS), pp. 149–
158, April 2018 
[24] Q. Wu, Steven Flolid, Shuang Song, Junyong Deng, Lizy K. John. Hot 
Regions in SPEC CPU2017. 2018 IEEE International Symposium on 
Workload Characterization (IISWC). 
