Memory BIST with Statistical Failure Analysis for Diagnosis of Resistive-Open Defects due to Electromigration and Stress-Induced Voiding in an SRAM by Kim, Woongrae
MEMORY BIST WITH STATISTICAL FAILURE ANALYSIS FOR 
DIAGNOSIS OF RESISTIVE-OPEN DEFECTS DUE TO 










A Thesis Dissertation 
Presented to 














In Partial Fulfillment 
of the Requirements for the Degree 
Master of Science in the 








Georgia Institute of Technology 
December 2015 
 




MEMORY BIST WITH STATISTICAL FAILURE ANALYSIS FOR 
DIAGNOSIS OF RESISTIVE-OPEN DEFECTS DUE TO 


















Approved by:   
   
Dr. Linda Milor, Advisor 
School of Electrical and Computer 
Engineering 
Georgia Institute of Technology 
  
   
Dr. Azad J Naeemi 
School of Electrical and Computer 
Engineering 
Georgia Institute of Technology 
  
   
Dr. Albert B Frazier  
School of Electrical and Computer 
Engineering 
Georgia Institute of Technology 
  
   









I would like to thank Prof. Linda Milor for her professional guidance on my 
research and thesis. Since I have joined in her group in 2013, I have learned novel 
approaches and skills to solve difficult problems and to explore academic achievements.  
Also, I would like thank Prof. Naeemi and Prof. Frazier for serving as the 
dissertation committee members and the insightful comments to improve my research 
thesis.  
Finally, I would like to thanks to my colleagues, Dae-Hyun Kim, Chang-Chih 
Chen, Soonyong Cha, Tazhi Lu, Kexin Yang, for the co-work and a lot of help to 














TABLE OF CONTENTS 
    Page  
ACKNOWLEDGEMENTS iii 
LIST OF TABLES vi 
LIST OF FIGURES vii 
LIST OF SYMBOLS AND ABBREVIATIONS ix 
SUMMARY x 
CHAPTER 
1 INTRODUCTION 1 
2 BACKGROUND 3 
3 WEAROUT MODELING  6 
3.1 Modeling EM effect in an SRAM array 6 
3.2 Modeling SIV effect in an SRAM array 7 
3.3 Wearout Simulation Flow and Results  8 
4 BUILT-IN SELF TEST FOR DIAGNOSIS 12 
4.1 Built-In Self Test System and Architecture 12 
 4.1.1 Overview of BIST System 12 
 4.1.2 BIST Controller 13 
 4.1.3 Output Response Analyzer (ORA) 13 
 4.1.4 Built- In Self Test Area  13 
4.2 BIST Algorithm for Wearout in an SRAM Array 18 
 4.2.1 Overview of the Test Algorithm 18 
 4.2.2 Algorithm to Find Proper Reference Cells 20 
 4.2.3 TF1, TF2, DRF1, and DRF2 Tests 21 
 v 
 4.2.4 TF3 Algorithm 24 
 4.2.5 TF4 Algorithm 26 
 4.2.6 Detectable Range of Wearout Mechanisms 29 
4.3 Reconfigurable Platform to Generate BIST System and Test Bench   30 
5 STATISTICAL ANALYSIS FOR SEPARATE DISTRIBUTIONS 35 
5.1 Statistical Analysis for Wearout in an SRAM Array 35 
5.2 Statistical Analysis with Stress Acceleration 38 



















LIST OF TABLES  
     Page 
Table 3.1: Fault Groups and Indices for Resistive Open Faults 11 
Table 4.1: Test Modes and BIST Patterns 19 
Table 4.2: Simulation Results for the TF1 and TF2 22 
Table 4.3: Simulation Results for the TF3 for O6 and O7 during Read ‘0’ 25 
Table 4.4: Simulation Results for the TF4 during Read Operations 26 
Table 4.5: Digital Logic Equations for Diagnosis with TF4   28  
Table 4.6: Detectable Range of Inserted Resistances 30 





















   






























Figure 2.1: Weibull distribution for wearout mechanisms for memory bit 
failure distribution.         
 
Figure 3.1: (a) Possible backend wearout sites in a physical layout of a 6T 
SRAM cell: via/contact voiding due to EM (O2, O5, O9) and SIV (O1–O11), 
and (b) via/contact voiding in the SRAM cell schematic. 
 
Figure 3.2  Flow of system-level modeling of chip wearout [27]. 
 
Figure 3.3  Four use scenarios provided by Intel 
 
Figure 3.4.  The characteristic lifetimes of vias/contacts due to EM for 32Kb 
SRAM cells for different use scenarios: the cumulative probability distribution 
of lifetime for vias/contacts due to EM. The characteristic lifetimes are when 
62% of samples fail.  Variation is due to variation in stress experienced by the 
32K cells. 
 
Figure 3.5  The characteristic lifetimes of vias/contacts due to SIV for 32Kb 
SRAM cells for various use scenarios. 
 
Figure 4.1.  Overview of built-in self test (BIST) system. 
 
Figure 4.2  Test architecture in built-in self test area. 
 
Figure 4.3.  Sensing circuit for current variations in data lines [1]:   (a) current 
subtractor and amplifier block, (b) current digitizer, and (c) weighted  
reference current generator.  Wn can be selected to generate different reference 
currents. 
 
Figure 4.4.  BIST algorithm. 
 
Figure 4.5. Write ‘0’ with the TF1 algorithm: an SRAM cell with O4 and O5. 
 
Figure 4.6.  DRF1 test to distinguish O4 from O5 fault. 
 
Figure 4.7.  Write and read logic ‘0’ after a write ‘1’ operation: (a) an SRAM 















































Figure 4.8.  Simulation results with TF3: (a) bitline pair voltages and its 
digitized value from a cell with an O6 fault, and (b) bitline pair voltages and 
its digitized value from a cell with an O7 fault. 
 
Figure 4.9: Detectable ranges of an inserted resistance in an SRAM cell using 
current test scheme: TF1, TF2, DRF1, and DRF2 tests for O2, O3, O4, and O5 
faults. 
 
Figure 4.10: Reconfigurable platform to generate the customized BIST for the 
various sizes of caches based on the commercial tool from Mentor Graphics. 
 
Figure 4.11: BIST implementation flow based on the commercial tool from 
Mentor Graphics. 
Figure 5.1: Failure rate distribution using a simulator which determines the 
stress distribution of SRAM cells inside a microprocessor for EM and SIV 
with four use scenarios: (a) corporate, (b) gaming, (c) office, and (d) general 
scenario. 
Figure 5.2  (a) Error analysis for 𝑃𝑐ℎ𝑖𝑝 when simulation data from the wrong 
use scenario (gaming senario) are used for failure analysis for the “true” 
corporate scenario for EM and SIV, (b) error analysis for λ − λ′ with 𝑃𝑚,𝑆𝐼𝑉 
and 𝑃𝑚,𝐸𝑀 for general use. 
Figure 5.3  Failure rate distribution using a simulator with general use scenario 
for EM and SIV.  Each group for each temperature contains three sub groups 











LIST OF SYMBOLS AND ABBREVIATIONS 
η                           Weibull characteristic lifetime  
BIST  Built-In Self Test 
EM  Electromigration 
SIV  Stress-Induced Voiding 
TPG  Test Pattern Generator 
CS   Chip Selection Signal 
WE  Write driver enable signals 
W_data  Data inputs 
SAE  Sense amplifier enable signals 
PRE  Precharge circuit enable signals 
V_pre  Precharge voltages 
T_row_addr Test row address 
T_col_addr Test column address 
ORA  Output Response Analyzer 
SC  Sensing Circuit 
I/Os  Inputs and Outputs 








We present a Built-In Self Test (BIST) scheme for failure diagnosis of via/contact 
voiding due to electromigration (EM) and stress-induced voiding (SIV) in an SRAM 
array. The BIST system aims to detect the backend wearout mechanisms in faulty cells 
and identify the cause of the failure in an SRAM array. This electrical test methodology 
enables more efficient physical failure analysis without significant test cost.  
Some faulty sites for different backend mechanisms result in exactly the same 
electrical failure signature. For these faulty sites, we can identify the cause of failure and 
potentially distinguish EM and SIV failures by matching the observed failure rate from 
BIST and the failure distribution function based on mathematical reliability models. 
Hence, the statistical method helps to determine separate wearout distributions for EM vs. 
SIV with electrical on-chip tests only. 
The estimation of wearout distributions from the statistical failure analysis and 
BIST methodology can be useful in identifying wearout model parameters for each 
mechanism, leading to determining the wearout limiting mechanisms in the field.  The 
extracted models can then be compared with process-level models to check whether the 
lifetime estimation is correctly done, and if not, appropriate corrections can be made to 









Reliability is becoming more important since process technology scaling leads to 
the reduction of dimensions of devices and interconnects without reducing the supply 
voltage in proportion. System designers try to address the reliability issues through the 
use of redundancy and error correcting codes (ECCs). However, systems can fail in the 
field even with the repair methodology if the memory does not have sufficient 
redundancy or if the wearout rate is faster than expected. For these cases, failing chips are 
usually returned to the manufacturer, and the chip is expected to be analyzed to diagnose 
the cause of failure.   
The standard method is physical failure analysis, which involves deprecessing to 
determine the defect visually. The success rate of the physical failure analysis method is 
not always high and the test cost is significant. The proposed work can provide guidance 
and can increase the failure analysis efficiency by replacing physical failure analysis with 
electrically determining the faulty cells and cause of failure when failures are due to 
resistive opens.  
According to the International Technology Roadmap for Semiconductors, 
commercial high performance processors are expected to contain 82% embedded 
memory on average. Since the embedded memory, which is usually SRAM, is designed 
with the tightest design rules, embedded memory provides the appropriate vehicle to 
diagnose most wearout failures. Moreover, because SRAM failures can be recovered 
using error correcting codes, the SRAM can contain many failing cells whose causes of 
failure can be analyzed for failure analysis. The usage of on-chip electrical tests with the 
 2 
statistical failure analysis methodology enables efficient diagnosis of the causes of failure 
of larger failing samples, which in turn enhances confidence in the results of failure 
analysis.  
The wearout mechanisms are classified as frontend and backend wearout 
mechanisms. In prior work, the impact and detection of frontend mechanisms in an 
SRAM array has been studied [1],[2]. This work considers the detection of backend 
wearout problems.  Backend wearout mechanisms include backend time-dependent 
dielectric breakdown (BTDDB) and via/contact voiding due to current stress-dependent 
electromigration (EM) and temperature-stress-dependent stress-induced voiding (SIV) 
[3]. These backend wearout mechanisms cause resistive bridges and resistive opens in an 
SRAM array.  The diagnosis of resistive bridges has previously been studied [4],[5].  
Hence, this work focuses on the failure analysis of resistive opens caused by EM and SIV 
[6].  Although it is expected that most failures in an SRAM array are due to a smaller set 
of the frontend wearout mechanisms, especially BTI and GTDDB, we have included a 
much larger set of wearout mechanisms for completeness of the failure analysis.  
The proposed work is a BIST scheme for diagnosis, which enables the efficient 
location of multiple wearout failures in an SRAM array. Unlike the resistive-open models 
used in [7], the fault model includes resistive-open defects in vias/contacts, considering 
only EM and SIV effects that are feasible based on a physical layout of a cell.   In 
addition, since we can build separate the wearout distributions for each via/contact, then 
we can perform hypothesis testing to distinguish EM and SIV failures, which both result 
in the same resistive open contact/via faulst, by matching the failure rate due to each 




Process technology scaling can lead to wearout of devices and interconnects, 
especially with nanoscale technology. The backend wearout induced by backend time-
dependent dielectric breakdown (BTDDB), stress induced voiding (SIV), and 
electromigration (EM) are the major causes of wearout problems.  
The SIV mechanism has been studied in [8]-[10]. The thermal mechanical stress 
between metals and dielectric materials causes the directionally biased motion of atoms. 
The biased atomic motion can increase a via resistance and can create voids inside of a 
via. This is called stress induced voiding, which leads to timing and functional failures in 
many digital systems.  
Electromigration (EM) can result in exactly the same electrical failure signatures 
as SIV. Historically, the EM mechanism has been researched in [11]-[15]. The EM 
mechanism leads to the transfer of momentum from electrical current to ions in the 
metallic lattice. The transfer of momentum transports the metallic ions into the 
neighboring material. This mechanism leads to the reduction of via sizes and an increase 
in resistance.   
For our aging analysis, we model wearout mechanisms with the Weibull 
distribution as  
𝑃(𝑡) = 1 − 𝑒𝑥𝑝−(𝑡/ 𝜂)
𝛽
                                         (1) 
where 𝜂  is the characteristic lifetime, 𝛽  is the shape parameter which describes the 





























ln (η)  
15 16 17 18
Using equation (1), we can derive the following equations to define the 
characteristic lifetime for wearout problems: 
                         −𝑙𝑛(1 − 𝑃(𝑡)) = (𝑡/ 𝜂)𝛽                                  (2) 
     𝑙𝑛 (−𝑙𝑛(1 − 𝑃(𝑡))) = 𝛽 𝑙𝑛(𝑡) − 𝛽𝑙𝑛 (𝜂).                               (3) 
When t is 𝜂, 𝑃(𝜂)=1-exp (−1). Hence, the characteristic lifetime, 𝜂, can be defined as the 


















Fig. 2.1 Weibull distribution for wearout mechanisms for memory bit failure distribution. 
 
The proposed work is to develop a BIST methodology and statistical failure 
analysis method that can detect and distinguish the failing bits due to EM and SIV. For 
aging analysis for the SRAM array, equation (3) can be used to plot the failure data of 
memory fail bits due to the aging process and to extract the characteristics lifetime and 
 5 
the shape parameter. Fig. 2.1 presents the plot of  𝑙𝑛(𝑡)  vs 𝑙𝑛 (−𝑙𝑛(1 − 𝑃(𝑡)))  for 
failures of each memory fail bit based on equation (3). If there are N memory cells in an 
SRAM array, then the first failure is associated with probability 1/2𝑁, the second failure 
is associated with probability 3/2𝑁, etc.  If we record the time to failure, 𝑡1 for the first 
failure, 𝑡2 for the second failure, etc., then with several failures, we can solve for the 
Weibull parameters for the time-to-failure of the memory cells.  Especially, when we plot 
the ordered pairs (ln (𝑡1), ln (− ln (1 −
1
2𝑁
)) , (ln(𝑡2) , ln (− ln (1 −
3
2𝑁
)) , etc., the x-
intercept is ln (𝜂) and the slope is 𝛽, as illustrated in Fig. 2.1.  Hence, we can estimate the 
Weibull parameters of the time-to-failure of all memory cells by determining the time-to-
failure of several sample cells in an SRAM array.   
Prior studies have primarily focused on the test for faults in order to drive the 
repair algorithm, including automated test and reconfiguration of SRAMs [17]-[23].  If 
the array has insufficient redundancy and cannot be reconfigured, only diagnosis is 
required.  In prior work on diagnosis, methodologies to identify the cause of failure have 
been proposed to avoid costly physical failure analysis [24]-[26]. However, none of these 






























































            CHAPTER 3 
WEAROUT MODELING 
There are several optional SRAM layouts which are designed to be appropriate 
for different applications [27].   We choose a physical layout which has many possible 











Fig. 3.1(a) Possible backend wearout sites in a physical layout of a 6T SRAM cell: 
via/contact voiding due to EM (O2, O5, O9) and SIV (O1–O11), and (b) via/contact 
voiding in the SRAM cell schematic. 
 
3.1 Modeling EM in An SRAM Cell 
Current transfers momentum to ions in the metallic lattice, which leads to some of 
the metallic ions to be transferred to the adjacent material. This can cause 
electromigration (EM), leading to the reduction of via/contact dimensions which 
increases in resistance [28]. The characteristic lifetime, 𝜂𝐸𝑀, of a via/contact due to EM 
can be  modeled as [28].                            
 7 
                                                      𝜂𝐸𝑀 = 𝐴𝐸𝑀  𝑇/𝑗𝐸𝑀                                                      (4) 
where T is temperature, 𝑗𝐸𝑀 is the current density, and 𝐴𝐸𝑀  is a technology dependent 
fitting constant [28].  
With advanced process technology, vias/contacts which are connected to shorter 
wires never suffer from voids since the gradual movement of conductor atoms can create 
a back-stress to reduce the effective material flow caused by EM [11],[12]. There is a 
minimum quantity which is a product of a wire length, called the Blech length, and 
current density that causes via voiding. In a SRAM cell, via/contacts connected to bitline 
pairs and the VDD line can experience a risk for EM [12]. Other via/contacts  do not 
satisfy the critical requirements for the Blech length or the high unidirectional current 
density to form via voids.  Hence, we assume that only O2, O5, and O9 in Fig. 3.1 have a 
risk of void formation due to EM. Although EM is more likely in a larger SRAM array 
(L2 and L3 caches) which can provide a longer Blech length for vias/contacts connected 
to VDD and bitline pairs, we include EM models in our test structure to make the 
diagnosis scheme more general for various types of memory applications.  
3.2 Modeling SIV in An SRAM Cell 
Thermal mechanical stress between the metal and the dielectric can induce 
directionally biased motion of atoms at high temperatures. The motion can cause stress-
induced voiding (SIV), leading to an increases in via/contact resistance inside of vias 
[8],[9]. The resistance of a via/contact depends on the operating temperature of the chip. 
The characteristic lifetime, 𝜂𝑆𝐼𝑉, due to  the SIV mechanism is modeled as 
                                            𝜂𝑆𝐼𝑉 = 𝐴𝑊
−𝑀(𝑇0 − 𝑇)




















which is a function of the linewidth, 𝑊, the geometry stress component, M, the stress-
free temperature, T0, the thermal stress component, N, the activation energy, 𝐸𝑎, and the 
fitting constant, 𝐴 [28]. Unlike the resistive-open fault model presented in prior work [7], 
there are eleven possible worn-out via/contact locations (O1-O11) due to SIV presented 
in Fig. 3.1. 


















































A memory generator [31] generates a 32Kb memory to analyze if it is possible to 
diagnose all wearout mechanisms. The simulation framework presented in Fig. 32 [32]-
[35] acquires the detailed electrical stress and temperature of each SRAM cell within a 
memory which is embedded within the microprocessor [36], running various standard 
benchmarks [37].  The details of the required data and the flow diagram for the 
microprocessor reliability simulator are presented in Fig. 3.2. 
The microprocessor reliability simulator presented in Fig. 3.2 extracts the 
microprocessor activity profile using an FPGA emulator. The activity profiles are used to 
estimate the power profile, which, when combined with the layout, determines the 
temperature profile.  The vulnerable areas and features for the backend mechanisms are 
extracted from the physical layout and linked to the electrical and temperature profiles, 
from which feature lifetime is computed.   The feature-by-feature lifetime data are finally 
combined to compute a composite lifetime distribution for the full system of the memory 
and/or processor. 
We also consider various use scenarios, including corporate, gaming, office work, 
and general usage [38], since the characteristic lifetimes depend on workload. These 
realistic use conditions consist of fractions of time in operation, standby, and off states 
and the time in each case is presented in Fig. 3.3.   
Note that the stress for each via/contact depends on the average current density, 
temperature, and the geometry components (see equation (4),(5)). When the current 
density is significantly different for each via/contact, the lifetimes of each via/contact 

























































































characteristic lifetime distribution due to EM for the 32Kb cells for various use scenarios.  












Fig. 3.4.  The characteristic lifetimes of vias/contacts due to EM for 32Kb SRAM cells 
for different use scenarios: the cumulative probability distribution of lifetime for 
vias/contacts due to EM. The characteristic lifetimes are when 62% of samples fail.  
Variation is due to variation in stress experienced by the 32K cells. 
 
Also, the characteristic lifetimes of each via/contact due to SIV are function of the 
linewidth of metal above the via/contact and geometry stress component (see equation 
(5)). The lifetimes due to SIV are also not the same since the linewidths are not the same 







































Fig. 3.5.  The characteristic lifetimes of vias/contacts due to SIV for 32Kb SRAM cells 
for various use scenarios. 
 
The same electrical failure signatures can be induced due to the resistive open 
defects in locations O2, O5, and O9 due to EM or SIV. Hence, statistical failure analysis 
is also needed to diagnose the probability distributions of the causes of failure using the 
relative failure rates at each site for each mechanism in Fig. 3.4 and Fig. 3.5. We 
summarize the three open groups for open defects in locations O2, O5, O9 due to EM and 
SIV in Table 3.1. 
TABLE 3.1  FAULT GROUPS AND INDICES FOR RESISTIVE OPEN FAULTS 
Group EM SIV 
OG 1 (m=1) O2_EM O2_SIV 
OG 2 (m=2) O5_EM O5_SIV 





BUILT-IN SELF TEST FOR DIAGNOSIS 
4.1 Built-In Self Test System and Architecture 















Fig. 4.1.  Overview of built-in self test (BIST) system. 
 
 
Fig. 4.1 shows a floorplan of the BIST system for diagnosis of backend wearout 
mechanisms for EM and SIV. The BIST component consists of a BIST controller, a 
sensing circuit (SC), and the output response analyzer (ORA). First, the BIST controller 
generates test patterns which include test addresses for decoders, read and write 






























detect leakage current from faulty cells due to backend wearout mechanisms. Third, the 
output response analyzer diagnoses the cause of wearout and stores the result. 
4.1.2 BIST Controller  
The BIST controller consists of a test pattern generator (TPG) and acts as a 
feedback system from the output response analyzer (ORA) to the memory array. The 
TPG generates test patterns to implement the special test algorithms for wearout 
mechanisms. The test pattern contains a chip selection signal (CS), write driver enable 
signals (WE), data inputs (W_data), sense amplifier enable signals (SAE), precharge 
circuit enable signals (PRE), precharge voltage control signals (V_pre), and addresses 
(T_row_addr, T_col_addr).  
In the test mode, the BIST controller disconnects the built-in self test area from 
active signals from the I/O and connects them to the test patterns from the TPG. After the 
tests are finished, the repair block starts the repair procedure. For the repair procedure, 
the SRAM contains redundant row arrays in each bank and their addresses are stored in 
the registers of the repair block.    
4.1.3 Output Response Analyzer (ORA)  
The output response analyzer (ORA) performs diagnosis using test results when 
any fault detect signal comes from the sensing circuit (SC) in the test area. When the SC 
in the built-in self test area detects current variation or a functional error, the circuit sends 
test addresses and a fault trigger signal to the ORA. 
4.1.4 Built-In Self Test Area 
Fig. 4.2 present the test area in half of the SRAM system which contains four 































































































column decoder acts as the bridge between the 128 bitline pairs and the eight global data 
line pairs, which consists of eight data lines from 128 bitlines and another 8 data lines 
from 128 bitline-bars through the column decoder. However, to implement the special 
BIST pattern, we must activate individual cells for each test step. Hence, we extend test 
column addresses (T_col_addr) from four bits to seven bits so that they use the additional 


































Ref W1 W2 W3 W4

































Fig. 4.3 Sensing circuit for current variations in data lines [1]:   (a) current subtractor and 
amplifier block, (b) current digitizer, and (c) weighted  reference current generator.  Wn 
can be selected to generate different reference currents. 
 
The current test circuit (Tckt) in Fig. 4.3 [1] detects current variations in the data 
lines in Fig. 4.2.  The current at input B is subtracted from the current at input A, which 
results in current I1 (see Fig. 4.3(a)). I1 is then fed into the current amplifier and the 
amplified current, I2, is mirrored onto the current digitizer in Fig. 4.3(b). When the 
current digitizer detects that I2 is less than a reference current from the weighted 
 16 
reference current generator shown in Fig. 4.3(c), the output logic is activated to logic ‘1’ 
which triggers the ORA to start diagnosis procedures. The reference current is easily set 
by selecting Wn in Fig. 4.3(c). To provide the appropriate reference current for each test 
pattern, we have designed additional logic to control Cn in the current digitizer. 
The current test circuit was used to test wearout mechanisms in [1],[2],[4],[5],[6]. 
We extend the use of current test to locate and diagnose faulty cells suffering from 
EM/SIV. The current test methodology enables improved testability of the inserted 
resistance of open defects. Leakage currents also vary due to process variations. Process 
variations can cause the leakage current to vary and degrade detection. Hence, we have 
designed the test schemes and patterns to make the BIST tolerant to process variations. 
The minimum current trigger level of the SC in Fig. 4.3 can be set to about 6.6 uA, and 
the circuit can detect EM/SIV mechanisms in the presence of 10% process variation 
corners. 
Faulty cells in both upper and lower banks in Fig. 4.2 are monitored through a 
pairwise comparison of cells, one in each bank. Two current test circuits (Tckt) for bitline 
testing and two others for bitline-bar testing (\Tckt) are in each SC unit to analyze current 
variations in data lines. Each SC unit can monitor a data line pair from the upper bank 
and another data line pair from the lower bank. A data line from the 16 bitlines is 
connected to both input A of Tckt 1 and input B of Tckt 2 in the SC unit. One of the data 
lines from the 16 bitlines in the lower bank is connected to input B of Tck 1 and input A 
of Tckt 2 (see Fig. 4.2 and 4.3(a)).  Similarly, data lines from 16 bitline-bars in the upper 
bank and 16 bitline-bars in the lower bank are also connected to inputs of \Tckts in each 
 17 
SC unit. Digital test logic (DFT) is in the middle of the bank to test functional checks for 
read and write data in the data lines. 
The current analysis results depend on the capacitance and resistance of a bitline 
pair and the data line. Hence, a significant mismatch in path length between a cell under 
test and the test circuit from the distance between a reference cell (a good cell) to the 
same test circuit can lead a false diagnosis result, even if both cells are not faulty cells.  
To minimize the diagnosis errors due to mismatch in path length, we set the maximum 
allowed length mismatch between the data paths from the cells in the upper and lower 
banks to be 110um. We divide each bank into 64 sub-blocks to keep the length mismatch 
under the limit.  When the cell under test is in the upper bank, we pick a good cell in the 
same sub-block of the lower bank as the reference cell.   
Undetectable faults might exist if the leakage currents from faulty cells from the 
upper bank and lower bank are exactly the same. However, since the leakage currents 
depend on the degree of wearout for via/contact sites, currents from two cells are 
generally different because wearout is progressive. Thus, undetectable faults from 
matched leakage currents are highly unlikely and have little impact on the test coverage 
of the BIST system. 
Our BIST system is a highly reconfigurable system for various array sizes for 
level one, level two, and level three caches. We simply add several registers for address 
counters in the BIST controller and additional registers to store the larger number of 
addresses of failed cells if we want to increase test address ranges for a larger memory 
array. Also, when we increase I/O sizes for the larger memories, we need more test 
circuits, such as those shown in Fig.4.2.  
 18 
When we reconfigure the BIST for a larger memory array, we have to divide the 
SRAM array into more sub-blocks to keep the maximum allowed length mismatch to 
110um, to avoid timing mismatch at the inputs of the current test circuit in Fig. 4.2.  For 
example, for a 32Mb SRAM array with 12 bit row addresses, 8 bits column addresses, 
and 32 I/Os, we have to divide the array into 131,072 sub-blocks and reference cells.  
Since the BIST system operates on each sub-block unit in paired arrays, the test 
pattern and cycle are repeated for each sub-block. Hence, having more sub-blocks does 
not impact test coverage and there is no significant area overhead for the BIST controller 
and the ORA logic. 
If the memory is designed with more scaled technologies, the off-state leakage 
current can be a more significant problem [39]. This may lead current test schemes to be 
less effective.  However, our BIST scheme utilizes a current comparison between two 
cells in the paired sub-banks. Since the off state leakage current depends on the process 
technology, the initial level of the leakage from the paired cells is still likely to be similar, 
cancelling out any enhanced leakage effect from the technology. If the reference cell 
selection controls for the initial leakage currents, then it is likely that the proposed BIST 
schemes will be effective for scaled technologies. Nevertheless, as we move to more 
scaled technologies, more reference cells and/or trigger limits may be needed for the 
current tests to better account for variation in initial leakage currents. 
4.2 BIST Algorithm for Wearout in an SRAM Array 
4.2.1 Overview of the Test Algorithm  
Table 4.1  and Fig. 4.4 show the test algorithm implemented by our BIST system 
[6]. First, the BIST conducts scanning tests to find a set of proper reference cells in each 
 19 
NO
Counter for the test address 
increments its value 
YESDetect O2,O3,O4,O5 Fault is detected 
by TF1 or TF2 test
Conduct DRF1 or DRF2 test
to distinguish O2,O3,O4,O5 
Fault is detected 
by TF3 test
NO





for the test addresses
NO
Scanning all cells 
in 64 sub block ?
Step 1: Find reference cells Step 2: TF1 to TF4
Fault is detected 
by TF4 test
YESDetect and distinguish
O1, O8, O9, O10, O11
Counter for the test address 







sub-block. With a set of reference cells, one of the inputs of the SC in Fig. 4.3 is 
connected to a data line for the reference cell and another is connected to a data line for a 
cell under test in its paired bank. The BIST system then starts the test algorithms for 
diagnosis.  We have inserted a resistance for the open defects of 10MΩ for our 
simulations. 










Current Data lines (w1,w0,r0) O4, O5 TF 1 
Current Data lines (w1,w0,pre[1.2V],r0) O4 VS O5 DRF 1 
Current Data lines (w0,w1,r1) O2, O3 TF 2 
Current Data lines (w0,w1,pre[1.2V],r1) O2 VS O3 DRF 2 
Digital Data lines (w1,w0,r0) O6, O7 TF 3 



















Fig. 4.4  BIST algorithm. 
 20 
4.2.2 Algorithm to Find Proper Reference Cells 
The test algorithm identifies faulty cells and reference cells by comparing the 
leakage currents from two test cells.  If one is much larger than the other, then a faulty 
cell has been identified.  If not, both cells can be considered as reference cells. 
Before starting the BIST algorithms, the BIST system has to find proper reference 
cells to provide an appropriate reference current to the sensing circuit (SC) shown in Fig. 
4.3 in each of the 64 sub-blocks of each bank.  64 reference cells are determined for each 
bank to minimize the chance of a fault detection error due to the mismatch in path length.  
Any cell in a sub-block of a bank can be set as a reference cell for any cell in the same 
sub-block of the paired bank.  We use W1 in Fig. 4.3(c) to set the current trigger level to 
6.5uA to distinguish a proper cell from faulty cells. 
To find the proper reference cells, the up/down counter for row and column 
addresses implements a scanning algorithm in each sub-bank. The BIST controller 
generates the same row/column addresses and the test pattern ⇑(w1, r1, w0, r0) for the 
upper and lower banks. The ORA uses a fault trigger signal from current testing of the 
data lines as well as digital testing of data lines for each cell. 
When there is no fault-trigger signal, we can identify the proper reference cells, 
and we store their addresses in the ORA. If there is a fault trigger signal, then the search 
for a nearby reference in the same sub-block continues until one is found by changing 
addresses. Once a pair of reference cells for each sub-block is found, the BIST controller 


























logic '0' logic '1'
n3
O5
After the screening test to find the sets of reference cells, the BIST can locate 
faulty cells in the SRAM array through a pairwise comparison with the proper reference 
cell. 













Fig. 4.5.  Write ‘0’ with the TF1 algorithm: an SRAM cell with O4 and O5. 
 
The TF1 (Transition fault) test in Table 4.1 is used to detect O4 and O5. The TF2 
test is used to detect O2 and O3. And then the BIST controller starts to conduct the DRF1 
(Data retention fault) test to distinguish O5 from O4.  The DRF2 test is utilized to 
distinguish O2 from O3 (see Table 4.1). We turn off bitline sense amplifiers during the 
read operation of the test algorithms. 
Fig. 4.5 shows the TF1 test with the O4 fault. Even with the O4 or O5 fault due to 
EM or SIV, the write ‘1’ operation can be done properly. When the write driver drives 
bitline-bar to 0V, the M6 transistor drives n2 to ground. Although M4 can hold the 
charge on the n2 node, n2 is discharged to under 0.6V through M6, and the n1 value is 
 22 
also swapped to logic ‘1’. Since M5 cannot pull down n1 to logic ‘0’ due to the large load 
of O4, during the write logic ‘0’ after M3 is turned on, n1 becomes stuck at logic ‘1’. 
Then M6 cannot pull up the n2 node to 0.6V due to the path between n2 and ground 
through M3. During the read logic ‘0’ operation after the write ‘1’ and write ‘0’ 
operations, the voltage on bitline-bar is discharged from 1.09V to 0.01V because it is 
connected to n2 which holds 0V. 
TABLE 4.2 SIMULATION RESULTS FOR THE TF1 AND TF2 
Fault 
Read ‘0’  (TF1) Read ‘1’  (TF2) 
Current variation 
from \BL (max) 
Current variation 
from BL (max) 
Proper 0 uA 0 uA 
O1 0.3 uA 0.3 uA 
O2 4.5 uA 29.3 uA 
O3 4.5 uA 29.3 uA 
O4 29.3 uA 4.5 uA 
O5 29.3 uA 4.5 uA 
O6 1.0 uA 0.1 uA 
O7 0.1 uA 1.0 uA 
O8 4.5 uA 0.1> uA 
O9 4.6 uA 4.6 uA 
O10 0.1> uA 4.5 uA 
O11 4.5 uA 4.5 uA 
 
If O5 is located between a bitline and M5, the problem with the TF1 pattern is the 
same (see Fig. 4.5). Since the write driver cannot pull down the n1 node due to the 
inserted large load during the write ‘0’ operation, n1 is stuck at logic ‘1’, and this 
prevents M6 from pulling up the n2 node. Thus, the voltage on bitline-bar is changed 
from 1.09V to 0.01, and this induces current variations in data lines from the bitline-bars. 
Table 4.2 shows that the current variation with O4 or O5 is significant during the read 
logic ‘0’ operation. We select W2 in Fig. 4.3(c) to set the reference current to 6.6 uA to 











DRF pattern: (W1, W0, Precharge 1.2V, R0)
0 5 10 15 20
Time [µs]
Bitline with O4 (resistance: 113K ohm ~ )









 Additional test steps are needed to distinguish O4 from O5. The DRF1 algorithm 
presented in Table 4.1 analyzes data retention properties during a long read operation.  
The (w1, w0) pattern leads the n1 node in Fig. 4.5 to be stuck at logic ‘1’. After the write 
‘0’ is completed, the BIST controller sends PRE1 (precharge circuit enable signal) and 
Vpre1 (1.2V) to the bank. At this moment, the bitline is pulled up to logic ‘1’ and the 












Fig. 4.6.  DRF1 test to distinguish O4 from O5 fault. 
 
During a very long read ‘0’ operation (20us), as presented in Fig. 4.6, the n3 node 
in a cell with an O4 fault is charged to 986mV, and M5 can prevent the bitline from being 
discharged (see Fig. 4.5). On the other hand, the M5 transistor with an O5 cannot hold the 
bitline charge due to the large inserted load (see Fig. 4.5). Thus, the bitline which is 
connected to the cell with the O5 fault can be easily discharged. Fig. 4.6 shows that the 



















Stuck at logic zero
On
On




0 -> 1 1 -> 0






with the very long 20us test time.  
Test algorithms to analyze O2 and O3 faults are very similar since the fault 
locations are symmetric with O4 and O5 (see Fig. 3.1(b)). The TF2 algorithm is utilized 
to detect O2 and O3 through an analysis of leakage currents on a bitline (see Tables 4.1 
and 4.2). The DRF 2 algorithm can distinguish O2 from O3 by sensing the voltage 
degradation of bitline-bar for the same reason as with O5.   











Fig. 4.7.  Write and read logic ‘0’ after a write ‘1’ operation: (a) an SRAM cell with O6, 
and (b) an SRAM cell with O7. 
 
 
For the TF3 test algorithm, the BIST controller writes logic ‘1’ with a very long 
period of 70ns in faulty cells to set the initial value at n1 node and n3 node to ‘1’ and n2 
node and n4 node to ‘0’ (see Fig. 4.7).  Then a write ‘0’ and a read ‘0’ are executed.  
Bitline sense amplifiers are turned off during the read ‘0’ operation. The digital logic 
block detects voltage mismatch patterns on the bitline and bitline-bar due to O6 and O7. 
Fig. 4.7(a) presents write ‘0’ and read ‘0’ for a faulty cell containing O6 at the n1 























































M4 to 0V due to the large resistance due to SIV. The gate becomes stuck at logic ‘1’, 
leading to turning M3 on.  Since the current from M6 is directly discharged to ground 
through M3 during the write ‘0’ operation, the voltage on n2 node (0V) does not change, 
and M2 stays on. When the read ‘0’ operation starts, M3 pulls down the bitline-bar from 
1.13 V to 0V, and M2 pulls up the bitline voltage from 0.01V to 0.74V at the same time. 






















Fig. 4.8. Simulation results with TF3: (a) bitline pair voltages and its digitized value from 
a cell with an O6 fault, and (b) bitline pair voltages and its digitized value from a cell 
with an O7 fault. 
 
TABLE 4.3. SIMULATION RESULTS FOR THE TF3 FOR O6 AND O7 DURING READ ‘0’ 





Proper 0 1.2 0 1 
O1, O8–O11 0 1.2 0 1 
O6 0.01 -> 0.74 1.13-> 0 0 -> 1 1 -> 0 
O7 0.01 -> 0.74 1.21 -> 0 0 -> 1 1 -> 0 
 
 26 
Fig. 4.7(b) presents an SRAM cell with O7 due to SIV at the n4 node with the 
same initial conditions. Similarly, M10 and M12 cannot pull up the gate of M8 to logic ‘1’ 
during the short write ‘0’, pulling down the n3 node to 0V. After the read ‘0’ starts and 
the write driver is disconnected, M8 pulls up the logic value of n3 from ‘0’ to ‘1’. The 
new value of n3 node turns M9 on, pulling down the bitline-bar voltage from 1.21V to 
0V. More time is needed for M8 to pull the n3 node up than for M3 to pull the n2 node 
down. Hence, the M9 transistor is turned on at 111.9ns, and the bitline-bar voltage starts 
to decrease at 111.9 ns (see Fig. 4.8(b)).    
The waveform for the digitized value from the bitline pair, presented in Fig. 4.8, 
is utilized to identify O6 and O7. When a faulty cell with O6 or O7 is tested, the 
waveform pattern is detected (see Table 4.3). The BIST system stores five points ( Rg1 −
 Rg5 ) for each data line pair, as presented in Fig. 4.8. Since the falling edge of bitline-bar 
is different for the two cases for O6 and O7, the Rg3 register is used to distinguish them. 
Based on the stored register values, the ORA diagnoses the O6 and O7 faults  using the 
following equations. 
                               FO6 = (! Rg1) ∩ (Rg2) ∩ (! Rg3) ∩ (Rg4) ∩ (! Rg5)                         (6) 
                             FO7 = (! Rg1) ∩ (Rg2) ∩ (Rg3) ∩ (Rg4) ∩ (! Rg5)                          (7) 
4.2.5 TF4 Algorithm 
 
TABLE 4.4 SIMULATION RESULTS FOR THE TF4 DURING READ OPERATIONS  
Fault 
Sub-step 1 Sub-step 2 Sub-step 3 
w1 -> w0 -> 
pre (1.2V) -> r0 
w1 -> w0 -> 
pre (0V) -> r0 
w0 -> w1 -> 
pre (0V) -> r1 
BL[V] /BL[V] BL[V] /BL[V] BL[V] /BL[V] 
Proper 0 1.25 0 0.75 0.75 0 
O1 1.23 1.25 0.40 0.77 0.77 0.40 
O8 0 1.25 0.75 0 0.75 0 
O9 0 1.25 0.29 0.47 0.47 0.29 
 27 
Fault 
Sub-step 1 Sub-step 2 Sub-step 3 
w1 -> w0 -> 
pre (1.2V) -> r0 
w1 -> w0 -> 
pre (0V) -> r0 
w0 -> w1 -> 
pre (0V) -> r1 
O10 0 1.25 0 0.75 0 0.75 
O11 1.15 1.25 0.36 0.43 0.43 0.36 
Reg.    𝑅𝑔6    𝑅𝑔7    𝑅𝑔8   𝑅𝑔9   𝑅𝑔10    𝑅𝑔11 
 
Defect O1 is the contact between sources of NMOS cell transistors and the 
ground path (see Fig. 3.1(b)). The M7 or M9 transistor in Fig. 3.1(b) cannot pull down a 
bitline or a bitline-bar during the read operation. We use the test pattern (w1, w0, 
precharge (1.2 V), r0) to test the ability of the cell to pull down. The BIST controller 
conducts write logic ‘1’ and write logic ‘0’ operations first. And then the BIST controller 
creates the precharge circuit enable signal (pre1) and the precharge voltage signal (Vpre1) 
to charge the bitline pair to 1.2V before the read ‘0’ operation. For a proper cell, the M7 
transistor in Fig. 3.1(b) should discharge a bitline to 0V during the read ‘0’ operation. 
However, the M7 transistor in the faulty cell with the O1 fault cannot discharge the 
bitline due to the large inserted resistance between M7 and ground. Table 4.4 shows that 
the test result with O1 is different from that of a proper cell when sub-step 1 of the TF4 
algorithm is executed.   
Defect O9 is the worn out contact between the sources of PMOS devices and the 
VDD path (see Fig. 3.1(b)). Similar to the way for the detection of O1, the large 
resistance keeps the M8 or M10 transistor in Fig. 3.1(b) from pulling up a bitline or 
bitline-bar during a read operation. Similarly the test algorithm (w1, w0, precharge (0V), 
r0) is required to test whether the M10 transistor can pull up the bitline-bar properly. 
During the hold time between write ‘0’ and read ‘0’ operations, the BIST system sends a 
precharge voltage signal (0V) to pull down the bitline pair. When the read ‘0’ starts, the 
M10 transistor in the faulty cell with O9 cannot pull up bitline-bar due to the large 
 28 
inserted resistance (O9). Both the bitline and bitline-bar are logic ‘0’ in the sub-step 2 
column of Table 4.4. 
Defect O8 and O10 are contacts between a drain of a PMOS transistor and a 
signal path (see Fig. 3.1(b)). To detect the O8 fault, the test pattern in sub-step 2 of TF4 
is used. When the read ‘0’ starts, the M10 transistor in Fig. 3.1(b) has to hold the signal 
node connected to its drain at logic ‘1’ and the M7 transistor holds its drain node at logic 
‘0’ for proper operation. 
However since the M10 transistor cannot hold the node due to the large resistance 
of the O8 fault, the node can be discharged, leading to a change of logic value at the node 
connected to M10 to logic ‘0’ and a change of logic value at the drain node of M7 to 
logic ‘1’. This can change the logic from the bitline and bitline-bar to ‘high’ (0.75V) and 
‘low’ (0V), respectively, as presented in the sub-step 2 column of Table 4.4.   
To detect O10, we revise the test pattern to (w0, w1, precharege (0V), r1). The 
sub-step 3 column of Table 4.4 shows that the logic value from the bitline pair is 
swapped with the O10 fault for the same reason as with the O8 fault. 
Defect O11 causes a fault since a word line cannot control the access transistors 
(see Fig. 3.1(b)). The logic on a bitline pair cannot be changed when the bitline pair is 
precharged.  
TABLE 4.5 DIGITAL LOGIC EQUATIONS FOR DIAGNOSIS WITH TF4 
Fault Boolean equation 
O1               𝑅𝑔6 ∩ 𝑅𝑔7 ∩ ! 𝑅𝑔8 ∩ 𝑅𝑔9 ∩ 𝑅𝑔10 ∩ ! 𝑅𝑔11 
O8             ! 𝑅𝑔6 ∩ 𝑅𝑔7 ∩ 𝑅𝑔8 ∩ ! 𝑅𝑔9 ∩ 𝑅𝑔10 ∩ ! 𝑅𝑔11 
O9             ! 𝑅𝑔6 ∩ 𝑅𝑔7 ∩ ! 𝑅𝑔8 ∩ ! 𝑅𝑔9 ∩ ! 𝑅𝑔10 ∩ ! 𝑅𝑔11 
O10             ! 𝑅𝑔6 ∩ 𝑅𝑔7 ∩ ! 𝑅𝑔8 ∩ 𝑅𝑔9 ∩ ! 𝑅𝑔10 ∩ 𝑅𝑔11 








































6.6 uA (TF1 test with O2,O3, TF2 test with O4,O5)
From 83.3K ohm (O2, O5)
From 144K ohm (O3, O4)
O2,5 (no length mismatch, process variation)
O2,5 (110um length mismatch, no Vth variation)
O2,5 (no length mismatch, process variation, NBTI)
O3,4 (no length mismatch, process variation)
O3,4 (110um length mismatch, no Vth variation)
O3,4 (no length mismatch, process variation, NBTI)
Boolean equations for diagnosis of the cause: The logic values from a bitline pair 
during the TF4 test are stored in 𝑹𝒈𝟔  - 𝑹𝒈𝟏𝟏  (see Table 4.4). Using digital logic 
implemented by Boolean equations in Table 4.5, we can identify the location of each 
voiding via/contact in a SRAM cell.  
4.2.6 Detectable Range of Wearout Mechanisms 
 
Fig. 4.9 shows the detectable ranges of resistances in the faulty cells with 
via/contact voiding (O2-O5) using the TF1, TF2, DRF1, and DF2 patterns. We have 
applied process variation (10% process variation corners) and the Negative Bias 
Temperature Instability (NBTI) effect to determine the detectable range. To distinguish 
















Fig. 4.9. Detectable ranges of an inserted resistance in an SRAM cell using current test 
scheme: TF1, TF2, DRF1, and DRF2 tests for O2, O3, O4, and O5 faults. 
 
 30 
the faulty cell and a good reference cell should be larger than 6.4 uA. 6.4 uA is the 
critical current variation for a cell with the worst case open fault.  Hence, we set the 
minimum current trigger level for the test to 6.6 uA after adding some margin for noise.  
Even with process variations, bitline length mismatch, and some noise effects, our BIST 
system detects a cell with the O2 or O5 fault with an inserted resistance larger than 
66.7KΩ and a cell with an O3 or O4 problem with a resistance larger than 113KΩ.  
Table 4.6 shows a summary of detectable ranges of the inserted resistances for all 
possible open via/contacts with the maximum allowed bitline length mismatch. We can 
see that process variation degrades detection and testability. In spite of this, our BIST 
system and algorithm can detect and distinguish all possible EM/SIV wearout locations 
in an SRAM array. The total test time for the 64Kb SRAM with 176 faulty cells (176 
open via/contacts) is 0.0379s.   






(with PV ) 
The worst 
range 
O1 Digital 179KΩ ~ 184.7KΩ ~ 184.7KΩ ~ 
O2,O5 Current 63.9KΩ ~ 66.7KΩ ~ 66.7KΩ ~ 
O3,O4 Current 109KΩ ~ 113KΩ ~ 113KΩ ~ 
O6 Digital 2.38MΩ ~ 2.62MΩ ~ 2.62MΩ ~ 
O7 Digital 4.21MΩ ~ 4.37MΩ ~ 4.37MΩ ~ 
O8,O10 Digital 170KΩ ~ 230KΩ ~ 230KΩ ~ 
O9 Digital 374KΩ ~ 400KΩ ~ 400KΩ ~ 
O11 Digital 5.69MΩ ~ 5.94MΩ ~ 5.94MΩ ~ 
 
4.3 Reconfigurable Platform to Generate BIST System and Test Bench   
Caches are implemented in hierarchies of between 1 and 3 levels [40]. The first-level 
(level 1) cache is used for temporary storage of instructions and data and is usually 
























































































Core 2Core 1 Core 3
ORA BIRA
Customized block
Generated block from BIST tool
caches hold program data and instructions with an SRAM array which contains from 
several hundred kilobytes to a few megabytes of cells. The customized BIST should test 
all designed caches in the processor. Also, since cache sizes and operating frequencies 
are different, our BIST implementation platform should be flexible. Hence, a 
reconfigurable BIST implementation platform and flow are needed to generate the 
customized BIST and test bench to test various types and sizes of memory systems in 












Fig. 4.10. Reconfigurable platform to generate the customized BIST for the various sizes 
of caches based on the commercial tool from Mentor Graphics. 
 
Fig. 4.10 presents the BIST system architecture for each mechanism based on the 
BIST system and algorithm for the single SRAM system presented in Table 4.1 and Fig. 
4.4. The system is a hybrid platform, combining the BIST part from a commercial BIST 
 32 
generation tool [41] and a customized part. The hybrid platform based on the 
implementation flow from a commercial BIST tool makes the BIST system and JTAG 
test bench highly reconfigurable for different process technologies, different cache sizes, 
and different memory architectures.  
In the BIST system wrapper in Fig. 4.10, the standard test pattern generator (TPG) 
in the BIST controller, the built-in repair analysis (BIRA), test access port (TAP) 
controller, and the JTAG interface are generated from the commercial tool. Based on the 
basic components, we have designed a customized BIST controller, test scheduler, 
customized ORA, and mux systems in the BIST system wrapper to implement the 
specific algorithms for each wearout mechanism in Table 4.1 and Fig. 4.4. 
The BIST controller contains the standard test pattern generator (TPG) generated 
by the commercial BIST tool and the customized controller. The standard TPG is used to 
create the test pattern for addresses and read/write data for the standard test algorithms, 
such as the March algorithm for production test before shipping the chip from the 
manufacturer [42]. The customized controller contains the register-type circuits to 
generate our specific test patterns in Table 4.1. The customized output response analyzer 
(ORA) is embedded into the analyzer with the BIRA module which is generated by the 
BIST generation tool. Using the test results from the test circuit, the customized logic in 
the ORA determines the wearout failures with the algorithm in Table I (see Fig. 4.10). 
The standard BIRA module from the tool is used when there is a need to execute standard 
test algorithms.  
Also, since address sizes and input and output (I/O) widths are not the same for all 
different types of memories, there is a need to design mux systems in the BIST system 
 33 
Commercial BIST tool (Step 1)
Generate RTL for BIST modules
Define memory sizes for 
controller definition
Userbit definition for TAP
for wearout selection
Behavioral model 












Commercial BIST tool (Step 2) 
Assemble BIST sub modules
Physical Design flow (Step 4)
BIST IP Delievery 
BIST tool inputs
for testbench
Address and data pattern
setting for each test step 
for each memory size
Test scheduling 
Commercial BIST tool (Step 5) 




Synthesis and STA flow (Step3)
Synthesis and timing closure 
for BIST system
Add ORA in 
Analyzer
wrapper between the BIST controller and each test memory to match the sizes of address 
and I/O widths (see Fig. 4.10). The test scheduler in Fig. 4.10 uses the userbit registers in 
the TAP controller to set the test schedule for each test step in Table 4.1. The userbit can 
be set in the BIST generation tool when the BIST netlist and testbench are generated (see 










Fig. 4.11. BIST implementation flow based on the commercial tool from Mentor 
Graphics. 
 
Fig. 4.11 is the revised BIST implementation flow based on the flow from the 
commercial BIST tool to make the customized BIST reconfigurable. As the tool inputs, 
we include the behavioral models of the top modules in the BIST system wrapper, 
memory definitions, and userbit definition for test step selection. The behavioral models 
for our customized logic for the customized controller, test scheduler, and mux systems 
are included in the BIST tool inputs. With BIST tool inputs, the commercial BIST tool 
flows start. For step 1 and step 2 in Fig. 4.11, the BIST tool assembles the BIST modules 
and generates the behavioral models for each top module for the JTAG interface, the 
 34 
TAP controller, the standard TPG, and BIRA.  When the behavioral models for 
submodules are generated, we can insert the behavioral model of the ORA in the analyzer 
block. Since the ORA module is connected to the submodules of BIRA generated from 
step 2, it should be added between step 2 and step 3. Then, step 3 and step 4 do synthesis 
and physical design with the behavioral models for the top and sub modules with the 
specific design constraints for each application and process technology.  
To generate the test bench as a JTAG standard for the algorithm in Table 4.1, the 
BIST tool flow can be used (see Fig. 4.11). As the tool inputs for the generation of the 
test bench, we can set the test pattern for addresses and data for each test step for each 
memory size in Table 4.1. With the specific inputs and the generated BIST intellectual 
property (IP), the test algorithm in Table 4.1 is converted to a JTAG standard through 















STATISTICAL ANALYSIS FOR SEPARATE DISTRIBUTIONS 
5.1 Statistical Analysis for Wearout in an SRAM Array  
For open groups (OG1-3) due to EM and SIV presented in Table 3.1, the cause of 
a fault cannot be identified using only electrical tests since both mechanisms can cause 
the same electrical signatures due to the open defect. 
Hence, we add an additional statistical analysis methodology to determine the 
cause of the backend wearout. In this section, we propose to diagnose the fraction of 
failures for each confounded mechanism with statistical failure analysis combined with 
field test data from BIST and reliability simulation data. The fraction of failures from EM 
vs. SIV is estimated by matching the failure rate of each fault site from BIST in the field 
to statistical data from a reliability simulator [32]-[35].  
For open groups due to EM and SIV in Table 3.1, the Weibull characteristic 
lifetime and the shape parameter are defined as 𝜂𝑙,𝑚 and 𝛽𝑙,𝑚, respectively. 𝑚 is used as the 
index for the open group (OG1-3) in the 𝑙th cell. The reliability simulator in Fig. 3.2 
computes 𝜂𝑙,𝑚   and the corresponding values of the shape parameter for all possible 
wearout sites in the embedded SRAM array using benchmarks and scenarios to determine 
the stress profiles. 
𝛽𝑙,𝑚 is usually assumed to have a constant value for each mechanism. Then, the 
characteristic lifetimes for each group of opens, 𝜂𝑚 can be estimated with 







.                                                     (8) 



































































































                             1 = ∑ (𝜂𝑐ℎ𝑖𝑝 𝜂𝑚⁄ )
𝛽
𝑚   .                                                 (9) 
Based on, 𝜂𝑚, m=1,…,3 for each open group, the probability that the open fail is located 
in the 𝑚𝑡ℎ group of locations is 
 𝑃𝑚 = (𝜂𝑐ℎ𝑖𝑝 𝜂𝑚⁄ )
𝛽
.                                                 (10) 
The relative frequency of each open group is estimiated based on the relative 
failure frequency of SIV (λ) and EM (1 − λ).  Hence, the relative frequency of the faulty 
sites in the chip, 𝑃𝑚,𝑐ℎ𝑖𝑝 , is  
                                                                   𝑃𝑚,𝑐ℎ𝑖𝑝 = λ𝑃𝑚,𝑆𝐼𝑉 + (1 − λ)𝑃𝑚,𝐸𝑀.                                    (11) 
where the probabilities of failures due to SIV and EM for each open group are 𝑃𝑚,𝑆𝐼𝑉 and 













Fig. 5.1.  Failure rate distribution using a simulator which determines the stress 
distribution of SRAM cells inside a microprocessor for EM and SIV with four use 




































0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 5.1 shows the failure rate, 𝑃𝑚,𝑐ℎ𝑖𝑝, due to EM and SIV, for different relative 
fractions of SIV and EM failures, λ. 𝑃𝑚,𝑐ℎ𝑖𝑝  is obtained from the observed fraction of 
failures in each open group with the BIST scheme. 𝜆 is computed by regression:  
 















Fig. 5.2. (a) Error analysis for 𝑃𝑐ℎ𝑖𝑝 when simulation data from the wrong use scenario 
(gaming senario) are used for failure analysis for the “true” corporate scenario for EM 
and SIV, (b) error analysis for λ − λ′ with 𝑃𝑚,𝑆𝐼𝑉 and 𝑃𝑚,𝐸𝑀 for general use. 
 
If there is uncertainty in the actual use scenario, there can be errors in estimation 
of the probabilities of failure. When instead of the corporate use scenario, the simulator 
runs with the gaming use scenario, the errors in estimation of probabilities of failure are 
presented in Fig. 5.2(a).   
For the error analysis, we assume that the 𝑃𝑚,𝐸𝑀  and 𝑃𝑚,𝑆𝐼𝑉  values can be 
modeled as normal distributions with standard deviation, σ. We first compute 𝑃𝑚,𝑐ℎ𝑖𝑝 for 
the samples of λ  using equation (12). Then, equation (12) is solved for λ′   with the 
computed 𝑃𝑚,𝑐ℎ𝑖𝑝 by varying σ for the normal distribution randmoly. Fig. 5.2(b) presents 
the errors from the analysis.  
 38 
5.2 Statistical Analysis with Stress Acceleration 
 The matching with the small set of open groups using Fig. 5.1 is vulnerable to 
errors in the measured probability of failures, 𝑃𝑚,𝑐ℎ𝑖𝑝, from the real test using the BIST 
system. Process variations within or between dies can create the variations in each 
probability of failure value. If the failure distribution does not vary significantly for 
different relative fraction of SIV and EM failures, λ, it can be hard to distinguish the 
wearout mechanisms when process variation is applied. Hence, we use more test groups 
with more stress acceleration conditions to make our statistical methodology tolerant to 
errors.  
SIV is more sensitive to temperature variation than EM (see Equations (4),(5)). 
Hence, we create more test sets with different temperature acceleration conditions. Using 
the different temperature acceleration conditions, we can cause the failure distribution to 
vary significantly with the different relative fractions of SIV and EM failures, λ.  
We set 11 temperature acceleration test sets for each open group (OG1, OG2, and 
OG3). The temperature acceleration test sets are denoted in Table 5.1.  Then, combining 
the temperature sets with open groups in Table 3.1, we can create 33 test sets (11 
temperature conditions x 3 open groups) for failure analysis. 
TABLE 5.1 TEMPERATURE ACCELERATION CONDITIONS  
Temperature 
Index n=1 n=2 n=3 n=4 n=5 n=6 n=7 n=8 n=9 n=10 n=11 
[K] 270 280 290 300 310 320 330 340 350 360 370 
 
Then, the characteristic lifetimes for each group, 𝜂𝑛,𝑚,can be computed with 
















370 360350340 330320310 300290280 270Open groups for 































The overall lifetime of the SRAM, 𝜂𝑐ℎ𝑖𝑝, for each mechanism is the solution of 
                 1 = ∑ ∑ 𝑃𝑛,𝑚𝑚𝑛 .                                                      (14) 
Given, 𝜂𝑛,𝑚, m=1,…,3, n=1,…,11  for each open group, the probability that the failure is 
located in the (𝑛, 𝑚)𝑡ℎ group of locations is 
                                                   𝑃𝑛,𝑚 = (𝜂𝑐ℎ𝑖𝑝 𝜂𝑛,𝑚⁄ )
𝛽
.                                               (15) 
Overall, the relative frequency of the fault sites in the chip, 𝑃𝑛,𝑚 𝑐ℎ𝑖𝑝, is  
                                                    𝑃𝑛,𝑚 𝑐ℎ𝑖𝑝 = 𝜁𝑃𝑚,𝑆𝐼𝑉 + (1 − 𝜁)𝑃𝑚,𝐸𝑀                                (16) 
where 𝜁 is the fraction of SIV faults to EM faults with more acceleration condition sets. 
The parameter, 𝜁,  is computed by regression:  
                                          𝜁 =




                             (17) 
𝑃𝑛,𝑚,𝑐ℎ𝑖𝑝 can be measured from the observed fraction of failures for each open group, 
using our BIST methodology. When we collect the relative failure rates for each group 
from the chip with BIST system, we can estimate 𝑃𝑛,𝑚,𝑐ℎ𝑖𝑝 . 𝑃𝑛,𝑚,𝑆𝐼𝑉  and 𝑃𝑛,𝑚,𝐸𝑀 are 
computed from the aging reliability simulator [32]-[35].    
   
 
                                                                                            
 
 
Fig. 5.3  Failure rate distribution using a simulator with general use scenario for EM and 
SIV.  Each group for each temperature contains three sub groups (OG1, OG2, OG3). 
 
 40 
Fig. 5.3 presents the failure rate, 𝑃𝑛,𝑚,𝑐ℎ𝑖𝑝 , due to EM and SIV, by varying the 
relative fraction of SIV and EM failures, 𝜁 . Each open group for each temperature 
contains three sub groups (m=1..3) in Table 3.1. We can see that for the SIV mechanism 
(𝜁=1), the failure rate varies significantly since the temperature parameter is a more 
dominant factor for the SIV mechanism. The more acceleration condition sets using the 
different dependencies of the acceleration for different mechanisms can be used to 
distinguish different mechanisms. Unlike the failure distribution using an accelerated set 
in Fig. 5.l, more variation and matching sets in Fig. 5.3 can help the statistical failure 
analysis to be more tolerant to process variations and to the real test data, 𝑃𝑛,𝑚,𝑐ℎ𝑖𝑝, when 
















SUMMARY AND CONCLUSION 
The object of the research has been to develop built-in self-test and statistical 
failure analysis methodologies for electrical detection and diagnosis of backend wearout 
mechanisms due to EM and SIV in an SRAM array. This work has considered only 
backend wearout mechanisms in an SRAM array.  This is because failures are much more 
likely in SRAM cells because of the smaller feature sizes of the cell and the larger 
number of via sites. SRAM peripheral circuits are generally designed with much looser 
design rules. Also, the number of vias for SRAM peripheral circuits is 0.105% of the 
number of vias in an SRAM array. Hence, the failure rate for SRAM peripheral circuits 
can be ignored.  
For the future work, we will investigate the numerical optimization methodology 
to reduce and optimize the acceleration test sets for failure analysis. We have added many 
acceleration test sets for failure analysis. However, it is not easy to perform too many 
aging experiments. Hence, we will find novel methodologies to extract some critical sets 









[1] F. Ahmed and L. Milor, “Analysis of on-chip monitoring of gate oxide breakdown in 
SRAM cells,” IEEE Trans. VLSI, vol. 20, no. 5, pp. 855-864, May 2012. 
[2] F. Ahmed and L. Milor, “NBTI resistant SRAM design,” in Proc. Int. Workshop on 
Advances in Sensors and Interfaces, 2011, pp. 82-87. 
[3] C.-C. Chen and L. Milor, “System-level modeling and microprocessor reliability 
analysis for backend wearout mechanisms,” in Proc. Design Automation and Test in 
Europe, 2013, pp. 1615-1620. 
[4] W. Kim, C.-C. Chen, S. Cha, and L. Milor, “MBIST and statistical hypothesis test for 
time dependent dielectric breakdowns due to GOBD vs. BTDDB in an SRAM array,” 
IEEE VLSI Test Symp., 2015.  
[5] W. Kim and L. Milor, "Built in self test methodology for diagnosis of backend 
wearout mechanisms in SRAM cells," Proc. VLSI Test Syup., 2014. 
[6] W. Kim, S. Cha, and L. Milor, “Memory BIST for On-Chip Monitoring of Resistive-
Open Defects due to Electromigration and Stress-Induced Voiding in an SRAM 
Array,” Proc. Conf. on Design of Circuits and Integrated Systems, 2014.  
[7] L. Dilillo P. Girard, S. Pravossoudovitch, A. Virazel, S. Born, and M. Hage-Hassan, 
"Resistive-open defects in embedded-SRAM core cells: Analysis and March test 
solution," Proc. Asian Test Symp., 2004, pp. 266-271. 
[8] T. Oshima, K. Hinode, H. Yamaguchi, H. Aoki, K. Tori, T. Saito, K. Ishikawa, J. 
Noguchi, M. Fukui, T. Nakamura, S. Uno, K. Tsugane, J. Murata, K. Kikushima, H. 
Sekisaka, E. Murakami, K. Okuyama, and T. Iwasaki, “Suppression of Stress-Induced 
Voiding in Copper Interconnects,” Int. Electron Devices Meeting, 2002. 
[9] R. Wang, C.C. Lee, L.D Chen, K. Wu, and K.S. Chang-Liao, “A study of Cu/Low-k 
stress-induced voiding at via bottom and its microstructure effect,” Microelectronics 
Reliability, vol. 46, no. 9, pp. 1673-1678, Oct. 2006. 
[10] K. Yoshida, et al. "Stress-induced voiding phenomena for an actual CMOS LSI 
interconnects." IEEE International Electron Devices Meeting, 2002. 
[11] A. H. Fischer, et al. "Electromigration failure mechanism studies on copper 
interconnects." Proc. IEEE International Interconnect Technology Conference, 2002.  
[12] Z. Guan, et al. "SRAM bit-line electromigration mechanism and its prevention 
scheme." IEEE International Symposium Quality Electronic Design (ISQED), 2013. 
[13] H. Tsuchiya, and Y. Shinji, "Electromigration lifetimes and void growth at low 
cumulative failure probability." Microelectronics Reliability, vol. 46, no. 9-11, pp. 
1415-1420, 2006. 
[14] C. J. Christiansen, et al. "Via-depletion electromigration in copper 
interconnects." IEEE Trans. Device and Materials Reliability, vol. 6, no. 2, pp. 163-
168, 2006. 
[15] B. Li, et al. "Line depletion electromigration characterization of Cu 
interconnects." IEEE Trans.  Device and Materials Reliability, vol. 4, no. 1, pp. 80-
85, 2004. 
[16] Muhammad Bashir and Linda Milor, "Backend low-k TDDB chip reliability 
simulator." 2011 IEEE International Reliability Physics Symposium (IRPS). 
 43 
[17] S.-Y. Kuo and W.K. Fuchs, “Efficient spare allocation in reconfigurable arrays,” 
IEEE Design & Test of Computers, vol. 4, no. 1, pp. 24-31, Feb. 1987. 
[18] T. Kawagoe, J. Ohtani, M. Niiro, T. Ooishi, M. Hamada, and H. Hidaka, “A built-
in self-repair analyzer (CRESTA) for embedded DRAMs,” Proc. Int. Test Conf., 
2000, pp. 567-574. 
[19] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, “Built-in redundancy analysis for 
memory yield improvement,” IEEE Trans. Reliability, vol. 52, no. 4, pp. 386-399, 
Dec.2003. 
[20]  P. Ohler, S. Hellebrand, H.-J. Wunderlich, “An integrated built-in test and repair 
approach for memories with 2D redundancy,” Proc. IEEE European Test Symp., 
2007, pp. 91-96. 
[21] W. Jeong, I. Kang, and S. Kang, “A fast built-in redundancy analysis for 
memories with optimal repair rate using a line-based search tree,” IEEE Trans. VLSI, 
vol. 17, no. 12, pp. 1665-1678, Dec. 2009. 
[22] S.-K. Lu, Y.-C. Tsai, C.-H. Hsu, K.-H. Wang, and C.-W. Wu, “Efficient built-in 
redundancy analysis for embedded memories with 2-D redundancy,” IEEE Trans. 
VLSI, vol. 14, no. 1, pp. 31-42, Jan. 2006. 
[23] S.-K. Lu, C.-L. Yang, Y.-C. Hsiao, and C.-W. Wu, “Efficient BISR techniques for 
embedded memories considering cluster faults,” IEEE Trans. VLSI, vol. 18, no. 2, pp. 
184-193, Feb. 2010. 
[24] S. Naik, F. Agricola, and W. Maly, “Failure analysis of high-density CMOS 
SRAMs,” IEEE Design & Test of Computers, vol. 10, no. 2, pp. 13-23, June 1993. 
[25] J.B. Khare, W. Maly, S. Griep, and D. Schmitt-Landsiedel, “Yield-oriented 
computer-aided defect diagnosis,” IEEE Trans. Semiconductor Manufacturing, vol. 8, 
no. 2, pp. 195-206, May 1995. 
[26]  H. Balachandran and D.M.H. Walker, “Improvement of SRAM-based failure 
analysis using calibrated Iddq testing,” Proc. VLSI Test Symp., 1996, pp. 130-136.  
[27] G. Apostolidis, D. Balobas, and N. Konofaos, “Design and simulation of 6T 
SRAM cell architectures in 32nm technology,” PACET 2015. 
[28] C.-C. Chen and L. Milor, “System-level modeling and microprocessor reliability 
analysis for backend wearout mechanisms,” in Proc. Design Automation and Test in 
Europe, 2013, pp. 1615-1620. 
[29] C.-C. Chen, F. Ahmed, and L. Milor, “Impact of NBTI/PBTIon SRAMs within 
microprocessor systems: modeling, simulation, and analysis,” Microelectronics 
Reliability, vol. 53, no. 9-11,  pp. 1183-1188, Sept.-Nov.  2013. 
[30] R. Kwasnick, A.E. Papathanasiou, M. Reilly, A. Rashid, B. Zaknoon, and J. Falk, 
“Determination of CPU use conditions,” Proc. Int. Reliability Physics Symp., 2011, 
pp. 2C.3.1-2C.3.6. 
[31] Memory compiler: www.arm.com. 
[32] C.-C. Chen, F. Ahmed, and L. Milor, “A comparative study of wearout 
mechanisms in state-of-art microprocessors,” Proc. IEEE Int. Conf. Computer 
Design, 2012, pp. 271-276. 
[33] C.-C. Chen and L. Milor, “Microprocessor aging analysis and reliability modeling 
due to back-end weraout mechanisms,” IEEE Trans. VLSI, 2015. 
 44 
[34] C.-C. Chen and L. Milor, “System-level modeling and microprocessor reliability 
analysis for backend wearout mechanisms,” in Proc. Design Automation and Test in 
Europe, 2013, pp. 1615-1620. 
[35] C.-C. Chen, F. Ahmed, and L. Milor, “Impact of NBTI/PBTIon SRAMs within 
microprocessor systems: modeling, simulation, and analysis,” Microelectronics 
Reliability, vol. 53, no. 9-11,  pp. 1183-1188, Sept.-Nov.  2013. 
[36] LEON3 processor: www.gaisler.com  
[37] Mibench benchmark: http://www.eecs.umich.edu/mibench  
[38] R. Kwasnick, A.E. Papathanasiou, M. Reilly, A. Rashid, B. Zaknoon, and J. Falk, 
“Determination of CPU use conditions,” Proc. Int. Reliability Physics Symp., 2011, 
pp. 2C.3.1-2C.3.6. 
[39] N.S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J.S. Hu, M.J. Irwin, M. 
Kandemir, and V. Narayanan, "Leakage current: Moore's law meets static power," 
Computer,  vol. 36, no. 12, pp. 68-75, 2003. 
[40] A. González, F. Latorre, and G. Magklis. "Processor microarchitecture: An 
implementation perspective." Synthesis Lectures on Computer Architecture 5.1, pp. 
1-116, 2010.  
[41] L. Denq and C. Wu, "A hybrid BIST scheme for multiple heterogeneous 
embedded memories," Proc. IEEE Asian VLSI Test Symp., 2007. 
[42] V.A. Vardanian and Y. Zorian. " A march-based fault location algorithm for static 
random access memories." Proc. of IEEE International On-Line Testing Workshop, 
2002. 
 
 
 
 
 
 
