Interconnect yield analysis and fault tolerance for field programmable gate arrays by Campregher, Nicola & Campregher, Nicola
Interconnect Yield Analysis and Fault 
Tolerance for Field Programmable 
Gate Arrays 
Nicola Campregher 
A thesis submitted for the degree of 
Doctor of Philosophy of the University of London 
and for the Diploma of Membership of the 
Imperial College 
Department of Electrical and Electronic Engineering 
Imperial College of Science, Technology and Medicine 
University of London 
August 2006 
Abstract 
In an effort to increase design flexibility and performance, FPGA manufactur-
ers exploit the manufacturing process to its limits. This is particularly evident 
for the metal layers: latest technology allows up to 12 metal layers to be manu-
factured, and as a result the total area occupied by interconnects in an FPGA 
can be as high as 90% of the entire device. 
Following trends aiming to reduce wire widths and separations, the prob-
ability of a defect arising in the interconnect structures is significantly higher 
than anywhere else on the device, leading to high yield losses. The losses are 
particularly important for large FPGAs. Low yields of very large devices mean 
that the number of working dies out of whole wafer can be as low as one or 
two. 
Yields are very sensitive data: manufacturers are reluctant to provide any 
kind of information regarding their manufacturing process yield. This thesis 
presents a FPGA yield analysis framework to establish the extent of the yield 
loss problem. The developed models are based on well known yield analysis 
techniques, and provide a base for understanding and measuring yield losses, 
1 
as well as estimating the potential benefits of yield enhancement techniques. 
The results show that yields of large devices manufactured with the latest 
technology process can be as low as 40% due to faults in the interconnect 
resources alone. Projections made using the SIA roadmap further show that 
future technology nodes will result in 0% yield for large devices. 
This thesis further proposes a new approach to interconnect fault tolerance 
in FPGAs. Firstly, a Built In Self Test procedure based on fault knowledge to 
locate faulty interconnect resources on the device is presented. Secondly, an 
innovative fault avoidance technique is presented: while previous approaches 
were based on hardware redundancy or reconfiguration, the technique pre-
sented exploits both the high regularity and the reconfiguration properties of 
FPGA devices. Based on fine grain redundancy, the scheme is demonstrated to 
improve wafer yields considerably while maintain timing and area degradation 
within acceptable limits. The fault tolerance technique is shown to provide 
yield enhancements of up to 90% while incurring in timing and area penalties 
of just 8.5% and 4.5% respectively. 
2 
Acknowledgements 
First and foremost I would like to thank my supervisor Prof. Peter Cheung for 
his guidance and support. His enthusiasm has been truly contagious during 
the project and I thoroughly enjoyed working with him. 
Many thanks to my second supervisor, Dr. George Constantinides. George 
has been a role model for the duration of my studies, providing valuable advice 
and guidance. Thanks are due to Dr Milan Vasilko for providing interesting 
discussions and comments during the PhD. Many thanks to all the engineers 
from Altera and Xilinx who have often spared time during conferences to 
discuss my ideas and problems. 
I would like to express my gratitude to my parents for all the sacrifices they 
have made for me. This thesis is dedicated to them. Thanks to my family for 
always supporting me, and in particular to my uncle Paolo who first inspired 
me into the world of electronics. 
I am thankful to Andy and Nat for all their help and encouragement during 
difficult times. I am forever indebted to them. Thanks to all the Circuits and 
Systems people for making the lab a great place to work. I will always cherish 
3 
their friendship. Thanks to Leo for being a great friend, and to the Lycee 
gang, to all my rowing friends, to my cycling buddies and to the Jolly crew 
for putting up with me for all these years. A massive thanks to Jessica for her 
love, patience and most importantly for being great. 
4 
Contents 
1 Introduction 18 
1.1 Motivation 	  18 
1.2 Objective 	  19 
1.3 Outline 	  19 
1.4 Statement of Originality 	  21 
2 Background 23 
2.1 Introduction 	  23 
2.2 FPGA Architecture 	  24 
2.3 Yield Analysis and Prediction 	  25 
2.3.1 	Yield Analysis for FPGA 	  27 
2.4 Testing 	  28 
2.4.1 	Fault detection 	  28 
2.4.2 	Fault Diagnosis 	  37 
2.5 General VLSI Static Fault Tolerance techniques 	  39 
2.5.1 	Via replication 	  40 
2.5.2 	Compaction 	  41 
5 
2.5.3 Routing 	  41 
2.6 FPGA-Specific Static Fault Tolerance Techniques 	 42 
2.6.1 Hardware modifications 	  43 
2.6.2 Modification to CAD tools 	  51 
2.6.3 Static Fault Tolerance in Commercial Devices 	 60 
2.7 Dynamic Fault Tolerance 	  64 
2.8 Summary 	  64 
3 Yield Analysis of FPGA Interconnect 67 
3.1 Introduction 	  67 
3.2 Interconnect layer model 	  70 
3.2.1 	SIA predictions 	  72 
3.3 Critical Area 	  73 
3.3.1 	Parametric failures 	  76 
3.3.2 	Critical area of FPGA interconnect layers 	  80 
3.4 Yield Analysis 	  81 
3.4.1 	Manufacturing technology assumptions 	  85 
3.5 Results 	  87 
3.5.1 	Different metal layers on a single die 	  87 
3.5.2 	More advanced technology nodes 	  89 
3.6 Summary 	  91 
4 Exploiting FPGA properties to improve wafer yields 93 
4.1 	Introduction 	  93 
6 
4.2 
4.3 
4.4 
Yield of chips with redundancy 	  
Repairable and non-repairable areas 	  
Yield improvement schemes 	  
95 
95 
96 
4.4.1 	Redundant row 	  97 
4.4.2 	Spare wires 	  98 
4.4.3 	Array shifting 	  98 
4.5 Improving yield with fault tolerance schemes 	  99 
4.6 Further analysis of redundant row 	  99 
4.7 Working devices 	  102 
4.8 Improving yield with fault tolerant reconfiguration 	  105 
4.9 Summary 	  107 
5 Yield Enhancements of Design-Specific FPGA 109 
5.1 Introduction 	  109 
5.2 Design-Specific FPGAs 	  111 
5.3 Problem formulation 	  114 
5.3.1 	Probability of Successful Mapping 	  114 
5.3.2 	Multiple bitstreams 	  119 
5.4 Results 	  120 
5.5 Summary 	  126 
6 Built in Self Test of FPGA Interconnect 128 
6.1 Introduction 	  128 
6.2 Fault grading 	  129 
7 
6.3 Testing strategy 	  132 
6.3.1 WUTs Grouping 	  133 
6.3.2 TVG and ORA Operation 	  134 
6.3.3 BIST controller operation 	  135 
6.4 Implementation 	  137 
6.4.1 Number of Configurations 	  139 
6.4.2 Wire Testing Phases 	  139 
6.4.3 Switch Matrix Testing Phases 	  140 
6.4.4 Case Study: Xilinx Virtex II Pro 	  142 
6.5 Summary 	  142 
7 Interconnect Fault Avoidance for FPGA 	 145 
7.1 Introduction 	  145 
7.2 Fault tolerance for FPGAs 	  147 
7.2.1 Motivation and Requirements 	  147 
7.2.2 Architecture Exploration 	  149 
7.2.3 Proposed Fault Tolerance technique 	  153 
7.2.4 ILP formulation 	  155 
7.2.5 Fault avoidance 	  158 
7.3 Timing and Area Analysis 	  161 
7.3.1 Single fault 	  161 
7.3.2 Multiple faults 	  166 
7.4 Yield Analysis 	  167 
8 
7.4.1 	Single fault 	  167 
7.4.2 	Multiple faults 	  168 
7.5 Summary 	  170 
8 Conclusions 172 
8.1 Summary 	  172 
8.2 Future work 	  174 
8.3 The future of FPGA fault tolerance 	  177 
9 
List of Figures 
2.1 Internal structure of a typical FPGA [Bet98]. 	  24 
2.2 	Classification of SRAM-based FPGA approaches to fault detec- 
tion [DI03] 	  29 
2.3 Roving area under test [AES01] 	  33 
2.4 Classification of SRAM-based FPGA approaches to fault diag- 
nosis [DI003] 	  37 
2.5 Introducing redundant vias in the silicon [AW97]. 	  40 
2.6 Introducing a spare row in FPGAs [HSN+93] 	  44 
2.7 Fault avoidance via reconfiguration networks [KI941)] 	 46 
2.8 (a) Row of factory reconfigurable FPGA cells. (b) Reconfig-
uration around defective cell with configuration data for the 
defective cell reset to the null state [HD98]. 	  48 
2.9 Automatic connection of reserved segments to a cover cell for 
reconfiguration. (a) Top cell-to-channel connection. (b) Right 
cell-to-channel connection [HD98] 	  49 
2.10 A defect-tolerant switch block [YL05] 	  50 
10 
2.11 Single length defect avoidance [YL05] 	  50 
2.12 Fault tolerant FPGA block [BHKW00]. In fault tolerant mode, 
block X is used to replicate the function of any defective look-up 
table. 	  52 
2.13 Shifting the data in eight directions with the king-shifting dis- 
tribution [DKI99] 	  57 
2.14 Low overhead approach [LMSP99] 	  59 
2.15 (a) datapath reconfiguration to avoid faulty row and (b) repli- 
cating connections to the general routing matrix. 	  62 
2.16 Routing signals to avoid faulty row 	  63 
3.1 Interconnect metal layer. The gaps between the metal lines 
account for vias connections between layers. 	  70 
3.2 Exponential trend existing between minimum feature size and 
FPGA array size 	  72 
3.3 Typical device cross section [Sem04] 	  74 
3.4 Catastrophic faults relative to size. Similar sized defects may 
only cause a fault if a pattern is broken or two patterns joined. 	74 
3.5 	Fault probability kernel K(x), defect size distribution S(x), and 
fault probability 0(x). Note that the majority of defects have 
size similar to the minimum feature size xo. 	  75 
11 
3.6 Parameters in the fault probability kernel K(x). L, w and d are 
architectural parameters, while h is dependent on x, the size of 
the defect 	  76 
3.7 Delay of non-uniform wires. 	  77 
3.8 Critical area of more complex structures susceptible to extra 
material defects. 	  81 
3.9 Critical area of more complex structures susceptible to missing 
material defects. 	  83 
3.10 Yield comparison of different metal layers for the 90nm technol- 
ogy node. 	  88 
3.11 Metal 1 interconnect layer yield comparison for different tech- 
nology nodes. 	  89 
3.12 Overall die yield due to interconnect defects 	  90 
3.13 Relative occurrence of defects in a die for current and future 
technology nodes 	  91 
4.1 Yield comparison of three well known fault tolerance schemes 
at the 90nm technology node 	  100 
4.2 Yield improvement derived from extra row redundancy for the 
maximum predicted array sizes 	  101 
4.3 Yield enhancements derived from partitioning the device with 
localized redundancy 	  102 
12 
4.4 Improving efficiency of redundancy scheme by partitioning the 
device can result in greater yield enhancements for the 22nm 
technology node 	  103 
4.5 	Number of good dies per 12 inch wafer as a function of array size 
for 90nm technology with and without extra row redundancy 
and fault tolerance 	  104 
4.6 Number of good dies for 45 nm and 22nm technology nodes . 	 105 
4.7 Yield enhancements of fault reconfiguration and hardware re- 
dundancy at 45nm 	  106 
5.1 Split of costs of IC design and manufacturing [A1t05] 	 112 
5.2 Cost vs Volume for different ASIC and FPGA Methodologies 
[Xi105] 	  113 
5.3 	 fi(x, y), a bivariate normal distribution, indicates higher con- 
gestions towards the middle of the device. 	  115 
5.4 Yield of Structured FPGAs vs Array size (M x M) for devices 
built using a 90nm process. An average yield gain of 25% is 
expected over the functional yield, with peaks of 35%. 	 121 
5.5 Device yields for 22nm process. Functional yields for larger 
devices are expected to be close to 0%. The overall yield is 
likely to consist of devices exhibiting multiple faults 	 122 
5.6 	Probability of finding a design match for Design-Specific FPGA 
vs interconnect utilization 	  123 
13 
5.7 Maximum utilization vs target yield in Design-Specific FPGA 
for 90nm and 22nm processes 	  124 
5.8 Mapping two bitstreams to a large device manufactured on a 
90nm process 	  124 
5.9 An example of multiple bitstream mappings on 90nm devices . 	 125 
6.1 Testing Strategy 	  133 
6.2 Grouped WUTs between ORAs and TVGs 	  134 
6.3 BIST operation during test 	  136 
6.4 Three configuration phases 	  140 
6.5 Switch Matrix fault diagnosis phases 	  141 
6.6 Floorplan view of a test configuration 	  143 
7.1 A segment of the full architecture 	  150 
7.2 Design classification in the presence of a fault. The inherent 
redundancy of FPGA is not sufficient to guarantee successful 
place and route under the presence of an interconnect fault. . . 152 
7.3 Spread of timing variation for designs successfully placed and 
routed but with timing failure 	  152 
7.4 	Full architecture routing channel connectivity with spare resources153 
7.5 An example architecture with limited connectivity but higher 
performance 	  154 
7.6 Modelling connectivity using adjacency matrices 	  155 
14 
7.7 Example architecture with improved connectivity for fault tol- 
erance 	  158 
7.8 Example of fault avoidance through re-routing 	  159 
7.9 Switch positions to load into bitstream shift register 	 160 
7.10 Fault avoidance algorithm pseudo-code 	  161 
7.11 Percentage of faults causing design degradation in fault tolerant 
architecture 	  162 
7.12 Percentage of faults causing timing variations in fault tolerant 
architecture 	  162 
7.13 Ti-state buffer sharing at switch block and output pin connec- 
tion block [Bet98] 	  164 
7.14 Routing area analysis of fault tolerant architectures with buffer 
sharing 	  165 
7.15 Routing area analysis of fault tolerant architectures without 
buffer sharing 	  165 
7.16 Percentage of faults causing design degradation for devices ex- 
hibiting 2 or 3 functional faults 	  166 
7.17 Percentage of faults causing design degradation for devices ex- 
hibiting 5 or 10 functional faults 	  167 
7.18 Percentage of faults causing timing variations under the pres- 
ence of multiple faults 	  168 
7.19 Using the proposed fault tolerance scheme can almost double 
the productivity for very large devices at 90nm 	  169 
15 
7.20 Improving wafer yields at 45nm by tolerating up to 5 functional 
faults per device 	  169 
16 
List of Tables 
2.1 Summary of some popular FPGA device characteristics. 	 26 
2.2 Summary of FPGA fault tolerance schemes. 	  66 
3.1 SIA roadmap for interconnects [Sem04] 	  73 
3.2 	Spatial defect distribution f (D) 	  83 
5.1 Architecture used for Place and Route analysis. The parame-
ters Fc and Fs indicate the connection block and switch block 
population respectively 	  117 
5.2 	Interconnect utilization of MCNC benchmarks placed and routed 
using a FPGA architecture which resembles commercial FPGAs 118 
6.1 BIST selection 	  138 
6.2 Stuck-on faults resolution 	  141 
7.1 Architecture used for Place and Route analysis. The parame-
ters Fc and Fs indicate the connection block and switch block 
population respectively 	  150 
7.2 Switches used by original and fault tolerant configurations . . 	 160 
17 
Chapter 1 
Introduction 
1.1 Motivation 
As FPGAs continue their growth in the semiconductor market, manufacturers 
continuously search for extra performance and flexibility. In an effort to im-
prove device operations and reduce power and area, component's dimensions 
and spacing between them are often reduced to a minimum. As the industry 
entered the deep sub-micron regions, however, the reduction of lithographic 
pattern's width and separation has lead to higher occurrence of defects: this 
has particulary affected the metal layers, up to 12 of which can be manufac-
tured in the latest technology process [Xi106]. As the width and separation of 
metal lines are decreased, the metal layers are becoming the main source of 
device faults. 
A defect is defined as any variation from the intended design; in the deep 
sub-micron region any deviation from the design is likely to cause a catas- 
18 
trophic failure, due to the inherent low tolerances of such small structures. 
Manufacturing yields are therefore decreasing, leading to higher device costs 
and hampering the use of FPGAs in large volume projects. 
In order to ensure that large FPGAs can still be manufactured with rea-
sonable yield, some kind of fault tolerance technique has to be devised. Altera 
has employed redundancy for its devices for a number of generations, and it 
claims the yield benefits are invaluable. The scheme used by Altera in its de-
vices is however not well suited to island style FPGAs, and is thought to suit 
fault tolerance in the logic and local routing only and not in the metal layers 
used for the general routing network. 
1.2 Objective 
The primary objective of this thesis is to investigate the effects of defects in 
the metal layers on manufacturing yield of FPGA devices, and to develop 
techniques to tolerate such faults and improve productivity. 
1.3 Outline 
This thesis is structured as follows. In Chapter 2 a review of previous and 
concurrent work in related areas, such as yield analysis, fault tolerance and 
testing, is presented. 
Chapter 3 introduces the yield analysis technique upon which many of 
the results presented are generated from. Based on well known yield analysis 
19 
models, the technique relies on layout analysis of metal layers. From the layout, 
the sensitivity to defects is extracted which, coupled with defect size and defect 
density distributions, provides a yield analysis framework for the metal layers 
of FPGA devices. The model is used to show that the yield losses expected 
from defects in the metal layers are significant, especially for larger devices. It 
is further refined in conjunction with predictions made by the SIA roadmap for 
semiconductors to show that low yields are also expected for future technology 
nodes. 
Chapter 4 builds on the work in Chapter 3 to provide an in-depth yield 
analysis of some of the current fault tolerance techniques. In particular, a 
spare-row scheme is thoroughly analyzed, providing a measure of the yield 
benefits achieved by using this technique. 
Chapter 5 deviates slightly from pure fault tolerance techniques to provide 
an analysis of another approach to yield enhancement. Designated as Design-
Specific FPGA, this approach offers one-time programmable FPGAs which are 
structurally equivalent to standard devices. By losing the reprogrammability of 
devices, and therefore utilizing a limited number of resources, it is possible that 
devices which exhibit a fault can be used if only in a limited capacity. Chapter 
5 provides a statistical analysis of the benefits of losing the reprogrammability 
of devices to favor yield. 
Chapter 6 describes a Built-In-Self-Test (BIST) technique which is used 
to provide fault location. Used in conjunction with a priori knowledge of the 
type of fault the device exhibits, the BIST technique efficiently locates the fault 
20 
in the interconnect; the information can then be used by the fault tolerance 
scheme to avoid the fault. 
The fault tolerance scheme is presented in Chapter 7. Based on the re-
configuration properties of FPGAs as well as exploiting the high regularity of 
devices, the fault tolerance scheme is shown to offer significant yield benefits 
at very small timing and area costs. Importantly, due to the nature of the 
scheme, it can support multiple faults, which are expected in most devices in 
future technology nodes. 
Chapter 8 provides a summary of the main contribution of the thesis and 
opportunities for future work. 
1.4 Statement of Originality 
There are two main areas of contribution in this thesis. One area involves 
the yield analysis of the metal layers of FPGA devices, and is discussed in 
Chapters 3, 4 and 5. The second area is fault tolerance for FPGAs, and is 
discussed in Chapters 6 and 7. The main contributions are summarized below; 
smaller contributions are summarized in the introduction of every chapter. 
• A yield analysis framework for the metal layers of FPGA devices, based 
on established techniques, used to prove that yield losses for large devices 
are significant [CCCV05a]. 
• A yield analysis of previously presented fault tolerance techniques, 
including Altera's redundancy scheme and Xilinx Easypath pro-
21 
gram [CCCV05b, CCCV0613]. 
• A novel approach to interconnect fault tolerance, based on the reconfigu-
ration properties and high regularity of FPGA devices [CCV04, CCCV064 
22 
Chapter 2 
Background 
2.1 Introduction 
This chapter aims to provide an overview of the different concepts taken by 
researchers in the field of yield analysis, testing, static and dynamic fault tol-
erance for FPGA. The structure of this literature review is as follows: Section 
2.2 introduces the basic elements of FPGA architecture. Section 2.3 presents 
a brief account of research in the field of VLSI yield analysis, while Section 
2.4 focuses on testing techniques for FPGA. Section 2.5 outlines some of the 
popular techniques used to improve yields of VLSI devices, and Section 2.6 
presents yield enhancement techniques specific to FPGAs. Section 2.7 briefly 
introduces techniques in the field of dynamic fault tolerance. Finally Section 
2.8 summarizes the work and provides a brief look into what future research 
trends might be. 
23 
Programmable 
routing 
Logic 
block 
Figure 2.1: Internal structure of a typical FPGA [Bet98]. 
2.2 FPGA Architecture 
An FPGA consists of an array of MxN Configurable Logic Blocks (CLBs), 
which are programmed with configuration data to generate logic functions. A 
typical FPGA device is shown in Figure 2.1 [Bet98]. The set of all configuration 
data makes up the FPGA configuration. Each logic element is connected to 
other CLBs via the programmable interconnect network. The input/output 
signals are routed outside the FPGA using input/output blocks (I0Bs). 
A Configurable Logic Block is made of one or more n-input Look Up Tables 
(LUTs), used to generate any n-input logic function. Each output from LUT 
is connected through a D-Flip Flop, which can be bypassed by means of a 
multiplexer. 
A switch matrix is used to program the interconnect network. Connections 
are created to connect two lines via switches. 
24 
FPGA manufacturers have recently begun to embed hard IPs in the FPGA 
fabric, offering customers high performance blocks to complement the config-
urable logic. These hard IPs include memory blocks, multipliers, and even 
all-purpose microprocessors. Table 2.1 summarizes the features of some the 
most popular devices available on the market. 
2.3 Yield Analysis and Prediction 
The problem of yield loss has been studied since the integrated circuit manu-
facturing process began [Pos58]. As the electronics market expanded and more 
money was invested, the need to study economically viable solutions became 
apparent. Studies continued on the automation side, in order to make the 
manufacturing process as precise as possible [Mye61]. Designers discovered 
that semiconductor device yield is determined primarily by the defect density 
and the circuit's critical area, i.e. the portion of the circuit active area in which 
the occurrence of a defect results in yield loss [FP85]. 
Research concentrated on these two aspects, and on how to improve the 
manufacturing process. Mathematical models were introduced to calculate the 
critical area of a given device and to predict yield [FP85, FP92a, Sta83]. All 
of this work is based on some underlying assumptions, many of which are still 
the subject of investigation [FP92b]. 
25 
Table 2.1: Summary of some popular FPGA device characteristics. 
Device Family Logic Block Type Embedded Memory Other Features 
Altera Stratix [A1t03] 4-input LUT 512 bits, 4 Kbits, 
and 512Kbits block 
DSP blocks and 
embedded multipliers 
Altera Stratix II [A1t04] 8-input adaptive 
logic module 
512 bits, 4 Kbits, 
and 512Kbits block 
DSP blocks and 
embedded multipliers 
Xilinx Virtex II [Xi104a] 
Xilinx Virtex II Pro [Xi104b] 
Xilinx Virtex 4 [Xi105b] 
4-input LUT 18 Kbits 
blocks 
Embedded PowerPC 
processor 
embedded multipliers 
Actel ProAsic+ [Act05] 3-input block 256x9 bit blocks Flash based 
Lattice EC/ECP [Lat06a] 4-input LUT Up to 535Kbits DSP blocks 
Lattice XP [LatO6b] 4-input LUT Up to 414 Kbits Mix of Flash 
and SRAM cells 
2.3.1 Yield Analysis for FPGA 
While general VLSI techniques exist to analyze and predict manufacturing 
yields, yield analysis for FPGA often concentrates on probabilistic analysis of 
the array as a collection of smaller entities. An early approach from Doumar 
and Ito [DI03], and also used by Hanchek and Dutt [HD98], assumes that 
defects are independently distributed on the chip, allowing defect probabilities 
to be modelled using the Poisson distribution. The probability that one CLB 
is defect free is given by (2.1), where A, is the cell area, and D is the defect 
density. 
p = e-DA,. 	 (2.1) 
Among the shortfalls of this model is that it underestimates the yield when 
the DA, product extends beyond unity. 
The yield of a non-defect tolerant FPGA chip is given by (2.2), where A is 
the number of defects per unit area and SNDT is the non-defect tolerant area. 
,,-A SNDT YNDT = 	• (2.2) 
Yield expressions of fault tolerant chips vary depending on the fault toler-
ance scheme. These can be found in [DI03, HD98]. 
27 
2.4 Testing 
Testing of a system is an experiment in which the system is exercised and its 
resulting response analysed to ascertain whether it behaves correctly [ABF90]. 
Testing procedures can be divided into two separate categories: fault detection 
and fault diagnosis. Detection aims to determine whether a circuit or design 
behaves as expected; if not, the circuit or design is deemed faulty. Diagnosis 
only takes place on faulty devices: the aim is to locate and identify the cause 
of the misbehaviour and it is therefore a more complex procedure than fault 
detection. Fault diagnosis is only required if the manufacturer or the user plans 
to enhance the yield and/or tolerate faults. For this reason fault detection and 
diagnosis are very rarely considered together in research studies. In the case 
of FPGA devices, the tests performed for fault detection vary greatly from 
those performed for fault diagnosis, hence the need to differentiate between 
the various solutions present in either field. 
2.4.1 Fault detection 
Conventional research studies targeting fault detection can be categorised as 
shown in Figure 2.2 [DI03]. The different approaches have been categorised 
into three main sections, depending on the FPGA features they exploit for 
testing. The first section, testing using the programmability, makes full use of 
the reconfiguration properties of FPGA. The second section, labelled Design 
for Testability, uses modification to the original structure of devices with the 
28 
Based FPGA fault detection 
Testing of the Chip using only the programmability 
	Testing of CLBs 
1---Testing of a single CLB 
	Testing of CLB Array (strategy) 
	BIST 
	Array-based 
	I Approach 
	Universal test 
	Testing of interconnect Resources 
	BIST 
Non-BIST 
	Testing of I/O, programming circuits, etc. 
Design for Testability 
	Testing of CLBs 
h-  Modified Scan 	 Shifting Data 
	Testing of Interconnect resources 
(..._ Shifting data 
	Testing of CLBs and interconnect resources 
I— Shiftingdata 
IDDQ test 
I—Testing of CLBs 
	Testing of I/0 
Figure 2.2: Classification of SRAM-based FPGA approaches to fault detection 
[D103]. 
sole aim of facilitating tests. Finally, the third class applies IDDQ testing to 
FPGA devices. Each category has different advantages and disadvantages. 
Following is an overview of each category, outlining the work that has been 
carried out. 
Testing using programmability 
Detecting faults in CLB array 
When testing CLBs only, the formulation of the testing problem is much sim- 
plified, as it becomes a matter of generating enough testing patterns to ensure 
29 
that the CLB is exhaustively tested [DI03]. As the number of configurations 
needed to fully test the CLB is directly proportional to the time required to 
perform the test, it becomes obvious to target the minimization of such value 
to improve test times. This has been carried out in [HL96, SKCA96, HMCL98, 
RPFZ99, RPFZ97]. 
The most comprehensive work in FPGA testing was documented in [SKCA96, 
AS01, SLA97, SWHA98]. This research introduces a new method of Built in 
Self Test (BIST) exploiting the reprogrammability of FPGA systems. Testabil-
ity is achieved without any hardware overhead, as the test is carried out when 
the chip is off-line. The BIST logic then "disappears" once the chip is put 
on-line again. The test is carried out by having the test controller (Automatic 
Test Equipment, CPU, or maintenance processor) load the test configurations 
needed to create the BIST logic into the FPGA, initiate the tests, read the 
test result and produce a pass/fail indication. 
To support the self-test procedure, some of the logic blocks are configured 
as Test Pattern Generators (TPGs) or Output Response Analyzers (ORAs). 
The strategy is to then configure a group of Configurable Logic Blocks (CLBs) 
as exhaustive TPGs and comparison-based ORAs, and the remaining CLBs 
as blocks under test (BUTs). BUTs are repeatedly configured to test them 
in all their modes of operations. A test session is a sequence of test phases 
that completely tests the BUTs. Once the BUTs have been tested, the roles of 
the CLBs are changed so that in the next test session previous BUTs become 
TPGs or ORAs and vice versa. Using this method a minimum of two test 
30 
sessions are needed to completely test the device. 
This approach eliminates the need for dedicated hardware resources for 
testing in the device, as it makes full use of the reprogrammability of the 
FPGA. Since the test sequences are generic, in that they are a function of 
the FPGA architecture and independent of what is programmed in the device, 
this technique can be used at every testing level. In addition this technique is 
independent of the size of the device and it provides testing for interconnect 
resources and programming circuits, albeit not exhaustive. 
This work was further improved in [SLKA96]: the authors presented a 
solution to the scaling problem found in the previous project with the use 
of combinational modules connected in geometrically regular patterns, called 
Iterative Logic Arrays (ILAs). This approach solves the issue of scaling com-
pared to the original BIST approach, at the cost of a much more complex and 
time consuming testing algorithm. It is proved in the paper that three test 
sessions will be required to test the whole device. This is due to the nature of 
FPGA architecture, which it is not particularly suited to ILAs, and hence re-
quires some CLBs to be programmed to support the testing functions, in much 
the same way as TPGs and ORAs were programmed in the original algorithm. 
Probably the most straight forward design, in terms of algorithm complex-
ity is the so-defined "naive" approach [HL96]. The CLBs are tested by simply 
connecting their input/outpus to the I/O pins of the device. The test is then 
carried by an external entity. This methodology is limited by the number of 
I/O pins available on the device, and can result to very long down time. The 
31 
array-based approach [HMCL98], developed by the same authors, uses a row 
or column-based structure to perform the tests. Rows of CLBs are configured 
as AND or XOR gates and their outputs are conducted by the following row 
of CLBs to the FPGA input-outputs. 
Other test methodologies include the application of conventional universal 
fault testing to FPGAs [IFM+95], and the introduction of the concept of test 
transparency to interconnect testing [HT00]. 
While all the work discussed above assume an off-line device is used for 
testing, sometimes it is necessary to maintain the chip on-line and guarantee a 
minimal level of functionality. The most interesting work on this subject was 
proposed by Shombert and Shnidman in two separate works in the context of 
systolic arrays [SMSP98, SS98] and then refined to be applied to FPGA devices 
[AES01]. The main idea is to have only a relatively small portion of the chip 
off-line and being tested, while the rest of the device is on-line and continues 
operation. Testing of the entire FPGA is accomplished by repeatedly moving 
different sections of the systems logic in the most recently tested part, thus 
allowing a new area of the chip to be tested. Figure 2.3 illustrates this process, 
called of roving spares, fault scanning or roving stars in each work. While the 
pioneering works needed some degree of redundancy to implement the design, 
for FPGAs roving relies on Run-Time Reconfiguration, making this a very 
efficient approach. 
Detecting faults in the interconnect resources 
32 
VVorking 
W
o
rk
in
g  
Se
lf-
Te
st
in
g  
	
X  .--* 
Working 
—4. 
Figure 2.3: Roving area under test [AES01]. 
It is a well known fact that interconnects occupy the majority of chip area 
in FPGA devices. It therefore makes sense to concentrate efforts to eliminate 
faults and defects from them, as the possibility of faults in them is higher than 
in the case of other parts. 
There are two main strategies to interconnect testing in FPGA. One is 
the application of BIST approach to interconnects [SWHA98, NNJ02, HT02, 
LS03]. The BIST technique proposed by Stroud et al. [SWHA98] is a com-
parison based approach. Two Wires Under Test (WUTs) receive identical test 
patterns and their outputs are compared by ORAs. This approach however 
fails to detect multiple faults that have identical faulty behavior in the two 
WUTs groups. This BIST technique is also aimed at dynamic faults, and can 
be used during device operation. It provides complete fault coverage, however 
it requires an external reconfiguration controller. A similar concept has been 
proposed by Niamat et al. [NNJ02]. Two sets of 4 wires are applied with test 
vectors and their output compared. The ORA in this case does not produce a 
pass/fail indication but a bit sequence. The output response is then stored in 
a LUT and used at later stages to locate the position of the fault. A different 
implementation based on parity check was proposed by Sun et al. [SXXT01]. 
33 
In this approach the outputs of the WUTs, connected in snake-like chains, are 
checked for parity against the parity from the original test vectors at the ORA 
to produce a pass/fail result. This approach however, due to the way parity 
checking is done, has only 50% error detection efficiency and is very time con-
suming. The authors in [HT02] presented a BIST scheme for cluster-based 
FPGA architectures. Based on the concept of test transparency, they define 
configurations which enable test access to high density logic clusters embed-
ded within each FPGA tile. In general, the BIST approaches reported so far 
require many configurations and are unsuitable if fast testing is needed. 
The other strategy is a more classical approach not using BIST. Various 
researchers have proposed different models [RFZ97, RPFZ98, MY0I96], based 
on different assumptions. All these methods, albeit very fast and compact, are 
limited in functionality by the number of I/O pins needed to access the chip at 
various stages of the testing process. They are thus unsuitable to applications 
requiring resource inexpensive testing. 
Detecting faults in other components 
Apart from LUTs and interconnect, faults in the other circuit used to pro-
gramme the FPGA have to be considered. The programming circuit of LUT-
based FPGAs consists of two shift registers, a control circuit and a configura-
tion memory cell array. The latter component can be tested using conventional 
techniques. Research has addressed the testing of shift registers using only 
the properties of the programming circuit, without using additional hardware 
34 
[MY0+99]. This is the only documented research aimed at these specific com-
ponents, and it assumes a fully working control circuit. At present, a test for 
the control circuit has not been documented. 
Design for Testability 
Design for Testability is rapidly becoming a very interesting area of research. 
It aims at the modification of the original hardware to a new structure to 
simplify the testing procedure. As chips increase in size and functionality and 
move towards SoC, testing each individual component using classic techniques 
can be time and resource consuming; hence the need to modify the structure 
to allow faster, more efficient testing. 
One approach is based on modified scan procedure to sequentially test every 
module in an FPGA [AARCS91]. The authors claim to provide 100% fault 
coverage for single stuck-at faults. Unfortunately, this method requires several 
tests and does not fully exploit the regularity of FPGAs to reduce test times. 
Another very interesting approach in this category proposes the modifica-
tion of the SRAM so that it will be able to shift the configuration data and 
then achieve the testing by loading the configuration data only once instead of 
loading it many times as in the case of other methods. This method is applied 
to both CLBs [DI99a] and interconnects [DI99b]. A fundamental problem with 
this approach however is that it assumes the SRAM is made up of shift regis-
ters. If the SRAM is a conventional RAM array, it will not be possible to shift 
the data on the chip. 
35 
Design for testability in FPGAs has only recently come of age. However, it 
seems a popular belief that it will prove fundamental in the future as systems 
move towards System-on-Chip, thus making other test approaches too costly 
to be applied. 
IDDQ Testing 
IDDQ testing is considered to be very convenient to test parts of the FPGA 
not usually testable using the programmability. This area of research targets 
testing of specific parts of the FPGA, including I/O blocks and internal parts 
of the CLB blocks. In particular, bridging faults within the CLB have been 
tested [ZWL98b, ZWL98a]. The same research project later targeted input and 
outputs [ZWL99]. Interestingly, there is not any recent work that addressed 
the test of interconnect resources. 
This type of test is rather popular because it does not suffer from the limi-
tation of FPGA input-outputs, since the test output signals do not need to be 
conducted to an off-chip platform to be observed [Raj95]. The main downfall 
of this method is the test time, which is very long because the measurement 
time is slow. However, this limitation is accepted in VLSI testing. It is then 
the manufacturer that becomes the only one qualified to determine whether 
the test time is acceptable or not. 
IDDQ testing is set to disappear from the conventional testing strategies as 
very high leakage currents as those found in modern FPGA tend to distort the 
test results [KRH00]. 
36 
S 	Based FPGA fault diagnosis 
Diagnosing faults in CLBs 
Diagnosing the faulty CLB using the programmability 
	BIST 
	Array-based 
	I Approach 
	Universal test 
	Diagnosing the faulty CLB using design for testability 
	Modified scan diagnosis 
	Shifting the configuration data 
Diagnosing faults in interconnect resources 
	Fault diagnosis using the programmability 
I 	BIST 
	Non- BIST 
	Fault diagnosis using the design for testability 
I 	Shifting data 
Figure 2.4: Classification of SRAM-based FPGA approaches to fault diagnosis 
[DI003]. 
2.4.2 Fault Diagnosis 
Fault diagnosis is essential if the manufacturer or the users plan to incorporate 
fault or defect tolerance in a design. It can be implemented after any level of 
testing, depending on whether it is applied by the manufacturer or the user. 
The research studies in this area can be categorised as shown in Figure 2.4 
[DI003]. 
Diagnosing faults in the CLB array 
Most researchers believe that more effort targeting the diagnosis of the precise 
faulty point in CLB (multiplexer, LUT, connection) is not required, as any of 
these faults will render the whole CLB unusable. Effort is generally spent in 
fault location instead of determining the cause of the misbehaviour. 
An improved BIST approach to provide fault location is proposed in [SLA97]. 
37 
The improvement is achieved by exploiting the regular structure of the FPGA. 
Four test sessions are performed: after the first two sessions the row where a 
fault is present can be determined, and then the chip is rotated by 90 degrees. 
During the third and fourth session the column where the fault is presents can 
be determined. It is then simple to deduce the exact position of the faulty 
CLB. 
An attempt has been made to introduce fault diagnosis and consequently 
fault tolerance using a high level synthesis approach [APS02]. The proposed 
idea partitions the dataflow path of the application circuit into disjoint sub-
graphs in which self checking is achieved by using one of the error detection 
techniques available in the literature for the specific type of computation per-
formed. These subgraphs are then suitably grouped into disjoint clusters. 
Each cluster is mapped into a tile (portion of the FPGA). When a fault is de-
tected in a subgraph of a tile, the whole tile is reconfigured onto a spare tile of 
the FPGA. The authors declare that clustering is performed so that both the 
circuit complexity of the overall system and the interconnection complexity 
among tiles for reconfiguration are limited. Unfortunately the authors failed 
to demonstrate their algorithm with any concrete results. Furthermore, the 
effects of wiring delays are ignored. 
Universal test approaches have been proposed for fault diagnosis as well, 
but they suffer from very long test times [IMF97, IMF98]. Array-based ap-
proaches do not need any modification, and fault diagnosis is achieved by 
applying the test strategy twice, once with the chip rotated by 90 degrees. 
38 
Unlike for testing, design for testability for fault diagnosis has not made 
any relevant progress. The main question here is how to improve the actual 
FPGA design, in order to achieve fault diagnosis. One of the reasons as to 
why research in this field has not advanced may be the fact that in order to 
improve the FPGA design, knowledge of the actual structure of the FPGA is 
required, and such information is not readily available. 
Diagnosing faults in the interconnect resources 
The interconnect resources in an FPGA are very complex. Therefore fault 
diagnosis might require a long time. Most the research carried out in this 
field makes use of the programmability of FPGA devices. Only one research 
project has attempted to define a design using Design for Testability param-
eters [DI9913]. The approach was explained in a previous section, but it is 
worth noting how the test does not work on the latest FPGA devices. This 
is because it assumes a regular interconnect structure, which is not found in 
modern designs. 
2.5 General VLSI Static Fault Tolerance tech-
niques 
One of the most commonly used techniques for VLSI yield enhancement is 
that of layout modification. The main advantage of this approach when ap-
plied to FPGA is that it does not require any change to the CAD tools or any 
39 
'On track" added via 
i 	"Off track" added via 
\ 1 • • \ 
WA 
• \ila  
M N. 
• U. 
Original Layout 	 Vias Added 
Figure 2.5: Introducing redundant vias in the silicon [AW97]. 
additional resources, and only requires minor fabric modifications. One of the 
advantages of careful layout is the reduction of the critical area of a device, 
and consequently its sensitivity to defects. When dealing with metal layers, 
an example of layout modifications for yield enhancements is increasing the 
spacing between metal lines. Such a modification does not affect the RC char-
acteristic of interconnect; however, it will result in functional and performance 
differences when applied to the active logic region. 
Layout modifications techniques are introduced in the next subsections. 
2.5.1 Via replication 
Via failure is a common cause of fault within a VLSI device. In IC design 
it is often possible to insert in the layout an extra via to partner an existing 
one [AW97]. This is true in particular for sparse circuits, where there is suf-
ficient free space left between tracks for an extra via. An example of this is 
shown in Figure 2.5. The extra via enables a single via fault to be tolerated, 
if the defect is an isolated random event and does not affect the other via. 
The via addition results show that, on average, more than 50% of non 
40 
redundant vias can be replicated. This is of course dependent on the density 
of the circuit taken into account. 
2.5.2 Compaction 
The main purpose of compactors is to minimize the die area, in order to man-
ufacture more dies in a wafer. Most tools however also perform wire length 
minimization and jogs reduction. 
A first technique based on non critical element placement was proposed by 
Allan et al. [AWH92]. The scheme proposed local modifications such as in-
creasing contact size, increasing contact overlap and increasing track width. A 
slightly different approach was taken by the authors of [CK92, CK95a, CK95b], 
where yield improvement is achieved post-compaction by reducing the sensi-
tivity of the layout to both short and open circuit defects. The sensitivity is 
reduced by redistributing the spacing between non critical elements. A tech-
nique to reduce the sensitivities to short faults was also presented in [BM96]. 
Based on network flow analysis, this method also allows minimization of cross-
talk. 
2.5.3 Routing 
Additional yield enhancement can be achieved by modifications to the rout-
ing and layer assignment. Most routers try to minimize the number of vias 
in the layout [FCF90]. Sometimes, however, to avoid a via, the router may 
introduce longer wire segments, which are more susceptible to manufacturing 
41 
imperfections. 
In [CK96] the authors proposed a new routing algorithm to reduce wire 
length as well as number of vias to achieve higher yields. The reduction in wire 
length results in lower sensitivities to open and short defects, and lower via 
counts (-30%) further improved the layout sensitivity. Via count reduction is 
also achieved with complex layer assignment, as described in [CK95a]. Another 
research [Kuo] achieves yield enhancement by using a new algorithm to improve 
layer assignment and via allocation. 
The cost function of maze routers has also been modified to include the 
probability of spot defects [HXJ95]. The new cost function reduced the sensi-
tivity of layouts by 6.4% on average. 
2.6 FPGA-Specific Static Fault Tolerance Tech-
niques 
The high regularity of FPGA and its reprogrammability offer two main meth-
ods to tolerate faults. The first, more obvious one, is to repeat some of the 
elements in the array and swap them at the factory for defective ones. The sec-
ond method for fault tolerance assumes that, when programmed, not all of the 
FPGA resources will be used, and some spare ones will be left unused. Tech-
niques have been presented to ensure that defective resources are left unused 
by the programmed FPGA. 
This section offers an overview of the proposed methods to implement fault 
42 
tolerance in FPGAs. It is split into two subsections; the first includes research 
that requires changes to the FPGA die in order to achieve fault tolerance. The 
second subsection contains a summary of techniques that require changes to 
the CAD tools to support reconfiguration around a faulty resource. 
2.6.1 Hardware modifications 
Manufacturing spare resources to be swapped for defective ones is a widely 
used technique to enhance the yield of memory devices. The regular 2-D array 
of memory chips made it possible to introduce spare rows or columns which 
would be swapped in or out of the array as needed via fuse blowing. 
Hardware redundancy for FPGAs was first proposed by Hatori et al 
[HSN+93]. The authors proposed a method to introduce a spare row or col-
umn in the array without affecting the device performance. The swap, to be 
performed at the factory, would eliminate the whole row where the defect is 
present, by means of blowing a fuse. The main innovation of this work was 
the placement of row selectors after the row decoders, as shown in Figure 2.6. 
The scheme would then only require adjustments to the row selectors, as op-
posed to the whole row decoder/selector block. The routing segments are also 
extended to allow full routability when a row is swapped. 
The spare row approach has been adopted by Altera in its commercial 
devices [RMLP]. Factory programming allows the datapath to be modified as 
to bypass rows that contain faulty cells. Connections to resources outside the 
logic block array (I0Bs) are maintained by reconfiguring the datapath. 
43 
iv 
Selector 	 Vertical wiring 
0 segment 
4 
5 
6 
7 
Spare row 
(disabled) 
(a) without defective row 
Logically 
equivalent 
used 
not used 
Vertical wiring 
segment 
/ \ 
Vertical wiring 
segment 
Selector 
Logically 
equivalent 
Defective row 
3 	(disabled) 
4 
5 
6 
(Spare row) 
(c) without redundancy 
architecture 
(b) with defective row 
Figure 2.6: Introducing a spare row in FPGAs [HSN+93]. 
44 
The main aim of most hardware based fault tolerance solutions is to ensure 
that every device is logically equivalent to a defect-free one. If this is achieved, 
the design process is only carried out once and the resulting device configu-
ration can be applied to all devices. Kelly and Ivey [KI94b, KI94a] presented 
a technique for defect tolerance based around reconfiguration networks. The 
technique uses a network of reconfiguration switches together with an on chip 
reconfiguration processor to bypass defective elements. The proposed transfor-
mation algorithm consists of simple bit comparisons and data shifts taking of 
the order of minutes to complete. Fault tolerance is achieved by modifying the 
array and ensuring that the defective elements are not included, as shown in 
Figure 2.7. Once the arrays is reconfigured, normal configuration loading com-
mences. The scheme calls for modifications to the wiring segments as well as 
the inclusion of a spare element at the end of each row. The use of the on-chip 
reconfiguration processor eliminates the need for user or factory intervention. 
The authors in [HD98, HD96] propose a scheme whereby the FPGA con-
figuration datapath is modified to remove a faulty resource. The original con-
figuration data can then be used, with the use of additional wiring segments. 
Under the principle of node covering, each primary node is assigned a cover 
node which can be reconfigured if the primary node is found to be faulty. If a 
fault is identified through testing, the FPGA can be reconfigured such that the 
faulty cell is replaced by its cover, which in turn is replaced by its own cover, 
and so on until a spare cell in the chain is reached, as shown in Figure 2.8. 
This work was further refined in [MD99] to consider a dynamic node covering 
45 
Normal Configuration 
Network 
— — — — - 
Re-Configured Network 
Connections 
CLB 
Switch 
Matrix 3r 
_ 
Figure 2.7: Fault avoidance via reconfiguration networks [KI941)]. 
technique. 
The reconfiguration procedure assumes that cover segments necessary for 
reconfiguration around a faulty cell are provided by the routing tools and 
included in the initial configuration data. When configuration data is loaded 
in the absence of faults, it bypasses the spare node, and a factory enable signal 
forces the configuration pass-transistors of the spare cell to the OFF state, 
as shown in Figure 2.9. If a cell is faulty, the faulty cell's configuration pass 
transistors are turned off instead. The user is unaware of the presence of a fault, 
and laser-programmable links burned at the factory allow the configuration 
data to bypass the faulty cell. 
A variation of this rerouting makes fault tolerance transparent to the user 
by using extra wiring and factory-configured switches on the tracks to physi-
cally "warp" the channel routing segments around a faulty cell while maintain-
ing the same logical routing configuration. The extra switching delay overhead 
is too high for use in reconfiguring around individual FPGA cells, but can be 
employed more efficiently using blocks of cells [HTA94]. 
A similar concept was employed by the authors in [YL05]. Multiplexers are 
introduced in the switch block and connection block of the FPGA, as shown 
in Figure 2.10, allowing every wire to be shifted along the routing channel by 
one or two tracks, thus enabling a faulty track to be bypassed, as shown in 
Figure 2.11. 
In the context of Wafer Scale Integration, Chapman [Cha96] proposed a 
WSI fault tolerant FPGA. The long term concept is to build the wafer scale 
47 
•  • • • 
61 SI— 
• 
SPARE 
CEL 
CELL 
B 
CELL 
C 
CELL 
D 
-.1. •	 • • • 
SPARE 
CEL 
CELL 
F 
CELL 
G 
CELL 
H 
(a) 
i_. 
Figure 2.8: (a) Row of factory reconfigurable FPGA cells. (b) Reconfiguration around defective cell with configuration data for 
the defective cell reset to the null state [HD98]. 
isr • • t_ • 
Detective 
CELL 
CELL 
E 
I_  _L.  •	 
Defective 
CELL 
.0 •  • 
CELL 
C 
CELL 
D 0— 0— 
• •	 
(b) 
• 
Channel Segment in use Channel Segment reserved Channel Segment unused 
CELL 
A 
CELL 
E 
CELL 
A 
CELL 
B 
CELL 
G 
___L 
CELL 
H 
CELL 
F 
Channel Routing 
Configuration Memory Bit 
Former 
Primary Connection --- 
CELL 
A 
SET 
0 
(a) 
Cell Configuration Memory Bit —, 
v 
celLis_reconfigured Channel Routing 
cell js_reconfigured 
Configuration Memory Bits 
Cell Configuration LEGEND 
I 
Memory Bit 
Channel Segment unused SET( —  SET 
0 
Former Channel Segment in use 
Primary Connection - 
CELL CELL Channel Segment reserved 
A B 1,..""..11.",..,. ........... 'I.., 
(b) 
Figure 2.9: Automatic connection of reserved segments to a cover cell for reconfiguration. (a) Top cell-to-channel connection. 
(b) Right cell-to-channel connection [1-1D98]. 
'411—OMUX 2 	+1 0 
ii_\ 
PI' 'olik 
11 IN  	.... ----- -2 6-../ \ 1 _111 
2 
..(1 
2 
imux .k.:_i_y 
+k,.,,)  
t. t+-1. 
t, t+1-1, 
t+-2 
OMUX OMUX 
Enhanced 
Switch Block 
Enhanced 
Switch Block r 
0 0 
J 
LH I 
	
I 	I 
L_ 	+.1 	I 	IS I 
Equivalent Defects Wire Driver 
t 	t, t+/-1.t+/-2 
1, t+/-1,t+/-2 	t 
Figure 2.10: A defect-tolerant switch block [YL05]. 
Figure 2.11: Single length defect avoidance [YL05]. 
50 
FPGA, test the cells after fabrication and determine the working logic blocks. 
Then a permanent defect avoidance routing would be performed to hook to-
gether only the working cells and avoid the faulty ones. A similar technique is 
adopted by [KDFJ89]. 
Fault tolerant logic elements 
Bartzick et al [BHKWOO] presented a new fault tolerant logic block for FPGAs. 
The procedure is based on runtime testing that does not increase the delay to 
the functional blocks. The logic block proposed consists of 4 cells, 3 standard 
cells and one special cell, as shown in Figure 2.12. When working in standard, 
non-fault tolerant mode, all four cells are used to implement logic functions. 
When working in fault tolerant mode, three cells perform logic functions while 
the other is tested. The logic block also consists of a comparator unit, which 
compares the output of the cell under test against a reference value. The special 
cell is used to restore the function of any cell found to be faulty. The status 
of every cell is either stored or restored at any point, allowing full operation 
of the circuit. 
2.6.2 Modification to CAD tools 
FPGAs are inherently redundant. A typical design mapped onto a FPGA will, 
in general, not use all the resources on the device. It is therefore reasonable to 
consider the case where methods could be found to exploit the unused resources 
to replace any defects. 
51 
North 
South 
West 
11. 	 
Ent 
\ North ►	 Ref value 
4 	
\ East 
• A_Out 
\ 11_0u! 
• C_Out 
East  TestC 
A Out 
X_Out 
C Out 
	I 	I 
N's South x t 
\ North ► \ North 
— toO 
-Inl Out 
X_Out 
B Out 
A_Out 
11 Out 
C Out 
X_Out 
	pg. 
\ Ea 
• II_Out 
, C Out 
• A_Out 
X_Out 
A Out I I — 
	
South ► 	 
1 X Out to 	 
— Ino 
Int 	out 
1o2 
\ East 	► 	  
\ A Out 
C▪ -OUt 
• D_Out 
41— 
fault 
Comparator 
Unit 
• North 
West ► 	 
13 _Out P. 	 
C Out 1.  
TestA 
West \ West ► 
• South ► 	  
• X_Out 
West 
• South Its—S 
N X_Out 
tot 
Int 	Out 
1,12 
0 
X_Out 
n2 
Figure 2.12: Fault tolerant FPGA block [BHKW00]. In fault tolerant mode, block X is used to replicate the function of any 
defective look-up table. 
This section introduces some of the work done to implement fault tolerance 
in FPGAs exploiting the reprogrammability of devices. 
Defect maps 
Defect map techniques are commonly used in computer hard drives to provide 
the data controller the locations of the faulty sectors which are then avoided. 
In FPGA, if the location of the fault is known a priori, it becomes possible 
to run through the design process and obtain a circuit that does not require 
the use of the defective resources. 
The concept of defect maps for programmable devices was first explored by 
Demjanenko in [DU90]. The implementation of the technique required prior 
knowledge of the location of the faults within the array. With this knowledge, a 
feasibility matrix is set up, which indicates the allowable product line mapping 
for the given device. 
A similar conception has been used to implement defect tolerance on the 
Teramac Custom Computer [CAC+91 and in multiple FPGA systems in [HWO5]. 
Defect tolerance is implemented by first detecting and locating the defective 
resources using diagnostic tests. Information about the defects is then stored 
in a defect database. Finally, when user designs are mapped onto Teramac, the 
mapping software reads the database and maps the design using only defect-
free resources. 
The work presented in [Mis04] proposes to split the CAD flow algorithms 
into two parts, one to be performed per-design, and one to be performed per- 
53 
fabric. The per-design component will only need to be performed once by the 
application developer, whereas the per-fabric part will be performed by the 
users for each device against a defect map - stored in the device - in order to 
generate a unique configuration that will be functional on a defective device. 
At present there are no results available for this scheme so it is impossible 
to comment on its efficiency. It can however be argued that, in mainstream 
applications, this approach would not be considered the most practical because 
every device potentially requires its own configuration bitstream. 
Dynamic reconfiguration 
Work based on a priori knowledge of the defect locations are described in 
[NNRD94, HC90, RN95, MCZL97]. These approaches have an obvious disad-
vantage in that the layout tools are required to perform a new routing of a 
circuit for each new faulty cell or wiring segment encountered. 
The authors of [ML96] present an algorithm for placement reconfigura-
tion to minimize the timing degradation that also reduces the amount of re-
programming that needs to be done. The scheme the authors present is slightly 
faster than doing placement and routing again to avoid the faulty CLB, by us-
ing a concept known as slack neighbourhood. A slack neighbourhood graph is 
constructed to represent the amount by which a net delay can be increased 
without violating the timing constraints. Reconfiguration is then performed 
taking into account the net slack as given by the slack neighborhood graph, 
which allows to determine efficiently how to route a net after a CLB swap. 
54 
One of the most comprehensive work in this field was carried out at the 
University of Kentucky [ESSA00, AES01], a project which also resulted in the 
BIST techniques presented in Section 2.4.1. The method proposed is based 
on the roving self testing areas (STARs) fault detection/location strategy. In 
STARs, the area under test uses partial reconfiguration properties to modify 
the configuration of the area under test without affecting the configuration of 
the system function and dynamic reconfiguration properties to allow uninter-
rupted execution of the system function while reconfiguration takes place, as 
shown in Figure 2.3. Once a fault is detected, the working area application is 
dynamically reconfigured around the fault with no additional system function 
interruption. The authors also introduce the concept of partially usable blocks 
to increase efficiency and utilization. 
Shifting and re-Routing 
Emmert and Bathia in [EB97] investigated the problem of rapidly reconfig-
uring FPGA mapped designs with applications to fault tolerance and yield 
improvement. The scheme uses a matching technique to determine empty, 
non-faulty logic blocks locations suitable for reconfiguring the mapped design, 
and then uses a shift-move strategy to shift the functions of the logic blocks 
between matched locations. The scheme requires prior knowledge of the loca-
tion of the fault, as well as enough unused resources to be able to reconfigure 
any faulty logic blocks. 
A shifting scheme was also proposed in [DKI99]. The idea is identical to the 
55 
one proposed by Emmert and Bathia, where the user data is shifted on-chip so 
that the defects are avoided. The work presents two different shifting methods 
(king-shifting and horse-allocation) to shift the whole array to avoid the fault. 
The scheme requires the reservation of spare cells in the original configuration, 
which are used to remap defective cells, as shown in Figure 2.13. To achieve 
shifting, a new design of the SRAM is also presented, which allows the config-
uration data of each row and each column to be shifted to the adjacent row 
and column respectively. 
Engineers from Xilinx presented a defect tolerance scheme based on the 
JBits suite [SG01]. The JBits tools offer a function to specify circuit parame-
ters such as bit width as late as at run-time. These Run-Time Parameterizable 
(RTP) Cores allow the circuit to be modified and configured in-system and 
run-time, based on user input or sensor data. The authors have expanded the 
available function to include three basic modes (Skip CLB, Skip Row, Skip 
Column) of constructing RTP cores in the presence of faulty CLBs. Thus de-
fects can be tolerated at run-time, without the need of any additional software 
or modification to the device architecture required. 
Relative to fault tolerance and re-mapping to avoid faults, there exists 
research that focuses on reducing dependence on routers by either avoiding 
them or reducing the time required for the router to route the circuit. One 
such method proposed to reduce the routing time is to use incremental rout-
ing [EB98]. Incremental routing is a process by which some of the signal nets 
are ripped up (unrouted) and rerouted, while the majority of the signal nets 
56 
South-East South-West 
North-West 	 North 
Used CLB 
44(-Direction of 
array shifting 
D Spare CLB 
4— 
West Stay 
South 
Figure 2.13: Shifting the data in eight directions with the king-shifting distribution [DKI99]. 
are left intact. The problem with these methods is they require an on-board 
router to perform incremental reconfiguration, and the size and complexity 
of the routing tools make it impractical to have an on board router for fault 
tolerant reconfiguration in remote systems. 
Dutt et al [DST99] implemented a similar reconfiguration fault tolerance 
scheme with incremental routing approach on a number of benchmark circuits. 
With the fault locations known a priori, the reconfiguration path from the 
faulty cell to the spare cell was based on network flow techniques. In case 
of unavailability of free routing resources for incremental routing, the nets 
occupying the required interconnect resources were moved to other routing 
resources to enable routing. The method also required an on-board router. 
Pre-compiled partial configurations 
The technique presented in [EC01] for incremental routing does not require 
a router at the time fault tolerant reconfiguration takes place. Using the 
proposed algorithm, all the incremental configuration files for programmable 
interconnect are pre-compiled. The result is small partial configuration files, 
requiring little storage and allowing quick download to the device. 
Another method of eliminating the router to implement logic fault tolerance 
was demonstrated by Lach et al. [LMSP98b, LMSP98a]. During the design 
phase, the design is partitioned into tiles. Each tile, containing several logic 
blocks and programmable interconnects, has some resources left spare. This 
enables the compilation of multiple configuration for each tile, each using dif- 
58 
	LA 
I 
IA 
D- 
II 
L 
Figure 2.14: Low overhead approach [LMSP99]. 
ferent resources to implement identical functionality, as shown in Figure 2.14. 
If a resource is found the be faulty, a new tile configuration is loaded that avoids 
the fault. The tile interfaces are fixed, and thus enable tile replacement. The 
main drawback of such a scheme is the number of replacement configurations 
needed, which grows as the square of the tile size. The technique is extended 
in [LMSP99] to support interconnect fault, even though the probability of suc-
cess of this technique on interconnect is limited by the number of inter-tile 
interconnections. 
The main advantage of pre-compiled configuration approach is the min-
imization of the system downtime, as the configurations are pre-generated. 
However, this approach results in high storage required for the different con-
figurations to achieve high fault coverage. With FPGA continuing to increase 
in size, the extra cost due to configuration storage overhead may be too high, 
59 
making these solutions unattractive. 
A solution to this issue was proposed in [HM01]. Different configurations 
are generated by shifting some or all of the CLB columns in the initial configu-
rations along one direction in order to create similarities among configurations. 
Such similarities then result in higher configuration compression, and less stor-
age required. In addition, this technique does not require high precision fault 
location; the process can be replaced by a much faster gross fault location 
function, thus saving time and storage. 
More work on pre-compiled partial configurations was carried by the au-
thors in [LT00]. The work presented is applied to cluster-based FPGAs, and 
implements intra-cluster fault tolerance with the aim of simplifying CAD tech-
niques for fault tolerance implementation. The fault tolerance approach takes 
advantage of device routing architecture to quickly swap unused logic and 
routing resources in place of faulty ones within logic clusters. 
2.6.3 Static Fault Tolerance in Commercial Devices 
Altera has implemented redundant circuits in its programmable devices for 
quite some time, claiming great yield benefits [Altb]. Although the circuit de-
signs are not available, there exist patents in Altera's name highlighting solu-
tions to implement redundancy in programmable devices [RMLP, NZJ, CLL±]. 
There is no way to know whether the circuits described in the patent appli-
cations are indeed used in commercial devices; notwithstanding, the concepts 
expressed are worth mentioning for a complete review of fault tolerant systems. 
60 
The schemes presented by Altera are based on spare row redundancy. 
The device's hierarchical logic architecture favour this type of implementation 
over island-style FPGA. Having determined the type of redundancy required, 
the designs concentrate on efficient swapping procedures. Two schemes ex-
ist [RMLP, CLL+], the latter of which has been filed for patent very recently, 
a sign of continuous research in the field. 
The earlier work [RMLP] is based on shifting the configuration data avoid-
ing the programming of the row where the fault is present; using a complex 
system of multiplexers in the configuration datapath, a row can be bypassed 
and its entire configuration data shifted to the row below it, as shown in Fig-
ure 2.15(a). This scheme is supported by a complex line segmentation [NZJ], 
whereby the connection of any row to the general routing network can be 
replicated by the row below it via the use of a single extra switch, as shown in 
Figure 2.15(b). 
The latest patent [CLUE] depicts a more complex fault avoidance tech-
nique, also based on multiplexers. The scheme, depicted in Figure 2.16 uses a 
multiplexing network to re-route signals onto faultless portions of the device. 
The multiplexers are programmed by once again modifying the configuration 
datapath. 
61 
(a) (b) 
standard 
configuration 
configuration 
from previous 
j 	row 
standard 	7_, fault tolerant 
(„o switch switch 
CLB row 
configuration 
shift register 
spare row 
Figure 2.15: (a) datapath reconfiguration to avoid faulty row and (b) replicat-
ing connections to the general routing matrix. 
bad row 
62 
R+1
R 
R+2 
R+3 
R+4 
R+5 
R+6 
R+7 
R+8 
R+9 
R+10 
R+11 
R+12 
R+1
R 
R+2 
R+3 
R+4 
R+5 
R+6 
R+7 
R + 8 (bad) 
R+9 
R+10 
R+11 
R+12  
R+1
R 
R+2 
R + 3 (bad) 
R+4 
R+5 
R+6 
R+7 
R+8 
R+9 
R+10 
R+11 
R+12 
(b) 
	
(c) 
Figure 2.16: Routing signals to avoid faulty row. 
2.7 Dynamic Fault Tolerance 
Recently, ICs have become sensitive not only to radiation in space, but also 
to upsets at ground level, as a result of the continuous shrinking of fab-
rication technology for semiconductors. While dynamic fault tolerance is 
not the subject of this thesis, it is worth mentioning some of the most suc-
cessful works, as they can sometimes be used for both static and dynamic 
faults. The reprogrammability of FPGAs leads to high logic density in terms 
of SRAM memory cells, which are very sensitive to radiation and require 
protection in harsh environments. This has led researchers to explore so-
lutions to enable FPGAs to tolerate these single event upsets. Techniques 
based on Triple Modular Redundancy (TMR), Duplication with Compari-
son (DWC), and Concurrent Error Detection (CED) have been presented 
[LLD03, MSS+96, YM01, DP94, dLKNH+04]. 
Both major manufacturers [Alta, Xilb] offer aerospace-specific packaging, 
aimed to reduce the impact of dynamic faults in the lifetime of the device. 
Xilinx has provided its aerospace-specific FPGA to the successful Mars Rover 
module [Xilc]. 
2.8 Summary 
The main focus of this thesis is yield analysis and static fault tolerance for 
interconnect in FPGA devices. The literature review has provided an overview 
of the techniques presented by researchers in the field. 
64 
Yield analysis and prediction techniques have been presented; VLSI yield 
analysis is a well established practice, dating back to the early days of manu-
facturing process automation. Work specific to FPGA in this field is rare and 
is based on simplistic analysis, and therefore not well suited to analyze the 
yield of modern devices. 
Techniques to extensively test devices have been reviewed comprehensively. 
Testing techniques can be categorised into fault detection and fault diagnosis. 
The first category's aim is to obtain a pass/fail indication on whether the 
device exhibits a fault, while its location can be found through diagnosis. The 
latter is most used in conjunction with fault avoidance techniques. Testing is 
often a time consuming procedure, thereby the requirement of most researchers 
to minimize test time. 
The last area of research presented in this thesis is that of fault avoidance. 
Techniques exploiting the main properties of FPGA have been reported. The 
first, more obvious approach is that of hardware redundancy: by introducing 
spare components in the device, it becomes possible to swap the faulty ones out. 
Research often concentrates in optimizing the granularity of the redundancy as 
well as the swapping procedure. The second approach to static fault tolerance 
in FPGA is that of bitstream manipulation: by modifying a bitstream to suit 
the faulty device it becomes possible to avoid any component exhibiting a 
fault. The review is summarized in Table 2.2. 
65 
Table 2.2: Summary of FPGA fault tolerance schemes. 
Fault tolerance 
scheme 
Hardware 
modifications 
CAD tools 
modifications 
Timing 
Penalty 
Area 
Penalty 
Multiple 
Faults 
Complexity 
Spare Row [HSN+93] Yes No High Low Yes Low 
Reconfiguration Network 
[KI94b, KI94a] 
Yes No Low High No High 
Node Covering 
[HD96, HD98] 
Yes No Low High No Medium 
Fine grain wire 
redundancy [YL05] 
Yes No High High Yes Low 
Fault tolerant Logic 
Block [BHKWOO] 
Yes No High High Yes High 
Defect Maps 
[DU90, CAC+97, HW05] 
No Yes Low Low Yes High 
Per-fabric Routing 
[Mis04] 
No Yes Low Low Yes High 
Slack neighborhood 
[ML96] 
No Yes Low Low Yes High 
Roving STARs 
[ESSA00, AES01] 
No Yes High High Yes High 
Array Shifting 
[EB97, DKI99] 
Yes Yes High High No High 
JBits fault tolerance [SG01] No Yes Low Low No High 
Pre-compiled partial 
configurations 
[LMSP98a, LMSP98b, LT00] 
No Yes Low High§ No High 
On-board router [EC01.] Yes Yes Low Yes$ No High 
§Extra off-chip configuration memory is required Extra silicon area required for the router circuitry 
Chapter 3 
Yield Analysis of FPGA 
Interconnect 
3.1 Introduction 
The area occupied by wiring channels and interconnect configuration circuits 
in an FPGA is significant, occupying up to 90 percent of the chip area [BFV92]. 
With current trends aiming to reduce the area occupied by wiring segments 
in the routing channels, wire width and wire spacing have been reduced. This 
has in turn led to higher occurrences of wiring defects, such as breaks and 
shorts, and decrease in manufacturing yield and fewer functioning devices at 
fixed manufacturing costs. 
The most important contributors to manufacturing yield loss are failures 
caused by local unintended product-process interactions [Sim01]. Device de-
fects can be divided into three different categories [FP92a]: 
67 
• Gross defects such as wafer scratches, which will almost certainly cause 
a yield loss; 
• Parametric defects, which affect device performance and reliability 
and can in extreme cases affect yields; 
• Random defects, which are defined as any deviation from the original 
design, and will only affect yield under specific circumstances. Random 
defects could also affect device performance and reliability. 
Gross defects are often considered as one-offs and are thus impossible to 
predict. Parametric variations, on the other hand, are a common occurrence in 
modern manufacturing processes and lead to "speed binning": the manufactur-
ers can guarantee only up to a specific performance level. This phenomenon is 
perhaps best explained by looking at common personal computer CPUs. Iden-
tical devices are tagged to distinguish their operation at different clock speeds, 
and are often restrained to operate at a specific clock frequency by means of 
extra control circuitry. This has in turn driven enthusiasts to "overclock" their 
CPUs, i.e. force the microprocessor to operate at a higher clock speed in order 
to achieve better performance at lower costs [P4OC]. Modeling of parametric 
variations is a very broad area of study, and will not be explored in this study. 
Random defects can be further divided into two types: pinhole and spot 
defects. Pinhole defects occur in dielectric insulators; they are usually small 
and only account for a very small yield loss [Sta83]. Spot defects are due to 
extra or missing material in the active layers. Extra material often causes a 
68 
short between two conducting paths, whereas missing material can cause an 
open circuit. Since spot defects account for most of the yield loss in metal 
layers, it is this aspect that has been examined and is reported in this chapter. 
The main aims of this study are: 
• to quantify the extent of the yield loss due to interconnect spot defects; 
• to estimate the yield of large FPGA devices for future technology using 
defect parameters from the SIA roadmap predictions [Sem04]; 
• to develop a yield analysis framework which can easily be utilized to 
explore the yield enhancements achieved by fault tolerance schemes. 
The work is based on well known yield prediction techniques, dependent on 
device layout analysis. An interconnect model that represents the photolitho-
graphic patterns present in metal layers of a FPGA die is suggested. From 
this model, the portion of the area susceptible to defects, known as the critical 
area is extracted and used to formulate a yield prediction model. 
This chapter is structured as follows: Section 3.2 introduces the intercon-
nect layer model used in the analysis, Section 3.3 offers a detailed account of 
the critical area extraction, while Section 3.4 provides an overview of the yield 
analysis techniques used in this work. In Section 3.5 the results of the analysis 
are presented. Finally, Section 3.6 concludes the chapter and suggests area for 
future research on the subject. 
69 
M columns 
                      
                      
                      
                      
                      
switch 	 
matrices 
                     
                     
                     
            
116 
       
                     
                      
                      
                      
                      
                      
                      
                      
                      
M rows 
                    
                    
                    
                      
      
	 L 	
              
                    
                    
               
               
               
               
               
               
               
                      
                      
Width 
(W) 
Height 
(H) 
Figure 3.1: Interconnect metal layer. The gaps between the metal lines account 
for vias connections between layers. 
3.2 Interconnect layer model 
FPGAs have, by nature, a regular, repeating structure. Their logic architecture 
is formed by an array of identical logic blocks and switch matrices. As a result, 
all the metal connections between logic blocks are also regularly shaped and 
distanced. 
FPGAs offer wires of various length to connect one logic block to another. 
Interconnects that span single and multiple logic blocks and that have vary-
ing degrees of connectivity are manufactured to offer the highest flexibility 
during routing without affecting performance. Such hierarchical interconnect 
structure is typical of the latest generation of FPGA devices, such as the most 
advanced Altera [Alt04] and Xilinx [Xi10513] chips. An interconnect metal layer 
can therefore be modelled as a collection of lines of similar length, grouped in 
channels, leading from one logic block to another. A model of a possible metal 
layer design is shown in Figure 3.1; this diagram only shows interconnects that 
span a single logic block. 
70 
The parameters used to define any given metal layer are defined below: 
• M - width and height of CLB array in the FPGA. The device is assumed 
to be a square array of MxM CLBs. 
• lines - Number of interconnects in a wiring channel. 
• L - Length of line. This measure differs depending on the type of line 
manufactured on the metal layer. 
• w - width of the conducting path. 
• s - space between conducting paths. 
The parameter M is extracted for current technology from the largest Xil-
inx devices [Xilb]. Predictions on array size for future technology nodes are 
formulated by following FPGA technology trends, as shown in Figure 3.2. Es-
timates on the maximum array size are made by extrapolation, assuming that 
maximum die size is constant for all technology nodes. 
It is assumed that the width and the space between paths have identical 
size. The size of each parameter can then be found by halving the wire pitch 
value, which is a technology-dependent quantity. Short lines are manufactured 
at the lower layers, whereas the higher layers will host the longer, global lines. 
All inter-layer patterns (vias, contacts) are assumed to be contained in the 
areas above the switch matrices. It is therefore possible to model all lines as 
straight, parallel patterns equally spaced between each other. The patterns 
71 
22nm claw 
45nin prediction 
XCVM.X330 	XCAMUC200 
XC2V11000 
X3
X 
----."4"----n. 00 xcamsoxy 
• 
700 
600 
500 
400 
2 
300 
200 
100 
0 	 0.05 	0.1 	0.15 	0.2 	0.25 
Minimum feature size 
Figure 3.2: Exponential trend existing between minimum feature size and 
FPGA array size. 
are only broken over the switch matrices, which are regularly arranged in the 
FPGA logic array. 
With regards to the metal layers, it is assumed that the silicon space is 
used to a maximum: area not occupied by the metal lines may be occupied by 
vias and contacts, but no free area is left on the silicon. 
3.2.1 SIA predictions 
This section provides a brief summary of the relevant predictions made by SIA 
as regards to interconnect dimensions [Sem04]. Table 3.1 provides a list of the 
dimensions relevant to this study for the technology nodes taken into account. 
Note that some of the manufacturable solutions are not known. This in 
particular applies to the 22nm technology node. For most of the other pre-
dictions, solutions are known and are already being tested in large volumes. 
For those parameters which do not have a manufacturable solution, the SIA 
roadmap offers an indication of the dimensions likely to be obtained. 
72 
Table 3.1: SIA roadmap for interconnects Sem04 . 
Year 2004 2010 2016 
Technology Node hp90 hp45 hp22 
Number of Metal Levels 10 12 14 
Metal 1 wiring pitch (nm) 214 108 54 
Intermediate wiring pitch (nm) 275 135 65 
Minimum Global wiring pitch (nm) 410 205 100 
Cluster parameter 2 2 2 
Critical defect size (nm) 45 23 11 
Overall electrical defect density Do (faults/m2 ) 2210 2210 2210 
Random Do (faults/m2 ) 1395 1395 1395 
Predicted array size (M) 160 300 550 
Figure 3.3 shows a typical cross section of a modern device [Sem04], high-
lighting the increasing interconnect dimensions as more metal layers are man-
ufactured. 
3.3 Critical Area 
The critical area of a lithographic pattern is defined as the portion of the total 
chip area within which the occurrence of a defect results in a fault [FP92a]. 
More specifically, a defect of size x will only cause a fault if its center falls in 
a particular section of the chip, as shown in Figure 3.4. Figure 3.4 shows how 
defects of equal size may or may not cause a catastrophic fault, depending on 
where their center falls. 
The critical area is defined in (3.1), where ATotai is the total die area, x is 
the defect size, K(x) is the fault probability kernel, and S(x) is the defect size 
distribution. 
73 
missing 
material 
■ extra material 
non 
catastrophic catastrophic 
Global (up to 5) 
Intermediate (up to 8) 
Metal 1 
,lasKar-2Kew  
ire 
a 
—tot 	14— Metal 1 Pitch 
Figure 3.3: Typical device cross section [Sem04]. 
Figure 3.4: Catastrophic faults relative to size. Similar sized defects may only 
cause a fault if a pattern is broken or two patterns joined. 
74 
 defect 	Probability 
 
4 	
  
  
 
L 
(a) 
 
  
(b) 
Figure 3.5: Fault probability kernel K(x), defect size distribution 8(x), and 
fault probability 0(x). Note that the majority of defects have size similar to 
the minimum feature size xo. 
Ac = ATo.,1 K(x)S( x)dx. 	 (3.1) 
The integral term is sometimes referred to as 0(x), the fault probability. 
Figure 3.5 shows a graphical representation of these functions for a single 
metal pattern susceptible to missing material defects. The fault probability 
kernel shows how the portion of the defect-sensitive chip area varies as the 
defect size varies. 
The critical area for extra material defects for two parallel metal intercon-
nects is shown in Figure 3.6(a) as the area within the dashed rectangle. The 
total area is thus L x j, and j = 2w + s — 2y. y is the distance of the center 
of the defect from the lower edge of the metal pattern. Details on how to 
calculate the value of y are given in Section 3.3.1. 
The critical area for missing material defects for a single metal interconnect 
is depicted in Figure 3.6(b) as the area sandwiched by the two dotted lines. 
The total area is thus L(w + 2h), and h = x/2 — (w — d). d is defined as the 
75 
w 
extra material 
   
A 
    
        
        
    
w z 
 
Y 
         
         
         
missing material 
L 
(a) 
L 
(b) 
Figure 3.6: Parameters in the fault probability kernel K(x). L, w and d are 
architectural parameters, while h is dependent on x, the size of the defect. 
minimum strip of metal needed in order to guarantee conduction. Details on 
how to calculate the value of d are given in Section 3.3.2. The fault probability 
kernel K(x) for this function is shown in Figure 3.5. 
There has been much discussion regarding the defect size distribution 
[SISM99]. It is now widely accepted that S(x) should increase linearly to 
a maximum value until the defect size reaches the minimum feature size x0 
(see Figure 3.5), and fall away from the maximum value as a function that is 
inversely proportional to the cube of the defect size [FP92a]. Defects of size 
smaller than 10 , mainly due to process-related dirt, are not considered, as they 
will not result in a catastrophic fault. 
3.3.1 Parametric failures 
Defects on metal layers do not only cause catastrophic failure, as often they 
affect the parametric characteristics of the interconnects. If however, the extra 
delay caused by the defect is too high compared to the delay of defect-free lines, 
the fault is treated as catastrophic. The following analysis illustrates how to 
76 
W + S 
f(q) 
4 	  
L 
Figure 3.7: Delay of non-uniform wires. 
extract the sensitivity of metal lines to defects which do not cause catastrophic 
failures, but affect the yield. 
Consider the line arrangement shown in Figure 3.7, of a uniform wire of 
length L and width w neighbouring a general non-uniform wire of length L a 
distance s w away from the uniform wire and width described by a function 
f (q). The total capacitance of these lines is the sum of three terms [Gar79], 
shown in (3.2), where Cp is the parallel plates capacitance, C f is the fringe 
capacitance, and C, is the coupling capacitance, also known as the line to line 
capacitance. 
CTotal Cp Cf Cc. 
	 (3.2) 
In the case of minimally spaced wires, as is the case in FPGA interconnects, 
the coupling capacitance can be approximated by (3.3), where €ox is the electric 
permittivity coefficient of the insulator, and t is the thickness of the wire. 
= 2e„ • t/s. 	 (3.3) 
77 
The non-uniform wire in Figure 3.7 is partitioned into n equal segments 
each of length 5q = f; and approximated as an RC distributed network [GW99]. 
Let segment i be labelled as qi. For segment qi, there exists a coupling 
capacitance between the segment and the neighbouring wire. The coupling 
capacitance of segment qi , labelled C,,, is given by (3.4), where c, is the coupling 
capacitance per unit length. 
C 	w + s — f (q,) .  
ce Sq 	
(3.4) 
The total capacitance of segment i is then given by (3.5), where co is the 
parallel plate capacitance per unit length and c f is the fringing capacitance 
per unit length. 
Ci = (co f (qi ) +c f + 	
cc 
s+w— f (qi)) 
	 (3.5) 
The total segment resistance is given by (3.6), where ro is the resistance 
per unit length. 
rOSq Ri = 
f (T)
.  (3.6) 
The timing analysis for the non-uniform wire is then performed using the 
Elmore delay model [Elm48]. Under the Elmore delay model, the delay 
through a segment is given by the segment's resistance multiplied by the seg-
ment's downstream capacitance. The delay TD,i of segment i is thus given by 
78 
(3.7). 
TD,i = Ri x 
k=0 
And the delay through the non-uniform wire is given by (3.8). 
TD == 	{Ri X 
=0 
As bq--, 0, the closed form expression of (3.8) becomes (3.9). 
TD = ro  fo 	 f(ofoq (co f (u) + c + w + s
c
—
, 
f (u) dudq. 	(3.9) 
(3.9) can then be used to extract the extra delay caused by the defect. The 
extra allowable delay is an input parameter of the yield analysis framework. 
Expressions for the function f (q) for extra material defects fex (q) and missing 
material defects fmis(q) are given in (3.10) and (3.11) respectively. 
w {0<q<z—i 
fe.(q) = y +2 z—i<q<z+i 
w 	z + -. <q< L 
(3.10) 
fmis(q) = d 
w 0<q<z—i 
z—i<q<z+i 
w z±i < q <1' 
(3.11) 
   
In order to calculate the critical area of wires the parameters y and d 
79 
i - 1 
C k • 
i -1 
k=0 
(3.7) 
(3.8) 
from (3.10) and (3.11) are extracted and included in the kernel expression as 
explained in Section 3.3. 
3.3.2 Critical area of FPGA interconnect layers 
Calculating the critical area for the interconnect model presented in Section 3.2 
is a relatively trivial task, since the interconnect layer is modelled as a collection 
of identical and parallel lines. 
For extra material defects, the continuous spectrum of defect size is divided 
into five main regions, four of which are shown in Figure 3.8. The missing 
region is for defects of size less than 2(y — w), which do not cause a fault. 
In region (a) the critical area between two metal lines is split evenly into two 
regions, each growing in both directions. The size of the gap between the two 
separate regions is directly proportional to the size of y. In region (b) the two 
separate regions meet and continue to grow in both directions over the metal 
lines, leading to region (c) where the critical area from adjacent metal lines 
meet and the critical area proceeds to grow in one direction only. Finally, 
in region (d) the whole layer becomes susceptible to faults. The final kernel 
equation is shown in (3.12). 
For missing material defects, the continuous spectrum of defect size is di-
vided into four main groups, as shown in Figure 3.9. In region (a), defects are 
smaller than w — d and hence do not cause catastrophic fault. In region (b) 
the critical area around each metal line grows in both directions according to 
the analysis presented in Section 3.3, leading to region (c) where critical areas 
80 
(a) 2(y-w) < x < s 	 (b)s<x<s+w/2 
(c) s + w/2 < x < W/M - L (d) x > W/M L 
Critical Area 
Increasing 
Critical Area 
Interconnect 
Figure 3.8: Critical area of more complex structures susceptible to extra ma-
terial defects. 
from adjacent metal lines meet and the critical area proceeds to grow in one 
direction only. Finally, in region (d) the whole layer is susceptible to faults. 
The final kernel equation is shown in (3.13). 
3.4 Yield Analysis 
For a non-constant defect density, the probability of finding n defects in a chip 
of critical area Ac, assuming a defect density D, is given by (3.14), where 
f (D), is known as the spatial defect distribution. 
p(n, Ac, D) = f f (D) (Ace-AcipdD. n! (3.14) 
Many studies have been carried out to find out the function best describing 
the spatial defect distribution [C093, Mur64, See68, Sta91]. Some of the most 
common are shown in Table 3.2. 
81 
Kshort (X) = 
0 
2 x M2 x lines x L x (w — y + i) 
M2 x lines x L X VX2 — (y — s) 2  
HxMxL' 
H x W 
x < 2(y — w) 
2(y — w) < x <s 
s<x<s+ tL' 
2w s+1"<x<—
m
—L 
x m — L 
where L' = L + V x2 — (s )2. (3.12) 
Interconnect • 
11111 Critical Area 
Increasing 
Critical Area 
(d) x W/M - L (c) s + 2(w d) < x < W/M - L 
(a) 0 < x < w - d (b) w- d x < s + 2(w - d) 
Figure 3.9: Critical area of more complex structures susceptible to missing 
material defects. 
Table 3.2: Spatial defect distribution f (D). 
Model Function Yield Equation 
Murphy(Triangular) 
f(D) 
.12 
Y 
= (1e-AcD 
ADD 	) 
D 
Murphy(Rectangular) 
f(D) 
1_,211cD 
Y 2Acr) 
D '  
Poisson 
f(D) 
I
y = e-AeD 
D ,  
Seeds 
f(D) ,,,,\,............_______ 
Y 	1  = i+Acp 
D 
Gamma 
f(D) 
Y = (1+ Ac=13 )-a a 
83 
00 4. 
1 
0 
M 2 x lines x II x (x — (w — 2d)) 
HxMx.L' 
H x W 
x < w — d 
w—d<x<s+ 2(w — d) 
s + 2(w — d) < x < ii.T% — L 
x> I — L 
K open (X) — 
ATotal il 
where 	L' = L + -✓\ / x 2 — (w — d) 2. 	 (3.13) 
It now widely accepted that the most suitable function is the Gamma func-
tion [FP92a], as shown in (3.15), where a and Do are known as the clustering 
parameter and average defect density respectively. 
1  f(D) = r(a).BDa-l e 4 ,  
a = 
 D2 o  13  var(D)  
var(D)' 	Do • 
Combining (3.14) and (3.15) results in (3.16). 
(3.15) 
F(a + n) (AcDobar  
Ac, D) 	n!F(a) (1 + AcDo/a)n+' (3.16) 
As the yield is the probability of obtaining defect-free chips, the yield pre-
diction is made using (3.17). 
1 
Y = P(0, 11c, D) 
	
	 (3.17) (1 + AcDo/a)a .  
3.4.1 Manufacturing technology assumptions 
In obtaining the results presented in Section 3.5, the following assumptions are 
made about the FPGA architecture. 
• The biggest device is 1.5in X 1.5in for all technology nodes. 
• All metal connections between logic blocks are regularly shaped (i.e. 
straight and parallel) and regularly distanced. 
85 
• Halving the minimum feature size results in quadrupling the maximum 
array size. 
• Vias and contacts are only present in the regions directly above the switch 
matrices. 
• Lower metal layers are used for shorter, faster lines. 
• Any parametric variation greater than double the designed value is con-
sidered a catastrophic fault. 
• Heterogeneous FPGAs are assumed to have similar interconnect struc-
ture to homogeneous FPGAs. 
These assumptions follow on from the fact that FPGAs are extremely reg-
ular structures and this regularity should reflect on the layout of the metal 
layer. 
Furthermore, the following assumptions are made regarding the manufac-
turing process: 
• Defects are not closely packed; this follows from SIA prediction of cluster 
parameter of 2 
• Defects are square, with side x. 
• Defect size distribution follows an inverse power law shape 1/x3. 
• The random average defect density remains constant for all technology 
nodes (from SIA roadmap). 
86 
• The defect density is not constant over the whole wafer and follows a 
gamma distribution. 
• Width of lines and space between lines have identical size as given by 
the metal line pitch in the SIA roadmap. 
• The die yield is the product of the individual layer yields. 
• Wafers are 12in in diameter. 
Apart from the last two assumptions, all of these assumptions are impossi-
ble to verify unless access to manufacturing data is granted. However they all 
follow on from results of previous published studies [MMR99, SISM99], which 
support the reasoning behind the assumptions. 
3.5 Results 
Using the proposed model and the information provided by the SIA roadmap, 
it is possible to analyze the predicted yield of interconnect layers with different 
characteristics. This allows the prediction of current and future yields, and the 
exploration of potential benefits of different fault tolerant schemes. 
3.5.1 Different metal layers on a single die 
The SIA roadmap classifies interconnect layers into three types: metal 1, inter-
mediate and global, each with different characteristics (Figure 3.3). Figure 3.10 
shows the predicted yields when each of three interconnect layers is considered 
87 
1 
0.99 
0.97 
0.95 
-a 0.93 a) 
5= 
0.91 
50 	 100 
Array Size (M) 
0.89 
0.87 
0.85
0  150 200 
separately for the 90nm technology node. As can be seen, metal 1 layer, which 
is the lowest layer of interconnect, causes higher yield loss than the other lay-
ers. The lowest metal layer is used for short, fast metal interconnects. Being 
smaller and closer to the source and sink, these lines are the most used, and 
also are most susceptible to defects. It follows that a fault tolerant scheme 
targeting these lines exclusively will increase the yield of a wafer consider-
ably. The maximum FPGA array size shown on the graph is converted to 
be 160 x 160M, which is extracted from the size of Xilinx's XC4V200 device 
manufactured in this technology. 
Figure 3.10: Yield comparison of different metal layers for the 90nm technology 
node. 
88 
Predicted maximum array size at 90nm 
Predicted maximum array 
size at 45nm 
Technolog, 
node 
Predicted maximum array 
size at 22nm 
3.5.2 More advanced technology nodes 
The continuous development of wafer fabrication techniques means that inter-
connect characteristics will keep on evolving in the future. Driven by the need 
for higher density, metal lines will get smaller, thinner and, as a consequence, 
more susceptible to spot defects. 
Figure 3.11 shows the predicted yield of the metal 1 layer for the technology 
nodes given in Table 3.1. The graph also highlights the maximum predicted 
array size (M) for each technology node. The yield loss is relatively well con-
tained for the 90nm node. However, drastic losses appear at 45nm and 22nm. 
The yield loss due to metal 1 alone for the maximum array size in 22nm is 
predicted to be over 80%. 
1 
0.9 
0.8 
0.7 
v 0.6 
a) 
>- 
0.5 
0.4 
0.3 
0.2 
0.1 
100 	200 	300 	400 
	
500 
	
600 
Array size (M) 
Figure 3.11: Metal 1 interconnect layer yield comparison for different technol-
ogy nodes. 
89 
	 22nm 
— — — 45nm 
	90nm 
maximum predicted array 
size at 90nm 
maximum predicted array 
size at 45nm 
maximum predicted array 
size at 22nm 
Figure 3.12 shows the total expected yield loss due to interconnect defects 
on all layers. Yield of around 40% for the biggest devices is estimated for 
the 90nm process technology. This value will certainly decrease if defects in 
the logic layers are considered. Predicted yield due to all interconnect defects 
for the 22nm node is close to 0%. Some form of fault tolerant scheme must 
therefore be introduced in order to produce any usable devices. 
0.8 
0.6 
V a, 57- 
0.4 
0.2 
100 	200 	300 
	
400 
	
500 	600 
Array size (M) 
Figure 3.12: Overall die yield due to interconnect defects. 
While for the 90nm technology node most dies out of a single wafer are ex-
pected to only exhibit one or no faults, as geometries are reduced for the 45nm 
and 22nm technology nodes, the expected occurrences will vary considerably. 
Figure 3.13 depicts the expected occurrences of devices exhibiting multiple 
faults. It can be observed that achieving high yields at the 90nm technology 
node implies tolerating at most 2 faults; at 45, however, achieving yields of 
90 
around 80% require a fault tolerance scheme able to support at least 4 faults. 
At the 22nm technology node, finally, support of over 10 faults is required in 
order to improve yields significantly. 
c u 100 U  C 
80 U 
0 
60 
0) 
40 
a) 
20 
;-2 
a. 
       
      
      
      
   
n  
7 	8 	9 	10 
   
  
0 	1 	2 	3 	4 	5 	6 11+ 
 
Number of faults per die 
90nm 	MN 45nm 	1=22nm 
-4-90nm cumulative -M--45nm cumulative -.-22nm cumulative  
Figure 3.13: Relative occurrence of defects in a die for current and future 
technology nodes. 
3.6 Summary 
This chapter has introduced some of the fundaments of yield analysis. Using 
well known techniques applied to an FPGA layout model, it has been possible 
to analyze the effects of random spot defects on the wafer yield of FPGA 
devices. The model suggests that using current technology over 50% of large 
devices from a wafer are lost due to interconnect faults. 
Using predictions formulated by the SIA roadmap it has further been pos-
sible to estimate future wafer yields as technology advances to smaller geome-
tries. Predictions estimate that it will be impossible for FPGA manufacturers 
91 
to continue current trends increasing array size at every technology node. It 
is estimated that even at 45nm, yields of large devices will be close to 0. 
These results have not been verified against actual manufacturing data, as 
these are impossible to obtain. Dropping yields are however a known problem 
in the semiconductor industry [RA04] and this work tries to offer a first step 
to dealing with the problem in the FPGA industry. The results suggest that 
FPGA yields will become a serious concern, and that actions need to be taken 
to ensure the advancing of FPGA devices. 
Fault tolerance techniques exist which can ensure that a device can function 
under the presence of a fault. The next chapter explores refinements which 
can be applied to the yield analysis framework presented in order to evaluate 
the benefits of such fault tolerance schemes. 
92 
Chapter 4 
Exploiting FPGA properties to 
improve wafer yields 
4.1 Introduction 
The analysis presented in Chapter 3 estimates that the yield of large FPGA 
devices is set to drop significantly as device geometries are reduced. Even at 
90nm a large FPGA can lose up to 60% of its wafer yield due to interconnect 
faults alone. 
In Chapter 2 an overview of the techniques available to improve yield has 
been offered; very few, however, quantify the extent of the improvements, and 
those which do offer a numerical estimate use very simple yield models which 
often do not reflect real manufacturing scenarios. The work presented in this 
chapter builds on the foundations of the yield analysis framework presented in 
Chapter 3 to provide a comparison on the yield enhancements offered by some 
93 
well known fault tolerance schemes. 
The aim of the work presented in this chapter are: 
• To provide a yield analysis framework to easily evaluate the benefits of 
fault tolerance schemes on wafer yields; 
• to evaluate the yield benefits of some well known fault tolerance schemes; 
• to understand the inherent benefits of the only fault tolerance scheme 
known to be utilized in commercial FPGAs [Altb]; 
• to explore the yield benefits of using device reconfiguration for fault 
tolerance. 
This chapter is organized as follows: Section 4.2 and 4.3 provide an account 
of the extensions to the yield analysis framework necessary to study yield 
enhancements. Section 4.4 introduces the fault tolerance schemes chosen for 
the comparison, while the results of the comparison are presented in Section 
4.5. Section 4.6 presents a detailed analysis of the only known commercial fault 
tolerance scheme. Section 4.7 further extends the yield analysis framework by 
looking at the total number of working dies from a single wafer, and Section 4.8 
offers a brief analysis of fault tolerant reconfiguration compared to hardware 
redundancy. Finally, Section 4.9 summarizes the chapter. 
94 
4.2 Yield of chips with redundancy 
For a chip able to function under the presence of functional faults, called an 
n-tolerant chip, where n is the maximum number of tolerable faults, the total 
yield will be made up of chips exhibiting 0, 1, 2, 3..., n faults. The total yield 
for an n-tolerant chip is given by (4.1): 
Yn—tolerant = E p(i, Ac, D). 	 (4.1) 
i=0 
If we assume that the majority of chips will have either 0 or 1 fault, the 
resulting equation for the yield of chips exhibiting one fatal fault is thus given 
by (4.2) 
AcDo  
P( 1, Ac, D) (4.2) (1 + AcDo/a)a+1.  
And the yield of a chip able to function in the presence of one fault is given 
by (4.3) 
1 	 aAcD0 
Y1—tolerant =  (1 + AcDo/a)a 
a 
 +  a + AcDoi • 
(4.3) 
4.3 Repairable and non-repairable areas 
A typical FPGA usually has a large number of identical array elements that 
serve to implement logic functions (CLBs), and a small amount of program-
ming and peripheral circuits (I0Bs). If either the peripheral or programming 
95 
circuit fail, the entire chip will fail. If however, an array element fails, the fault 
could be tolerated using fault tolerance techniques. 
It is therefore important, when analyzing the yield improvements of a fault 
tolerance scheme, to consider the proportion of repairable areas and non-
repairable ones. The total area of the device is given by the sum of the array 
area, and the non-repairable areas, such as IOBs. 
The total yield is given by the product of the individual yields, calculated 
using the respective critical areas of the repairable (A0) and non-repairable 
regions (Anr ). The yield of a fault tolerant device will be the product of 
the defect free non-repairable area and the fault-tolerant repairable region, as 
shown in (4.4). 
n 
Yn—tolerant = P(0 ) A07 D) X EP(il Anr , D). 	 (4.4) 
i=0 
The amount of non-repairable area is also a variable in our model. For the 
purpose of our analysis the non-repairable regions of the FPGA is assumed to 
be 10% of the total die area. Furthermore it is assumed that those regions are 
manufactured using larger geometries; the yields are calculated accordingly. 
4.4 Yield improvement schemes 
For the purpose of yield enhancement, some of the best known fault tolerance 
scheme for FPGAs have been analyzed. This section provides a brief overview 
of the schemes chosen. The schemes have been studied for application to a 
96 
90nm process and to only tolerate a single fault. While this scenario is per-
haps conservative, the schemes taken into consideration were only intended to 
deal with such eventuality. The yield analysis presented therefore proves the 
validity of those schemes only under these specific circumstances and high-
lights the need for a fault tolerance scheme to improve on the shortfalls of the 
techniques presented. 
4.4.1 Redundant row 
Hardware redundancy for FPGAs was first proposed by Hatori et al [HSN+93]. 
The authors proposed a method to introduce a spare row or column in the array 
without affecting the device performance. The swap, to be performed at the 
factory, would eliminate the whole row where the defect is present, by means 
of blowing a fuse. The main contribution of this work was the placement of 
the row selectors after the row decoders. The fault tolerance would then only 
require changes to the row selectors. The routing segments are also extended 
to allow full routability when a row is swapped. 
This technique is used by Altera, albeit with modifications to suit the 
slightly different architecture of its devices [RMLP, NZJ]. It is claimed that 
redundancy has a major effect on the manufacturing yields of FPGAs [Altb]. 
Further analysis of the benefits of hardware redundancy is given in Section 4.6. 
97 
4.4.2 Spare wires 
The method proposed in [HD98] allows up to one faulty segment in the channel 
portion along each side of every cell to be tolerated. The scheme is based on 
the addition of a spare segment in each channel, which is used to substitute 
any faulty segment. The swap, to be performed at the factory, makes use of 
extra pass transistors to redirect incoming signals to an adjacent wire. All 
lines are re-mapped until the spare segment is reached. 
4.4.3 Array shifting 
Doumar [DKI99] proposed a fault tolerance method based on array shifting, 
where the user data is shifted on-chip so that defects are avoided. The work 
presents two different shifting methods (king-shift and horse allocation) to 
shift the whole array and avoid the fault. The scheme requires some cells to be 
left unused so that the shifting algorithm can re-map the design and leave the 
faulty cell as the unused one. Maximum usage is defined as 89% for the king 
shifting approach and 80% for the horse allocation. For the purpose of our 
analysis, we assume that a bigger array is manufactured, in order to guarantee 
a fixed size array usage. The results obtained are very similar for both shifting 
schemes; due to space restrictions and for clarity purposes, only the results for 
the king shifting approach are presented. 
98 
4.5 	Improving yield with fault tolerance schemes 
Using the framework presented, it is possible to analyze the impact of a fault 
tolerance scheme on the yield of FPGA dies. Figure 4.1 shows the potential 
yield improvements of using the three hardware redundancy methods taken 
into account. The scheme that offers the most improvement is the spare wires 
scheme proposed in [HD98]. The extra row scheme has, for smaller arrays, 
a comparatively larger overhead, and therefore does not offer as much im-
provements as the other scheme. As array size grows, however, the overhead 
reduces, and for large devices the advantages of this scheme become obvious. 
The shifting method, on the other hand, proves beneficial for smaller devices, 
but is quickly surpassed by the other schemes as the array size grows, due to 
the increasing area overhead necessary to offer full usage of the intended array. 
At maximum array size, when functional yield is expected to be around 
40%, the extra grid scheme can improve yields to over 80%, proving to be the 
most beneficial. Extra row redundancy can increase yields up to 76%, while 
the shifting method only achieves an improvement to 71%. 
4.6 Further analysis of redundant row 
Introducing a spare row in the FPGA array has proven to be a successful yield 
enhancement method for the latest Altera devices [Altb]. In this section we 
provide an in-depth analysis of the benefits on manufacturing yield offered by 
this scheme. While the exact scheme used by Altera targets faults in the logic 
99 
0.9 
0.8 
-a 
0.7 
— — shifting 
	 extra row 
— — — extra grid 
0.6 
0.5 
0.4 
20 40 60 80 100 120 140 160 180 200 
Array Size (M) 
Figure 4.1: Yield comparison of three well known fault tolerance schemes at 
the 90nm technology node 
and local routing only, it is assumed here that the spare-row technique applied 
to island style FPGA can also support faults in the general routing network. 
Since the SIA Roadmap indicates that spot defects are generally not clus-
tered together (Cluster parameter = 2), we assume the worst case scenario 
that each defect requires a separate redundant row or column to be substi-
tuted. This assumption introduces other issues such as fault detection and 
location, timing effects and method of substitution. These issues are however 
outside the scope of this study. 
Adding hardware redundancy can improve the yield but comes at an in-
crease in silicon area. Figure 4.2 shows how adding extra rows can improve 
yield. The number of faults that can be tolerated increases with additional 
redundant circuits. It is possible that any given number of faults will require 
a small number of extra rows or columns, if they happen to be close together 
100 
  
 
— — 22nm 
	45nm 
— — — 90nm 
 
   
5 	10 	15 	20 	25 	30 	35 	40 
Percentage Area Overhead 
0.9 
0.8 
0.7 
0.6 
Figure 4.2: Yield improvement derived from extra row redundancy for the 
maximum predicted array sizes 
on the silicon. However, under the worst case scenario when defects are widely 
scattered, 77 redundant rows or columns are required to tolerate n faults. At 
the 90nm technology node, most devices will only exhibit a limited number of 
faults, which implies that adding more than a certain amount of extra area 
will not result in any further improvement. This is shown in Figure 4.2; the 
yield rises rapidly to full scale for relatively small amount of extra silicon area. 
At 22nm, each device will exhibit multiple faults, and it can be observed how 
varying amounts of added silicon area can produce different amounts of yield 
improvements, up to the full wafer yield. 
When dealing with multiple faults in very large devices, the possibility of 
localized redundancy has to be explored. This is achieved by dividing the 
array in partitions and providing redundancy at smaller scale. An example 
101 
no partition, 2 tolerable faults 
spares  
4 partitions, 4 tolerable faults 
IMMM=MMIIIMMI 
FP GA 
array 
faulty 
Figure 4.3: Yield enhancements derived from partitioning the device with 
localized redundancy 
of this is shown in Figure 4.3. This diagram shows how the same amount of 
redundancy (2 extra rows) can tolerate more than just 2 faults if localized 
redundancy is used. Figure 4.4 depicts this argument graphically. The device 
taken into account in this graph is the biggest device to be manufactured at 
22nm. At around 10% area overhead the yield can be increased to 60% with 
redundancy but no partitioning. However, due to the scattered occurrence of 
defects on the wafer, partitioning the device into 16 partitions can result in 
yield of nearly 85% for the same area overhead. 
4.7 Working devices 
Another way to view the advantages of adding redundancy to improve yield 
is to calculate the total number of usable dies that result from the fabrication 
of a single wafer. The number of total dies that can fit in a wafer is obtained 
102 
0.9 
0.8 
	no partition 
	 4 partitions 
— — — 36 partitions 
— — 16 partitions 
9 10 11 
0.7 
0.6 
I,  0.5 >- 
0.4 
0.3 
0.2 
0.1 
0 
3 	4 	5 	6 	7 	8 
Percentage area overhead 
Figure 4.4: Improving efficiency of redundancy scheme by partitioning the 
device can result in greater yield enhancements for the 22nm technology node 
using (4.5), from [FP92a],where Re is the wafer radius, and H and W are the 
physical dimensions of the chip. 
71..R
W  
2 R. 
N(H,W) = 	6 CI " H 
(4.5) 
The number of good dies available from a wafer then becomes the product 
of the yield and the total number of fabricated dies. 
Figure 4.5 shows how the number of good dies varies against increasing 
array size for the 90nm technology node by adding one extra row to tolerate 
one fault. Note how, despite yields increasing from 40% up to 76% (Figure 4.1), 
the total number of working dies for the maximum array size grows from 14 
to 20; this is due to the smaller number of dies being fitted on the wafer as 
a result of the increased area due to extra rows. Figure 4.6 shows how the 
103 
130 140 180 200 170 190 150 160 
M 
_ _ _ with hardware 
redudancy 
	without hardware 
redundancy 
50 
a 
35 
o_ 
t+,) 30 
0  
0 
o) 25 
0 
o 20 
E 
z 
15 
10 
5 
120 
45 
40 
number of good dies per wafer varies as more faults are to be tolerated for 
future technology nodes. It is important to note how the 45nm curve reaches 
a peak and then drops as extra area is added to tolerate more faults. This is 
due to the little yield improvement obtained by adding a significant amount 
of extra area. The drop does not however imply that full yield is achieved. 
Simply, the advantage of being able to tolerate more faults comes at the price 
of fewer dies fitting on the wafer. 
Figure 4.5: Number of good dies per 12 inch wafer as a function of array size for 
90nm technology with and without extra row redundancy and fault tolerance 
104 
- — — 22nm 
45nm 
- - - - ------------- 
/ 
/ 
-I 30 
3 
'6 
0- 25 0 0 
'8 
la 20 
0 0) 
6,_ 15 
0 .0 
E 
Z m 10 
40 
35 
0 
1 
5 
2 	3 	4 	5 	6 	7 
Number of tolerable faults 
8 9 10 
Figure 4.6: Number of good dies for 45 nm and 22nm technology nodes 
4.8 Improving yield with fault tolerant recon-
figuration 
FPGAs are inherently redundant. A typical design mapped onto a FPGA will, 
in general, not use all the resources on the device. It is therefore reasonable 
to consider the case where, employing a built-in self-test (BIST) strategy and 
suitable place-and-route algorithms, a method could be found to exploit the 
unused interconnect resources to replace any defects. For example, if an open 
fault exists on an interconnect line, it may be replaced by an unused track 
nearby. Again many issues remain to be resolved before such a scheme is viable, 
but these are outside the scope of this study. For example, how can a defect 
be located and a suitable free resource that maintain timing integrity of the 
design be identified? However, it remains interesting to compare such a fault 
tolerant approach with schemes using redundant circuits from the perspective 
105 
5 6 
	reconfiguration 
— — — hardware redundancy 
	 difference 
0.6 
-0 
0.5 
>- 
0.4 
2 	3 	4 
Number of tolerable faults 
0.3 
0.2 
0.1 
0
0 1 
1 
0.9 
0.8 
0.7 
of yield improvement. 
Figure 4.7: Yield enhancements of fault reconfiguration and hardware redun-
dancy at 45nm 
Figure 4.7 shows the yield of a large device (maximum array) obtained using 
a hardware redundancy scheme (extra row) and a fault tolerance scheme based 
on device reconfiguration at 45nm. A fault tolerance scheme does not incur any 
area overhead, resulting in smaller devices and smaller critical area. Therefore 
the yield will always be larger when compared to a hardware redundancy 
scheme. The advantage of the reconfiguration-based fault tolerance scheme is 
clear for the smaller devices, where a redundant scheme incurs proportionally 
a higher area overhead. In contrast, a fault tolerance scheme based on device 
reconfiguration costs no extra silicon area but comes at great complexity costs. 
106 
4.9 Summary 
This chapter has presented the enhancements made to the yield analysis frame-
work introduced in Chapter 3 to analyze the yield of chips able to function 
under the presence of functional faults. 
The framework has then been applied to study the yield benefits of three 
well known fault tolerance schemes for FPGAs, including the only scheme 
known to have commercial usage. 
It has been shown that great yield enhancements can be achieved for large 
FPGA devices using fault tolerance schemes at 90nm. Two of the fault tol-
erance schemes taken into account have been designed with parameters not 
suitable for use beyond 90nm, e.g. support for a single fault; the other tech-
nique evaluated in this study can be extended for future technology nodes, 
and, being the only one to have found commercial use, has been thoroughly 
analyzed in this study. 
The results have shown that, albeit concerns exist regarding the manufac-
turing yield of large FPGAs, solutions are present to deal with the issue and 
improve yield significantly. Every scheme has been analyzed exclusively from 
a manufacturing yield point of view. Other issues exist regarding for example 
fault location, timing integrity, and swap mechanism, but these are outside the 
scope of this study. 
Timing integrity is especially a very delicate issue and hardware redun-
dancy is known to cause timing degradations [HD98]. The requirements for 
107 
multiple fault support and timing integrity have thus brought the development 
of the fault tolerance scheme presented in Chapter 7. 
108 
Chapter 5 
Yield Enhancements of 
Design-Specific FPGA 
5.1 Introduction 
With ever increasing ASIC manufacturing costs, companies are looking for 
alternatives to ensure that their small to medium volume devices can be man-
ufactured at reasonable costs. Although FPGAs have long been used for pro-
totyping designs, their higher unit cost when compared with ASIC often dis-
courages their use in production. Various attempts have been made to reduce 
the part-cost of designs that originate from FPGA prototypes [A1t05]. One 
such effort is baSed on the idea of restricting an FPGA part to one specific de-
sign, an approach designated Design-Specific FPGA in this study. Xilinx's 
Easypath [Xi105a] is one example. The cost saving of this approach is mostly 
due to two factors: 
109 
• It reduces the cost of manufacturing test because only limited function-
ality appropriate to the specific design is required; 
• devices may contain defects as long as faulty resources are not used by 
the design, thus increasing the number of working devices that can be 
obtained from a wafer. 
The device therefore cannot be guaranteed to function under all circum-
stances, and is expected to be used with only one bitstream, losing the pos-
sibility to reprogram the FPGA. This study investigates the potential yield 
enhancement of the Design-Specific FPGA approach for current and future 
technology nodes. Only catastrophic interconnect defects are being considered 
in this work. 
The impact of interconnect defects on yield of current and future FPGA de-
vices was first reported in Chapter 3. It was shown that the yield expected for 
devices manufactured using 22nm technology can be very low for large devices, 
suggesting that some form of fault tolerant scheme is likely to be required. 
The yield enhancement offered by three different fault tolerant schemes was 
reported in Chapter 4. The issue of yield enhancement gained by introducing 
redundancy has also been studied in a recent work [YL05, CLL+], suggesting 
that the problem is being addressed by manufacturers and researchers. As far 
as the author is aware, no published work so far addresses the issue of yield 
enhancement of Design-Specific FPGAs. The novel contributions of this work 
are: 
110 
• to develop a probabilistic model that a given design would map to a 
FPGA device using the Design-Specific approach without the need for 
redundancy or rerouting; 
• to combine this probabilistic model with the interconnect yield model 
developed in Chapter 3 so that the yield enhancement of Design-Specific 
FPGAs can be quantified for current and future technology nodes; 
• to investigate the probability that more than one design may be mapped 
to a given device. 
This chapter is structured as follows. Sections 5.2 introduces the concept 
of Design-Specific FPGAs and explains some of the issues related to this ap-
proach. In Section 5.3, the problem of yield prediction for Design-Specific 
FPGAs is formulated. Section 5.4 presents the results of our analysis and 
their implications, and finally Section 5.5 concludes the chapter and suggests 
areas for future research on the subject. 
5.2 Design-Specific FPGAs 
With ever increasing development and manufacturing costs, as shown in Fig-
ure 5.1 [Alt05], small to medium companies have begun to look for alternatives 
to ASICs. While for small sized designs FPGAs and CPLDs can still provide 
good returns, for big designs FPGAs are often too expensive. The introduction 
of Structured ASICs has filled a gap in the market, allowing the fast and cheap 
111 
development of small to medium volume devices. 
45 
40 - 
35- 0 
O 
U 
30 
11 
0 
0.18 pm 	0.15 pm 0.13 pm 	90 nm 	65 nm 
20 - 
0 I- 
15. 
Design / 
Verification 
& Layout 
Software 
rpst & Pr odu c t 
	Masks & Wafers 
45 nm 
Figure 5.1: Split of costs of IC design and manufacturing [A1t05] 
Structured ASICs are usually made of an array of standard logic cells, con-
nected by custom interconnect layers [WT04]. By using a common logic layer 
and only a limited number of custom metal layers, non-recurring engineering 
(NRE) costs are greatly reduced, allowing small to medium volumes to be 
produced at much lower costs. Furthermore, the use of prefabricated common 
logic layers allows for faster turnaround time, which is becoming an important 
factor as the lifespan of devices reduces [Mak02]. 
FPGAs manufacturers have recently introduced their own solutions to com-
pete with Structured ASICs, which threaten a significant portion of the FPGA 
market, as shown in Figure 5.2 [Xi105a]. In particular, FPGA manufac-
turers have concentrated on efficient migration from the prototyping stage, 
which is often carried out using FPGAs, to volume manufacturing. While 
some manufacturers have implemented their own Structured ASIC architec- 
112 
ASICs 
Structured 
ASICs 
Design- 
Specific 
FPGAs 
ture coupled with a CAD flow to migrate FPGA bitstreams to the Structured 
ASIC [A1t05], others have identified advantages in using partially tested FPGA 
devices as non-reprogrammable parts and only guaranteed to work with given 
bitstreams [Xi105a]. It is the latter approach which is the subject of this work. 
FPGAs 
Volume 
Figure 5.2: Cost vs Volume for different ASIC and FPGA Methodologies 
[Xi105] 
When customers begin their prototyping stage, they are offered a defect-free 
device to implement their design. Once the design is finalized, the bitstream 
is sent to the FPGA manufacturer, who prepares custom test vectors to test 
the devices. The chips which pass the tests are then guaranteed to work only 
with the provided bitstream, and only minor tweaks are allowed [Xi105a]. This 
method assures full compatibility with the prototype FPGA, and the timing 
requirements are guaranteed to be met, at lower overall device cost. 
113 
5.3 Problem formulation 
This section describes the analysis carried out to examine the yield of Design-
Specific FPGAs. Section 5.3.1 describes the main assumptions used and the 
analysis of the probability that a given design would map to a device exhibiting 
catastrophic interconnect faults, while Section 5.3.2 describes how the work is 
extended to provide an analysis of multiple design mappings. 
5.3.1 Probability of Successful Mapping 
Considering a design which needs to map n signals on a layer made of k inter-
connects, we aim to find the probability that all signals are assigned to faultless 
interconnects, given that faulty interconnects exist on the layer. k, the total 
number of interconnects on any given layer, is a parameter of our interconnect 
model, calculated assuming minimal line spacing and no free silicon left on the 
device. 
The probability that a signal is assigned to an interconnect j at location x, y 
on the layer is modelled by a symmetric distribution. It has been proved that, 
due to the need to minimize speed and density, place and route tools tend 
to create congestions towards the middle of the device [Bet98, BR97]. The 
results shown are obtained using a normalized bivariate normal distribution 
h, given in (5.1), where x and y are the coordinates of interconnect j. This 
assumes that the corners of the die are less likely to be utilized and higher 
utilization is expected in the middle of the device. This function is shown in 
114 
Interconnect 
usage 
probability 
Figure 5.3: fi (x, y), a bivariate normal distribution, indicates higher conges-
tions towards the middle of the device. 
Figure 5.3. Results obtained from other symmetrical distributions suggest that 
the shape of the function fi does not influence the results as long as symmetry 
is maintained. 
-1(+-2-2,) 
= 	e 	 (5.1) 27rax 
The probability that an interconnect j at location x, y exhibits a catas-
trophic fault, g j , is modelled as a bivariate gamma distribution, expressed in 
Equation 5.2, where D is the wafer defect density and a is the clustering fac-
tor, as introduced in Section 3.4 [FP92a]. The bivariate gamma distribution 
models the spatial distribution of defects, as explained in Section 3.4, and is 
used here to determine the distance between faults within a die. 
115 
F(S) = 	 i 
r-rri  
i=1 
ll
n 	v.2)-1 _F 
p=1 1 - Z--/q=1 q 
(5.4) 
gj = r ( x 	y )[ vari Jo/31a  
e var(D) 	 (5.2) 
For any set S of interconnects the probability that S is a set of usable lines 
is given by the function G(S) in (5.3). 
	
G(S) = fJgi 11(1 — gi)• 	 (5.3) 
j-,ES jES 
The probability that all signals are assigned into set S is given by (5.4). 
1 	DDO  
And the overall probability that a successful signal assignment is achieved 
over a single metal layer is given by (5.5). 
P(success) = 	G(S)F(S). 	 (5.5) 
The probability that a successful signal assignment is made over all layers 
is the product of the individual layer probabilities. 
The final part of the analysis is concerned with interconnect utilization. 
Interconnect utilization is independent of logic utilization, and is often depen-
dent on the nature of the design as opposed to its size. For example, the 
requirements for memory and buses in a design is likely to affect the routing 
utilization independent of the logic design underneath. 
The aim of this part of the analysis is to quantify the number of active 
116 
metal lines on any given metal layer. Interconnect of different lengths are 
likely to be manufactured on different metal layers, using different geometries. 
Larger geometries are less susceptible to defects, meaning that higher layer 
yields will be achieved as shown in Section 3.5.1. The scope of this analysis 
is to quantify the extent to which lines built on layers with higher yields are 
used with respect to interconnects built on other layers. For the analysis, we 
assume that longer lines will be built on higher layers and larger geometries. 
To quantify interconnect utilization, an FPGA architecture which resem-
bles that of commercial FPGAs has been prepared for VPR, a tool which 
provides full reports on interconnect usage for each design [BR97]. The ar-
chitecture is modelled in VPR using the parameters shown in Table 5.1. The 
full suite of MCNC benchmarks [Yan91] has then been placed and routed, 
to acquire data regarding usage of each interconnect type. The benchmark 
circuits were placed and routed using the smallest possible array that each 
design would fit so that utilization figures across all designs are normalized. 
The interconnect utilization results is summarized in Table 5.2. 
Table 5.1: Architecture used for Place and Route analysis. The parameters Fc 
and Fs indicate the connection block and switch block population respectively 
Line length Segment Frequency Fc Fs 
1 0.33 1 1 
4 0.25 1 1 
8 0.25 0.33 1 
long 0.17 0.5 1 
channel width - 24 
The interconnect utilization ratio is, by definition, the ratio n/k, thereby 
117 
Table 5.2: Interconnect utilization of MCNC benchmarks placed and routed 
using a FPGA architecture which resembles commercial FPGAs 
Design Line length percentage utilization 	Average 
1 4 8 long 
ex5p 0.337 0.806 0.692 0.827 0.666 
tseng 0.069 0.669 0.462 0.676 0.469 
apex4 0.283 0.793 0.682 0.78 0.46 
misex3 0.25 0.797 0.661 0.833 0.635 
alu4 0.155 0.761 0.69 0.72 0.581 
diffeq 0.0827 0.715 0.57 0.794 0.54 
dsip 0.0382 0.506 0.395 0.257 0.299 
seq 0.266 0.804 0.737 0.785 0.648 
apex2 0.297 0.813 0.738 0.847 0.674 
des 0.0673 0.454 0.374 0.133 0.257 
bigkey 0.0543 0.57 0.436 0.214 0.319 
s298 0.0987 0.771 0.681 0.756 0.577 
spla 0.435 0.826 0.771 0.938 0.743 
frsic 0.305 0.794 0.74 0.945 0.696 
elliptic 0.178 0.762 0.701 0.907 0.637 
ex1010 0.255 0.834 0.765 0.625 0.62 
s38417 0.0898 0.737 0.631 0.355 0.453 
clma 0.392 0.833 0.799 0.903 0.732 
Average 0.169 0.736 0.640 0.633 0.545 
Relative usage 7.8% 33.8% 29.3% 29.1% 
indicating how many routing resources are being used by a given device. As-
suming that each line type is manufactured on the FPGA using two metal 
layers, one for the vertical interconnects and one for the horizontal ones, the 
probability that all signals are mapped on the layer using the faultless intercon-
nects can be found using Equation (5.5). The right most column in Table 5.2 
indicates the weighted average interconnect utilization of each design; this fig-
ure offers a measure of the active area on the die, which has been used to 
categorize designs into high and low utilization. 
Finally, the last row in Table 5.2 summarizes the relative interconnect usage 
118 
for each interconnect type. This has been used to calculate which layers are 
more likely to be more populated, and allow analysis to be carried out using 
overall interconnect utilization as opposed to individual layer utilization. 
The successful mapping probability indicates the proportion of devices that 
could be used with a specific design even when they may exhibit interconnect 
faults . The overall yield of a Design-Specific FPGA for a given design is then 
calculated as the sum of the functional (i.e. fault-free) yield and the fraction 
of usable devices exhibiting functional faults. 
5.3.2 Multiple bitstreams 
If multiple designs are to be mapped to a device, the correlation of the designs 
will affect the probability of successful mappings. If the designs are identical, 
the overall success probability is obviously the same as the individual design 
success probability. If, on the other hand, the designs are very different, the 
overall success probability is likely to be significantly reduced. It is therefore 
important to quantify the design correlation when analyzing the overall success 
probability. 
In order to determine the correlation between two designs, we calculate the 
Degree of Similarity, a measure of how many resources are shared between the 
designs. A value of 1 for the Degree of Similarity means that the designs are 
identical, while a value of 0 means that no common resources exist between 
the designs. 
For the case of two designs being mapped to a device, the design netlists 
119 
are placed and routed using VPR. A comparison of the interconnect resources 
used from the routing files then yields the Degree of Similarity between the 
designs. The technique can easily be extended to analyze multiple bitstream 
mappings. 
5.4 Results 
The placed and routed results from VPR (Table 5.2) were used to compute the 
probability that a given design can be mapped onto a FPGA that may have 
interconnect defects. The MCNC benchmark circuits were divided into three 
categories: those with high interconnect utilization (such as spla), those with 
low interconnect utilization (such as des) and those with average utilization. 
Using (3.16) and (5.5) we predicted the interconnect yield for conventional and 
design-specific FPGAs as a function of the complexity of the device as shown 
in Figure 5.4. The solid line shows the predicted functional yield (i.e. yield of 
defect-free devices) due to interconnect defects for devices built using a 90nm 
process. The dotted lines show the expected yield with design-specific FPGAs 
for different interconnect utilizations. Assuming that the largest array size 
built using this technology to be 160 x 160 CLBs, the functional yield is around 
40%. According to these results, limiting FPGAs to a specific design can 
improve the yield by 20% to 35% depending on the utilization of interconnect 
resources. Since Design-Specific FPGAs can tolerate device defects as long as 
the faulty interconnects are not used, such improvement in yield is expected. 
120 
	 average case 
— — low utilization 
I— — — high utilization 
	 functional yield 
maximum array size 
1 
0.9 
0.8 
0.7 
-0 0.6 
>-  0.5 
0.4 
0.3 
0.2 
0.1 
50 	100 	150 
	
200 
	
250 
Array size (M) 
Figure 5.4: Yield of Structured FPGAs vs Array size (M x M) for devices 
built using a 90nm process. An average yield gain of 25% is expected over the 
functional yield, with peaks of 35%. 
Furthermore, the probability of a defective interconnect remains unused for a 
low utilization design is higher and therefore yield gain is higher as expected. 
As manufacturing technology moves to new technology nodes and smaller 
geometries, it is expected that larger devices will exhibit multiple functional 
faults. Figure 5.5 shows the proportion of devices with different number of 
faults as a function of device size for the 22nm process. The solid line indi-
cates the expected fraction of devices which are defect-free. This drops to near 
zero for devices larger than 400 x 400 CLBs. The dotted lines show the ex-
pected percentage of devices with different number of defects. Figure 5.6 shows 
the probability of successfully finding a match between a design of different 
interconnect utilization and an FPGA with one to four defects. As expected, 
121 
Pe
rc
en
ta
ge
  o
f d
ev
ic
es
  
100 
90 
80 
70 
60 
50 
40 
30 
20 
10 
— — — - 2 faults 
	 4 faults 
— — — 6 faults 
	 fault free 
100 	200 	300 	400 	500 
Array size (arbitrary units) 
Figure 5.5: Device yields for 22nm process. Functional yields for larger devices 
are expected to be close to 0%. The overall yield is likely to consist of devices 
exhibiting multiple faults 
the probability of a match decreases with increasing interconnect utilization. 
It is also harder to find a match if the device exhibits higher number of defects. 
In fact for devices with four or more defects, the probability for a match falls 
below 10% for interconnect utilization as low as 0.3. 
Figure 5.7 shows the maximum achievable interconnect utilization for dif-
ferent target yields in Design-Specific FPGA manufactured in 90nm, 45nm and 
22nm processes. For this result, we assume that the largest arrays are used for 
all technologies (160x160 for 90nm, 300x300 for 45nm and 550x550 for 22nm). 
This result suggests that for the 90nm process, if the target yield is 50%, the 
maximum interconnect utilization can be almost 0.6. It also suggests that the 
Design-Specific FPGA approach does not really help the large 22nm devices. 
122 
2 0.7 as 
E 
F:3) 0.6 
Cl) 
0  0.5 
— 0.3 o_ 
0.2 
0.1 
0.4 
.0 
CD 
0.9 
0.8 
01
0  
	 1 fault 
— — — - 2 faults 
	 3 faults 
— — — 4 faults 
0.2 	0.3 	0.4 	0.5 	0.6 	0.7 
Interconnect utilization 
0.8 
	
0.9 
Figure 5.6: Probability of finding a design match for Design-Specific FPGA vs 
interconnect utilization. 
Even for low interconnect utilization (say 0.3), the achievable yield is as low 
as 10%. 
If multiple designs are to be mapped to the same device, the yield also de-
pends on the Degree of Similarity between the designs. As shown in Figure 5.8, 
yields of large devices built on a 90nm process rapidly decrease as the number 
of common interconnect resources between two designs decreases. There exists 
a region where no yield is expected if two very dissimilar designs, both with 
high utilization, are to be used. This is due to the high number of faultless 
resources needed in order to map the two designs. 
Figure 5.9 shows how yields would compare for two designs to be mapped 
on an FPGA built using a 90nm process. The solid line represents the refer-
ence functional yield. The graph shows the yields of two designs exhibiting 
123 
0.2 
0.2 0.2 
0.4 0.4 
0.6 0.6 
0.8 
0.6 
0 
>- 0.4 
0.9 - 
0.8 - 
.c5 0.7 
145 
0.6 
0 2 0.5 
0 
W., 0.4 
0.3 
0.2 
0.1 
22 nm 
45 nm 
90 nm 
20 	40 	60 	80 	100 
Target yield (percent) 
Figure 5.7: Maximum utilization vs target yield in Design-Specific FPGA for 
90nm and 22nm processes 
Interconnect Utilization 	0.8 	 0.8 	Degree of Similarity 
Figure 5.8: Mapping two bitstreams to a large device manufactured on a 90nm 
process 
124 
— — high & low utilization 
	 high utilization design 
— — — low utilization design 
	 functional maximum 
array size 
1 
0.9 
0.8 
0.7 
-0 0.6 
7) 
>- 0.5 
0.4 
0.3 
0.2 
0.1 
50 	100 	150 
	
200 
	
250 
Array size (M) 
Figure 5.9: An example of multiple bitstream mappings on 90nm devices 
interconnect utilization characteristics similar to two designs from the MCNC 
benchmark suite apex4 (low utilization) and misex3 (high utilization). These 
two designs were chosen so that the Degree of Similarity between them could 
be calculated and used in this analysis. 
The Degree of Similarity between the apex4 and misex3 design was found 
to be 0.63. This value was kept constant for varying array sizes in order to 
carry out the analysis. Figure 5.9 shows how the need to combine the two 
designs in the same device would affect the yield; for large arrays, the yield 
loss compared with the high utilization design yield used on its own is of the 
order of 5%. 
It is relevant to note that, while mapping individual designs offers signifi-
cant yield advantages even when high utilization is needed, the yield enhance- 
125 
ment of mapping two designs to the same device are at most in the region 
of 20%, which may not justify the use of Design-Specific FPGAs as a yield 
enhancement scheme. 
5.5 Summary 
In this study we have introduced an analysis of the yield advantages of using 
FPGAs as one-time programmable devices for small to medium volumes. By 
using fixed design bitstreams, devices which might exhibit functional faults in 
other conditions can be used to map designs, therefore driving up manufactur-
ing yields. An analysis of interconnect layer usage has been presented, which 
allowed the yield of a specific design to be calculated. 
Using existing yield models it has been possible to analyze the improve-
ments which can be derived by losing the reprogrammability of FPGAs. By 
using the SIA roadmap predictions it has been possible to study the effects of 
such a scheme as more advanced technology nodes are introduced. 
The effect of higher interconnect utilization on the yields has been studied, 
to gauge future requirements for the scheme. Finally, we have examined the 
success probability of this technique if multiple bitstreams are to be used on a 
device. 
We have shown that significant yield advantages can be achieved using 
the Design-Specific FPGA approach, however the requirements for success-
ful design mappings denature FPGAs, whose main quality lies in the infinite 
126 
reprogrammability of devices. As the results in Section 5.4 show, the poten-
tial yield improvement of this approach for 22nm technology and beyond is 
questionable. 
This study has a number of limitations. The interconnect model used may 
not match the existing heterogeneous FPGAs and could therefore be improved. 
Verification of the results against manufacturer's data could be made if such 
data can be obtained. Notwithstanding, the work provides interesting insights 
into the benefits and limitations of Design-Specific FPGAs as a yield enhance-
ment technique. 
Chapters 3, 4 and 5 have provided the background analysis to motivate 
the research in this field. The work presented in these chapters has shown 
the limitations of the previous research in the area, and has spawned the 
research presented in the following chapters. Chapters 6 and 7 introduce what 
the author believes to be a more sophisticated approach to the problem of 
interconnect fault tolerance for FPGAs. 
127 
Chapter 6 
Built in Self Test of FPGA 
Interconnect 
6.1 Introduction 
Current trends aiming to minimize wire widths and separation have lead to 
higher occurrence of functional faults in the interconnect layers of FPGA de-
vices. Chapter 3 presented an analysis of the impact of such faults on the yield 
of FPGA devices, while Chapter 4 analyzed the benefits on yield of some of 
the most known yield enhancement techniques. Chapter 4 further highlighted 
the need to develop a fault tolerance scheme to be used for future technol-
ogy nodes, and how the requirements for a fault tolerance scheme are likely 
to change. As an alternative to increasing once again wire widths and sepa-
ration, a method to categorize devices exhibiting similar functional defects is 
proposed, in order to provide a solution to tolerate such physical defects and 
128 
increase manufacturing yield. 
This work aims to take advantage of the deep knowledge manufacturers 
have of the defects occurrences in their devices [Xi103], while trying not to 
affect the user's load and device performance. 
This study will introduce a new method to categorize faulty devices, as well 
as providing test procedures to locate device defects whenever needed. The 
Built-In-Self-Test (BIST) requires a relatively small amount of configurations 
to efficiently locate and identify a specific type of defect, determined by the 
defect grade of the device. The BIST architecture is easily scalable and can 
detect multiple functional faults. 
This chapter will introduce a new approach to fault detection. Section 6.2 
introduces the fault grading concept on which the fault detection is based; Sec-
tion 6.3 gives essential information on the Built-In-Self-Test procedure. Section 
6.4 provides some details of the implementation, and finally, in Section 6.5 a 
brief summary of the chapter is offered together with some concepts for future 
developments. 
6.2 Fault grading 
Devices undergo a multitude of manufacturing tests at various stages of the 
manufacturing process. Some degree of test is performed at the wafer level, 
where defective devices are marked and discarded after cutting. Parametric 
tests are performed on packaged devices; failing devices are once again dis- 
129 
carded, whereas devices that pass parametric tests are "binned" into different 
speed categories depending on how they performed during tests. Failed devices 
are mainly discarded even though the total amount of resources affected by a 
physical defect is minimal. Some of these devices can be used, albeit not in full 
capacity. Manufacturers have already looked at ways to reuse faulty devices. 
One such solution is offered by Xilinx with their Easypath devices [Xi105a] as 
examined in Chapter 5. Easypath devices are tested only for the resources 
used for a specified design. This means that devices do not have to pass all 
manufacturing tests, but only a limited number of them. Customers are then 
offered with devices mapped exclusively for their design, but at a reduced 
cost. They however lose the possibility to reconfigure the devices for future 
upgrades. 
Instead of using the Easypath approach, this work proposes that devices 
can be categorised with respect to the functional fault they exhibit. Functional 
faults are independent of physical faults, such as open vias or spot defects found 
during manufacturing tests [Xi103]. A specific functional fault only affects part 
of the device, and if it can be avoided, the rest of the chip can be made to work 
with only slightly altered characteristics. Our fault grading scheme aims to 
provide fault categories for defective devices that have failed similar functional 
tests. 
The concept of fault grading is very similar to that of speed grading: de-
vices will always exhibit different characteristics, and are therefore categorised 
according to specific parameters. Devices are marked and designs compiled 
130 
according to those specific parameters. It is therefore possible to generate 
new categories, and using this information defective devices can be used to 
implement the majority of designs. 
The fault grades contain information about the fault the device exhibits. 
The amount of information the fault grades contain is a trade-off between 
what is needed in order to avoid the fault and generalization of the defect. One 
extreme is a fault grade that contains the exact location and type of fault. The 
other extreme is a simple tag identifying a faulty device. A good compromise 
is a fault grade that indicates what type of resource is affected. This leads to 
a limited number of fault grades, that contain enough information to generate 
test configurations to locate the fault in the device during power-on. 
As an example, consider a Xilinx Virtex II PRO device and its general 
routing matrix [Xi104b]. This device offers 4 different types of lines: direct 
connections, double lines, hex lines, long lines. Four fault grades could be 
used to categorize fault on the wire resources. Two grades could be used for 
switch matrices faults, one to identify stuck-on faults and one for stuck-off 
faults. These grades are chosen with a defect tolerance scheme in mind, and 
how to avoid certain defects with the lowest overhead possible. Assuming that 
all other resources are unaffected, we can efficiently test all the interconnects 
of the same type that could possibly be faulty, during the power-on sequence, 
in order to provide an alternative design to avoid the faulty resource. 
131 
6.3 Testing strategy 
A BIST methodology is proposed to detect and diagnose a functional fault on 
a known interconnect resource type. The strategy consists of a point to point 
analysis, where a wire is forced to a logical value at one end and observed at 
the other. If the observed output is not equal to the input, a fault is present. 
A BIST environment consists of three elements: 
• Test Vector Generator (TVG); 
• Wires Under Test (WUT); 
• Output Response Analyzer (ORA). 
TVGs select the pattern of 0's and l's to be applied on the WUTs, while 
the ORAs compare the WUTs response against a predefined sequence and 
issue a pass/fail signal accordingly. 
Considering the nature of modern FPGAs, where routing channels are con-
siderably large, it is feasible to group the Wires Under Test together and per-
form an analysis at the ORA of all grouped wires. 
TVGs and ORAs can be implemented using Configurable Logic Block 
(CLBs). As most modern devices are made of large CLBs (comprising of 
multiple LUTs) a TVG and a ORA can be implemented using a single CLB. 
The TVG/ORA combinations are arranged in a chain that spans the entire 
width or height of the device. When a fault is detected, a 'fail' signal is passed 
on through to the chain end. The propagation within the chain is synchro- 
132 
Test Selector 
TVG/ 
ORA TVG 
<WUT  
Fault 
Chain 
WUT 
Fault 
Chain 
TVG/ <WUT  
ORA 
Fault 
Chain 
ORA 
TVG = Test Vector Generator 
WUT = Wires Under Test 
ORA= Output Response Analyzer 
lst 
Clock 
Cycle 
2nd 	 Nth 
Clock Clock 
Cycle 	 Cycle 
Figure 6.1: Testing Strategy 
nized with a clock, so that an ORA in the Nth position in the chain will only 
be allowed to access the chain at the Nth clock cycle. When a 'fail' signal is 
detected at the end of chain, the position of the chain in the array is found by 
the BIST controller using simple decoding logic. A diagram of such a system 
is shown in Figure 6.1. 
6.3.1 WUTs Grouping 
Taking into account that 4-input LUTs are the main building block in most 
modern FPGAs, the simplest ORA implementation is by using one such el-
ement. The TVGs are implemented using multiple LUTs, one for each wire 
in the set of WUTs. This allows complete independence of test vectors be-
tween wires in a set of WUTs. As a compromise between TVGs and ORAs 
implementations, it was decided to group the WUT in groups of 4. This ar-
rangement would require a single 4-input LUT for the ORA, whereas 4 4-input 
LUTs would be required for the TVGs. Such quantities are not uncommon in 
readily available devices [Xi10413]. The implementation can be altered to best 
133 
TVG  TVG/ 
ORA  
TVG/ 
ORA  ORA 
Al 
Figure 6.2: Grouped WUTs between ORAs and TVGs 
fit any device with different architectural characteristics, such as the latest 
Altera Stratix 2 device [A1t04], which is based on a variable sized CLB. The 
resulting arrangement for a 4-LUT based FPGA is shown in Figure 6.2. 
The dotted lines in Figure 6.2 represent wires from the adjacent set of wires, 
which have to be driven with opposite signals as the adjacent WUT to account 
for bridging faults across assigned sets. Those wires are not considered at the 
ORA but might nonetheless have an effect on the WUT in case of bridging 
faults. 
6.3.2 TVG and ORA Operation 
TVGs generate bit sequences to account for all possible faults that could de-
velop in the interconnect resource. They are implemented as simple look-up 
tables, where the output is selected from the Test Selector input. The Test 
Selector input is generated from the BIST controller, and is a global signal 
routed to all TVGs. For wiring faults, four basic test vectors can detect any 
defective occurrence. These, defined as the four basic vectors, are: 
• "0000" Tests for stuck at 1 faults. 
• "11/1" Tests for stuck at 0 faults. 
• "1010" Alternating l's and O's. Tests for bridging faults. 
134 
• "0101 " Alternating l's and 0's, in opposite order from the previous test 
vector. Tests for bridging faults. 
The basic test vectors can identify the set of WUTs that contains a fault. 
To correctly identify the exact faulty wire within a given set of WUTs, extra 
test vectors can be used. This second set of vectors is dependent upon the 
result of four basic test vectors and is decided by the BIST controller. The 
function of the second set of vectors is purely to improve the fault resolution. 
The ORA function is to generate a pass/fail signal according to some pre-
defined parameters. The ORA is designed to fit in a single 4-input LUT and 
under our scheme, it will issue a 'pass' signal only if the four WUT have logical 
values corresponding to the 4 basic test vectors. Under all other circumstances 
it will issue a 'fail' signal. 
6.3.3 BIST controller operation 
The BIST controller operation during test is shown in Figure 6.3. While the 
test vectors are being propagated through the chains, the BIST controller 
checks the end of all chains for any 'fail' signal being issued. If such a sig-
nal is found, the current counter value (representing how far along the chain 
the vectors have been propagated) and the chain end identifier represent the 
coordinate of the ORA that has detected the fault. 
The output from the BIST controller is a string of four bits regarding which 
of the four basic test vectors has found a fault. If, for instance, the string of 
135 
1. var ChainEnds: array of binary(0 to N-1) :=(all=0): 	//Chain Ends 
2. var result: array of binary (0 to 3) := '0000' 	//Test results 
3. var counter, x coord, y coord: integer 
4.  
5. begin 
6. for ( j in 0 to 3) 
7. case j is 
8. when (0) - apply 0000 	 //Test vectors 
9. when (1) - apply 1111  
10. when (2) - apply 1010 
11. when (3) - apply 0101 
12. end case 
13. counter := 0 
14. for (x in 0 to M-1) 
15. for (i in 0 to N-1) 
16. if ChainsEnds(i) = 1 then 	//Fault found 
17. result(j) = 1 	 //Fault recorded 
18. x coord := counter 
19. y coord := i 
20. end if 
21. end for 
22. counter:=counter + 1 
23. end for 
24. end for 
25. end 
Figure 6.3: BIST operation during test 
results from the BIST controller is 1000, test vector 1 has caused a fault. This 
means that the fault present in the system is a stuck-at-1 fault, as test vector 
1 could not have caused or detected any other unexpected behavior. 
From the inspection of the test results of the basic test vectors the BIST 
controller can determine what type of fault is present in the system and apply 
other test vectors to identify exactly which wire in the group of WUTs is faulty. 
Note than any fault or combination of faults confined within the set of WUT 
would cause at least two tests to fail. The only possible fault not confined 
within the set of WUT is a bridge onto adjacent set of wires. This causes only 
one of the bridging test vectors to fail. From the combination of failed tests 
the BIST controller can reduce the fault resolution to 2 or 3 wires or pairs of 
wires in the set of WUTs, as shown in Table 6.1. The second set of test vectors 
is designed purely to increase the fault resolution by selection of any one of 
136 
the already selected wires. During propagation of the extra test vectors, the 
pass/fail signal from the ORAs are used as selection between wires to identify 
the faulty one. 
If, for example, the combined test results are 1010, the fault is limited 
between Wire 2 or Wire 4 being stuck at 1. The next test vector, 1110, is 
then propagated. As by this point the possibility of any other fault has been 
eliminated, the ORA inputs can only be 1110, if Wire 2 is s-a-1, or 1111, if 
Wire 4 is s-a-1. The first option will result in a 'fail' signal from the ORA, 
whereas the second option will result in a 'pass' signal. The fault resolution 
can be increased to identify precisely the faulty wire from each ORA response. 
6.4 Implementation 
The BIST strategy proposed is to be used with prior knowledge of the faulty 
resource. The BIST strategy is a point to point one, where test vectors are 
applied at one end to a set of WUTs and observed at the other. TVGs and 
ORAs are arranged in rows, so that pass/fail results are propagated and read 
from only one location for each row. The BIST controller decodes the outputs 
from the end of the ORA chains to provide fault location. The WUTs are 
grouped in sets in order to offer the highest degree of parallelism considering 
the architectural and strategic constraints. The total number of configurations 
needed to complete testing is dependent upon the total number of wires of 
the same type present in the device. The configurations are grouped into 
137 
Table 6.1: BIST selection 
Test Vectors (Wire 1 - Wire .4) 
(1) - 0000 (2) - 1111 (3) - 1010 (4) - 0101 Fault Next Vector 
0 0 0 0 No Fault NIA 
1 0 1 0 Wire 2 or Wire 4 s-a-1 1110 
1 0 0 1 Wire 1 or Wire 3 s-a-1 0111 
0 1 1 0 Wire 1 or Wire 3 s-a-0 1000 
0 1 0 1 Wire 2 or Wire 4 s-a-0 0001 
0 0 1 1 Bridge All previous 4 
0 0 0 1 Bridge onto next set All previous 4 
0 0 1 0 Bridge onto next set All previous 4 
All others Multiple faults Composite 
phases, where configurations belonging to the same phase aim to test different 
interconnects appertaining to the same channel. 
6.4.1 Number of Configurations 
To fully test a routing channel all lines originating or terminating from a 
single CLB have to be tested. If the architecture has L lines of a specific type 
originate from any CLB in a channel, then the test of all lines in a channel will 
need FL/41 number of configurations. Modern FPGAs rarely have more than 
20 lines of any type generating from any CLB in one channel [Xi10413], hence 5 
configurations are sufficient to test all lines in channel. These make up a test 
phase. 
6.4.2 Wire Testing Phases 
Considering an M x N array , with M CLBs in each vertical channel and N 
CLBs in each horizontal channel, M + 1 vertical routing channels and N + 
1 horizontal routing channels exist [BFV92]. Testing of each horizontal or 
vertical routing channel requires all the CLBs in a row or column, respectively. 
In the vertical and horizontal direction, testing of all channels requires at least 
2 phases, where during the first N or M channels respectively are tested, 
and in the second the channels left over is tested. The second phase of the 
vertical and horizontal channels testing can be combined, as shown graphically 
in Figure 6.4. 
Three phases are required to test all the lines of the same type in all chan-
139 
P 
TVG/ 	TVG/ 	TVG/ 
ORA ORA ORA 
TVG/ 
ORA 
TVG/ 	TVG/ 	TVG/ 
ORA ORA ORA 
TVG/ TVG/ 
ORA 	ORA 
(a) 
	
(b) 
	
(c) 
Figure 6.4: Three configuration phases 
nels. If FL/41 are required for each phase, a total of 3 x [L/41  are needed for 
testing the whole device. 
6.4.3 Switch Matrix Testing Phases 
To test for switch matrix faults the WUTs signals are routed through switch 
matrices. Stuck-off faults are dealt with just like open faults. In the event 
of stuck-on faults, bridges are created within the switch matrix configurations 
for detection. 
The switch matrix configurations needed to test for all stuck-on and stuck-
off faults are shown in Figure 6.5. The diagram shows the routing inside the 
switch matrix in order to cause bridging faults under all possible matrix com-
binations. At the same time, the routing shown also explores all the possible 
connections within the matrix itself. 
The testing scheme remains unchanged: if an ORA detects an unexpected 
behavior, a 'fail' signal is propagated through the end of the chain. The only 
tweak to the original scheme is that two TVGs connected to the same switch 
140 
ORA 
TVG 
ORA  
TVG 
TVG 	 ORA TVG 
TVG  
1 	
ORA 
TVG 
ORA 
TVG ORA ORA TVG 
(i) 	 (ii) 	 (iii) 
	
(iv) 
Figure 6.5: Switch Matrix fault diagnosis phases 
Table 6.2: Stuck-on faults resolution 
Faulty Connection Routing configuration detected by 
North-East ii,iii,iv 
North-West i,iii,iv 
South-East i,iii 
South-West ii 
North-South i,ii 
East-West i,ii,iii 
matrix produce opposite signals. In the case of stuck-on switches, this causes 
a behavior identical to that of a bridging fault. 
The diagnosis of stuck-off faults is straight forward, as 'fail' signal from any 
ORA can only be caused by one faulty connection in all routing configurations. 
For stuck-on faults, however, a different analysis has to be performed. A 
permanent connection between two terminal will cause all the ORAs connected 
to the faulty switch matrix to detect a fault. But any permanent connection 
will only be detected during a specific number of routing configurations. From 
the analysis of the result of the 4 test phases, the BIST controller can determine 
the exact faulty connection. The faults and failures caused are summarized in 
Table 6.2. 
141 
6.4.4 Case Study: Xilinx Virtex II Pro 
The Xilinx Virtex II Pro [Xi104b] device family allows TVGs and ORAs to 
be implemented in a single CLB, thanks to the high number of 4-input LUTs 
present. For the purposes of this case study the case where a double line in 
the general routing matrix is faulty is considered for a XC2VP20 device. The 
Virtex II Pro has 20 double lines originating from a CLB in both vertical 
and horizontal channels. This leads to a total of 5 configurations per test 
phase. Therefore a complete test would require 15 configurations to fully 
test all double lines available in the FPGA. Assuming a worst-case scenario 
of bitstream download through the JTAG port, each device configurations 
would require 249ms, so the total time required for reconfigurations is 3.74s. 
This time can be considerably reduced if a SelectMap interface is used for 
download. In this case, total download time would be just over 0.3s. The 
actual test time, in terms of clock cycles is in both cases much smaller than 
the configuration download time and thus it would not affect total testing time 
by a great amount. A test configuration is shown in Figure 6.6. 
6.5 Summary 
A new framework for FPGA interconnect testing has been presented. The con-
cept of device fault grading has been introduced, together with simple, effective 
testing procedures. Under this scheme it is possible to load testing configura-
tions to the FPGA with the specific aim of locating a fault whose nature is 
142 
Figure 6.6: Floorplan view of a test configuration 
already known. The testing is done completely on-chip via a dedicated BIST 
controller or partially off-chip using a microprocessor as the testing controller. 
This work provides manufacturers and users with a different approach 
to fault tolerance. The development of this framework is based around the 
assumption that defective devices will show similar functional faults spread 
around the chip area. It is possible to categorize these defects with respect to 
their functional faults. In the design process the fault is accounted to be found 
anywhere around the chip and the usage of a faulty resource can be reduced 
to a minimum. The exact location of the fault can be found by loading the 
proposed test configurations during the power-on sequence 
This work forms part of a complete fault tolerance scheme: it takes the first 
step towards allowing functionality under the presence of a fault by locating 
143 
the fault in the FPGA array. The next part of the fault tolerance scheme is 
fault avoidance, details of which are given in the next Chapter. 
144 
Chapter 7 
Interconnect Fault Avoidance 
for FPGA 
7.1 Introduction 
The area occupied by wiring channels and interconnect configuration circuits 
in an FPGA is significant, occupying 50 to 90 percent of the chip area [BFV92]. 
With current trends aiming to reduce the area occupied by wiring segments in 
the routing channels, wire width and wire spacing have been reduced. This has 
in turn led to higher occurrences of wiring defects, such as breaks and shorts, 
decrease in manufacturing yield, as shown in Chapter 3, and fewer functioning 
devices at fixed manufacturing costs. 
The nature of FPGA devices provides two separate ways of dealing with 
defects. The first and more obvious method is based on hardware redundancy, 
capitalizing on the high regularity of the FPGA array to swap a faulty resource 
145 
with a spare functioning one. The second method is based on exploiting the 
reconfiguration properties of the device, tweaking the design to fit around the 
defective resource. Both methods come with significant area and timing over-
heads; only a limited number of the proposed schemes have proved successful 
and have been implemented by manufacturers [Altb, NZJ, RMLP]. 
In this work a new fault tolerant scheme is proposed, based on both hard-
ware redundancy and reconfiguration. This new approach to fault tolerance 
is based on modifying some of the underlying characteristics of a given FPGA 
architecture, and the principle is demonstrated using a simple routing archi-
tecture. 
The proposed fault tolerance scheme has the following notable advantages: 
• estimated device yields increase from 40% to 100% for large devices built 
at 90nm as predicted by the yield analysis tool introduced in Chapter 3; 
• estimated worst case timing degradation of 8.5%, independent of the 
number of faults present on the device; 
• semi-permanent defect correction through configuration readback; 
• support for multiple non-localized defects; 
• can be extended to support dynamic fault tolerance. 
The major disadvantage of this fault tolerance scheme comes in the need 
for minor modifications to the device's configuration controller; these require 
a small amount of silicon area and cause extra latency during the power-up 
146 
sequence. The small modifications required to the reconfiguration controller do 
not however overshadow the significant yield benefits achieved by the proposed 
fault tolerance scheme. 
This chapter is structured as follows. Section 7.2 explains the line of 
thought which brought the development of the scheme, the fault tolerance 
scheme itself and gives details on the implementation. The area and timing 
results of our pilot architecture are presented in Section 7.3, while the yield 
analysis results are presented in Section 7.4. Finally Section 7.5 concludes the 
paper and suggests area for future research in the area. 
7.2 Fault tolerance for FPGAs 
This section describes the motivation and line of thought that led to the de-
velopment of the fault tolerance scheme described in this Chapter. 
7.2.1 Motivation and Requirements 
The occurrence of defects during manufacturing is a random process. While 
data exist regarding the density and clustering of defects, it is impossible to 
formulate prediction models regarding the location of the defects within a die. 
As such, the probability of obtaining even two defective devices which exhibit 
the same functional fault in exactly the same location is almost non-existent, 
unless systematic causes of defects exist. 
As FPGAs continue their expansion into the semiconductor market, they 
147 
are more and more often utilized in medium volume products. One of the 
biggest challenges offered by fault tolerance is therefore ensuring that the same 
bitstream produced can be successfully matched to tens, hundreds, and poten-
tially even thousands of non-identical devices. Therefore the development of 
a fault tolerant scheme has the following primary goals: 
• increase the number of usable devices in each wafer; 
• impose no extra design burden for the customer; 
• minimize impact on timing enclosure. 
Area, despite being an important issue for most research, is not mentioned 
as a primary goal. This is because in order to increase the number of usable 
devices obtained from the manufacturing process the yield advantage has to 
overcome the area overhead. It is therefore more important to increase yields 
and reduce area overheads. 
Under these constraints the most reasonable approach is to automatically 
manipulate the design before programming the FPGA. By generating ad-hoc 
bitstreams for each device the "uniqueness" factor of fault tolerance is elim-
inated. This can be achieved either through a configuration controller or 
through on-board placer and router, to be run with prior knowledge of the 
fault location [DST99]. However, other factors affect these type of approaches, 
most notably the overhead required to manipulate the bitstreams in a rea-
sonable amount of time before programming. With designs getting more and 
more complex a full re-generation of bitstream is infeasible. 
148 
Therefore hardware redundancy, coming with greater area and timing penal-
ties, has been the preferred method for implementing fault tolerance. As de-
vice performance is improved as a result of manufacturing technology and 
architectural improvements, the timing degradation resulting from the extra 
switching required to avoid the fault can be within acceptable limits for non-
performance-dependent applications. Any form of hardware redundancy does, 
however, restrict the performance of devices, and in a fast moving semicon-
ductor industry even the smallest degradations are crucial. 
7.2.2 Architecture Exploration 
The first step of our analysis was conducted to understand exactly how the 
highly redundant nature of FPGA affected the probability of a design be-
ing successfully placed and routed. VPR, an open source place and route 
tool [BR97], was modified in order to provide fault injection. This is achieved 
by making an interconnect resource unavailable to the router to simulate the 
presence of a catastrophic interconnect fault. One at a time, all interconnect 
resources available in the device have been tagged faulty, made unavailable 
and a full Place and Route pass has been performed. 
Two architectures have been taken into account. The first, labelled full is 
a full connectivity, low performing architecture. Its general routing network is 
made up of short segments only spanning one logic block. This architecture 
has the highest degree of routing flexibility; however, due to high number of 
switches required to achieve the full connectivity, it is also a low performance 
149 
Connection Block (a) Switch matrix Connection Block (b) 
X x 
X 
X 
X 
X X 
X X 
X X 
X 
X 
X X X 
X X X x X X X X X 
X X 
A BCDN E F GHO 
Figure 7.1: A segment of the full architecture 
Table 7.1: Architecture used for Place and Route analysis. The parameters Fc 
and Fs indicate the connection block and switch block population respectively 
Line length Segment Frequency Fc Fs 
1 0.33 1 1 
4 0.25 1 1 
8 0.25 0.33 1 
long 0.17 0.5 1 
channel width - 24 
architecture. Figure 7.1 shows a small section of the full architecture. 
The second architecture, labelled segmented, most resembles commercial 
devices: its general routing network comprises of multiple length segmented 
interconnects, with low connectivity. This type of routing architecture is less 
flexible than the full architecture, however it provides significant benefits in 
terms of performance. A study on the optimal levels of segmentations and 
connectivity has been presented in [Bet98]. The architectural parameters of 
the segmented architecture used for this analysis are summarized in Table 7.1. 
Selected benchmarks from the MCNC suite [Yan91] have been placed and 
routed in the modified VPR using the smallest possible array and minimum 
channel widths normally required for faultless devices. The results were split 
in 3 categories: 
150 
• Successful - The design was successfully placed and routed with the same 
timing characteristics of the faultless device. 
• Timing failure - The design was successfully placed and routed but the 
timing was affected when compared to the faultless device. 
• Failed - The design could not be successfully placed and routed. 
Figure 7.2 shows the outcome of the place and route analysis. The percent-
age of faults causing the design to fail routing can be as high as 60% for the 
segmented architecture. The spread of timing variation is shown in Figure 7.3, 
and in both architectures some designs successfully place and route but exhibit 
very high timing degradation. 
On average, the full architecture only fails the place and route process in 
under 10% of cases. However, if the experiment is repeated using one extra 
track in each routing channel, all designs route successfully and within timing 
constraints. 
Figure 7.1 shows a portion of the full architecture. The striking feature of 
this type of architecture is the high connectivity from all pins of connection 
block (a) (A,B,C,D,N) to all pins of connection block (b) (E,F,G,H 2 O). More 
importantly, due to the nature of the architecture it is possible to replicate all 
pin-to-pin connections by simply widening the routing channel and introducing 
spare resources, as shown in Figure 7.4. It is this feature which enables designs 
to be placed and routed while maintaining similar timing characteristics to the 
original design. Duplicating pin-to-pin connections has thus been identified as 
151 
Pe
rc
en
ta
ge
  o
f P
&
R
 p
as
se
s  
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
18% 
II 
II
I  
alu4 
E 
cE)  
CA 
.3) 
Ul 
ex5p 
E 
cE)  
CM w 
U) 
tseng 
a 
? 
CA a) 
U) 
E 
c6) 
CA 
a) 
cn 
big key dsip 
E 
e
a) 
CI) 
6) w 
des 
E 
a) 
E 
CA a) 
w 
"E 
cE)  
cn 
apex2 frisc 
a 
cE) 
co 1 
misex3 
a 
'I 
a) 
cn 
average 
a 
E 
(1) 0 
• Successful IIII Failed Timing 0 Failed routing 
Figure 7.2: Design classification in the presence of a fault. The inherent re-
dundancy of FPGA is not sufficient to guarantee successful place and route 
under the presence of an interconnect fault. 
0-5% 5-10% 10-20% 20-30% 30-40% 40-50% >50% 
Percentage timing variation 
D Full II Segmented 
Figure 7.3: Spread of timing variation for designs successfully placed and 
routed but with timing failure 
152 
	Connection Block (a) 	Switch matrix Connection Block (b) 
---------------- 	 ------------------------- -----... --------- ---- _ 	 --- ----------------- ------------ 
1 	X -X--- 	-- - -----------------ii ------ 	X it 
I I 
	4 it 
4t 1 	 
IC 	1( 	1 	 1 	
4 	 
1 
4 
t 	 I 
BCDN E F GHO 
Spare resources 
Figure 7.4: Full architecture routing channel connectivity with spare resources 
being the key to achieving fault tolerance without disrupting the timing. 
This brings the question of how much redundancy is really needed in FP-
GAs? The analysis has shown that there is no need to introduce many more 
spare resources, rather it is convenient to maximize and improve the use of the 
available ones left over by the placer and router, and perhaps introduce spares 
at very fine grain level. This has brought the development of the fault tolerant 
architecture discussed in the next section. 
7.2.3 Proposed Fault Tolerance technique 
The highly regular structure and high connectivity of the full architecture 
enables all pin-to-pin connections to be replicated with relative ease. Full con-
nectivity, however, is subject to larger area and considerable timing penalties. 
High performance architectures thus have limited connectivity, as shown in 
the sample architecture in Figure 7.5; the architecture is designed taking into 
account routability and performance only. It is proposed here to re-evaluate 
the connectivity of FPGA devices taking into account fault tolerance as a pa- 
153 
rameter. It should be noted that the diagram in Figure 7.5 does not represent 
the segmented architecture, and has only been used in order to demonstrate 
the functionalities of the proposed fault tolerance scheme. 
Connection Block 
X 
(a) 	Switch matrix Connection Block (b) 
X 
X 
X 
X X X 
X 
X X 
X 
X X 
X 
X X X 	X 
ABCDN E F 	GHO 
Figure 7.5: An example architecture with limited connectivity but higher per-
formance 
Consider the architecture shown in Figure 7.5. The connectivity in each 
connection block and the switch matrix can be expressed mathematically using 
adjacency matrices, as shown in Figure 7.6. A matrix entry of "1" signifies 
a connection between a pin and an interconnect or an interconnect to an-
other interconnect exists in a connection block or switch matrix respectively. 
Conversely, a matrix entry of "0" means no connection exists. The product 
of all three matrices yields the overall connectivity matrix, where each entry 
indicates the number of possible routes to connect each pin to all others. 
In order to improve fault tolerance the aim is to re-evaluate the overall 
connectivity matrix by increasing each non-zero matrix entry by one or more, 
and derive the individual adjacency matrices accordingly. Integer Linear Pro-
gramming was used to solve this problem, and the formulation is shown in the 
next subsection. 
154 
A,B,C,D,N = LUT (a) pins 
E,F,G,H2 O = LUT (b) pins 
I1 12 13 14 15 11 12 13 14 15 E F G H O 
A 0 1 0 0 1 11 1 00 0 0 11 1 1 0 	0 0 
B 1 0 0 1 0 12 0 1 0 0 0 12 1 0 1 	0 0 
C 1 0 1 0 0 13 0 0 1 0 0 13 0 0 1 	1 0 
D 0 1 0 0 1 14 0 0 0 1 0 14 0 1 0 	0 1 
N 0 0 1 1 0 15 0 0 0 0 1 15 0 0 0 	1 1 
Connection Block A 	Switch matrix 	Connection Block B 
EFGHO 
A 1 0 1 1 	1 
B 1 2 0 0 	1 
C 1 1 1 1 	1 
D 1 0 1 1 	1 
N 0 1 1 1, 
Overall Connectivity 
adjacency matrices 
Figure 7.6: Modelling connectivity using adjacency matrices 
7.2.4 ILP formulation 
This section introduces the Integer Linear Programming (ILP) to split an 
overall connectivity matrix into the product of a three adjacency matrices, 
each representing a connection block or a switch matrix. In order to linearize 
the problem the formulation has been split into two stages, where in the first 
stage the overall connectivity matrix is split into the product of an adjacency 
matrix and an intermediate matrix; in the second stage the intermediate matrix 
is further divided into two adjacency matrices. The formulation of the first 
stage is shown in this section, as it depicts the most general case of the problem. 
Consider an architecture where each routing channel is t tracks wide and 
each logic block has p pins which connect to the routing channel. The overall 
connectivity matrix for this kind of architecture is modelled using a matrix C 
of p rows by p columns, which is the product of an adjacency matrix A and 
an arbitrary matrix B of p x t and t x p rows and columns respectively. The 
155 
main constraint is shown in (7.1). 
Vi, V J E aikbki > cij +1. 
t 
(7.1) 
k=1 
As both a and b are variables in the system, a dummy variable d is intro-
duced, constrained by (7.2) to regulate (7.1) in linear form. Substituting for d 
yields (7.3). 
dikj < dik bkj • 
	 (7.2) 
E 	+ 1. 	 (7.3) 
k= 
Considering aik E {0, 1}, (7.1) can be replaced by (7.4), which is then 
linearly expressed by (7.5) and (7.6), where U is an upper bound on b. 
aik =0= dikj <0 
aik = 1 = diki < bki• (7.4) 
dikj < aik U. (7.5) 
dikj 	bkj • (7.6) 
Finally, in order to preserve the regular structure of the FPGA, the algo- 
156 
rithm needs to ensure that all pins connect to a maximum number of intercon-
nects and vice versa all interconnects only connect to a fixed number of pins. 
These conditions are expressed by (7.7) and (7.8), where N1 and N2 are the 
intended number of connections present, if a is a connection block adjacency 
matrix. Similar constraints are used for switch block adjacency matrices. 
E aik < 	 (7.7) 
E aik < N2 	 (7.8) 
The aim of the formulation is to obtain an optimized solution to minimize 
area. The ILP objective is thus minimize the number of non-zero entries in the 
adjacency matrices, as they represent the number of switches present in each 
connection block and switch block. Considering the fact that transistors in 
connection blocks and switch matrices are likely to have different properties, 
the size and performance of each was based on the work presented in [Bet98]. 
The final objective equation is shown in (7.9), where T1  and T2 are the relative 
sizes of the transistors in the resource being modelled by matrices A and B. 
min E(Tiaik) + 	(T2bkj) 
	
(7.9) 
i,k 
The final result of the ILP problem solving for the sample architecture 
shown in Figure 7.5 is shown in Figure 7.7. Due to space limitations, only 
157 
connections in the horizontal channels are shown. For a complete architecture 
the vertical connections are also considered. 
Connection Block (a) 	Switch matrix Connection Block (b) 
X 	X 
	x **  
A BCDN 	I 	E F GHO 
Extra switches 
—x-- Spare resources 
Figure 7.7: Example architecture with improved connectivity for fault toler-
ance 
7.2.5 Fault avoidance 
The fault avoidance is based on a node covering scheme. Each point to point 
connection is "covered" by another option, so that if a track becomes unavail-
able as a result of fault, the "covering" track is used instead. 
An example transformation is shown in Figure 7.8. The original design, 
depicted in the top diagram, utilizes a faulty track (second from the bottom of 
connection block (a)). The transformation algorithm automatically swaps the 
signal to utilize a different track. This in turn "knocks" another signal, which 
originally utilized the covering track, onto its own cover. The final result is 
depicted on the bottom diagram of Figure 7.8. 
The implementation of the configuration controller requires knowledge of 
all the possible transformations. The type of information required for storage 
is shown in Table 7.2, and the switch numbering is shown in Figure 7.9. The 
158 
N A BICE) 
Faulty line 
Spare resources 
Connection Block (a) 
Original Fault tolerant 
E F GHO 
Extra switches 
Switch matrix Connection Block (b) 
IIMPI ----  MIMI= =I =1.111  MI III! 
BC1N I 
X 
	X 
X 	
Connection Block (a) 	Switch matrix Connection Block (b) 
EFGHO 
Extra switches 
Figure 7.8: Example of fault avoidance through re-routing 
contents of Table 7.2 are architecture specific and are pre-generated. The 
controller function is to load the necessary configuration switches state into 
local memory, where they can be modified and then fed back into the bitstream. 
In the example given in Figure 7.8 only the connections to the channel to the 
right of the fault are considered: for full fault coverage channel above, below 
and to the left are also taken into account. 
The main purpose of the controller is to swap all original connections with 
the fault tolerant ones: in the example given above, the connection from pin 
A to pin G utilizes a faulty track and hence its cover needs to be used. this 
opening the original switches (no. 1, 17 and 30) and asserting the ones for the 
fault tolerant cover (no. 2, 20 and 32). Since switch no. 20 is already being 
159 
used by connection from pin D to pin H, another cover has to be found. This 
recursive algorithm continues until the cover utilizes only spare resources. A 
pseudo-code version of the algorithm is given in Figure 7.10. 
2X 
1 X 
	 26X 38 9X 12141   
15X 
2 2z 
2 0"
1 29X 
32x 35X 3 
8X 14X X 
8X 18X 
28 
31X 34X 
X 
10X 
s 18 
1-7 25X 
4X 7X 16 24X 27X 
30X 
33X 
A BCDN E F 	GHO 
Figure 7.9: Switch positions to load into bitstream shift register 
Table 7.2: Switches used by original and fault tolerant configurations 
Pin to Pin Original Switches Fault tolerant Switches 
A to G 1 - 17 - 30 2 - 20 - 32 
D to H 11 - 20 - 35 12 - 23 - 34 
The controller, which for the results presented in Section 7.6 was integrated 
into VPR, was also implemented in VHDL and synthesized using Synplicity 
ASIC and UMC's 130nm libraries. The resulting circuit size is 0.03mm2, run-
ning at a speed of over 350Mhz. A single fault recovery is achieved in 43 clock 
cycles, not including the fetching of the data from the bitstream. The total 
clock cycles required to complete the node covering procedure is dependent on 
the number of faults and the usage of the routing channel exhibiting the fault. 
160 
1. var faulty_line \% input 
2.  
3. function repair(line, bitstream) 
4. var pins := get_pins(find_cover(line)) 
5. if (check_pins_used(pins, bitstream)) then 
6. repair(find_cover(line),bitstream); 
7. else 
8. update_bitstream(line,bitstream); 
9. end if; 
10. end function repair 
11.  
12. function find_cover(line) 
13. lookup(line, cover_table); 
14. end function find_cover 
15.  
16. function update_bitstream (line, bitstream) 
17. set_to_0 (get_pins(line)); 
18. set_to_i (get_pins(find_cover(line))); 
19. end function update_bitstream 
20.  
21. main() 
22. load bitstream; 
23. repair(faulty_line, bitstream); 
24. store bistream; 
25. end; 
Figure 7.10: Fault avoidance algorithm pseudo-code 
7.3 Timing and Area Analysis 
7.3.1 Single fault 
The fault tolerant architecture, developed using the technique presented in 
Section 3.3.1 applied to the segmented architecture allows the entire design set 
to be placed and routed successfully, as shown in Figure 7.11. On average, 
over 70% of faults did not affect the designs in any way. The remaining faults 
only caused minor timing violations, as shown in Figure 7.12. All of the 
timing variations are within 8.5% of the original design, while the majority is 
contained within 5%. 
Figure 7.14 and Figure 7.15 depict the area overhead incurred by the fault 
161 
 100% 
90% 
80% -
70% -
60% -
50% -
40% -
30% 
20% - 
                          
                           
                           
                           
Pe
rc
en
ta
ge
  o
f P
&
R
 p
as
se
s  
                         
                         
                         
                         
                         
                         
                         
                          
                          
                          
                          
                          
                          
                          
10% 
                          
                           
 
0% 
                          
                           
0 \sr e
(DC 	 6 	\C 	 co "b• 	 k- 6e'‘)  \`.6- 	
• 	
cc` 	•.% co- 
N Successful ■ Failed Timing 
Figure 7.11: Percentage of faults causing design degradation in fault tolerant 
architecture 
16% 
14% 
U) 
2 • 12% 
C. 
ix 10% 
Gs 
"E) 8% 
a) R ▪ 6% 
a) 
▪ 4% 
2% 
0% 
0-2.5% 
	
2.5-5°4 
	
7.5-1 0% 
Percentage timing variation 
Figure 7.12: Percentage of faults causing timing variations in fault tolerant 
architecture. 
162 
tolerant architecture for routing networks without and with buffer sharing 
respectively. Buffer sharing allows area savings as only one buffer is required 
to drive the signal for each pin in the connection block and for each switch in 
the switch matrix. This situation is shown graphically in Figure 7.13(b). 
The area result graphs depict the total routing area required to implement 
the architecture with and without fault tolerance. The graphs also show how 
varying the parameter Fe , the fraction of pins each track connects to, affects 
the overall area requirements (Fe = 1 means the interconnect track connects 
to all pins, while Fe = 0 means the track does not connect to any pins). The 
total area is shown in "minimum-size" transistor count, an area measuring 
technique introduced in [Bet98]. This method takes into account the different 
sizes of connection block, buffers, and switch blocks transistors, and it offers 
a much more realistic measure of the total area required to implement an 
architecture. 
The results shown in Figure 7.14 depict the routing area requirements for 
an architecture with buffer sharing. In order to calculate the total tile area 
the transistor counts of the logic block would need to be included. Our sample 
routing architecture is, for a routing channel width of 16, 13% larger than the 
segmented architecture which it generates from. When logic is included, this 
value is reduced to 4.5%. If no buffer sharing is considered, the area overhead 
including logic is raised to 19% for routing only, as shown in Figure 7.15. 
163 
Logic 
Block 
Logic 
Block 
Logic 
Block 
Logic 
Block 
I SRAM I 
(b)Buffer sharing 
Logic 
Block 
i SRAM  I 
Logic 
Block 
1 
Logic 
Block 
Logic 
Block 
I 
Logic 
Block 
Logic 
Block 
. > j,.___AM  
	>-1--,--- 
- ._s_,___ly I 
(a) No buffer sharing 
Figure 7.13: Tri-state buffer sharing at switch block and output pin connection 
block [Bet981 
164 
2500 
a ) 
o_ w 2000 
0 
1500 
C co 
N 1000 
7) 
E 
E 
0 
500 
E 
E 
2 
1000 
0 
6000 
) 
47. 
41 5000 
4000 
U) C 
E 3000 
a) 
N 
2000 
E 
8 	10 	12 	14 	16 	18 	20 	22 
Routing Channel Width 
-•- Single line area Fc = 0.5 	 Single line Area Fc = 0.75 
Single line Area Fc = 1 -4,- Single line area with fault tolerance Fc = 0.5 
-•- Single line area with fault tolerance Fc = 0.75 	Single line area with fault tolerance Fc = 1  
Figure 7.14: Routing area analysis of fault tolerant architectures with buffer 
sharing 
8 	10 	12 	14 	16 	18 	20 	22 
Routing Channel Width 
I 	Single line area Fc = 0.5 	 -•-• Single line Area Fc = 0.75 
Single line Area Fe = 1 Single line area with fault tolerance Fc = 0.5 
1-m- Single line area with fault tolerance Fc = 0.75 -•- Single line area with fault tolerance Fc = 1  
Figure 7.15: Routing area analysis of fault tolerant architectures without buffer 
sharing 
165 
P
er
ce
nt
ag
e  
o
f 
P&
R
 p
as
se
s  
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
CO 
N 
 	CO 
0 CO  
CO 
Cl) 
= 
CO 
N 
En 
0 
CO 
CI 
Cl) 
= 
CO 
CO 
CO 
CO 
C) 
co 
0 os 
CO 
CO 
0 co 
C) 
CO 
0 co 
CO 
0, 
3 os 
03 
CO 
0 
os 
CO 
co 
3 co 
CI 
CO 
co 
CO 
. 
= co 
CI 
CO 
CO 
N 
tO 
os 
co 
CO 
CO 
CO 
CO 
3 co 
C) 
U) 
0 co 
CO 
CO 
CO 
C., 
alu4 
	
ex5p tseng bigkey dsip 
	
des 	apex2 frisc misex3 average 
■ Successful ■ Failed Timing  
Figure 7.16: Percentage of faults causing design degradation for devices ex-
hibiting 2 or 3 functional faults 
7.3.2 Multiple faults 
Due to the sparse occurrence of faults and the fine granularity of the fault 
tolerance scheme, it is possible to tolerate multiple faults at no extra area 
penalty. Figure 7.16 depicts the place and route results for devices exhibiting 2 
and 3 faults. While the average number of designs suffering timing degradation 
is higher, no unsuccessful designs exist. The same is true for designs exhibiting 
5 and 10 faults, shown graphically in Figure 7.17. 
The most interesting results however arise from the timing analysis of de-
vices with multiple faults. The results are shown in Figure 7.18. The maximum 
timing degradation has not changed, remaining at 8.5%, and the majority is 
still within 5% of the original design. These results prove the independence of 
the timing degradation on the number of faults present on the device. Timing 
degradations only arise if the faulty track happens to be used by a net on 
166 
100% 
90% 
60% 
70% - 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
P
er
ce
nt
ag
e  
o
f P
&
R
 p
as
se
s  
z 
as al al 
Is 
Is Id 
C/1 47 	CO 4 4 U) 
7 7 
al al IC co IC IO ld Ce 
U) 0 
alu4 
In O 
ex5p 
Lt1 
tseng bigkey 
O 
dsip 
U) O 
des 
U) O 
apex2 
O 
frisc 
al 0 
misex3 
1.0 O 
average 
• Successful ■ Failed Timing  
Figure 7.17: Percentage of faults causing design degradation for devices ex-
hibiting 5 or 10 functional faults 
the critical path: more faults simply increase the probability of such an event 
happening. 
7.4 Yield Analysis 
Using the techniques presented in Chapter 3 and 4 it has been possible to 
analyze the effect of the new fault tolerance scheme on yield. 
7.4.1 Single fault 
The rather large area overhead is overshadowed by the significant yield in-
crease, and reflected in the total number of working devices out of a 12in 
wafer. Figure 7.19 shows the variation in number of working dies per wafer 
as a function of array size, shown here in arbitrary units. The largest devices 
built at 90nm, the technology being analysed here, is shown by the vertical 
167 
25% 
u9 20% 
R 
cc 
ca 15% 
0 
a) 10% 
a) 
'CB 5% 
a- 
0% 
0-2.5% 2.5-5% 	5-7.5% 
Percentage timing variation 
 
7.5-10% 
 
• 2 faults •3 faults 05 faults 0 10 faults  
 
    
Figure 7.18: Percentage of faults causing timing variations under the presence 
of multiple faults 
dotted line in Figure 7.19. If no hardware redundancy is used, 14 working dies 
can be expected using this fault tolerance scheme. Using our fault tolerance 
scheme the total number of working dies can be increased to 26, thereby almost 
doubling the productivity. This is to be compared with the results shown in 
Section 4.7 of Chapter 4, which has shown that a fault tolerance scheme based 
on spare row redundancy only yields 20 dies under the same circumstances. 
7.4.2 Multiple faults 
Tolerating multiple faults is especially useful for future technology nodes, when 
devices are predicted to exhibit multiple abnormalities. Figure 3.13 depicts the 
number of faults to be expected for large devices manufactured at 90, 45 and 
22nm. In particular, for the 45nm node yields of almost 90% can be achieved 
by tolerating up to 5 faults, as shown in Figure 7.20. 
168 
without fault tolerance 
— — — with fault tolerance 
maximum array 
size 
300 50 	100 	150 	200 	250 
Array size M (arbitrary units) 
1 
0.9 
0.8 
0.7 
0.6 
a) 
$: 0.5 
0.4 
0.3 
— — — with fault tolerance 
	without fault tolerance 
0.2 
0.1 
0 	
 
90 
`17). 80 
.16 70 
.13 so 
8 50 
rn 
2 40 
a) 
-2 30 
z 
20 
10 
100 110 120 130 140 150 160 170 180 190 200 
Array size (arbitrary units) 
Figure 7.19: Using the proposed fault tolerance scheme can almost double the 
productivity for very large devices at 90nm 
Figure 7.20: Improving wafer yields at 45nm by tolerating up to 5 functional 
faults per device 
169 
Full scale yield is achieved at 45nm for all but the largest devices if 5 
provision for support of up to 5 faults is provided. 
7.5 Summary 
A new approach to fault tolerance in FPGAs has been presented. The method 
proposes to re-evaluate the routing architecture of FPGA devices to include 
fault tolerance as a measuring parameter as well as performance and routabil-
ity. 
The scheme proposed is based on node-covering techniques to replace faulty 
tracks by spare ones. Area overhead is limited by minimizing the number of 
extra switches required to implement the node-covering. 
It has been shown that even in the worst-case scenario timing variation 
is within 8.5% of original design, while area overheads are as little as 4.5%. 
Using the yield analysis techniques presented in Chapter 3 it has also been 
possible to prove that yields can be increased significantly, almost doubling the 
total number of working dies per wafer despite the area overhead for current 
technology nodes. The fine granularity of the redundant circuits allows for 
multiple non-localized defects to be tolerated at no extra area and timing 
costs compared to single fault tolerance. 
The scheme requires modifications to the configuration controller to im-
plement the fault avoidance. The extra circuitry required by the configura-
tion controller to implement fault tolerance is less than a square millimeter in 
170 
size, an overhead greatly overshadowed by the significant yield enhancements 
achieved. 
171 
Chapter 8 
Conclusions 
8.1 Summary 
This thesis examined some of the aspects relative to FPGA fault tolerance. 
The work presented deals exclusively with static faults - imperfections arising 
during the manufacturing process - in the metal layers of FPGA dies. These im-
perfections significantly affect yields and contribute to high part costs, thereby 
hampering the FPGA growth into the semiconductor industry. 
Being very sensitive data, yield figures are never released by manufacturers. 
While information can be found relative to the type of faults exhibited by 
devices [Xi103], the relative occurrence is unknown. In order to quantify the 
extent of the yield losses relative to the metal layers alone, a yield analysis 
framework has been developed. Based on established techniques and applied 
to the metal layers of FPGA device, the framework has been used to prove 
that yield losses deriving from the metal layers are a cause of concern. 
172 
Using the data provided by the SIA roadmap, it has been possible to further 
improve the framework to enable yield predictions as technology scales down 
to more advanced technology nodes. The results show that the growth in size 
of FPGAs is going to be limited by significant yield losses. If current size 
growth trends are to be maintained some form of fault tolerance has to be 
introduced. 
Both major FPGA manufacturers have introduced partial remedies to the 
problem. Altera has used logic redundancy in its product for a number of 
generations, and it claims that the yield benefits are significant. The claim 
has been verified using the yield analysis framework developed, and proved to 
be correct. The type of redundancy used, however, is not well suited to island 
style FPGAs, and is limited to coping with faults in the logic and local routing 
only. 
Xilinx has proposed a totally different approach to fault tolerance, which 
has enabled the company to enter larger volume markets as a result. The 
program, called Easypath, offers customers one-time programmable devices 
structurally identical to standard FPGAs, which are only guaranteed to work 
with a pre-designated bitstream. The need to only provide a limited number 
of functional resources enables devices which exhibit a fault in an unused part 
of the chip to be offered to customers; higher yields and reduced test time 
contribute to lower overall costs. The yield benefits achieved by this type of 
approach have been analysed using the developed yield analysis framework, 
even though the analysis is limited to faults in the metal layers. It has been 
173 
shown that, while significant advantages can be achieved with current technol-
ogy, it is a strategy that is likely to provide diminishing advantages in future 
technology nodes. 
The results obtained from the yield analysis have enabled the development 
of a targeted fault tolerance scheme. The fault tolerance work presented in 
this thesis has been divided into two parts: fault location and fault avoidance. 
Fault location is provided by a Built-In Self Test procedure, which enables fast 
and precise fault location, a procedure which can be run at device power-up 
time to provide the device with knowledge of the fault. 
The fault avoidance is achieved by exploiting both the high regularity 
and reconfiguration properties of FPGA devices. The method proposes to 
re-evaluate the routing architecture of FPGA devices to include fault toler-
ance as a measuring parameter as well as performance and routability. The 
fault avoidance technique has been shown to guarantee function under the 
presence of one or more fault at minimal area and timing costs. Area overhead 
is shown to be as little as 4.5% while timing degradation is contained within 
8.5% in the worst-case scenario. 
8.2 Future work 
FPGA fault tolerance is a vastly unexplored area of research. In terms of 
dynamic fault tolerance, not covered in this thesis, some work is being explored, 
but the applications are rare [Xila]. Perhaps, as FPGAs are more often utilized 
174 
in aerospace applications, tolerance so Single Event Upsets will spur more 
research in the field; however it seems to be the case that these faults are 
dealt with by shielding the device using better packaging techniques, as was 
achieved for the FPGAs devices used by NASA in their Mars Rover [Xila] 
In terms of static fault tolerance, i.e. manufacturing yield improvement, 
the biggest obstacle to research is represented by the type of information re-
quired to carry out the work. Being very sensitive data, manufacturers are 
not willing to share yield figures. This thesis forms a preliminary analysis of 
the area, and is based on a rather large number of assumptions not verifi-
able without a manufacturer's support. This is particularly true for the yield 
analysis framework presented. This was based on figures taken from the SIA 
roadmap for semiconductors, which many engineers from the industry believe 
to be quite a way off the mark. Unfortunately, the SIA roadmap is, for aca-
demics, the only resource available for the latest technology: partnership with 
a manufacturer thus becomes the only way to reasonably achieve better results 
in this field. Notwithstanding, the yield analysis framework presented provides 
researchers in the field of FPGA fault tolerance with a base to compare the 
yield benefits achieved by fault tolerance schemes. 
The yield analysis framework has been limited for the purpose of this study 
to the metal layers of FPGA devices. For a complete yield analysis, however, 
the logic layers would need to be considered as well. Modeling the logic layer 
of any electronic device can prove to be very difficult, due to the complexity of 
the structures present. This should however not discourage researchers in the 
175 
field: the literature available on the subject and the numerous patents filed by 
electronics firms should enable researchers to formulate a reliable yield analysis 
model, and perhaps discover different requirements for tolerating faults in the 
logic layers of FPGA device. 
The work presented in this thesis is based on homogeneous FPGA, which 
are no longer the industry's standard. While the general routing matrix in 
heterogeneous FPGA is consistent across the device, the local routing is likely 
to be different for each hard IP core embedded in the fabric. There exists 
therefore an opportunity for future research to expand fault tolerance to deal 
with defect in the local routing of heterogenous FPGAs. 
But perhaps the most challenging area of research is in fault localization 
and swapping. Many issues remain unresolved in this area, most notably 
how to determine whether a faulty device can be repaired or not. Packaging is 
expensive, therefore it is beneficial to find out whether a device can be repaired 
before it is packaged. This is the field where the most improvements can be 
achieved. At present, no academic work has been carried out on Back End 
of Line (BEOL) testing. There exists therefore scope for innovative thinking. 
Current BEOL testing techniques are aimed at only identifying faulty devices, 
in the least possible amount of time. Determining whether the fault can be 
repaired required fault diagnosis, and in most cases very high precision is a 
must. So how can a trade-off be found between the need to establish the nature 
of a fault and the need to keep a die under test for as short period of time as 
possible? 
176 
The actual swapping procedure can itself be expensive; many memory re-
dundancy schemes are based on fuse blowing, which is both area and time 
consuming. This approach is unfeasible for high performance logic devices. 
The author believes that the approach presented in this thesis, based on re-
configuring the bitstream at download time, is the most viable method of 
substitution. It does however require modification to the bitstream controller 
and it introduces extra latency during the power-up sequence. These costs are 
far outweighed by the benefits achieved as a result of using this approach, how-
ever in the fast paced world of electronics every shred of performance is crucial; 
future research should concentrate on ensuring that the users are completely 
unaware of any possible defects on the device. This target is only achievable if 
the fault avoidance is done completely by the manufacturer: this would result 
in perhaps more expensive procedure, however it is crucial that the users are 
not affected in the slightest by fault repairing techniques. 
8.3 The future of FPGA fault tolerance 
Some researchers and engineers are sceptical about the real need for fault 
tolerance in FPGA devices; their argument is often based on the fact that 
the efficiency of clean rooms is bound to improve and as a result so will the 
manufacturing process. Some argue that the yield problem has been branded 
as a real issue for a number of generations and, so far, it has proved not to be 
a problem. These arguments may be true: the limitations of the yield analysis 
177 
framework presented in this thesis have been outlined very clearly. However it 
is unreasonable to argue that current yields are at an acceptable level. Wafer 
comprising large FPGA dies often only yield one or two devices, especially 
during the early stages of the product's development cycle. It is therefore 
important that work in the field of fault tolerance continues. Full scale yields 
will probably never be achieved, but large improvements are possible, and if a 
suitable fault tolerance scheme is developed, these enhancements could come 
at very low cost, benefiting the users as much as the manufacturers. 
Work in the field of fault tolerance is bound to continue, and it is likely that 
it will go beyond static catastrophic faults. Dynamic faults have been briefly 
mentioned in previous sections, however one area that the author believes will 
become extremely important is that of parametric variation. In Chapter 3 
parametric defects were introduced, and it was explained how the modeling 
of such defects is rather complex and goes beyond the scope of this thesis. 
However, this only referred to wafer-wide parametric defects, and an efficient 
solution to this problem is already present in the form of "speed binning". 
Recently, however, large dies have began to exhibit within-die parametric vari-
ations. This can seriously hinder a device's performance, as the device itself 
will only be allowed to operate at a maximum clock speed restricted by the 
less performing areas on the die. 
These issues could be dealt with in programmable devices: it is reason-
able to think that a large FPGA device could be programmed to operate as 
a collection of separate and independent sub-systems, operating at different 
178 
speeds. It could therefore be possible to place these sub-systems on the die in 
suitable device regions. Once again, the problem of device "uniqueness" could 
pose a real threat to any real life application, but it is conceivable to think of 
a solution to place these entities on the die during, perhaps, the power-up se-
quence. Other, more sophisticated solutions could perhaps exploit "dynamic 
parallelism", where system elements could be designed at different levels of 
parallelism , one of which will be chose to ensure a constant throughput across 
the device. The problem is real and has only become of age in the deep sub-
micron device era, therefore there exists scope for much research. 
The field of FPGA fault tolerance is vast: it covers dynamic and static 
faults, catastrophic and parametric defects, interconnect and logic layers. This 
thesis represents the first comprehensive work to study yield analysis and fault 
tolerance limited to the FPGA interconnect layers. All the major aspects of 
this field of study have been examined, and it is hoped that the work will 
encourage more research in this underexplored field. 
179 
180 
Glossary and Acronyms 
ATE 	 Automatic Test Equipment 
BIST 	 Built In Self Test 
BUT 	 Block Under Test 
CAD 	 Computer Aided Design 
CLB 	 Configurable Logic Block 
Connection Block A configurable interconnect resource; it connects logic 
resources to routing wires 
CPU 	 Central Processing Unit 
Dynamic Fault 	Circuit fault arising during the lifetime of the device; it 
can be repaired with a re-program. 
ILA 	 Iterative Logic Array 
IOB 	 Input/Output Block 
IP 	 Intellectual Property 
FPGA 	 Field Programmable Gate Array 
Fault Diagnosis 	Testing procedure aimed at determining whether a fault 
is present in the device 
Fault Diagnosis 	Testing procedure aimed at locating the fault and 
181 
establishing the causes of failure 
Hard IP 	Synthesized IP captured in low level format that is specific 
to a manufacturing process 
LUT 	 Look Up Table 
ORA 	 Output Response Analyzer 
RTR 	 Run-Time Reconfiguration 
SoC 	 System-on-a-Chip. Complex hardware/software systems 
implemented in a single chip 
Soft IP 	Non-synthesized generic and platform independent 
source code 
STAR, 	Self Testing Area 
Static Fault 	Circuit fault due to manufacturing imperfections; 
cannot be repaired, only avoided. 
Switch matrix A configurable interconnect resource; it connects routing 
wires within the general routing network 
TPG 	 Test Pattern Generator 
VHDL 	Very high speed Hardware Description Language 
VLSI 	 Very Large Scale Integration 
WUT 	 Wire Under Test 
182 
List of Publications 
Published 
• Nicola Campregher, Peter Y. K. Cheung, Milan Vasilko: BIST Based 
Interconnect Fault Location for FPGAs. FPL 2004: 322-332 
• Nicola Campregher, Peter Y. K. Cheung, George A. Constantinides, Mi-
lan Vasilko: Analysis of Yield Loss due to Random Photolithographic 
Defects in the Interconnect Structure of FPGAs. FPGA 2005: 138-148 
• Nicola Campregher, Peter Y. K. Cheung, George A. Constantinides, Mi-
lan Vasilko: Yield Modelling and Yield Enhancement for FPGAs using 
Fault Tolerance Schemes. FPL 2005: 409-414 
• Nicola Campregher: FPGA Interconnect Fault Tolerance. FPL 2005: 
725-726 
• Nicola Campregher, Peter Y. K. Cheung, George A. Constantinides, Mi-
lan Vasilko: Yield Enhancements of Design-Specific FPGAs. FPGA 
2006: 93-100 
183 
• Nicola Campregher, Peter Y. K. Cheung, George A. Constantinides, Mi-
lan Vasilko: Reconfiguration and Fine-Grained Redundancy for Fault 
Tolerance in FPGAs. To appear in: FPL 2006. 
Submitted for Publication 
• Nicola Campregher, Peter Y. K. Cheung, George A. Constantinides, Mi-
lan Vasilko: Yield Analysis of Field Programmable Gate Array Metal 
Layers. Submitted to: Transaction on Computer-Aided Design. 
• Nicola Campregher, Peter Y. K. Cheung, George A. Constantinides, Mi-
lan Vasilko: FPGA Interconnect Fault Tolerance via Reconfiguration 
and Fine-Grain Redundancy. Submitted to: Transaction on Circuits and 
Systems. 
184 
Bibliography 
[AARCS91] K. Al-Ayat, R.Chan, C.L. Chan, and T. Spreed. Array archi-
tecture for ATPG with 100% fault coverage. In Defect, Fault 
Tolerance VLSI Systems, pages 213-226, 1991. 
[ABF90] 	Miron Abramovici, Melvin A. Breuer, and Arthur D. Friedman. 
Digital systems testing and testable design. IEEE Press, Piscat-
away, NJ, 1990. 
[Act05] 	Actel Corp. ProAsic+ Datasheet. 2005. 
[AES01] 	M. Abramovici, J. M. Emmert, and C. E. Stroud. Roving 
stars: An integrated approach to on-line testing, diagnosis, and 
fault tolerance for FPGAs in adaptive computing systems. In 
NASA/DoD workshop on evolvable hardware, pages 73-92, Long 
Beach, CA, 2001. 
[Alta] 	Altera Corp. http://www.altera.com. 
185 
[Altb] 	Altera Corp. 	Press release. 	Available at 
"http : //www. altera. com/ co rporate/news_room /releas es / 
releases_archive/2000/pr_redundancy. html". 
[A1t03] 	Altera Corp. Stratix Datasheet. 2003. 
[A1t04] 	Altera Corp. Stratix 2 Datasheet. 2004. 
[Alt05] 	Altera Corp. Hardcopy 2 Datasheet. 2005. 
[APS02] 	A. Antola, V. Piuri, and M. Sami. On-line diagnosis and re-
configuration of FPGA systems. In International workshop 
on electronic design, test and applications, pages 291-296, 
Christchurch, New Zealand, 2002. 
[ASO1] 	M. Abramovici and C. E. Stroud. BIST-based test and diagnosis 
of FPGA logic blocks. IEEE Transactions on Very Large Scale 
Integration Systems, 9(1):159-172, 2001. 
[AW97] 	G.A. Allan and A.J. Walton. Automated redundant via place-
ment for increased yield and reliability. Proceedings of the SPIE 
- The International Society for Optical Engineering Microelec-
tronic Manufacturing Yield, Reliability, and Failure Analysis III, 
1-2 Oct. 1997, 3216:114-25, 1997. 
[AWH92] 	G.A. Allan, A.J. Walton, and R.J. Holwill. A yield improve-
ment technique for IC layout using local design rules. IEEE 
186 
Transactions on Computer-Aided Design of Integrated Circuits 
and Systems, 11(11):1355-62, 1992. 
[Bet98] 	V. Betz. Architecture and CAD for Speed and Area Optimization 
of FPGAs. PhD Thesis, University of Toronto, 1998. 
[BFV92] 	S. D. Brown, R.J. Francis, and Z. G. Vranesic. Field Pro-
grammable Gate Arrays. Kluwer Academic Publishers, Norwell, 
Ma, 1992. 
[BHKWOO] T. Bartzick, M. Henze, J. Kickler, and K. Woska. Design of a 
fault tolerant FPGA. In Proceedings of FPL 2000. 10th Inter-
national Conference on Field Programmable Logic and Applica-
tions, 27-30 Aug. 2000, pages 151-156, Villach, Austria, 2000. 
Springer-Verlag. 
[BM96] 	C. Bamji and E. Malavasi. Enhanced network flow algorithm for 
yield optimization. In Proceedings of 33rd Design Automation 
Conference, 3-7 June 1996, pages 746-51, Las Vegas, NV, USA, 
1996. 
[BR97] 	V. Betz and J. Rose. VPR: a new packing, placement and rout-
ing tool for FPGA research. In Field-programmable Logic and 
Applications. 7th International Workshop, FPL '97. Proceedings, 
pages 213 — 222, 1997. 
187 
[CAC+97] 	W.B. Culbertson, R. Amerson, R.J. Carter, P. Kuekes, and 
G. Snider. Defect tolerance on the Teramac custom computer. 
In Proceedings of The 5th Annual IEEE Symposium on Field-
Programmable Custom Computing Machines, 16-18 April 1997, 
pages 116-23, Napa Valley, CA, USA, 1997. IEEE Comput. Soc. 
[CCCV05a] N. Campregher, P.Y.K. Cheung, G.A. Constantinides, and 
M. Vasilko. Analysis of Yield Loss due to Random Pho-
tolithographic Defects in the Interconnect Structure of FP-
GAs. In Thirteenth ACM International Symposium on Field-
Programmable Gate Arrays, Monterey, CA, 2005. 
[CCCV05b] N. Campregher, P.Y.K. Cheung, G.A. Constantinides, and 
M. Vasilko. Yield modelling and yield enhancement for FP-
GAs using fault tolerance schemes. In Proceedings of FPL 2005. 
15th International Conference on Field Programmable Logic and 
Applications, pages 409-414, Tampere, Finland, 2005. 
[CCCV06a] N. Campregher, P. Y. K. Cheung, G.A. Constantinides, and 
M. Vasilko. Reconfiguration and Fine-Grained Redundancy for 
Fault Tolerance in FPGAs. To appear in Proceedings of Field-
Programmable Logic and Applications, 2006. 
[CCCV0613] N. Campregher, P.Y.K. Cheung, G.A. Constantinides, and 
M. Vasilko. Yield Enhancements of Design-Specific FPGA. 
188 
In Fourteenth ACM International Symposium on Field- 
Programmable Gate Arrays, Monterey, CA, 2006. 
[CCV04] 	N. Campregher, P. Y. K. Cheung, and M. Vasilko. BIST based 
interconnect fault location for FPGAs. In Proceedings, Field-
Programmable Logic and Applications, pages 322-332, 2004. 
[Cha96] 	G. Dufort B. Chapman. Making defect avoidance nearly invisible 
to the user in wafer scale field programmable gate arrays. In 
Defect and fault tolerance in VLSI systems, pages 11-20. 
[CK92] 	V.K.R. Chiluvuri and I. Koren. New routing and compaction 
strategies for yield enhancement. In Proceedings 1992 IEEE In-
ternational Workshop on Defect and Fault Tolerance in VLSI 
Systems, 4-6 Nov. 1992, pages 325-34, Dallas, TX, USA, 1992. 
IEEE Comput. Soc. Press. 
[CK95a] 	Zhan Chen and I. Koren. Layer assignment for yield enhance-
ment. In Proceedings of International Workshop on Defect and 
Fault Tolerance in VLSI, 13-15 Nov. 1995, Proceedings the 
IEEE International Workshop on Defect and Fault Tolerance in 
VLSI Systems, pages 173-80, Lafayette, LA, USA, 1995. IEEE 
Comput. Soc. Press. 
[CK95b] 	V.K.R. Chiluvuri and I. Koren. Layout-synthesis techniques for 
yield enhancement. IEEE Transactions on Semiconductor Man- 
189 
ufacturing, 8(2):178-87, 1995. 
[CK96] 	V. K. Chiluvuri and I. Koren. Wire length and via reduction for 
yield enhancement. In Microelectronic Manufacturing Yield, Re-
liability, and Failure Analysis II, Oct 16-17 1996, volume 2874, 
pages 103-111, Austin, TX, USA, 1996. 
[CLL1 	Micheal Chan, Paul Leventis, David Lewis, Ketan Zaveri, 
Hyun Mo Yi, and Chris Lane. Redundancy structures and meth-
ods in a programmable logic device, US patent application no. 
U32005/0264318 Al. Altera Corporation, Dec. 1, 2005. 
[C093] 	G. Cheek and G. O'Donoghue. Yield models in a design for 
manufacturability environment: a bibliography. In Semicon-
ductor Manufacturing Science Symposium, 1993. ISMSS 1993., 
IEEE/SEMI International, pages 133-135, 1993. 
[DI99a] 	A. Doumar and H. Ito. An automatic testing and diagnosis for 
FPGAs. In Pacific Rim international symposium on dependable 
computing, pages 45-52, Hong Kong, China, 1999. IEEE Com-
puter Society. 
[DI99b] 	A. Doumar and H. Ito. Testing the logic cells and interconnect 
resources for FPGAs. In Asian test symposium, pages 369-374, 
Shanghai; China, 1999. Ieee. 
190 
[DI03] 	A. Doumar and H. Ito. Detecting. diagnosing, and tolerating 
faults in SRAM-based Field Programmable Gate Arrays: A sur-
vey. IEEE Transactions on Very Large Scale Integration Sys-
tems, 11(3):386-405, 2003. 
[DKI99] 	A. Doumar, S. Kaneko, and H. Ito. Defect and fault tolerance 
FPGAs by shifting the configuration data. In Defect and fault 
tolerance in VLSI systems, pages 377-385, Albuquerque; NM, 
1999. IEEE Computer Society. 
[dLKNH+04] F.G. de Lima Kastensmidt, G. Neuberger, R.F. Hentschke, 
L. Carro, and R. Reis. Designing fault-tolerant techniques for 
SRAM-based FPGAs. Design & Test of Computers, IEEE, 
21(6):552-562, 2004. 
[DP94] 	S. Durand and C. Piguet. FPGA with self repair capabilities. 
In ACM Second International Workshop on Field-Programmable 
Gate Arrays, pages 1-6, 1994. 
[DST99] 	S. Dutt, V. Shanmugavel, and S. Trimberger. Efficient incre-
mental rerouting for fault reconfiguration in field programmable 
gate arrays. In 1999 IEEE/ACM International Conference on 
Computer-Aided Design. Digest of Technical Papers, 7-11 Nov. 
1999, pages 173-6, San Jose, CA, USA, 1999. IEEE. 
191 
[DU90] 	M. Demjanenko and S.J. Upadhyaya. Yield enhancement of 
field programmable logic arrays by inherent component redun-
dancy. IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems, 9(8):876-84, 1990. 
[EB97] J. M. Emmert and D. Bhatia. Partial reconfiguration of FPGA 
mapped designs with applications to fault tolerance and yield 
enhancement. In Field Programmable Logic and its Application, 
Lecture Notes in Computer Science, vol.1304, pages 141-150, 
1997. 
[EB98] J.M. Emmert and D. Bhatia. Incremental routing in FPGAs. 
In ASIC Conference 1998. Proceedings. Eleventh Annual IEEE 
International, pages 217-221, 1998. 
[EC01] J.M. Emmert and J.A. Cheatham. On-line incremental routing 
for interconnect fault tolerance in FPGAs minus the router. In 
Proceedings 2001 IEEE International Symposium on Defect and 
Fault Tolerance in VLSI Systems, 24-26 Oct. 2001, pages 149-
57, San Francisco, CA, USA, 2001. IEEE Comput. Soc. 
[E1m48] 	W.C. Elmore. The transient response of damped linear network 
with particular regard to wide-band amplifier. Journal of Applied 
Physics, 19:55-63, 1948. 
192 
[ESSA00] 	J. Emmert, C. Stroud, B. Skaggs, and M. Abramovici. Dy-
namic fault tolerance in FPGAs via partial reconfiguration. In 
Proceedings 2000 IEEE Symposium on Field-Programmable Cus-
tom Computing Machines, 17-19 April 2000, pages 165-74, Napa 
Valley, CA, USA, 2000. 
[FCF90] 	Sung-Chuan Fang, Kuo-En Chang, and Wu-Shiung Feng. Via 
minimization with associated constraints in three-layer routing 
problem. In 1990 IEEE International Symposium on Circuits 
and Systems, 1-3 May 1990, pages 1632-5, New Orleans, LA, 
USA, 1990. 
[FP85] 	A.V. Ferris-Prabhu. Modeling the critical area in yield forecasts. 
Solid-State Circuits, IEEE Journal of, 20(4):874-878, 1985. 
[FP92a] 	Albert V. Ferris-Prabhu. Introduction to semiconductor device 
yield modeling. Artech House materials science library. Artech 
House, Boston, 1992. 
[FP92b] 	A.V. Ferris-Prabhu. On the assumptions contained in semicon-
ductor yield models. Computer-Aided Design of Integrated Cir-
cuits and Systems, IEEE Transactions on, 11(8):966-975, 1992. 
[Gar79] 	R. Garg. Characteristics of coupled microstriplines. IEEE Trans-
actions on Microwave Theory and Techniques, MTT-27(7):700-
5, 1979. 
193 
[GW99] 	Y. Gao and D.F. Wong. Optimal wire shape with considera-
tion of coupling capacitance under elmore delay model. In Pro-
ceedings of the ASP-DAC '99, Asia and South Pacific Design 
Automation Conference, pages 217-220 vol.1, 1999. 
[HC90] 	N. Hastie and R. Cliff. The implementation of hardware sub-
routines on field programmable gate arrays. In Proceedings of 
the IEEE Custom Integrated Circuits Conference, 1990., pages 
31.4/1-31.4/4, 1990. 
[HD96] 	F. Hancheck and S. Dutt. Node-covering based defect and fault 
tolerance methods for increased yield in FPGAs. In VLSI design, 
pages 225-229, Bangalore; India, 1996. IEEE Computer Society 
Press. 
[HD98] 	F. Hanchek and S. Dutt. Methodologies for tolerating cell and 
interconnect faults in FPGAs. IEEE Transactions on Computers 
C, 47(1):15-33, 1998. 
[I-IL96] 	W. K. Huang and F. Lombardi. An approach for testing pro-
grammable/configurable Field Programmable Gate Arrays. In 
VLSI test symposium, pages 450-455, Princeton; NJ, 1996. IEEE 
Computer Society Press. 
[HMO1] 	W.J. Huang and E.J. McCluskey. Column-based precompiled 
configuration techniques for FPGA fault tolerance. In Proceed- 
194 
ings of 9th Annual IEEE Symposium on Field-Programmable 
Custom Computing Machines, 2001. 
[HMCL98] W. K. Huang, F. J. Meyer, X. T. Chen, and F. Lombardi. Testing 
configurable LUT-based FPGA's. IEEE Transactions on Very 
Large Scale Integration Systems, 6(2):276-283, 1998. 
[HSN+93] 	F. Hatori, T. Sakurai, K. Nogami, K. Sawada, M. Takahashi, 
M. Ichida, M. Uchida, I. Yoshii, Y. Kawahara, T. Hibi, Y. Saeki, 
H. Muroga, A. Tanaka, and K. Kanzaki. Introducing redundancy 
in field programmable gate arrays. In Custom Integrated Circuits 
Conference, 1993., Proceedings of the IEEE 1993, pages 7.1.1-
7.1.4, 1993. 
[HT00] 	I. G. Harris and R. Tessier. Interconnect testing in cluster-based 
FPGA architectures. In Design Automation Conference, num-
ber 37, pages 49-54, 2000. 
[HT02] 	I.G. Harris and R. Tessier. Testing and diagnosis of interconnect 
faults in cluster-based FPGA architectures. Computer-Aided De-
sign of Integrated Circuits and Systems, IEEE Transactions on, 
21(11):1337-1343, 2002. 
[HTA94] 	N.J. Howard, A.M. Tyrrell, and N.M. Allinson. The yield en-
hancement of field-programmable gate arrays. IEEE Transac- 
195 
tions on Very Large Scale Integration (VLSI) Systems, 2(1):115-
23, 1994. 
[HWO5] 	Z. Hyder and J. Wawrzynek. Defect tolerance in multiple-FPGA 
systems. In Field Programmable Logic and Applications, 2005. 
International Conference on, pages 247-254, 2005. 
[HXJ95] 	E. P. Huijbregts, H. Xue, and J.A.G. Jess. Routing for reliable 
manufacturing. IEEE Transactions on Semiconductor Manufac-
turing, 8(2):188-194, 1995. 
[IFM+95] 	T. Inoue, H. Fujiwara, H. Michinishi, T. Yokohira, and 
T. Okamoto. Universal test complexity of Field-Programmable 
Gate Arrays. In 4th Asian test symposium, pages 119-125, Ban-
galore, India, 1995. IEEE. 
[IMF97] 	T. Inoue, S. Miyazaki, and H. Fujiwara. On the complexity of 
universal fault diagnosis for look-up table FPGAs. In Asian test 
symposium, pages 195-200, Akita, Japan, 1997. IEEE. 
[IMF98] 	T. Inoue, S. Miyazaki, and H. Fujiwara. Universal fault diagnosis 
for lookup table FPGAs. IEEE Design and Test of Computers, 
15(1):39-44, 1998. 
[KDFJ89] 	Vijay Koumar, Anton Dahbura, Fred Fischer, and Patrick Juola. 
An approach for the yield enhacement of programmable gate 
196 
arrays. In IEEE International Conference on Computer Aided 
Design, pages 226-229, 1989. 
[KI94a] 	Jason L Kelly and Peter A Ivey. A novel approach to defect tol-
erant design for SRAM based FPGAs. In ACM Second Interna-
tional Workshop on Field-Programmable Gate Arrays, Monterey, 
1994. 
[KI94b] 	J.L. Kelly and P.A. Ivey. Defect tolerant SRAM based FPGAs. 
In Proceedings 1994 IEEE International Conference on Com-
puter Design: VLSI in Computers and Processors, 10-12 Oct. 
1994, pages 479-82, 1994. 
[KRHOO] 	A. Keshavarzi, K. Roy, and C.F. Hawkins. Intrinsic leakage 
in deep submicron cmos ics-measurement-based test solutions. 
Very Large Scale Integration (VLSI) Systems, IEEE Transac-
tions on, 8(6):717-723, 2000. 
[Kuo] 
	
S.-Y. Kuo. Yor: a yield-optimizing routing algorithm by mini-
mizing critical areas and vias. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 12(9):1303-11. 
[Lat06a] 	Lattice Semi. EC/ECP Datasheet. 2006. 
[Lat061)] 	Lattice Semi. XP Datasheet. 2006. 
[LLD03] 	J. Lohn, G. Larchev, and R. DeMara. Evolutionary fault re-
covery in a Virtex FPGA using a representation that incorpo- 
197 
rates routing. In International Parallel and Distributed Process-
ing Symposium (IPDPS 2003), 22-26 April 2003, Nice, France, 
2003. 
[LMSP98a] J. Lach, W. H. Mangione-Smith, and M. Potkonjak. Low over-
head fault-tolerant FPGA systems. IEEE Transactions on Very 
Large Scale Integration Systems, 6(2):212-221, 1998. 
[LMSP98b] J. Lach, W.H. Mangione-Smith, and M. Potkonjak. Efficiently 
supporting fault-tolerance in FPGAs. In Proceedings of FPGA 
98. 1998 International Symposium on Field Programmable Gate 
Arrays, pages 105-15, Monterey, CA, USA, 1998. 
[LMSP99] 	J. Lach, W. Mangione-Smith, and M. Potkonjak. Algorithms for 
efficient runtime fault recovery on diverse FPGA architectures. 
In Defect and fault tolerance in VLSI systems, pages 386-394, 
Albuquerque; NM, 1999. IEEE Computer Society. 
[LSO3] 
[LT00] 
J. Liu and S. Simmons. BIST-diagnosis of interconnect fault 
locations in FPGAs. In Canadian Conference on Electrical and 
Computer Engineering, 2003., volume 1, pages 207-210, 2003. 
V. Lakamraju and R. Tessier. Tolerating operational faults in 
cluster-based FPGAs. In Field programmable gate arrays; FPGA 
'00 ACM/SIGDA, pages 187-194, Monterey, CA, 2000. Acm. 
198 
[Mak02] 	T. Makimoto. The hot decade of field programmable tech- 
nologies. 	2002 IEEE International Conference on Field-
Programmable Technology (FPT), pages 3 — 6, 2002. 
[MCZL97] 	F.J. Meyer, Xiaotao Chen, J. Zhao, and F. Lombardi. Fault 
tolerance of one-time programmable FPGAs with faulty routing 
resources. In 1997 Proceedings Second Annual IEEE Interna-
tional Conference on Innovative Systems in Silicon, 8-10 Oct. 
1997, pages 155-64, Austin, TX, USA, 1997. 
[MD99] 	N.R. Mahapatra and S. Dutt. Efficient network-flow based tech-
niques for dynamic fault reconfiguration in FPGAs. In Pro-
ceedings of the 29th Annual International Symposium on Fault-
Tolerant Computing, 15-18 June 1999, pages 122-9, Madison, 
WI, USA, 1999. 
[Mis04] 	Mahim Mishra. Scalable defect tolerance beyond the sia 
roadmap. In Field Programmable logic and its Applications, Lec-
ture Notes in Computer Science vol. 3203, 2004. 
[ML96] 	A. Mathur and C.L. Liu. Timing driven placement reconfig-
uration for fault tolerance and yield enhancement in FPGAs. 
In Proceedings of European Design and Test Conference, 11-14 
March 1996, pages 165-9, Paris, France, 1996. 
199 
[MMR99] 	R. Mangaser, C. Mark, and K. Rose. Interconnect constraints 
on BEOL manufacturing. In Advanced Semiconductor Manu-
facturing Conference and Workshop, 1999 IEEE/SEMI, pages 
304-308, 1999. 
[MSS+96] 	G.A. Mojoli, D. Salvi, M.G. Sami, G.R. Sechi, and R. Stefanelli. 
KITE: a behavioural approach to fault-tolerance in FPGA-based 
systems. In Proceedings. 1996 IEEE International Symposium 
on Defect and Fault Tolerance in VLSI Systems, pages 327-34, 
1996. 
[Mur64] 	B.T. Murphy. Cost-size optima of monolithic integrated circuits. 
Proceedings of the IEEE, 52(12):1537-1545, 1964. 
[Mye61] 	J. Myer. A survey of semiconductor materials technology. IRE 
Transactions on Component Parts , 8(2):65-69, 1961. 
[MY0+99] H. Michinishi, T. Yokohira, T. Okamoto, T. Inoue, and H. Fuji-
wara. Testing for the programming circuit of SRAM-based FP-
GAs. IEICE Transactions on Information and Systems E Series 
D, 82(6):1051-1057, 1999. 
[MY0I96] 	H. Michinishi, T. Yokohira, T. Okamoto, and T. Inoue. A test 
methodology for interconnect structures of LUT-based FPGAs. 
In Asian test symposium, pages 68-74, Hsinchu; Taiwan, 1996. 
200 
[NNJ02] 	M.Y. Niamat, R. Nambiar, and M.M. Jamali. A BIST scheme 
for testing the interconnects of SRAM-based FPGAs. In The 
45th Midwest Symposium on Circuits and Systems, volume 2, 
pages 11-41-4 vol.2, 2002. 
[NNRD94] J. Narasimham, K. Nakajima, C.S. Rim, and A.T. Dahbura. 
Yield enhancement of programmable ASIC arrays by reconfigu-
ration of circuit placements. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 13(8):976-86, 
1994. 
[NZJ] 
	
Triet Nguyen, Changsong Zhang, and David Jefferson. Line Seg-
mentation in Programmable Logic Devices having Redundnacy 
Circuitry, US patent no. 6759817 B2. Altera Corporation, issued 
July 6, 2004. 
[P40C] 	7 GHz Pentium - How much can a processor take? Available at 
"http://news.softpedia.com/news/7GHz-Pentium-6202.shtml" 
[Pos58] 	A. Postle. Problems in manufacturing component parts for au-
tomation. Production Techniques, IRE Transactions on, 3(1):9-
10, 1958. 
[RA04] 	M. Rencher and G. Allan. What's yield got to do with IC de-
sign?, available at "http://i.cmpnet.com/eedesign/  
2003/inside_eedesign7.pdf". 
201 
[Raj95] 	Rochit Rajsuman. IDDQ testing for CMOS VLSI. Artech House 
optoelectronics library. Artech House, Boston, 1995. 
[RFZ97] 	M. Renovell, J. Figueras, and Y. Zorian. Test of RAM-based 
FPGA: methodology and application to the interconnect. In 
15th IEEE VLSI Test Symposium, 1997, pages 230-237, 1997. 
[RMLP] 	Srinivas T. Reddy, Manuel Mejia, Andy L. Lee, and Bruce B. 
Pedersen. Programmable logic device with redundany circuit, 
US patent no. 6344755 B1, issued February 5, 2002. 
[RN95] 	K. Roy and S. Nag. On routability for FPGAs under faulty 
conditions. IEEE Transactions on Computers , 44(11):1296-
1305, 1995. 
[RPFZ97] 	M. Renovell, J. M. Portal, J. Figueras, and Y. Zorian. Test 
pattern and test configuration generation methodology for the 
logic of RAM-based FPGA. In Asian test symposium, pages 
254-261, Akita; Japan, 1997. IEEE. 
[RPFZ98] 	M. Renovell, J. M. Portal, J. Figueras, and Y. Zorian. Testing 
the local interconnect resources of SRAM-based FPGA's. In 
Asian test symposium, pages 513-520, Singapore, 1998. 
[RPFZ99] 	M. Renovell, J. Portal, J. Figueras, and Y. Zorian. Minimizing 
the number of test configurations for different FPGA families. 
202 
In Asian test symposium, pages 363-368, Shanghai; China, 1999. 
IEEE. 
[See68] 	R.B. Seeds. Yield and cost analysis of bipolar LSI. IEEE Trans-
actions on Electron Devices, 15(6):409, 1968. 
[Sem04] 	Semiconductor Industry Association. The international roadmap 
for semiconductor, 2004. 
[SG01] 	P. Sundararajan and S. A. Guccione. Run-time defect toler-
ance using JBits. In Field programmable gate arrays; FPGA 
'01, ACM/SIGDA, pages 193-200, Monterey, CA, 2001. ACM. 
[Sim01] 	P.L.C. Simon. Yield Modeling for Deep Sub-Micron IC Design. 
Phd thesis, University of Eindhoven, 2001. 
[SISM99] 	H. Sato, M. Ikota, A. Sugimoto, and H. Masuda. A new defect 
distribution metrology with a consistent discrete exponential for-
mula and its applications. IEEE Transactions on Semiconductor 
Manufacturing, 12(4):409-418, 1999. 
[SKCA96] 	C. Stroud, S. Konala, P. Chen, and M. Abramovici. Built-In Self-
Test of Logic Blocks in FPGAs (Finally, A Free Lunch: BIST 
Without Overhead!). In VLSI test symposium, pages 387-392, 
1996. 
203 
[SLA97] 	C. Stroud, E. Lee, and M. Abramovici. BIST-Based Diagnostics 
of FPGA Logic Blocks. In International test conference, pages 
539-547, Washington; DC, 1997. 
[SLKA96] 	C. Stroud, E. Lee, S. Konala, and M. Abramovici. Using ILA 
Testing for BIST in FPGAs. In International Test Conference, 
pages 68-75, 1996. 
[SMSP98] 	N. R. Shnidman, W. H. Mangione-Smith, and M. Potkonjak. 
On-line fault detection for bus-based Field Programmable Gate 
Arrays. IEEE Transactions on Very Large Scale Integration Sys-
tems, 6(4):656-666, 1998. 
[SS98] 	L. A. Shombert and J. W. Sheppard. A behavior model for 
next generation test systems. Journal of Electronic Testing, 
13(3):299-314, 1998. 
[Sta83] 	C.H. Stapper. Modeling of integrated circuit defect sensitivities. 
IBM Journal of Research and Development, 27:549-557, 1983. 
[Sta91] 	C.H. Stapper. On murphy's yield integral. IEEE Transactions 
onSemiconductor Manufacturing, 4(4294-297, 1991. 
[SWHA98] C. Stroud, S. Wijesuriya, C. Hamilton, and M. Abramovici. 
Built-in self-test of FPGA interconnect. International Test Con-
ference, pages 404-411, 1998. 
204 
[SXXT01] 	X. Sun, S. Xu, J. Xum, and P. Trouborst. Design and imple-
mentation of a parity-based BIST scheme for FPGA global in-
terconnects. In CCECE, 2001. 
[WT04] 	Kun-Cheng Wu and Yu-Wen Tsai. Structured ASIC, evolution 
or revolution? In Proceedings of the International Symposium 
on Physical Design, pages 103 — 106, 2004. 
[Xila] 
	
Xilinx Inc. Aerospace and Defense Design Challenges. Available 
at http://www.xilinx.com/products/silicon_solutions/market_  
specific_devices/aero_def/index.htm. 
[Xilb] 	Xilinx Inc. http://www.xilinx.com. 
[Xilc] 	Xilinx Inc. Xilinx Chips Land on Mars. Available at 
"http://www.xilinx.com/publications/xcellonline/  
xcell_50/xc_mars50.htm" 
[Xi103] 	Xilinx Inc. The Reliability Report. Sep. 2003. 
[Xi104a] 	Xilinx Inc. Virtex II Datasheet. 2004. 
[Xi10413] 	Xilinx Inc. Virtex II Pro Datasheet. 2004. 
[Xi105a] 	Xilinx Inc. EasyPath Devices Datasheet. 2005. 
[Xi10519] 	Xilinx Inc. Virtex 4 Datasheet. 2005. 
[Xi106] 	Xilinx Inc. Virtex 5 Datasheet. 2006. 
205 
[Yan91] 	S. Yang. Logic synthesis and optimization benchmarks, version 
3.0. Microelectronics Centre of North Carolina, 1991. 
[YL05] 	A.J. Yu and G.G.F. Lemieux. Defect tolerant FPGA switch 
block and connection block with fine-grain redundancy for 
yield enhancement. In Proceedings of FPL 2005. 15th Inter-
national Conference on Field Programmable Logic and Applica-
tions, pages 255-262, Tampere, Finland, 2005. 
[YM01] 	Shu-Yi Yu and E.J. McCluskey. Permanent fault repair for FP-
GAs with limited redundant area. In Proceedings 2001 IEEE In-
ternational Symposium on Defect and Fault Tolerance in VLSI 
Systems, pages 125-33, San Francisco, CA, USA, 2001. 
[ZWL98a] 	L. Zhao, D. Walker, and F. Lombardi. Detection of bridging 
faults in logic resources of configurable FPGAs using IDDQ. In 
International Test Conference, pages 1037-1046, 1998. 
[ZWL98b] 	L. Zhao, D. M. H. Walker, and F. Lombardi. IDDQ test-
ing of bridging faults in logic resources of reconfigurable field 
programmable gate arrays. IEEE Transactions on Computers, 
47(10):1136-1152, 1998. 
[ZWL99] 	L. Zhao, D. Walker, and F. Lombardi. IDDQ testing of in-
put/output resources of SRAM-based FPGAs. In Asian test 
symposium, pages 375-382, Shanghai; China, 1999. IEEE. 
206 
