Self-organising techniques for tolerating faults in 2-dimensional processor arrays by Evans, Richard Anthony
 warwick.ac.uk/lib-publications  
 
 
 
 
 
 
A Thesis Submitted for the Degree of PhD at the University of Warwick 
 
Permanent WRAP URL: 
http://wrap.warwick.ac.uk/109958  
 
Copyright and reuse:                     
This thesis is made available online and is protected by original copyright.  
Please scroll down to view the document itself.  
Please refer to the repository record for this item for information to help you to cite it. 
Our policy information is available from the repository home page.  
 
For more information, please contact the WRAP Team at: wrap@warwick.ac.uk  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
THE BRITISH LIBRARY DOCUM ENT SUPPLY CENTRE
Self-organising Techniques
TITLE for Tolerating Faults in
...........................I 2-Dimensional Processor Arrays
AUTHOR Richard Anthony Evans
INSTITUTION  
and DATE University of Warwick
Attention is drawn to the fact that the copyright of 
this thesis rests with its author.
This copy of the thesis has been supplied on condition 
that anyone who consults it is understood to recognise 
that its copyright rests with its author and that no 
information derived from it may be published without 
the author’s prior written consent.
THE BRITISH L IB R A R Y
D O C U M EN T SU PPLY C E N T R E  
Boston Spa. W etherby 
W est Yorkshire 
United Kingdom
20
R E D U C TIO N  X  !_________
g

Self-organising Techniques 
for Tolerating Faults in 
2-Dimensional Processor Arrays
by
Richard Anthony Evans 
BSc(Eng), ACGI, CEng, MIEE
PhD Thesis
Submitted to:
Department of Computer Science 
University of Warwick
From research carried out at:
The Royal Signals and Radar Establishment 
Malvern, Worcestershire
October 1988


Contents
List of Figures vii
List of Tables x
Summary xi
Acknowledgements xii
1 Introduction, Objectives and Overview 1
1.1 Introduction...................................................................................... 1
1.2 Thesis O bjectives............................................................................ 2
1.3 Overview of the Th esis..................................................................  3
2 The Evolution of Parallel Processing 5
2.1 The von Neumann Model of Com putation................................ 5
2.1.1 Technological Advance.*................................................... 6
2.1.2 The Need for High Performance Com puters................  7
2.1.3 Changing Attitudes to Computer Design......................  7
2.2 The Introduction of Parallelism................................................... 8
2.2.1 Classification o f Computer Architectures......................  9
2.2.2 Pipelined Parallel Architectures......................................  11
2.3 The Need for Fault Tolerance......................................................  14
2.3.1 Reliability Improvement................................................... 14
2.4 Wafer Scale In tegration ...............................................................  15
2.5 Types of Fault Tolerance...............................................................  16
2.5.1 Time R edundan cy............................................................  17
2.5.2 Hardware Redundancy......................................................  17
'  " ?  Algorithmic Fault Tolerance.............................................  18
2.6 Scope of the P r o je c t .....................................................................  19
3 Fault Distributions in Integrated Circuits 20
3.1 Introduction...................................................................................... 20
3.2 Types of Defect in Integrated C ircu its ......................................  21
i
3.3 Random Point D efects................................................................... 23
3.3.1 Causes of Point D e fe c ts ................................................... 23
3.3.2 Clustering of Point D e fe c t s ............................................  26
3.3.3 Radial Variations in Point Defect Distributions . . . .  28
3.4 Modelling Integrated Circuit Y i e l d ............................................  29
3.4.1 Poisson Distribution.........................................................  31
3.4.2 Compound Poisson S ta tistics.........................................  33
3.4.3 Generalised Negative Binomial Statistics......................  34
3.5 Implications for Fault Tolerant Techniques................................ 35
4 A  Review  o f  H ardware Fault-Tolerance for P rocessor Arrays 37
4.1 Introduction...................................................................................... 37
4.2 Classification of hardware fault tolerance schemes...................  38
4.3 Switch Organisation and Configuration Schemes ...................  39
4.3.1 Nodal Fault Tolerance......................................................  39
4.3.2 Row or Column Replacement Schemes.......................... 44
4.3.3 Hierarchical Fault Tolerance............................................  48
4.3.4 Row Generation S ch em es ...............................................  49
4.3.5 Global Organisation.........................................................  51
4.4 Switch Implementations................................................................ 52
4.4.1 Hard Configurable Schem es............................................  53
4.4.2 Firm Configurable Switching Schem es.......................... 56
4.4.3 Soft-Configurable Switching Schemes............................. 58
4.4.4 Vote-Configurable Switching Sch em es.......................... 59
4.4.5 Self-Organising Switching Schemes................................  59
4.5 WSI D em onstrators......................................................................  59
4.5.1 Trilogy ...............................................................................  60
4.5.2 Anamartic and the Solid State Disk M e m o ry .............  60
4.5.3 MIT and Lincoln L a boratory .........................................  63
4.5.4 GTE Laboratory................................................................ 63
4.6 Conclusions...................................................................................... 63
5 Self-O rganising Algorithm s for T w o-D im ensional Processor
Arrays 65
5.1 Introduction...................................................................................... 65
5.2 Definitions.........................................................................................  66
5.3 Algorithm 1: WINNER in One Dimension................................  68
5.3.1 Self Organisation................    71
5.3.2 Control Circuitry................................................................ 72
5.3.3 Array Boundary Conditions............................................  75
5.3.4 Interaction of REQuest and AVAILability Signals . . .  76
5.3.5 Serial Description of WINNER O peration ...................  77
5.3.6 5-Neighbour WINNER algorithm ...................................  79
Content« ii
5.4 Algorithm 2: WINNER In Two D im ensions............................  80
5.4.1 Double Site Condition ......................................................  82
5.4.2 Crossovers............................................................................ 85
5.5 Advantages of the WINNER Approach......................................  87
5.6 Concluding R em arks...................................................................... 87
6 Performance of the W INN ER  Algorithm 80
6.1 Introduction...................................................................................... 89
6.2 Simulation......................................................................................... 89
6.3 Choice of Language......................................................................... 90
6.4 Simulation Requirements ............................................................. 91
6.4.1 Program Parameters.........................................................  93
6.4.2 Program Flow C h u t .........................................................  94
6.4.3 Square Arrays ................................................................... 95
6.5 WINNER Simulation R esu lts ......................................................  97
6.5.1 Graphical Presentation of WINNER Results................ 98
6.6 Improving WINNER Performance ................................................ 102
6.6.1 5-Neighbour WINNER Algorithm ................................... 103
6.6.2 Array Partitioning................................................................ 104
6.7 Comparison with other Algorithms................................................ 109
6.7.1 Bounds on Performance.......................................................109
6.7.2 Algorithm Performance Comparisons................................I l l
6.8 Conclusions......................................................................................... 113
7 Hardware Implementation of the W IN N E R  algorithm 115
7.1 Introduction......................................................................................... 115
7.2 Hardware Requirements of WINNER Control Circuitry . . .  116
7.2.1 Gate-Level WINNER Control C ir cu itry ..........................116
7 2.2 Transistor Complexity of Control C ircu itry ....................118
7.3 Input/Output Interface Circuitry....................................................120
7.3.1 Selecting Functional R ow s....................................................120
7.3.2 Data Input Circuitry.............................................................121
7.3.3 Data Output Circuitry..........................................................124
7.4 Column Interconnnection Circuitry in Partitioned Arrays . . 124
7.5 Simulation of the WINNER H ardw are..........................................127
7.5.1 The ELLA Hardware Description Language................... 127
7.5*.2 Simulations using E L L A ......................................................128
7.6 Hardware for other Configuring T echn iqu es................................128
7.6.1 Moore and Mahat’s Schem e................................................ 128
7.6.2 Sami and Stefanelli’s S ch em e............................................. 129
7.6.3 Comparison of Hardware Requirem ents..........................129
Contents ill
8 Self-Testing o f  Self-O rganising Arrays 131
8.1 Introduction......................................................................................131
8.2 Design for Testability ...................................................................131
8.2.1 Scan path techniques.............................................................132
8.2.2 Level Sensitive Scan Design - LSSD .................................133
8.3 Self-test techniques......................................................................... 134
8.3.1 Linear Feedback Shift Registers..........................................135
8.3.2 Compression of test re su lts ................................................ 137
8.3.3 Compression by co u n t in g ....................................................137
8.3.4 Compression by Recursive Compaction............................. 137
8.3.5 Compaction of Multiple Input Streams............................. 138
8.4 Self testing requirements in W IN N E R ..........................................139
8.4.1 Signature Com parison..........................................................139
9 R educing C ontrol C ircuit Vulnerability 141
9.1 Introduction......................................................................................141
9.2 The Ideal Self-Organising Array.......................................................142
9.3 Inherent Fault-Tolerance of the Control C ircu it.......................... 143
9.4 Dual-Rail Implementation of Control C ircu itry .......................... 143
9.4.1 Fault Detection by Simple Duplication..............................144
9.4.2 Fault Detection using True and Complement Circuits 146
9.5 Application of Duplicated Circuits to WINNER .......................... 147
9.5.1 Performance of Duplicated Control C ircu itry ................ 149
9.6 Hardware Requirements for Two-rail Implementation.................151
9.7 External Testing of the Control C ircu itry ....................................151
9.7.1 Control Circuit Testing S tra tegy .......................................151
9.7.2 Scan Path testing procedure................................................ 154
9.7.3 Control Circuitry Fault Masking P rocedure....................155
9.7.4 Modified Scan path Register .............................................155
9.8 Hardware Complexity of the Scan Path Approach....................... 156
9.9 Other T e s ts ......................................................................................... 157
9.9.1 Testing of the Signature Comparator.................................157
9.9.2 Vertical Bypass C ircu itry ....................................................160
9.9.3 Horizontal Interconnections................................................ 162
9.10 Benefits of the Scan path Test procedure.......................................162
9.11 Comments............................................................................................ 163
10 W IN N E R  D em onstrator 164
10.1 The Need for a Demonstrator..........................................................164
10.2 Type of Demonstrator...................................................................... 164
10.3 Demonstrator Objectives...................................................................165
10.4 Demonstrator Specification .............................................................166
10.5 Processor Array F u n ction ................................................................167
Contents iv
Contents
10.6 Implementation O ptions...................................................................169
10.6.1 LSI Com ponents...................................................................169
10.6.2 Custom Chip or Gate Array................................................ 169
10.6.3 EPROM Implementation................................................171
10.7 Demonstrator Circuitry Requirements......................................171
10.7.1 Circuit for the WINNER C e l l .............................................173
10.7.2 Other Circuitry......................................................................176
10.8 External T e s t in g ............................................................................... 179
10.8.1 Computer Interface circuitry .............................................180
10.8.2 Testing Softw are...................................................................180
10.9 Demonstrator R e s u lts ......................................................................180
lO.lOConcluding Rem arks......................................................................... 184
11 Applications of W INN ER  186
11.1 Introduction......................................................................................... 185
11.2 Application to Processor Arrays...................................................... 185
11.2.1 The Transputer......................................................................186
11.2.2 The Distributed Array Processor (D A P )..........................187
11.2.3 Digital Signal Processing A r r a y s .......................................188
11.3 Application in Array Implementation.............................................188
11.4 High Availability System s................................................................189
11.4.1 Special Considerations..........................................................189
11.4.2 General Comments................................................................190
11.5 Silicon H ybrid s...................................................................................191
11.5.1 Requirement for Fault T oleran ce .......................................192
11.6 Wafer Scale Integration ................................................................... 192
12 Conclusions and Further Work 194
12.1 Conclusions......................................................................................... 194
12.2 Suggestions for Further W o rk ..........................................................197
12.2.1 Fabrication of a Monolithic c ir c u it ....................................197
12.2.2 Extension to Tree S tructures.............................................197
12.2.3 Fault Tolerant Switching Networks....................................198
Bibliography 200
APPENDICES
A W IN N ER  Performance Simulation Program 210
A .l Program O verv iew ............................................................................ 210
A.2 Suitability for Other Algorithms ...................................................211
A.3 Program Listing ............................................................................... 212
B ELLA Sim ulation Program  210
B. l Program Listing .......................................................................... 219
C  W IN N E R  D em onstrator Test Program  227
C. l Program O verview ....................................................................... 227
C. 2 Program Listing .......................................................................... 227
D  Published Papers 237
D. l Papers Included in the A ppendix ..............................................237
D.2 Other Publications............................................................................ 238
Contents vi
List of Figures
2.1 Systolic Matrix x Matrix M u ltip lier .........................................  12
2.2 Typical arrangement for algorithmic fault-tolerance.................. 18
3.1 The main Classes of integrated Circuit Defect............................ 21
3.2 Types of Integrated Circuit D e fe c t ............................................. 23
3.3 Sires of Integrated Circuit Defects ............................................. 25
3.4 Probability of defect causing a fault .........................................  26
3.5 Typical Clustering tendency of d e fe c ts ......................................  27
3.6 Radial yield variation ................................................................... 29
3.7 Chip yield predicted by the Poisson m odel................................ 32
3.8 Number of faults per c h i p ............................................................  33
3.9 Yield predicted by Negative Binomial m odel............................. 35
4.1 Classification of switching sch em es............................................  39
4.2 Classification of switch implementations...................................  40
4.3 Technique of Triple Modular Redundancy................................ 42
4.4 Nodal yield improvement achieved with TMR............................ 43
4.5 Toleration of TMR voting circuit faults ...................................  44
4.6 Fault-tolerant m em ory................................................................... 45
4.7 Simple Row-bypass S ch em e.........................................................  47
4.8 Hierarchical fault-tolerance schem e............................................. 48
4.9 Row-oriented scheme of Moore and M ahat................................ 50
4.10 A global interconnection scheme ................................................ 52
4.11 Cross-section of laser l in k ............................................................. 55
4.12 MNOS transistor............................................................................... 57
4.13 Spiral technique of Aubusson and C a tt ......................................  62
5.1 Main components of a WINNER c e l l .........................................  67
5.2 Equivalent hexagonal and orthogonal arrays............................. 68
5.3 Array configured using WINNER ................................................ 69
5.4 Schematic WINNER c e ll ...............................................................  72
5.5 Step-by-step array configuration ................................................ 78
5.6 Generation of REQuest s ig n a ls ................................................... 79
5.7 WINNER applied in 2 dimensions ............................................. 81
vii
5.8 WINNER cell for 2D configuration............................................ 83
5.0 Double-site and crossover m o d e s ...............................................  84
6.1 Harvest o f Hedlund’s s c h e m e .........................................» . . . 93
6.2 Schematic of the table of simulation results................................ 94
6.3 Characteristic form of the table of results...................................  95
6.4 Flow-chart of the simulation program..........................................  96
6.5 Table of simulation results............................................................ 98
6.6 Array yield versus processor y ield ...................................................100
6.7 Overhead versus processor y ie ld ......................................................101
6.8 Array yield versus target array s ize ................................................103
6.9 5-Neighbour WINNER performance .............................................104
6.10 Array yield versus array width ......................................................106
6.11 Effect of partitioning on array yield................................................107
6.12 Array yield versus number of partition s .......................................107
6.13 Upper bounds on configuration performance................................I l l
6.14 Comparison of configuration perform ance................................... 112
7.1 Gate implementation of control circuitry...................................... 117
7.2 CMOS implementation of some common gates ..........................119
7.3 Data input c ir cu itry ......................................................................... 122
7.4 Data output circuitry ......................................................................125
7.5 Intercolumn routing c ircu itry ......................................................... 126
7.6 Comparison of hardware requirements..........................................130
8.1 The scan-path testing technique....................................................... 133
8.2 Block diagram of a typical self-testing system............................... 135
8.3 A linear feedback shift r e g iste r ......................................................136
8.4 Single-input compaction c ir cu it ...................................................... 138
8.5 Multiple-input compaction circuit...................................................139
8.6 Simple signature comparison method...............................................140
9.1 Fault detection by simple duplication.............................................145
9.2 True and complement two-rail implementation............................. 146
9.3 Two-rail input buffer c ir c u its ......................................................... 148
9.4 A two-rail signature comparator.......................................................149
9.5 Percentage of active fau lts ................................................................150
9.6 Scan-path scheme applied to WINNER..........................................152
9.7 Scan-path arrangement within a WINNER c e l l ..........................153
9.8 Modified scan-path reg ister ............................................................ 156
9.9 A TMR based signature comparator................................................159
9.10 A fully testable signature comparator............................................. 160
9.11 A fully verifiable WINNER cell.........................................................161
List of Figures viii
10.1 4-bit by 4-bit array multiplier......................................................... 168
10.2 Orthogonally interconnected multiplier......................................... 170
10.3 Block diagram of the WINNER demonstrator........................... 172
10.4 Circuit of demonstrator c e l l .........................................................174
10.5 Equivalent circuit of the Control PROM........................................ 175
10.6 Equivalent circuit of the Processor PROM.....................................176
10.7 Equivalent circuit of the Scan PROM..............................................177
10.8 Equivalent circuit of data entry PROM ...................................... 178
10.9 Equivalent circuit of data output P R O M ...................................... 179
10. lOPhotograph of complete dem onstrator......................................181
lO.llSingle cell of the demonstrator a rra y ......................................... 182
10.12Photograph of configured demonstrator a rra y ............................. 183
11.1 Schematic view of the Transputer.................................................... 186
11.2 Schematic view of the ICL DAP........................................................187
List o f  Figures ix
List of Tables
3.1 Types of Integrated Circuit Defect...............................................  24
4.1 Laser link parameters......................................................................  55
5.1 Generation of AVAILability output signals.................................  74
5.2 Generation of REQuest output signals......................................... 76
9.1 Conversion of two-rail input signals.................................................148
x
Summary
This thesis is concerned with research into techniques for tolerating the de­
fects which inevitably occur in integrated circuits during processing. The 
research is motivated by the desire to permit the fabrication of very large 
( >  1cm*) integrated circuits having a viable yield, using standard chip 
processing lines. Attention is focussed on 2-dimensional arrays of identi­
cal processing elements with nearest-neighbour, orthogonal interconnections, 
and techniques for configuring such arrays in the presence of faults are in­
vestigated. In particular, novel algorithms based on the concept of aclf- 
organisation are proposed and studied in detail. The algorithms involve 
associating a small amount c f  control logic with each processing element 
in the array. The extra logic allows the processing elements to communi­
cate with each other and come to a collective decision about how working 
processors should best be interconnected. The concept has been studied in 
considerable depth and the implications of the algorithms in a practical sys­
tem have been thoroughly considered and demonstrated by construction of a 
small array at printed circuit board level, complete with software controlled 
testing procedures.
The thesis can be considered in four main parts as follows. The first part 
(chapters 1 to 4) starts by presenting the objectives of the research and then 
motivates it by examining the increasing need for processor arrays. The dif­
ficulty of implementing such arrays as monolithic circuits due to integrated 
circuit defects is then considered. This is followed by a review of published 
work on hardware fault tolerance for regular arrays of processors. The sec­
ond part (chapters 5 and 6) is devoted to the concept of self-organiaation 
in processor arrays and includes a detailed description and evaluation of 
the algorithms followed by a comparison with other published techniques. 
Considerations such as hardware requirements and overheads, reducing the 
vulnerability of critical circuitry, self-testing, and the construction of the 
demonstrator are covered in the third part (chapters 7 to 10). The fourth 
part (chapters 11 and 12) considers potential applications for the research in 
both monolithic and non-monolithic systems. Finally, the conclusions and 
some suggestions for further work are presented.
xi
Acknowledgements
The author wishes to acknowledge the support, assistance and encourage­
ment of the following people and institutions throughout the course of the
research leading to this thesis:
Dr Tim  Thorp, for encouraging me to register as a postgraduate student 
and for being understanding during the writing of the thesis,
Dr John McWhirter, for his inspiration, and day to day encouragement and 
support,
Dr Adrian Mears, for suggesting that I register as a postgraduate student,
Professor Graham Nudd, for supervising me throughout the thesis and for 
his suggestions, encouragement and comments,
Dr John McCanny, for his helpful discussions and comments on the thesis,
Alan Brewer and Gary Ellis, for their hard work in building the demonstra­
tor of a self-organising array,
Kevin Palmer, for helpful discussions and assistance during the design and 
construction of the demonstrator array,
Dr Ian Proudler, for many useful discussions, particularly concerning two- 
rail circuit implementations,
My wife Josie, for her love and support throughout this thesis but particu­
larly during the difficult times,
Acknowledgements
The Computer Science Department of the University of Warwick, for accept­
ing me as a part-time postgraduate student,
The Ministry o f Defence (Procurement Executive), for providing full fund­
ing for University fees, travel and attendance at conferences as well as 
for permitting me to submit my research as a PhD thesis.
xiii
Chapter 1
Introduction, Objectives and 
Overview
X.l Introduction
“In 1959 the number of transistors that would fit on a chip was 1; 
now it has surpassed 1 million. As material limits are reached, the 
pace is slowing, but by the year SOOO there will be chips containing 
1 billion components”.
This quotation by James Meindl (Meindl, 1987) is partly factual and 
partly speculative, but certainly indicates the scale o f the challenge facing in­
tegrated circuit researchers and designers over the coming decade. Currently 
the chips containing 1 million transistors are memory devices consisting of 
vast numbers of identical storage cells, and are fabricated on highly opti­
mised processing lines and sold in huge volumes. General purpose circuits 
have not yet reached this complexity in part because they cannot be sold 
in high enough volumes to sufficiently amortise the design and fabrication 
costs. The fabrication costs of these very large general purpose circuits are 
high because their yield is low. If techniques could be found to increase the 
yield, costs would fall and marketability should significantly increase.
One way to increase chip yield is to reduce the number o f defects which 
occur randomly in the processing line. This is already being done wherever 
possible, but with continual reductions in device geometries, the task is be­
1
INTRODUCTION, OBJECTIVES AND OVERVIEW 2
coming ever more difficult and increased standards of cleanliness are often 
required simply to maintain existing yields. The net result is that chips 
above about 1 cm* in area are rarely produced commercially.
Reductions in device geometry permit increased numbers of transistors 
to be placed on a chip of any given area. However, although further re­
ductions are possible it is clear that the rate of reduction is slowing down. 
This is mainly due to the fact that the equipment and process lines needed to 
handle the small device structures become almost exponentially more expen­
sive, requiring fundamental changes in lithographic techniques and etching 
techniques for example.
With this background it is hardly surprising that many researchers have 
been investigating an alternative technique to enable chip complexity to be 
significantly increased at reasonable cost. This technique is called Fault Tol­
erance or Defect Tolerance. The application of fault tolerant techniques to 
complex electronic systems has received the attention of researchers since 
electronics began to be used in safety-critical applications such as the con­
trol of aircraft or large industrial plant. These systems start life in a fully 
functional form but in the event o f  the in-service failure of a component, can 
either fail in a safe manner, or ideally, continue their function as if no fault 
were present. The application of fault tolerant techniques to integrated cir­
cuits is in general quite different. In integrated circuits the problem is that a 
small number of components within a large chip are faulty at the start o f the 
chip’s life and these cause the entire chip to be discarded as a failure. The 
primary aim therefore is to develop the ideas of fault tolerance to be able to 
tolerate defects present in the circuits. The ability of the circuit to tolerate 
in-service failures in addition to fabrication defects would be a bonus.
1.2 Thesis Objectives
In this thesis we address the problem of design techniques to enable inte­
grated circuits to tolerate fabrication defects. In particular we consider the
INTRODUCTION, OBJECTIVES AND OVERVIEW 3
implementation of large 2-dimensional arrays of identical processing elements 
with nearest-neighbour, orthogonal interconnections, in which some of the 
processing elements may be permanently defective. The objectives of the 
thesis are as follows:
1. To study existing techniques for fault tolerance or defect tolerance ap­
plied to integrated circuit fabrication.
2. To develop novel techniques for tolerating defects based on the concept 
of Self-organisation. Self-organisation implies that a circuit is both able 
to detect and then automatically configure itself around each defect 
without any external assistance to produce a fully functional array.
3. To evaluate the performance of the self-organising techniques devel­
oped in (2), and to compare them with existing approaches to the 
configuration problem.
4. To demonstrate the self-organising techniques by constructing a pro­
cessor array which embodies the concepts.
1.3 Overview of the Thesis
In order to assist the reader, there follows a brief overview of the thesis on a 
chapter by chapter basis.
Chapters 2 and 3 motivate the work in the remainder of the thesis and 
together attempt to answer the question: Why do we need fault tolerance? 
Chapter 2 reviews the evolution of computing from single von Neumann ma­
chines to the large multi-processor architectures available today and consid­
ers some of-the reasons for the trend towards parallel processing. Chapter 3 
then considers in detail the types of defect which occur in integrated circuits 
and presents some mathematical models which are commonly used to predict 
the yield of integrated circuits. Characteristic yield variations over the wafer 
surface are also discussed.
INTRODUCTION, OBJECTIVES AND OVERVIEW 4
Chapter 4 reviews the state of the art in techniques suitable for tolerating 
integrated circuit defects. Schemes for configuring circuits around defects as 
well as techniques for implementing the switching elements are considered.
Chapter 5 describes the novel self-organising algorithms which have been 
developed within the project. The reader who is familiar with the back­
ground and motivation for fault tolerant techniques can skip straight to this 
chapter if desired. The self-organising algorithms are evaluated in chapter 6 
and then compared with the existing published techniques. The hardware 
requirements of the self-organising algorithms are considered in chapter 7.
The concept of self-organisation requires that each processor in the array 
can produce a go/no-go signal to indicate whether it is functional or not. 
This signal is then used in the configuration process. Chapter 8 discusses 
how this could be achieved by Self-testing techniques.
One of the problems in a system designed to automatically tolerate de­
fects is that the circuitry used to carry out the configuration could itself be 
defective. This problem is addressed in chapter 9 in which two techniques 
for reducing the vulnerability of the most critical circuits are presented.
Chapters 7 and 10 are both concerned with the hardware required in a 
self-organising array. Chapter 7 considers the hardware overhead which the 
self-organising approach will incur for the user, while chapter 10 describes 
an array which has been built to demonstrate the ideas incorporating the 
techniques described in chapters 5 and 9.
Finally, in chapters 11 and 12 we consider potential applications of the 
self-organising concepts other than in integrated circuit fabrication, and then 
conclude by putting forward some suggestions for further work.
Chapter 2
The Evolution of Parallel 
Processing
2.1 The von Neumann Model of Computa­
tion
For the past 30 years or so the von Neumann model o f  computation has 
dominated nearly every aspect of computing from the largest mainframe to 
the smallest home computer. During this period, however, the physical ap­
pearance of von Neumann computers has changed dramatically. In the early 
days of computing a complete room full of vacuum tube circuitry and many 
kilowatts of power were required to run a machine of modest capability (by 
todays standards!). Today, it is possible to integrate a computer of many 
times this processing power on a single chip and sell a complete machine 
at such a low price that it has become possible to buy one for a child at 
Christmas! Machines of this type are of course quite basic computers and 
their speed of operation is not the highest consideration. At the other ex­
treme, large mainframe machines which serve whole departments or sites on 
a time-sharing basis are still quite bulky pieces of equipment, but are able 
to handle many tens or sometimes hundreds of users almost simultaneously.
5
THE EVOLUTION OF PARALLEL PROCESSING 6
2.1.1 Technological A dvances
Although the von-Neumann model of computing is a simple concept and to 
some who are working at the frontiers of computing research now appears a 
little ‘old fashioned’ , it still stands up very well to today’s requirements for 
general purpose computing. This has been made possible by essentially four 
major technological advances over the years as described in the following 
paragraph.
The first commercially produced computer, the UNIVAC1, appeared in 
1951 and used electronic valves with gate delays of about 1/is as its switching 
elements (Eckert et al, 1951). Machines of this type have been classed as First 
Generation computers. In 1960, valves were replaced by solid state devices, 
in the form of germanium transistors, which had gate delays of approximately
0.25/js - these were Second Generation machines. The first silicon integrated 
circuits were introduced in about 1965 and contained a few gates per chip 
operating with a 10ns gate delay and by the mid seventies achieving Ins gate 
delays. Third generation machines used these components. In the 1970s, 
the ability to manufacture integrated circuits became widespread, and their 
complexity rapidly increased so that by 1980 it became possible to  integrate 
a complete microa>*\pot%r onto a single chip - the era of fourth generation 
computing had begun.
The microprocessor was largely reponsible for two major trends in com­
puting in the 1980s. Firstly, it opened the computing community to the 
masses and made it possible to construct a very powerful machine which 
could fit on a desk. No longer was it essential to log into the mainframe 
computer where CPU time and memory were limited and response was often 
slow - ‘one per desk’ became the motto of computer manufacturers. Secondly, 
the availability of small, reliable and above all relatively cheap microproces­
sors meant that the idea of multi-processor parallel computing could be given 
serious practical consideration.
THE EVOLUTION OF PARALLEL PROCESSING 7
2.1.2 T he N eed  for High P erform ance C om puters
There are numerous tasks which can be usefully handled by a computing 
machine but which are extremely computationally intensive and often require 
the entire computing power of a high speed von Neumann machine in order 
that the task can be completed in a reasonable time. Such tasks include 
various simulation problems, and tasks related to engineering design such 
as circuit placement and routing, and complex scientific and mathematical 
calculations (Hillis, 1986). Other tasks often cannot be carried out at the 
required rate even with an entire single processor machine. These include 
processing electrical signals in real time (Signal Processing) such as in radar, 
processing images, (for example the reconstruction of medical ultra sonic 
scanner signals), pattern recognition etc. It is tasks such as these which 
require special attention and the solution has been to introduce parallelism 
into computing. (Hockney and Jesshope, 1981).
2.1.3 C hanging A ttitudes to  C om pu ter Design
The need for a fundamental change in the design of special purpose comput­
ers away from essentially serial, von Neumann machines towards machines 
employing some form of parallelism is also necessitated by the same technol­
ogy which makes parallelism a practical possibility. Since 1950, the speed 
of the components used in the construction of computers has increased by 
a factor of about 1000, from 1/xs to less than Ins. Furthermore, the num­
ber of transistors per chip has approximately doubled every two years since 
the first integrated circuit was developed (Nomura 1985). This has been 
achieved by a combination of both the technological progression from valves 
to silicon transistors as described above, and of improved processing meth­
ods which have allowed devices of ever decreasing geometries to be reliably 
fabricated. Currently, devices at about the 1/xm to 2nm gate length level are 
in commercial production with sub-micron devices having been sucessfully 
demonstrated at the research level. As geometries are scaled down further,
THE EVOLUTION OF PARALLEL PROCESSING 8
it will obviously be possible to integrate even more components onto a given 
area of silicon. However, a new problem rears its head in that although the 
propagation delay of devices decreases linearly (to first order) with reducing 
geometries, the delay associated with the interconnecting tracks remains ap­
proximately constant. It follows that system clock rates will become more 
and more dominated by the track delays as devices become smaller. As a re­
sult, it will be difficult to achieve significant increases in system throughput 
simply by producing devices with smaller geometries. This technological bar­
rier to increased speed has been an important factor in the rapidly increasing 
interest in parallel processing. (Meindl 1985)
2.2 The Introduction of Parallelism
The earliest known reference to parallelism in a computing machine is thought 
to have been in 1842, by L F Menabrea (Kuck 1977), who referred to a 
machine capable of computing products of pairs of 20 digit numbers when 
used for repetitive tasks like the generation of numerical tables. Menabrea 
suggests that the computing machine could be arranged to provide several 
results simultaneously. Parallelism today means much more than just pro­
ducing several independent results simultaneously of course, it includes the 
concept of partitioning a large problem into smaller sub-problems which al­
though may be largely independent of each other must nevertheless cooperate 
to produce a combined output result. For this reason both the hardware and 
the algorithm should ideally be considered together and the process is aptly 
described by the term Algorithmic Engineering.
The transition from serialism to parallelism in computing has taken many 
decades to reach its current position even though structures for parallel ma­
chines had been proposed in the 1950s. The delay of about 30 years before 
any serious commercial machines appeared is largely due to limitations in 
technology which made arrays of processors very difficult and costly to im­
plement. Until about the mid 1970’s only architectures based on a single
THE EVOLUTION OF PARALLEL PROCESSING 9
stream of instructions and data had achieved any commercial success, no­
tably the CRAY-1 in 1976. Of the many theories and structures based on 
multi-computer architectures proposed around 1950, the work of von Neu­
mann was probably the most notable. Von Neumann carried out a theoretical 
study which showed that a 2-dimensional array of computing elements with 
29 states could perform all operations because it could simulate the behaviour 
of a Turing machine (von Neumann, 1966). Unger (1958) proposed practi­
cal structures based on von Neumann’s work. This early work on arrays of 
processors can be considered to have been the progenitor of the SOLOMON, 
ILLIAC IV and ICL DAP machines which appeared in the 1970’s (Hockney 
and Jesshope, 1981). Similarly, Holland (1959) describes an assembly of pro­
cessors each obeying their own instruction stream - an early vision of todays 
Transputer arrays. With the arrival of LSI/VLSI in the 1980’s, small, reliable 
and cheap processors became available and permitted array architectures to 
be considered seriously. The first array computer was the SOLOMON com­
puter (Simultaneous Operation Linked Ordinal MOdular Network) and is 
described in Slotnick et al (1962). This was a two-dimensional array of 
32x32 processing elements each with a memory for 128 32-bit numbers and 
an arithmetic unit which worked in a bit-serial manner under control of a sin­
gle instruction stream and control unit. Although never built as described in 
the 1962 paper, it led to the development of the ILLIAC IV, the Burroughs 
PEPE floating point processor arrays, the Goodyear Aerospace STARAN 
and the ICL DAP, all arrays o f single-bit processing elements. (Hockney 
1981)
2.2.1 C lassification o f  C om pu ter A rchitectures
Computer architectures can be classified according to their structure (Shore 
1973) or according to the relationship between instructions and data (Flynn 
1972). Although Flynn’s classification is less precise than Shore’s, it seems 
to be more popular and involves four classes as follows:
THE EVOLUTION OF PARALLEL PROCESSING 10
1. SISD - Single Instruction stream/Single Data stream. This is the 
conventional Von Neumann machine in which there is one stream of 
instructions executed by a single processor operating on a single stream 
of data. It is possible to imagine an SISD machine with more than one 
processor, but it is clear that such a machine does not provide any 
more computing power than the single processor version!
2. SIMD - Single Instruction stream/Multiple Data stream. This clas­
sification implies parallelism since it describes computers in which the 
same instruction is executed on several streams of data simultaneously. 
Vector machines such as the CRAY-1 are included in this category as 
well as processor arrays such as ILLIAC IV, ICL DAP and the Con­
nection Machine from the Thinking Machines Corporation.
3. M ISD - Multiple Instruction stream/Single Data stream. This class 
is essentially void since it implies that several instructions are being 
executed on a single data item simultaneously.
4. M IM D - Multiple Instruction stream/Multiple Data stream. This 
class implies parallelism and the ability o f the machine to simulta­
neously execute several different instructions on different data. An 
example of a MIMD machine is an array of transputers (Inmos, 1984) 
each of which can potentially contain a different program. The PASM 
machine (Siegel et al, 1981) can also be considered in this category. 
PASM is essentially a SIMD array but is partitionable, with each par- 
titon being controlled independently. It therefore has SIMD/MIMD 
capability.
Of the above categories it is the SIMD and MIMD architectures, both 
involving multiple processing elements (generally arrays) which are of interest 
in this thesis. In particular we are interested in two dimensional arrays of 
interconnected processing elements.
THE EVOLUTION OF PARALLEL PROCESSING 11
2.2.2 P ipelined Parallel A rch itectures
The programmable computer architecture is not the only application area 
for parallelism. There are many functions in the field of signal and image 
processing which benefit from both parallelism and pipelining of operations 
but where each processor performs a fixed task. A particularly important 
dedicated architecture of this type is the systolic array which was proposed 
by Kung (1980).
Systolic Arrays
A systolic array is an array of identical processing elements each of which 
communicates only with its nearest neighbours with all such communication 
taking place via latches. The latches are all clocked in synchronism. The 
function of the array is determined by the function of the processors and the 
way in which data streams are arranged to flow through the array. Systolic 
arrays have many important properties including total physical and electrical 
regularity, a high degree of parallelism through the use of many processors 
and high throughput rates due to inherent pipelining. The parallelism and 
pipelined nature of systolic arrays makes them ideal for use in signal process­
ing applications where high throughput rates are essential to cope with real 
time data, but where the latency due to pipelining is not a serious problem.
Probably the best known systolic array is that proposed by Kung (1980), 
for the multiplication of banded matrices. This consists of a diamond shaped 
array of hexagonally interconnected processors each of which performs a 
multiply-accumulate function as illustrated in figure 2.1. Briefly the oper­
ation of the circuit is as follows. Data words enter the cells as shown and 
are multiplied together within the cell. The product is added to an incom­
ing sum and the combined result passed out northwards. Each processing 
element therefore computes part of the final result and by the time a result 
emerges from the top of the array it has been fully formed.
THE EVOLUTION OF PARALLEL PROCESSING 12
* *
. \
/ /
/•
Figure 2.1: Systolic Matrix x Matrix Multiplier after Kung (1980).
THE EVOLUTION OF PARALLEL PROCESSING 13
Bit-level Systolic Arrays
Many o f the systolic arrays proposed by Kung and others involved cells 
which performed a multiply-accumulate function. McCanny and McWhirter 
(1981) recognised that the multiply-accumulate function itself possessed in­
herent parallelism and showed how a systolic multiplier could be constructed 
in which each cell operates at the bit level. This was a significant step for­
ward for both signal processing and VLSI design. For signal processing it 
meant that pipelining had been introduced at the bit level and that systems 
based on such arrays would have a very high throughput. For VLSI the 
implications are several. Firstly, because large numbers of bit-level cells can 
be integrated onto a single chip, entire functional blocks, for example mul­
tipliers, correlators and convolvers (Urquhart and Wood 1984, Evans et al 
1983) can be implemented as single chip systolic systems. Secondly, since all 
cells in a systolic array have only nearest neighbour connections, the com­
munication problems within the chip are simplified by avoiding the need to 
broadcast data. Furthermore, since all cells are physically and electrically 
identical, arrays can be scaled easily and so systolic arrays of arbitrary size 
can be designed without the need to worry about timing problems.
Wavefront Arrays
As we have seen, a systolic array involves data streams which move across 
the array in a completely synchronous fashion. The correct timing of the 
interactions between individual data elements is ensured by having each cell 
in the array controlled by the same clock. The systolic approach has advan­
tages in terms of simple control and predetermined array timing, but does 
not necessarily result in a system with the highest possible performance. 
This is evident from the fact that the clock period must be sufficiently long 
to allow the slowest cell to complete its calculation. However, some cells may 
have completed their calculation well before the end of the clock period and 
could in principle start a new calculation without waiting for the next clock
THE EVOLUTION OF PARALLEL PROCESSING 14
period to start. Such cells are therefore essentially held back to keep them in 
synchronism with the slowest cells.
The wavefront array concept (Kung and Gal-Ezer, 1982) attempts to 
extract the fullest possible performance from a system. Instead of using a 
global clock to synchronise communications between cells, an asynchronous 
handshaking mechanism between cells is used. The handshaking procedure 
allows a cell to proceed with a calculation as soon as it has passed on the 
previous results to another cell and received the necessary inputs. This means 
that the data streams traversing the array will not have linear wavefronts, but 
will have regions where parts of the wavefront have become more advanced 
than the rest. In an array in which the cell computation time is a function 
of input data rather than spatial position, the average effect should be a net 
increase in system throughput.
2.3 The Need for Fault Tolerance
There are three main reasons why designers are interested in fault tolerance:
• To completely mask out the effect of faults occurring in service in 
safety-critical applications,
• To enable fast repair of faulty systems by manual initiation of a fault 
tolerant scheme,
• To enable fabrication faults in monolithic devices to be circumvented 
so that device yields can be increased and ultra large circuits can be 
built.
2.3.1 Reliability Im provem ent
The application of fault tolerant techniques to the above problems has evolved 
over many years because of the particular need at the time. The earliest need 
was for an improvement in the reliability of computers and is as old as the 
first computers, which were inherently unreliable due to the valves and relays
THE EVOLUTION OF PARALLEL PROCESSING 15
used in their construction. The move to solid state devices brought with it a 
dramatic improvement in reliability, which together with improved construc­
tion techniques such as printed circuit boards, largely alleviated the problem 
for all except the most complex systems.
As more and more complex systems continued to be built, much greater 
emphasis had to be placed on reliability, for example in the US space program 
of the early 1960s (Avizienis, 1976). An additional motivation was that 
increasing amounts of electronic circuitry were finding their way into other 
safety critical applications such as the control of aircraft and large factories. 
Human life depended on the correct operation of these circuits and much 
effort went into the study of techniques which could increase the probability 
of correct circuit operation over specified time intervals. (Siewiorek and 
Swarz, 1982).
In more recent times, the move towards parallel processing has enabled 
more complex machines to be designed and built. Even with highly reliable 
components which have survived the burn-in procedure, in-service failure can 
be a serious problem. However, although the hardware complexity which can 
readily be achieved in parallel processing systems brings with it problems of 
potential unreliability, the nature of parallel processing architectures also 
holds the key to their solution. The key lies in the regular structure of 
parallel processing machines, which are ideally suited to the inclusion of 
spare or redundant elements which can then be used to take the place of any 
which become faulty during operation. It is this feature which has led to 
much research on techniques for configuring processor arrays.
2.4 Wafer Scale Integration
The regular nature of processor arrays and the relative ease with which fault 
tolerance can be introduced into them has stimulated interest in whole wafer 
integrated circuits, or Wafer Scale Integration (WSI). Such circuits represent 
the ultimate goal of integrated circuit designers. However, wafer scale circuits
THE EVOLUTION OF PARALLEL PROCESSING 16
are impossible to fabricate successfully without some form of fault tolerance 
since the probability of at least one defect being present in the circuit is 
almost unity. In WSI the idea is to fully utilise the available silicon area on 
the wafer to fabricate a single monolithic device. In the case of a regular 
array o f processing elements this avoids dicing the wafer into separate chips, 
testing, packaging and reassembling into an array similar in topology to the 
way the processors were arranged on the original wafer.
WSI is expected to be capable of providing higher performance systems 
due to the close proximity of the processing elements and the fact that it is 
not necessary for the elements to communicate via the high capacitance world 
which exists outside the monolithic silicon. The advantages of reductions 
in size, weight and power follow in a straightforward manner. However, 
unlike conventional systems which are constructed using selected components 
which have been found fully functional after thorough testing, WSI circuits 
have a fixed set of processing elements. Because the yield of the processing 
elements on the wafer is far from 100%, the ability of WSI circuits to tolerate 
fabrication defects is essential.
Faults in the elements on the wafer can be caused in many ways, but 
perhaps the most common are short and open circuits in the wiring, pinholes 
in the oxides, and dust particles affecting the etching processes. The larger 
the elements on the wafer, the greater the probability that any one of them 
will contain a fault. Extending this concept to a circuit covering most of the 
surface of a wafer, it becomes easy to see that a WSI circuit would never be 
fully functional without some in-built fault tolerance.
2.5 Types of Fault Tolerance
It must be stated at the outset that an essential requirement of any approach 
to fault tolerance is the inclusion of some form of redundancy. This can be 
in the form of either time or hardware.
THE EVOLUTION OF PARALLEL PROCESSING 17
2.5.1 T im e R edundancy
Time redundancy implies that all the circuitry in the fault tolerant system 
is necessary if the system is to operate at its rated maximum speed, but 
that failures of individual elements of the system do not cause failure of the 
entire system even though the failed elements are not replaced. The system 
design is such that the remaining functional parts of the system perform the 
tasks which would normally have been carried out by the faulty elements. 
Since the total amount of hardware in the system is fixed, the speed of 
operation of the system is reduced. An analogy is a factory employing men 
to perform individual tasks which when combined result in the generation 
of a product. The full complement of staff is fixed and when all present the 
factory achieves its maximum product output rate. However, when one or 
more of the men is ill or on holiday (ie faulty), the product output of the 
factory will drop but probably not fall to zero since the tasks of the missing 
men will be shared by those present. The main problem of implementing 
time redundancy in hardware is designing an efficient algorithm for sharing 
the tasks among processors; one technique is presented by Sami (1984).
2.5.2 H ardw are R edundancy
Unlike time redundancy, hardware redundancy allows the system to oper­
ate at its rated speed even when faults (up to some maximum number) 
are present in the system. Perhaps the most obvious form of hardware re­
dundancy is the use of extra hardware elements and the inclusion of some 
arrangement for replacing a faulty part of the circuit with a spare. In other 
words to completely remove the faulty element from the circuitry being used. 
Techniques of this type are discussed in detail in Chapter 4 and will therefore 
not be pursued here.
THE EVOLUTION OF PARALLEL PROCESSING 18
Figure 2.2: Typical arrangement for algorithmic fault-tolerance.
2.5.3 A lgorithm ic Fault Tolerance
It is also possible to design systems with hardware redundancy which have 
the redundancy built into the algorithm. A typical arrangement is illustrated 
in figure 2.2. In these systems, all of the hardware is used all of the time with 
redundant calculations taking place in the redundant hardware. Instead of 
inhibiting any faulty elements from taking part in the computation process 
by switching them out of the system, they are allowed to remain in place 
and generate faulty outputs. Data signals are encoded before entering the 
system and any faulty signals generated within the system propagate to 
the outputs. Here they are detected and corrected in a decoding process. 
This approach is called Algorithmic Fault Tolerance since faults which occur 
during the execution of the algorithm can be tolerated. The main advantage 
of algorithm fault tolerance is that it is able to cope with intermittent as 
well as permanent failures, completely masking the effect of the fault from 
the output without any loss o f  information. In a well designed system of this 
type the user need not be aware that a fault had occurred.
Systems based on algorithmic fault tolerance include transmission sys­
tems using error correcting codes, where each data word to be transmitted 
is encoded with a number o f check bits. The check bits are subsequently 
used by a decoding circuit to correct errors in the received data word. A 
good treatment of transmission codes is given in MacWilliams and Sloane 
(1077). Error correcting codes have also been used in memory chips to detect
THE EVOLUTION OF PARALLEL PROCESSING 10
and correct both fabrication and in-service soft errors. (Cliff 1974). Other 
techniques are based on AN  codes which involve pre-multiplying operands 
by a constant value A and checking that the result is a multiple of A. AN 
codes are useful for detecting errors which have occurred in arithmetic op­
erations like addition and multiplication. A discussion of these and other 
coding techniques is presented by Wakerley (1978).
2.6 Scope of the Project
In this project we address the problem of applying hardware fault tolerance 
to 2-dimensional arrays of identical processors with nearest neighbour in­
terconnections since such circuits are becoming increasingly important in 
many areas of signal and data processing. We will focus our attention on 
techniques suitable for use in the fabrication of large area integrated circuits 
since this is probably the most difficult problem of fault tolerance. However, 
the techniques can also be used in other applications.
In chapter 3 we further motivate the work by investigating the distribu­
tions of faults which occur in processed silicon wafers. Then in chapter 4 
we review the state of the art in hardware fault tolerance techniques with 
examples o f application to both linear and 2-dimensional arrays. As we 
shall see, the work is particularly relevant to parallel processing, and due to 
the regularity of the arrays used in this field, efficient fault tolerant schemes 
can be developed.
C hapter 3
Fault D istributions in 
Integrated C ircuits
3.1 Introduction
In chapter 2 we described the increasing move towards parallel circuit archi­
tectures, such as arrays of processing elements, for high performance com­
putation. Typically, a single processing element might be integrated onto a 
chip and the chips then interconnected in the form of an array as required. 
The regularity of processor arrays naturally leads one to consider the pos­
sibility of integrating the entire array onto one large chip so that the tasks 
of dicing, packaging, re-testing, and assembling of the individual processors 
into the required array are eliminated. This approach is termed large area 
integration or more commonly wafer scale integration (WSI).
One of the problems of WSI is that the large area of the circuit means 
that the probability of at least one fault-causing defect being present in the 
circuit is almost unity, and the yield of the circuit will be zero. As we shall 
see in chapters 4 and 5, it is possible to incorporate redundant circuit ele­
ments so that faulty elements can be replaced. However, the effectiveness of 
redundancy depends strongly on the fault distribution (Mangir 1984). For 
this reason it is important to have an understanding of the distribution of 
the faults to be tolerated before attempting to incorporate redundancy into 
the circuit. In this chapter we discuss the main fault-producing mechanisms
20
FAULT STATISTICS 21
AVERAGE NUMBER OF DEFECTS 
PER WAFER
1000
100
1 0  ■ SMALL RANDOM 
DEFECTS:
INTERMEDIATE 
SIZED DEFECTS:
GROSS DEFECTS:
DISLOCAVONS
PINHOLES
DUST
SCRATCHES
BREAKAGES 
MISSED STEPS 
ALIGNMENT
10 0  10 '  1 0 *  1 0 *  1 0 * 1 0 *  1 0 *  1 0 7 1 0 *  1 0 *
AREA OF DEFECT (p m  ’  i
Figure 3.1: The main Classes of integrated Circuit Defect.
which occur in silicon integrated circuits and consider how they are dis­
tributed over the wafer. We then investigate the problem of modelling these 
distributions mathematically and present some of the models most commonly 
used in the literature. Finally we draw conclusions about the approach to 
fault tolerance which would be best suited to overcoming faults in a large 
integrated circuit.
3.2 Types of Defect in Integrated Circuits
Defects are present in wafers produced by any semiconductor process line. 
These defects can occur due to a variety of different causes and may be 
classified into three main groups according to Peltzer (1983), as shown in 
figure 3.1. The diagram is for a typical semiconductor wafer and shows the 
frequency of occurrence of each of the three defect types as a function of 
defect area. On the right hand side of the figure, we see that large area, 
or gross defects occur much less than once per wafer. These defects tend
FA ULT STATISTICS 22
to affect large portions of the wafer or even the entire wafer and render it 
unusable. Gross defects can be caused by breakages, missed processing steps, 
poor mask alignment, or incorrect processing, which could result in shifted 
device parameters. Many of these causes of defect can be detected during the 
processing, and further processing of the wafer can thus be avoided. Hovever, 
wafers with shifted device parameters may not be detected until after the 
processing stages have been completed.
At the other end of the scale of defect sizes are the random or point 
defects. A precise definition of when a defect is a point defect is difficult to 
provide, but defects which affect the operation of a single or small number 
of primitive devices such as transistors might typically be described as point 
defects. This means that defects with areas less that approximately lOO/xm* 
would be considered to be point defects and occur in relatively large numbers, 
perhaps several hundred or so on a 4 inch wafer. Typical average defect 
densities for current processes are in the region of 2 to 5 defects per square 
centimetre (Chen et al, 1988). Point defects can be caused by a multitude of 
different mechanisms including dust particles, contaminated chemicals and 
pinholes in oxides, and will be discussed in more detail in the next section.
The class of intermediate-sized defects lies between the gross and point 
defects and includes defects due to residues remaining from photolithographic 
processes, small scratches, etc. They are largely superficial but nevertheless 
can severely detract from the overall chip yield if not properly controlled. 
Intermediate-sized defects occur typically as a result of poor handling and 
poor process cleanliness.
Stapper et al (1983) give the proportions of failures due to gross and 
random defects as shown in figure 3.2. As can be seen, typically over 80% of 
losses are caused by random defects.
Of the above categories it is the class of random or point defects which 
will concern us in the remainder of this chapter as it is the largest cause of 
yield loss in large area integrated circuits. In particular we will be interested 
in the distribution of the point defects over the surface of the wafer and how
FAULT STATISTICS 23
X  LOSSES
} GROSS YIELD LOSSES
RANDOM.
P O IN T-D E FE C T
LOSSES
Figure 3.2: Typical Types and Proportions of Integrated Circuit Defect.
this influences the yield of chips on the wafer.
3.3 Random Point Defects
3.3.1 Causes o f  Point D efects
The Reliability Analysis Centre (RAC) in Rome collects data on all aspects of 
failure of many devices and systems. In table 3.1 we reproduce data collected 
by the RAC on LSI integrated circuit failures which occurred during initial 
testing after manufacture. This and other tables can be found in Siewiorek 
and Swarz (1982). The table shows that there are many types of point 
defect and that there is considerable variation between bipolar and MOS 
technologies. As one might expect, MOS circuits, being essentially surface 
devices, are more susceptible to defects in oxides and diffusions than are 
bipolar circuits.
The defects shown in table 3.1 are caused by imperfections in the pro­
cessing, including the initial fabrication of the silicon wafer itself. They arise 
during processing due to random fluctuations in the conditions of the pro-
FAULT STATISTICS 24
Failure Type % Bipolar Failures %  MOS Failures |
i Surface 29 45
! Oxide Defects 14 25
1 Diffusion Defects 1 10
Metallisation Defects 21 1
Interconnection Defects 29 4
Input Circuit Defect 1 8
1 Bond Defect 5 7
Table 3.1: Types o f Integrated Circuit Defect.
cessing equipment and chemicals, and include fluctuations in grain size of 
metallisation, resistivities of polysilicon regions, small bubbles in solutions 
and resists (Lawson, 1966), quality of contact regions, step coverage of met­
allisation and contamination. These fluctuations are very difficult to control 
because they occur at such a microscopic level. Defects in the initial wafer 
preparation can include chemical inclusions and crystal imperfections which 
act as recombination or generation centres and can cause degraded device 
performance. (Mangir, 1984).
Size Distribution of Point Defects
Point defects occur in a range of sizes. A typical size distribution is shown 
in figure 3.3. This data has been gathered by Stapper et al (1983) and shows 
that the average defect size for this particular process is about 3.2/jm, but 
that the most frequently occurring defect size is between 1.5 and 2.0nm. 
As one might expect, the measured frequency of the distribution decreases 
as the defect size increases. However, it also reduces for defect sizes below 
about 1.5nm. This is due to the resolution limit of the photolithographic 
equipment; small defects on the masks are not sufficiently resolved to cause 
real defects to appear (Stapper et al, 1983).
FAULT STATISTICS 25
RELATIVE FREQUENCY
DEFECT S IZ E  ( f jm )
Figure 3.3: Typical Size Distribution of Integrated Circuit Random Point 
Defects.
Probability of a Defect Causing a Fault
The presence of a defect in an integrated circuit does not necessarily mean 
that it will cause a fault in that circuit. There are many areas within the cir­
cuit which are entirely blank (ie contain no components or interconnections) 
and these will be unaffected by the presence of a defect. In some cases defects 
occurring on components, especially interconnect, will be smaller than that 
component and may not cause a fault. An example of this is a small hole in 
a much wider metal track. The track will still operate in the presence of the 
defect. The problem of whether the circuit will experience reliability prob­
lems when in service due to electromigration in the region of higher current 
density around the hole in the track will not be considered here.
FAULT STATISTICS 26
PROBABILITY O F  A DEFECT 
CAUSING A  FAILURE
Figure 3.4: Typical Probability-of-Failure Curve for Random Point Defects.
It is possible to generate a curve of probability of failure as a function of 
defect size by using a defect monitor. A defect monitor is simply a special 
chip designed so that the number and effect of defects occurring on the chip 
can be readily and accurately measured. A typical probability-of-failure 
curve is shown in figure 3.4. As can be seen, very small defects typically 
cause no faults at all and have a zero probability-of-failure, while large defects 
always cause faults and have a value of unity.
3.3.2 C lustering o f  P oint D efects
From the earliest days of analysing integrated circuit yield it was clear that 
the distribution of point defects was not entirely random over the surface of 
the wafer, and that defects tended to cluster together (Murphy, 1964). This 
tendency has been investigated by integrated circuit manufacturers who have 
employed inspectors to monitor the progress of wafers through the process 
line and count the number of particle defects on the wafer surface at each 
stage. This is done by shining a bright light obliquely across the wafer sur­
face. The particles on the surface scatter the light and can thus be counted.
FAULT STATISTICS 27
Figure 3.5: Typical Wafer Defect Maps showing Tendency to Cluster.
The wafers shown in figure 3.5 show results from a rather ‘dirty’ process line 
but clearly indicate the clustering tendency of the particles (Stapper, 1983). 
Stapper suggests that the clusters are caused by aggregates of particles which 
have collected in the manufacturing machinery and have been shaken loose 
by vibrations, pressures changes or gas flow changes. It is thought that these 
groups of particles will form clouds in gases and liquids used in the process 
line. Where the clouds reach the wafer surface, particles will be clustered.
Stapper also reports that edge clustering can occur while wafers are being 
held in ‘boats’ , the carriers used to support wafers during processing. While 
in these boats, wafers can become contaminated from one side only and so 
will have more defects on the exposed edge than on the other parts of the 
periphery. Edge clustering has also been detected by Gupta and Lathrop 
(1972).
Clustering of defects is important in integrated circuits since it leads 
to higher chip yields than would be expected with purely random defect 
distributions. This is because the regions of clustered defects leave other 
regions relatively free o f defects and these regions have a correspondingly
FAULT STATISTICS 28
higher chip yield.
3.3.3 R adial V ariations in Point D efect D istributions
It has been reported by several authors that the distribution of point defects 
on a wafer is dependent on the distance of the observed region from the 
centre o f  the wafer (Yanagawa 1972, Perloff et al 1981, Ferris-Prabhu et al 
1987). The variations have been analysed by measuring the yield of devices 
across many wafers and plotting a wafer map containing the average yield 
obtained at each chip site. Their results are shown in figure 3.6 and it can 
be seen that all curves show distinct reductions in yield towards the edge of 
the wafer. It is also clear that a slight reduction in yield is experienced at 
the centre of the wafer. Explanations for these yield reductions have been 
proposed as follows. Perloff believes that the reduction in yield towards the 
edge o f the wafer is primarily due to material defects, image distortion and 
mask registration problems. The yield reduction at the centre of the wafer 
has been associated with variations in thickness of the resist layers occurring 
during processing. During resist application, the wafer is first placed on a 
chuck and rotated at high speed. Liquid resist is then poured onto the centre 
of the wafer and the excess is spun off by the action o f the rotational forces 
leaving a fairly uniform layer thickness over most of the wafer. However, the 
rotational force acting on the resist varies as a function of distance from the 
centre o f the wafer, being close to zero at the centre. This results in a slightly 
thicker resist layer being present towards the centre of the wafer. Thicker 
resist layers will contain on average more defect due to their larger volume 
per unit area. In addition, in contact processes, where the photolithographic 
mask is placed in close contact with the wafer surface during exposure, more 
damage is likely to be caused to the wafer surface in areas of thicker resist.
FAULT STATISTICS 29
NORMALISED 
AVERAGE YIELD
NORMALISED 
AVERAGE YIELD
Figure 3.6: Variation of Chip yield and Defect Density with Radial Position 
on the Wafer, (a) Yanagawa (1972), (b) Ferris-Prabhu (1987), (c) Perloff et 
al (1981).
3.4 Modelling Integrated Circuit Yield
Ever since the development of the first integrated circuit, chip manufacturers 
have been interested in being able to model the process yield mathematically. 
Their interest in yield models is not merely academic but important for three 
main reasons:
1. Process con tro l. Measurements from test chips included on the 
wafers passing through the process lines are stored in a database and 
used to evaluate the components of the yield model. These components
FAUL7 STATISTICS 30
are plotted as a function of time and frequently reviewed and serve as 
an early warning of problems in the line. The problems can be rectified 
before serious yield losses occur.
2. P rodu ct scaling. When a product is being manufactured on a process 
line, it is possible to estimate the yield of another product if it were to 
be processed on the same line.
3. P rodu ct planning. When new products are being planned, yield 
models can assist in setting targets for future production.
In this thesis our interest in yield models is different. We require a knowl­
edge of the relationship between chip yield and chip area so that a sensible 
choice of module area can be selected for use in a fault tolerant, large area 
integrated circuit. If too large a module area is chosen, insufficient functional 
modules will be available for configuration into the functional array. On the 
other hand, a choice of too small a chip area will result in a larger overhead 
of configuring circuitry per module.
Modelling integrated circuit yield is not a trivial task for two main reasons 
as follows.
1. Manufacturers are reluctant to divulge information about the yield 
of chips fabricated on their process lines because they believe that 
this would adversely affect their position in the highly competitive 
integrated circuit market. This means that most manufacturers have 
independently developed their own models for their processes. This 
results in many models with few common links.
2. To be accurate, a yield model must take account of the vast variations 
between'wafers, batches and process lines as well as variations with time 
and with operator. This can result in very complex models containing 
many variables, some o f  which can be very difficult to determine.
As a result there are almost as many different yield models as there are 
researchers working in the area. Another problem seems to be that the
FAULT STATIST' ,’S 31
development of some models has been based on a very small sample of data 
and are not generally of use. It is a balance between model complexity and 
ease of use which is required.
Mathematical yield models have been proposed by many researchers. 
Price (1970) maintained that integrated circuit defects should follow Bose- 
Einstein statistics, while Gupta and Lathrop (1972) and Murphy (1971) 
thought that Maxwell-Boltzmann statistics should be used. Stapper (1983), 
on the other hand, investigates the use of generalised negative binomial statis­
tics and shows that they are applicable to a wide range of chip sizes.
In the following subsections we present the simplest and most intuitive 
model based on Poisson statistics, a modified version of the Poisson model 
and finally a model which has gained wide acceptance, the generalised neg­
ative binomial model.
3.4.1 Poisson D istrib u tion
The Poisson distribution is the simplest model of integrated circuit yield. It 
assumes that defects occur independently of each other at random positions 
across the wafer. The general form of the Poisson distribution applied to 
integrated circuits is
where P (X  =  k) is the probability that k defects will occur per chip, and 
A is the average number of faults per chip. The yield of the chips clearly 
occurs when k =  0, and is therefore given by
(3.1)
Y  =  P (X  =  0) =  « '* (3.2)
A is often written as
A =  AD (3.3)
where A is the chip area and D  is the average number of defects per unit 
area. The Poisson yield is illustrated graphically as a function of chip area 
in figure 3.7 for various values o f  D.
FAULT STATISTICS 32
NORMALISED 
CHIP YIELD
Figure 3.7: Chip Yield as a Function o f Area as given by the Poisson Model.
It has been known for many years that the Poisson model is too simplistic 
to accurately represent the defect distributions found in integrated circuits. 
Stapper (1986) has shown this quite clearly using data from memory chips. 
Figure 3.8 shows the measured distribution of single-bit failures compiled 
from 450 memory chips fabricated on a modern process line. From the 
data, the average number of defects per chip is 28.6, while the percentage 
of chips with no faults, ie the yield, is 27.5%. Using the value of 28.6 for A 
in equation 3.2, we obtain a yield of 6 or 3.8 x essentially zero!
The discrepancy between the measured and calculated values for chip yield 
shows that thè Poisson model does not represent the data.
It is clear from the data in the above illustration that the deviation from 
Poisson statistics is due to the clustering of defects. Clustering results in 
non-uniform defect densities with higher concentrations of defects in localised 
areas. This gives rise to other areas with relatively low defect densities, with
FAULT STATISTICS 33
N U M B ER  O F  FAILING C H IP S
Figure 3.8: Typical Distribution of the number o f  Faults per Chip. (Average 
over 450 chips).
higher chip yields. For this reason clustering is actually beneficial in terms 
o f increasing yield.
3.4.2 C om pou nd  Poisson S tatistics
Compound Poisson statistics attempt to improve the basic Poisson model 
by making the average number of defects per chip a random variable. This 
enables the density of defects to vary over the surface of the wafer in a 
random manner. The wafer can be considered to be divided into a number 
of independent regions each having a random fault distribution, but each 
containing differing average numbers of faults. Each region is given an index 
number,», and within each region, the Poisson distribution is assumed to be 
valid with an average number of defects equal to  A,. Associated with each 
region * is a probability distribution, P<. The compound Poisson distribution 
is then given by
m  =  *) =  £  Pi' - ’“ [K )klk\ (3.4)
•=1
resulting in
FAULT STATISTICS 34
Y =  P {X  = 0) = £  P*9~Xt- (3.5)
«=i
The form of Pi is completely general and depends entirely on  the nature of 
the fault distributions observed in manufactured chips. The problem arises 
in determining Pi since both the form and the parameters o f the distribution 
must be matched to the process data.
3.4.3 Generalised Negative Binom ial S ta tistics
Moore (1970) suggested the use of negative binomial statistics for modelling 
integrated circuit yield in the form
Y  =  y0( l  + 4JD /a)-*,o > 0 (3.6)
where Yo is a gross particle yield and a is a constant cluster parameter. Low 
values o f a  are associated with severe clustering, while as a  —► oo, the model 
approaches the random defect distribution of Poisson statistics. For real 
wafers, a  is typically in the range 0.5 to 4 (Ketchen, 1985). The yield model 
is illustrated in figure 3.9 as a function of chip area for various values of a. 
The values of Y0, D  and a must be determined from measurements made on 
wafers taken from the process line to be modelled.
Data of yield versus chip area can be determined for a process by a 
method using chip multiples which operates as follows. Wafers containing 
a large number of identical chips are processed on the production line to 
be modelled. Each chip is then tested and a map produced o f the position 
of functional chips. From this the yield of the chips on each wafer can be 
calculated from N//Nt, where Nf and Nt are the number o f  functional chips 
and the total number of chips on the wafer respectively. T h e yield from the 
wafers can then be combined to give an average chip yield. The next step is 
to place a grid over each wafer in which each rectangle of the grid surrounds 
exactly two chips. The yield of the double-area chip can then be estimated 
by counting the number of rectangles in which both chips are functional.
FAULT STATISTICS 35
NORMALISED 
CH IP  YIELD
Figure 3.9: Chip yield as a function of chip area as given by the Generalised 
Negative Binomial Model.
This process can be repeated for larger chip multiples to produce a set of 
yield data for different chip sizes. The data can then be used to estimate 
the values of Y0, D  and a for the model by, for example, a non-linear least 
squares fitting procedure.
Stapper (1986) has used this procedure and shows that the m odel is rep­
resentative over a chip area range of 1 to 36, the widest range ever published 
for which a single yield model is applicable. The model has also been found 
to give a good fit to data by several other researchers including Turley et al 
(1974) and Paz and Lawson (1977).
3.5 Implications for Fault Tolerant Techniques
In this chapter we have described the causes and effects of defects in inte­
grated circuits and have presented some models which have been proposed
FAULT STATISTICS 36
for estimating yield. Unfortunately, it is inevitable that much of what has 
been described has been general in nature due to the lack of real data avail­
able in the literature. To generate real data by carrying out tests on a process 
line would represent a separate thesis and is therefore not appropriate in the 
context of this work.
However, from the point of view of fault-tolerant integrated circuits, two 
points are important:
1. chip yield varies with radial position on the wafer, and
2. chip yield as a function of area can be determined using multiple chips 
for any given process. The ability of yield models to predict yields 
outside the range of available data is less important.
Furthermore, the wafer defect maps shown in figure 3.5 indicated that 
large areas of a wafer can often have a high density of defects. This makes 
the task of designing wafer scale circuits a difficult one since the clusters 
of faulty elements resulting from the defects seriously limit the ability to 
configure a functional circuit. For this reason, it seems likely that truly 
full wafer circuits will be limited to linear arrays and memory, for example, 
since such applications have few topological constraints. For two-dimensional 
arrays which are the subject of this thesis, it is more likely that large chips 
will be the main application, since these can make good use of the areas of 
lower defect density on the wafer.
C hapter 4
A  R eview  o f  Hardw are  
Fault-Tolerance for Processor 
A rrays
4.1 Introduction
In chapters 2 and 3 we have described the increasing interest in large inte­
grated circuits, possibly up to the size of an entire wafer and have highlighted 
the problems of achieving this goal due to defects which inevitably occur in 
the silicon substrate or which are introduced during processing. It would be 
comforting to think that these defects could one day be eliminated. However, 
although standards of cleanliness during processing are being continually in­
creased as a result of moving to smaller device geometries, it is unlikely that 
defect densities will reduce to zero for two main reasons. Firstly, the control 
of defects is a very difficult task, secondly, the use of reduced device geome­
tries means that even if some of the larger defects can be controlled, smaller 
defects become more significant.
For these reasons, the ability to tolerate faults is essential if large area 
circuits are ever to be produced successfully. Fault tolerant techniques will be 
needed not only to overcome fabrication faults in highly complex monolithic 
circuits produced in the context of Wafer Scale Integration, but also to cope
T h e  m ain  body of this review chapter i* to be published a* a  tutorial on Wafer Scale 
Integration aa part of a book in  Evan* (1989a).
37
HARDWARE FAULT-TOLERANCE: A REVIEW 38
with in-service failures so that in the event of a failure, a system can resume 
correct operation after a short interruption, having been reconfigured around 
the detected fault.
In this chapter we classify and then review the techniques currently avail­
able for incorporating hardware fault tolerance into processor arrays, includ­
ing the switch organisation and the method of implementing the switches. 
Many of the approaches which have been proposed in the scientific literature 
have not yet been demonstrated in real hardware. However, the details of 
some demonstration systems and devices have been published and these are 
included in the review wherever appropriate.
4.2 Classification of hardware fault tolerance 
schemes
Hardware fault tolerance techniques can be classified in two ways as follows:
1. according to the strategy for fault avoidance defined by the way in 
which the switching elements are organised,
2. according to the way in which switching elements are implemented.
Any particular fault tolerant scheme will be a member of a class from each 
category. The classes of switch organisation scheme are shown in figure 4.1 
while the methods of switch implementation are shown classified in figure 4.2.
The switch implementation classification is essentially that of Katevenis 
and Blatt (1985) and is presented from left to right in order of increasing 
lateness of binding. This means that those techniques on the left hand side 
o f the tree are fixed at the time of manufacture or configuration and are 
essentially permanent for the rest o f the life of the device. Switch implemen­
tations further to the right become fixed progressively later in their life and 
have increasing facilities for re-configuration.
HARDWARE FAULT-TOLERANCE: A REVIEW 39
Figure 4.1: Classification o f switching schemes for fault tolerant strategies 
in 2-dimensional processor arrays.
4.3 Switch Organisation and Configuration 
Schemes
In this section we briefly describe some of the approaches to configuration 
which fall into the classes shown in figure 4.1.
4.3.1 N od al Fault Tolerance
The objective of Nodal fault tolerance is to increase the yield of the individual 
processing nodes within an array to 100% so that an acceptable overall array 
yield is achieved without having to configure the connection between nodes.
S
W
IT
C
H
HARDWARE FAULT-TOLERANCE: A REVIEW 40
Fi
gu
re
 4
.2
: 
Cl
as
sif
ica
tio
n 
of
 s
wi
tc
h 
im
pl
em
en
ta
tio
n 
m
et
ho
ds
 u
se
d 
in
 f
au
lt 
to
le
ra
nt
 in
te
gr
at
ed
 ci
rc
ui
ts
.
HARDWARE FAULT TOLERANCE: A REVIEW 41
The simplest method for doing this is to use Triple Modular Redundancy, or 
TMR for short. TMR involves using three processors in place of each of the 
original single processing nodes, together with a voting circuit. The voting 
circuit provides the output of the TMR node by delivering the majority 
verdict of the outputs of the three processors, all o f  which execute the same 
function on identical data. In this way any one o f the processors can be 
faulty, but the voting circuit will still output the correct result. A TMR 
fault tolerant array is shown in figure 4.3(a) with an individual TMR node 
being shown in figure 4.3(b).
The TMR scheme has the advantage that it is very simple to implement 
since no configuration of the processors in the array is required at all, and 
no testing of the processors is required. It also has the advantage that it 
can tolerate in-service faults, both permanent and intermittent, without the 
user being affected or even needing to be aware that a fault exists. However, 
against these advantages there are some serious drawbacks. TMR systems 
require a large hardware overhead since each processor is now triplicated. 
Furthermore, the statistics involved in 2-out-of-3 majority voting schemes 
indicate that the yield of the processors involved must be quite high in order 
to gain any nodal yield advantage at all from using the scheme, and that even 
at best, the gain in yield is not dramatic. This can be seen from figure 4.4 
which shows the processor yield and the TMR node yield. It can be seen 
that a processor yield of 50% is required before the node yield exceeds the 
processor yield. In addition it can be seen that the maximum gain in yield 
is achieved when the original processor yield is about 87%, at which point 
the node yield has risen to 95%.
The curve in figure 4.4 assumes that the voting circuits have a perfect 
yield, which would not be the case in practice. The effect of faults in the 
voting circuity can be reduced however by employing a voting circuit at 
the input to each of the triplicated processors. This scheme is illustrated 
in figure 4.5 and allows voting circuit faults to be tolerated since they now 
appear as faults on the input of a processor and will be treated as such by
HARDWARE FAULT-TOLERANCE: A REVIEW 42
Figure 4.3: Technique of Triple Modular Redundancy: (a) TMR array, (b) 
TMR node.
HARDWARE FAULT-TOLERANCE: A REVIEW 43
YIELD (X )
PROCESSOR YIELD (X )
Figure 4.4: Nodal yield improvement achieved with TM R. 
later voting circuits.
The very high hardware overhead of TMR for a small gain in yield has 
prompted the study of other methods of implementing nodal fault tolerance. 
One alternative method is to use two processors in place o f the original 
processor, as against three for TMR. The idea is then to use a switch to select 
only one of the two processing elements for use in the array. This approach 
requires that the user knows which of the processing elements is working 
correctly and therefore implies that testing of the processing elements must 
be carried out. This could be done either by external test or by some form of 
self-test procedure. A technique using two processors per site was successfully 
used by Grinberg et al (1984) to increase the yield of individual wafers in 
their 3D computer based on stacked wafers. They used discretionary wiring 
to select between the two PEs on the node.
HARDWARE FAULT TOLERANCE: A REVIEW 44
PROCESSORS
ELEMENT
Figure 4.5: Technique for tolerating voting circuit faults in TMR. 
4.3.2 R ow  or  C olum n Replacem ent Schemes
As we have seen, nodal fault tolerance minimises or even eliminates the need 
to consider how to reconfigure an array to avoid faults and tries to sufficiently 
increase the yield of the nodes so that they can be connected directly into 
an array. Most hardware fault tolerance techniques however, rely upon some 
form of alteration to the connections between processors in order to generate 
a subset of the main array which is fully functional. In this way, faulty 
processors are completely isolated from the functional part of the array. The 
simplest technique of this type is the row-selection or row-bypass method.
Row Selection
The row selection technique is widely used in memory chips for yield en­
hancement (Moore, 1986), and is illustrated in figure 4.6. There are many
HARDWARE FAULT-TOLERANCE: A REVIEW 45
ADDRESS OUTPUT
Figure 4.6: Schematic of a fault tolerant memory showing row de-selection 
technique.
ways in which the row selection procedure can be organised and several of 
these are discussed by Fitzgerald and Thoma (1980). The idea is to incor­
porate spare rows (or columns) into the array of memory elements and to 
use these rows to replace any rows from the main array which are found 
to contain faults. This technique is very simple to implement in memory 
chips because there are no signal interconnection paths between cells, and 
spare rows can be selected simply by programming the decoding circuitry 
appropriately. The decoder is often programmed by blowing electrical fuses.
Many memory manufacturers claim that the row selection technique is 
useful in the early stages of production of a device for increasing yield and
HARDWARE FAULT-TOLERANCE: A REVIEW 46
give figures ranging from 30 fold yield increase in immature processes, reduc­
ing to 1.5 fold yield increase in a mature process (Smith, 1981). However, 
NEC claim not to need fault tolerance at all; see Posa (1981), and Rogers 
(1982).
Row Bypass
In memory chips, rows are simply selected from those available so that suffi­
cient cells are available for storing information. In processor arrays, a similar 
technique can be applied, but in addition, the connectivity between process­
ing elements must be maintained. This means that when a row containing a 
fault is de-selected, its input signals must also be diverted to an alternative, 
functional row. This can be achieved most simply by employing bypass cir­
cuitry around each row so that the whole row can be bypassed in the event 
of it containing one or more faults. All spare rows are initially bypassed and 
the bypass is removed when the row is brought into operation.
A row bypass scheme like that described above was proposed by Mc- 
Canny and McWhirter (1983), who also present figures which indicate that 
significant yield increases can be obtained. Figure 4.7(a) illustrates their 
approach and shows how multiplexers are used for the bypass mechanism. 
Figure 4.7(b) presents a graph of estimated yield increase which can be 
achieved, assuming a processing element complexity of about 10 gates. It 
can be seen that chips with a yield of around 10% without fault tolerance 
could yield at about 40% if fault tolerance were to be included. Similarly, 
chips with an initial yield of 1% might be increased to 20%. It is the latter 
of these two predictions which is likely to interest chip manufacturers since 
it could enable them to produce a larger device than previously possible and 
still retain an economic yield. This could enable the company to stay ahead 
of its competitors.
Moore et al (1986) extends the basic row bypass technique by considering 
the effect of an imperfect multiplexer yield on the overall array yield. He 
proposes various modified bypass circuits, some involving more than one
HARDWARE FAULT-TOLERANCE: A REVIEW 47
Figure 4.7: Row bypass scheme of McCanny and McWhirter (1983): (a) 
Array circuitry, (b) Estimated yield improvement.
HARDWARE FAULT-TOLERANCE: A REVIEW 48
Figure 4.8: Hierarchical scheme of Hedlund and Snyder (1982).
multiplexer per cell, but which are thereby able to tolerate many of the 
faults which could occur in the bypass circuitry.
An important requirement of the bypass technique is that the yield of 
the individual processing elements is very high so that there is only a small 
number of faults in the array compared with the number of rows in the 
array. It is this constraint which enables the simple method of discarding 
entire rows to be beneficially employed. If too many faults are present in the 
array it is likely that many or even all of the rows will contain faults, and no 
increase in yield will be achieved.
4.3.3 H ierarchical Fault Tolerance
Hierarchical fault tolerance techniques have similarities to nodal fault toler­
ance. In the scheme proposed by Hedlund and Snyder (1982), illustrated in 
figure 4.8, processing elements are grouped into blocks of twelve out of which 
only four are required to work. The four working processors are then inter­
HARDWARE FAULT-TOLERANCE: A REVIEW 49
connected as a 2 by 2 subarray within each block and the blocks are then 
interconnected to form a 2-dimensional array. If any block in a row of blocks 
is unable to configure a 2 by 2 subarray, the whole row of blocks is bypassed. 
This scheme offers two levels o f hierarchy and potentially allows a functional 
array to be configured from an array containing a large number of faulty 
devices, but also requires a very large overhead of redundant processors. It 
should be noted that although Hedlund and Snyder have chosen to bypass a 
whole row of blocks if a single block in that row cannot be configured into a 
2 x 2  subarray, it would also be possible to use a more sophisticated strategy, 
such as one of those described in the next sections, for avoiding faulty blocks. 
In this way an improved yield characteristic might be achieved.
4.3.4 R ow  G eneration Schemes
In these fault tolerant schemes, the idea is to generate functional rows of pro­
cessing elements in which each functional row is constructed by taking one 
functional processor from each column of the array. The functional rows are 
then interconnected down the columns in the vertical direction to form the 
2-dimensional array. Any faulty or unused processors encountered when in­
terconnecting down a column are bypassed. Several row generation schemes 
are presented in the literature by Sami and Stefanelli (1983), Moore and 
Mahat (1985) and Bentley and Jesshope (1986). The schemes of Moore and 
Mahat are reproduced in figure 4.9 together with two columns of processors 
which have been configured by the techniques.
Scheme A is the simplest and operates as follows. Cell 2 can be connected 
to any one of cells 4,5 or 6. If cell 2 is to be connected to cell 5, switches F 
and G are opened, and E and H are closed, similarly, a connection between 
cells 2 and 4 requires that switches F,E,D and C be open and H,G,A and B be 
closed. One of the drawbacks o f this simple scheme is that when a connection 
involving the use of the row-shift line is made, two other cells immediately 
become unusable, and one of these could be functional. To overcome this, 
scheme B employs extra switches and communication wiring to enable all
HARDWARE FAULT-TOLERANCE: A REVIEW
ROW SHIFT 
UNE
ROW SHIFT 
UNE
COLUMN 1 ! COLUMN 2
ROW SHIFT 
UNES
SO
(a)
(*>)
(c)
Figure 4.9: Row-oriented configuration scheme of Moore and Mahat (1985): 
(a) Scheme A, (b) Scheme B, (c) Scheme C.
HARDWARE FAULT-TOLERANCE: A REVIEW SI
cells to use the row-shift line simultaneously.
Scheme C is more sophisticated and allows double row shifts. This pro­
vides the cells with a greater degree of connectivity which offers increased 
flexibility to avoid faults.
The scheme of Sami and Stefanelli (1983) is similar to that of Moore and 
Mahat but allows as many row shifts as necessary. It therefore has superior 
performance but due to the high overhead is only suitable for array with a 
small number of spare rows.
4.3.5 G loba l Organisation
Schemes classified under the heading of global organisation offer a much 
greater flexibility as to how the cells are interconnected than other ap­
proaches. The most general scheme is probably that proposed by Katevenis 
and Blatt (1985) which is reproduced in figure 4.10.
The idea of global organisation is to provide the array with buses which 
run the entire length of the array between both the rows and columns. 
Switching nodes are inserted at the intersections of the buses so that con­
nections can be selectively made between separate buses, and between buses 
and processing elements. In principle the global nature of the scheme means 
that a processor anywhere in the array could be connected to any other pro­
cessor. This would provide an excellent ability for avoiding faults, but could 
lead to timing problems due to extra transmission delays being introduced 
into signal lines.
The operation of a global configuration scheme can be described generally 
as follows. First the buses are tested by an external tester, and then the 
working buses are used to give access to the switching nodes. These are 
tested and the combination of working buses and switching nodes is used to 
apply test patterns to the processing nodes. Finally the array is configured 
by setting the switches to the appropriate positions.
Many authors have proposed similar global configuration schemes. These 
include Hsia et al (1979), Raffel et al (1983) and Gaverick and Pierce (1985).
HARDWARE FAULT-TOLERANCE: A REVIEW 52
SWITCHES PROCESSORS BUSES
Figure 4.10: A global interconnection scheme after Katevenis and Blatt 
(1985).
The schemes differ mainly in the way in which the switching elements are 
implemented rather than having significant differences in configuration strat­
egy.
4.4 Switch Implementations
In this section we review the methods by which the switches used in a con­
figuring scheme can be implemented. The approaches have improved very 
significantly over the past two decades and can now provide a highly reliable 
interconnection medium.
HARDWARE FAULT-TOLERANCE: A REVIEW S3
4.4.1 H ard  C onfigurable Schemes
The earliest proposals for fault tolerant circuits involved the use of hard 
configurable schemes, in particular discretionary wiring. Later proposals 
suggested using fuses at various parts of the circuit under the control of 
electrical heating or laser cutting. The laser cutting technique has been 
included in this section on hard configurable switching schemes since it has 
until recently been an irreversible process. However, recent reports indicate 
that the laser cutting processes may be reliably reversed, and this process 
will also be discussed.
Discretionary Wiring Approaches
The earliest attempts at increasing the area of integrated circuits were based 
on the principle of discretionary wiring. The idea is to place more circuit 
elements on the chip or wafer than actually required to perform the function 
and to test each of these elements by probing the wafer. The results o f the 
test can then be represented as a wafer map and a metal mask can be de­
signed which would interconnect the working devices. Sack (1964) proposes 
this approach for enabling whole wafer circuits to be produced. He demon­
strates a complete wafer containing 108 gates interconnected in the form of 
a shift register. There are many variations on the basic discretionary wiring 
technique. Some are described in Petritz (1967), Lathrop (1967) and Cal­
houn (1969). All of these approaches appear to offer advantages at the level 
o f integration available at the time (about 5000 gates on a 1 inch diameter 
wafer).
One of the problems with discretionary wiring is that it relies on there 
being very few faults in the wiring layer, which although achievable at the 
device geometries of the late 1960’s, is unlikely to be successful at 1/xm 
geometries and with 4-6 inch diameter wafers. Another problem is that 
the probe testing of the devices on the wafer causes damage to the wafer 
surface which increases the probability of a fault occurring during subsequent
HARDWARE FAULT-TOLERANCE: A REVIEW 54
processing. An interesting approach along the theme of discretionary wiring 
is the approach proposed by Barsuhn (1978), in which he fabricates a wafer of 
memory chips. Faulty chips are replaced by good, individual mirror image 
chips which are flip-chip bonded over the faulty device. Barsuhn claims 
success with this method for a 2.25 inch diameter wafer.
Electrical Fuses
Electrical fuses are commonly employed as the method of implementing 
the necessary switching in yield enhancement techniques for memory chips, 
Moore (1986). They have also been extensively used in PROMs and Pro­
grammable Array Logic (PALs) for defining logic functions, although erasable 
techniques based on stored charge have now largely taken over. The electrical 
fuse technique is based upon heating the fuse, which is commonly made of 
aluminium or polysilicon so that it melts and creates an open circuit. When 
used in conjunction with pull-up or pull-down components, a change in logic 
level can be achieved and subsequently used to control other circuits. Al­
though such fuses can in principle be combined in a circuit to allow a reversal 
of the effect o f blowing a fuse by blowing a second, the fusing process itself is 
essentially irreversible. This means that the fusing technique cannot be used 
to isolate parts of a wafer for testing purposes and subsequently reconnect 
them.
Laser Linking and Cutting
The technique of using a laser beam to either cut or weld signal paths has 
been extensively studied at MIT Lineal Laboratory by Raffel and his re­
search team in the Restructurable VLSI (RVLSI) approach to large area in­
tegration (Raffel, 1983). The approach now seems to be a strong contender 
for Wafer Scale Integration due to its reliability and ease of execution. The 
structures used for linking and welding, together with details of the proce­
dures used and some results are presented by Chapman (1985).
HARDWARE FAULT-TOLERANCE: A REVIEW 55
LASER BEAM
SUBSTRATE
Figure 4.11: Cross-section of laser link from Chapman (1985).
1 Laser Power > 1.2 W
Pulse Width ss 1 ms
Open Link Resistance > io H n
Formed Link Resistance <  1 n
Failure Rate <  0.01 %
Capacitance ss 35 JF
Table 4.1: Laser link parameters.
The MIT stucture for making links between first and second metal layers 
is reproduced in figure 4.11 and details of the laser pulse required and the 
link parameters are given in table 4.1. During the linking process, for which 
an argon laser focussed to a 10/zm spot size is used, successive melting of 
the second layer tnetal, the amorphous silicon insulator and part of the first 
layer metal occurs. This creates a silicon-aluminium alloy which provides 
the conducting path. An important feature of the melting process is that 
it occurs over a relatively long time period with a low power pulse (1ms as 
against 100ns for commercial laser cutting systems). This avoids the splatter 
which normally accompanies metal vaporisation. MIT claim that the failure 
rate of the process is below that of the processing defects occurring during 
link fabrication.
The link structure described above also enables cuts to be made by using 
the laser to melt either the first or second layer metal just before it enters
HARDWARE FAULT-TOLERANCE: A REVIEW 56
the link structure. Cuts have been successfully carried out using the same 
low power as used for linking so that splatter is avoided.
4 .4 .2  Firm  C onfigurable Sw itch ing Schemes
These schemes are characterised by switch reversibility combined with non­
volatility of the switch setting. From the point of view of testing and con­
figuring a wafer, this type of switching scheme is attractive, since areas of 
the wafer can be temporarily isolated while a detailed local test is carried 
out. Mistakes in configuration, or faults occurring after configuration can 
also be conveniently dealt with. There are two main approaches to Firm 
Configurable Switches, the Floating Gate FET and the MNOS transistor. 
These are descibed more fully in the following paragraphs.
Floating Gate FET
The operation of a floating gate FET switch (Shaver, 1984) is similar to a 
normal FET in that it is the voltage on the gate of the FET which deter­
mines whether the transistor is on or off. The difference is in the manner 
in which the gate voltage is applied. In a normal FET the gate voltage is 
controlled directly by applying a potential to a wire connected to the gate 
electrode. However, the gate of a floating gate FET is not connected to any 
source of potential but is determined by the amount of charge stored on the 
gate itself. This charge is deposited by irradiating the gate with a beam of 
electrons of the appropriate energy. Normally-on or normally-off FETs can 
be fabricated by selecting the appropriate channel polarity. Under irradia­
tion by an electron beam, an n-channel depletion device is turned on, while 
a p-channel enhancement device is turned off.
An important feature of floating gate FETs is that the switch can be 
reversed by discharging the gate. This can be achieved in one of two ways; by 
standard ultra-violet irradiation or by electron beam irradiation. In the first 
o f these techniques, the UV radiation allows the gate to discharge through 
photo-injection through the gate oxide. The UV radiation can be applied by
HARDWARE FAULT-TOLERANCE: A REVIEW 57
Figure 4.12: MNOS transistor.
a flood lamp or alternatively it can be localised so as to selectively discharge 
a single gate. The second technique, is attractive since it can be carried out 
in the same machine as originally used to charge the gate. A low energy 
beam is used which generates a secondary emission of electrons from the 
gate which is larger than the irradiating beam current. Since under these 
conditions, more electrons leave the gate than arrive at it, the charge on the 
gate reduces.
Although the floating gate FET switch is very attractive and will prob­
ably be acceptable for commercial devices, the retention time of charge on 
the gate may be too short for military devices (Shaver, 1984). However, the 
use of the floating gate FET in wafer scale integration is being investigated 
as part o f  the ESPRIT project number 824. An overview of this project is 
presented by Trilhe and Saucier (1987).
The M N O S  Transistor Switch
The MNOS transistor illustrated in figure 4.12 is commonly used in Elec­
trically Alterable Read Only Memories (EAROMS). These devices can store 
information for many years but can also be altered in a simple manner by the 
application of the appropriate programming signals which tend to be about 
25 to 40 volts. The programming voltages cause injection of electrons into
HARDV.tRE FAULT-TOLERANCE: A REVIEW 58
the boundary region between the silicon nitride layer and the silicon dioxide 
layer. When the programming voltage is removed, the charge is retained 
since the boundary region is isolated. Erasure is achieved in a similar man­
ner, with stored charge being repelled from the boundary and absorbed into 
the substrate, (see Muroga, 1982)
Although the MNOS transistor switch is simple in operation, it does have 
drawbacks in the context of wafer scale integration. The main problem is 
that a connection to control the programming would be required for each 
transistor and these would have to be accessible from the edge of the wafer. 
For small numbers of switches this may be feasible, but for large numbers, 
the problem will be serious.
4.4.3 S oft-C onfigurable Sw itch ing Schem es
The main type o f  soft-configurable switching scheme uses externally con­
trolled electrical switching elements. This type of switch implementation is 
probably the one which most people would first think of. The idea is to 
design the switching nodes using ordinary logic gates. These are then con­
trolled from an external source so that the desired connections are made 
between the processors. The great advantage o f electrical switches is that 
they use only standard circuit components which are the same as those used 
for the remainder of the circuitry. In addition, since no specialised equipment 
is required, re-configuration can potentially be carried out in the field if in 
service faults occur. However, their main disadvantage is the same as for 
MNOS transistor switches, that the wiring needed to control them can be­
come a serious problem. This type of switch implementation technique has, 
however, been used successfully by Anamartic (formerly Sinclair Research) 
in their wafer scale disk memory. Their switches are configured under the 
control of an external tester and the scheme is described in Aubusson and 
Catt (1978).
HARDWARE FAULi'-TOLERANCE: A REVIEW 50
4.4.4 V ote-C onfigu rable Switching Schem es
Circuits of this type have already been considered under the heading of nodal 
fault tolerance. However, although it is not immediately obvious, it is worth 
remembering that they are a form of electrical switch. Their advantage over 
externally controlled switches is that no wiring or global control is required; 
all the information they require to output the correct result is available 
locally. It is unfortunate that the hardware redundancy associated with 
the processing node to be used with this type of switch is so high as to 
be impractical in most cases, especially where the yield of the individual 
processors is low.
4.4.5 Self-O rganising Sw itching Schem es
This type of switch is the subject of the remaining chapters of this thesis. The 
idea combines the convenience that external switches offer in terms of ease 
of implementation and potential for reconfiguration in the field but has the 
great advantage that no external control of the switches is needed. This not 
only removes the need for large numbers of extra pins just for configuration 
purposes, but also means that external computation to calculate the desired 
configuration pattern is not necessary; the ability to make decisions about 
the configuration pattern resides within the array cells themselves.
4.5 W SI Demonstrators
Much of the research which has been carried out on wafer scale integration 
has been limited to paper exercises backed up in many cases by computer 
simulations. There have been relatively few examples o f actual devices being 
built although this is now changing and several demonstrations devices are 
currently being developed. In this section we describe some of the devices 
which have been fabricated and comment on those under current develop­
ment.
HARDWARE FAULT-TOLER,>JCE: A REVIEW 60
4.5.1 Trilogy
Trilogy is probably the best known company involved in wafer scale integra­
tion. Their very ambitious project to build an IBM compatible mainframe 
computer on a single wafer received much attention in the press. The circuit 
was partitioned into about 1500 blocks containing between 10 and 50 gates 
each and triple modular redundancy was the method used to increase the 
block yield. In order to reduce the effect of clustered defects, the triplicated 
blocks were not placed adjacent to each other.
Unfortunately, Trilogy were unsuccessful in their attempt and investors 
have been wary of WSI ever since. Two main reasons have been suggested 
by Peltzer (1983) for Trilogy’s failure. The first is that the spacing of the 
triplicated blocks led to an increased transmission time between blocks and 
as a result the project fell short of its target of an IBM compatible device 
due to lack of speed. The second reason is that the technology chosen for 
implementing the circuitry was ECL, and resulted in a power consumption 
of about lkW on a 4 inch wafer. This led to tremendous problems of thermal 
management.
4.5.2 A nam artic and the Solid State D isk M em ory
In the UK the wafer-scale memory built by Anamartic (formerly Sinclair 
Research) is probably the best known. The reason for this is that right from 
the start the device has been specifically aimed at the consumer market and 
as a result has received much attention in both the technical and national 
press. In addition, a novel configuration technique has been used and this has 
captured the attention o f  many observers. Working wafers with 0.5 megabits 
of storage were demonstrated in 1985. A higher density wafer is currently 
being developed as a commercial product which Anamartic hope will be able 
to replace conventional disks and have both much improved reliability and 
access time.
The technique employed by Anamartic for enabling the wafer to provide
HARDWARE FAULT-TOLERANCE: A RLVIEW 61
sufficient yield is commonly called the Catt Spiral which was proposed by 
Aubusson and Catt (1978). The scheme generates a linear array of inter­
connected memory blocks starting from one block at the edge of the wafer 
and adding extra blocks to the chain one by one. The configuration is 
implemented using electrical switches which are controlled by an external 
tester/controller.
The implementation procedure operates as follows. The controller ini­
tially tests one of the chips on the periphery of the array. If the chip is 
faulty, another peripheral chip is chosen until a functional device is found. 
Instructions are then sent to the functional chip to tell it to connect itself 
to one of its neighbouring chips. The way in which the neighbour is chosen 
is described later. The chosen neighbour is then tested by the external con­
troller by sending test patterns through the first chip, and into the neighbour. 
Test results are passed along the reverse route. If the neighbour is faulty, an 
alternative neighbour is chosen until a functional neighbour is found. This 
chip is then added to the chain. The chain is then further built up by re­
peating the process. If at any point in the configuration procedure a chip at 
the end of the chain is found to have no functional neighbours, it is removed 
from the chain and the previous chip in the chain is used as the new chain 
end. This backtracking ability can also enable the chain to escape from dead 
ends which may exist on the array.
The order in which neighbours are selected as candidates for the next 
position in the chain determines the shape of the final chain. In the Ana- 
martic design, the most right-hand neighbour is selected and this results in 
a chain of good chips which hugs the outer edge of the array and spirals in 
towards the centre of the wafer. Figure 4.13 shows a wafer which has been 
configured in this way. For wafer scale integration it has the advantage that 
all the interconnects between elements of the chain are checked at the time 
of testing and as a result a chip cannot be included in the chain unless all 
wires are intact. Another advantage of the way in which the linear array is 
built up block by block is that the external switch control signals are applied
HARDWARE FAULT-TOLERANCE: A REVIEW 62
Figure 4.13: Wafer configured as a linear processor array using the spiral 
technique of Aubusson and Catt (1978).
serially. This avoids the need for large numbers of pins on the wafer.
Working prototype devices have been successfully fabricated and have 
demonstrated that the yield of the control circuitry on the cells is adequate, 
with around 90% of cells having working circuitry. As far as is known, no 
details of performance have been published, but in a public demonstration 
of a device, it was clear that a large proportion of the memory elements were 
also functional and could be connected to the spiral chain.
The use of electrical switches means o f course that the spiral pattern 
generated by the test and configuration procedure is volatile and will need 
to be reapplied each time the device is powered up. The device could be 
retested each time, but Anamartic have chosen to store the configuration 
pattern in a ROM. The contents of the ROM can then be loaded into the 
wafer before it is used.
HARDWARE FAULT-TOLERANCE: A REVIEW 63
4 .5 .3  M IT  and Lincoln Laboratory
The work at Lincoln Labs on wafer scale integration is based on their tech­
nique of Restructurable Very Large Scale Integration or RVLSI for short. 
Several demonstrators have been built successfully and a review of the progress 
o f the project is presented in Rhodes (1086).
One of the demonstrators based on the RVLSI approach is a digital inte­
grator consisting of 256 10-bit counters partitioned into 64 cells. Each cell 
contains four 10-bit counters. The complete device contains 130,000 tran­
sistors on a 4 inch wafer using 5fim CMOS technology. The configuration 
switches use 1,900 laser anti-fuses and 137 laser fuses.
4 .5 .4  G T E  L aboratory
G TE are implementing a pipelined processor in W SI (Cole, 1985). The 
pipelined element contains a high speed sequencer, a micro-code RAM, a 32 
bit ALU, and status and storage registers. Each element contains 150,000 
transistors and 60 elements can be implemented on a 3 inch wafer. A self-test 
procedure is incorporated in each element. This checks the element itself and 
also the interconnections to the neighbouring elements. If a fault is found, 
an electron beam programmable switch is used to disconnect the offending 
processing element.
4.6 Conclusions
In this necessarily brief review of published techniques for incorporating 
hardware fault tolerance into processor arrays we have seen that the main 
difference between the techniques reviewed is in the method of organising 
the switches to obtain a better utilisation or harvest o f  the functional pro­
cessors and in the way in which the switches are actually implemented. A 
common feature of many of the approaches is that the array is configured 
before being used in a system and is then essentially fixed for the rest of 
its operational life. In principle, the schemes implemented using electrical
HARDWARE FAULT-TOLERANCE: A REVIEW 64
switching elements could be reconfigured, but would require to be removed 
from the system before reconfiguration could take place. This is because the 
necessary test equipment for fault location, and the means for calculating 
the configuration pattern are separate from the array and are unlikely to be 
provided within the system.
For this reason there appears to be a niche for a fundamentally new 
approach to the problem of WSI and fault tolerance in general in which 
external control of the switches is not required and where configuration can 
be carried out by the array itself. Such a scheme would have obvious labour 
saving benefits in a WSI circuit but could also be extremely useful in other 
applications involving processor arrays including silicon hybrid circuits, and 
systems constructed from pcbs.
In the next chapter, we consider novel algorithms by which a two dimen­
sional processor array containing faulty processors can organise itself into a 
functional array without any external assistance. In later chapters, the basic 
self-organising algorithms are developed into practical systems.
C h a p ter 5
Self-Organising Algorithms for 
Two-Dimensional Processor 
Arrays
5.1 Introduction
In Chapter 4 we have seen that many of the approaches to hardware fault tol­
erance in 2-dimensional arrays involve the use of electrical switching networks 
to allow faulty processors to be replaced by spares. In all cases the switches 
are set by some external controller which decides which switches should be 
used and how the array should be configured. The aim of this project has 
been to investigate techniques which enable an array to automatically organ­
ise itself around the faulty elements and generate a functional 2-dimensional 
array from an array containing many faults. Ideally the array would be able 
to do this without any external assistance. This would avoid the need for an 
external controller and would also provide a system which could readily be 
reconfigured if a fault occurs in service.
We have developed what we believe to be a novel solution to the 2- 
dimensional array configuration problem. In our technique, which we call 
WINNER, an acronym for Wafer Integration by Nearest Neighbour Electrical 
Reconfiguration, (Evans, 1985), each cell within the array is provided with 
some intelligence in the form of a small amount of additional control cir-
T h e  m ain body o f th is  chapter is to be published in book form in Evana (1989b).
65
SELF-ORGANISING ALGORITHMS 66
cuitry. This is sufficient to  enable each processing element to independently 
and simultaneously make local decisions about how it should be connected 
to neighbouring elements tsdiing into account its own functionality, the func­
tionality of its neighbours and the connection priorities to these neighbours 
which are defined in specific algorithms. The effects of these local decisions 
propagate throughout the array and manifest themselves globally as a com­
pleti self-organisation o f the functional processing elements into a correctly 
interconnected functional 2-dimensional processor array. We call such arrays 
Self-Organising Arrays.
A number of algorithms incorporating the concepts of self-organisation 
can be derived. For the purposes of this thesis we concentrate our attention 
on two related algorithms which illustrate the technique. The first scheme 
applies the WINNER algorithm in one dimension of the array only, and gen­
erates functional rows o f processors. A simple fault bypassing technique is 
then used to form the second dimension, (Evans, 1985). It is the simpler 
of the two algorithms and is presented in section 5.3. The second method 
applies the WINNER algorithm in two dimensions and is discussed in sec­
tion 5.4, (Evans et al, 1985).
We describe the algorithms at a high logical and operational level. De­
tailed circuit level descriptions and results of performance simulations etc 
are presented in later chapters.
5.2 Definitions
The following terms will be used in subsequent description of the algorithms.
P rocessor: This is the circuit which performs the node function when the 
array is operating in-service. It communicates with four neighbours to 
the North, South, East and West.
C ontrol circuit: This comprises the extra logic which is added to the pro­
cessor to provide it with the self-organisational ability.
SELF-ORGANISING ALGORITHMS 67
Figure 5.1: Relationship between Cell, Control circuitry and Processor.
Cell: A cell is the combination o f  the processor and its control circuitry and 
is illustrated in figure 5.1.
The assumption that the processor has only orthogonal connections does 
not limit the generality of the approach since all other nearest neighbour 
array interconnection structures can be reduced to the orthogonal form. An 
example of a hexagonally interconnected array and its orthogonal equivalent 
are shown in figures 5.2(a) and (b) respectively. The transformation has 
been carried out in the following manner. Firstly, the connections which 
are already orthogonal remain unchanged. Next, each diagonal connection 
directly connecting a cell to its south-eastern neighbour is re-routed via the 
eastern neighbour. This means that a dummy connection is required in each 
cell as shown in figure 5.2(d) instead of the diagonal connections shown in 
figure 5.2(c).
SELF-ORGANISING ALGORITHMS 68
Figure 5.2: Hexagonally interconnected array and its orthogonal equivalent.
5.3 Algorithm 1: W IN N E R  in One Dimension
The WINNER algorithm in one dimension is so called because it results in an 
array in which a number of functional rows have been generated. These rows 
avoid faulty cells but do not themselves form a two-dimensional array. The 
second dimension of the array is formed by interconnecting the functional 
rows in the vertical direction, bypassing any faulty cells encountered in the 
process. In order to describe the algorithm, we present an array which has 
been configured using the algorithm and explain how the configuration has 
been achieved by giving a simple pencil and paper description. We then 
show how this procedure can be implemented as a circuit and describe the 
operation in detail.
SELF-ORGANISING ALGORITHMS
Figure 5.3: Configuration of a 2-dimensional array using WINNER: (a) Con­
figuration of a perfect array, (b) Array containing faults, (c) Rows configured 
around the faults, (d) Vertical connections made: array configuration com­
plete.
Figure 5.3 shows the main stages of an array which is being configured 
by the one-dimensional WINNER algorithm. Figure 5.3(a) shows how an 
array with no faulty processors would be configured. Figure 5.3(b) shows an 
array containing faults which is to be configured in the example, with faulty 
processors being indicated by a cross. On paper the functional rows can be 
generated as follows, with the cell numbers under consideration being shown 
in figure 5.3(c) and being referred to in brackets. Starting with the left hand 
column of the array, choose the functional cell nearest to the top of the array
SELF-ORGANISING ALGORITHMS 70
(1). From this cell, look at the column to the right and make a connection 
to one of the three (in general) nearest neighbour cells in that column, (in 
this case 2 or 7, since the upper boundary of the array limits the choice 
to two cells). In making the choice always choose a functional cell, with a 
preference for the cell nearest the top of the array. Repeat this procedure 
until the right hand side of the array is reached, (2,3,4,10). At this point, 
one complete functional row (1,2,3,4,10) has been generated. Subsequent 
rows are formed in a similar manner treating the cells used in a previous 
functional row as if they were faulty cells (ie avoid using them). The second 
row would therefore be (6,12,8,9,15). The procedure minimises the distance 
of the rows from the top of the array and therefore maximises the number 
of rows which can be generated on the array.
In some rows a dead end may be encountered. This is illustrated in 
the generation of the third functional row in figure 5.3(c) and occurs when 
processor 17 connects to 13. None of the neighbours of processor 13 are 
AVAILable and so the row must backtrack to 17 and try 18. The row can 
then be continued to completion. In a large array several back trackings 
may be required on some rows. A fourth row in this diagram cannot be 
constructed because a complete dead end is encountered. The row must 
therefore backtrack to the boundary of the array.
Each functional row formed in this way contains a processor taken from 
each column of the array. The second dimension of the required array can 
therefore be generated by making vertical connections between processors of 
each column and bypassing faulty processors and processors which although 
not faulty are neverthless unused. This process completes the construction 
of the 2-dimensional array and is shown in figure 5.3(d). It is clear that the 
resulting array will be smaller than the original array due to the presence 
of the faulty elements but it should be noted that the z  dimension of the 
functional array is identical to that of the given array, while the y dimension 
depends on the number of faults which occur in each column of the array.
SELF-ORGANISING ALGORITHMS 71
5.3.1 Self Organisation
We now consider how this pencil and paper procedure can be embodied 
within a circuit so that it can operate automatically. To simplify the problem 
we make the following assumptions, the validity of which will be discussed 
in chapters 8 and 9.
1. Each processor contains some method enabling it to indicate reliably 
whether or not it is working. This could in principle be achieved by a 
self test procedure.
2. All connections between processors are fault-free,
3. Each control circuit associated with a processor is fault free,
4. All connections between control circuits are fault-free.
5. All data signals flow from left to right and top to bottom. This sim­
plifies the description of the algorithms but in no way makes them less 
general.
In order to enable faulty cells to be avoided and to allow control circuits 
in adjacent cells to communicate, a cell with greater connectivity than the 
original processor is required. A schematic view of a cell with sufficient con­
nectivity to perform the one-dimensional self-organising function is shown in 
figure 5.4. We can see that although the North-to-South (N-S) connection is 
unchanged, extra channels have been provided on the Eastern and Western 
sides for both inter-processor and inter-control circuit communication. These 
connections allow the cell to communicate with its nearest neighbours to the 
NW, W and SW, and the NE, E and SE directions respectively. Control cir­
cuits in neighbouring cells communicate via single-bit control lines indicated 
in figure 5.4 as REQueat (REQ) and AVAILability (AVAIL) signals.
SELF-ORGANISING ALGORITHMS 72
IPNW REQNW IPN REQNE IPNE
Figure 5.4: Schematic of cell suitable for 1-dimensional WINNER algorithm. 
5.3.2 C ontrol C ircuitry
The function of the control circuitry in each cell is to decide how the cell 
should be connected to its neighbours in adjacent columns. In addition it 
must decide whether or not it should act as a bypass in the North-South 
direction. These decisions must be made on the basis of the following infor­
mation, which is the only information available to the control circuitry:
• a knowledge of whether the processor in its cell is functional or faulty,
• REQuest inputs from its North-West, West and South-West neigh­
bours, .
• AVAILability inputs from its North-East, East and South-East neigh­
bours.
Using this information the control circuitry must perform the following 
functions:
SELF ORGANISING ALGORITHMS 73
• generate REQuest and AVAILability signals and route them to neigh­
bouring cells,
• select data inputs from the appropriate neighbour and apply them to 
the processor,
• bypass the processor in North-South direction if the processor is faulty 
or unused.
Convention for REQuest and AVAILability signals
The following convention for REQuest and AVAILability signals will be used 
throughout this thesis:
REQuest and AVAILabilty signals are active when at a logic 1 
level and passive when at a logic 0 level.
Based on this convention, the phrases, to send an A VAILabilty signal and, 
to send a REQuest imply that the signals being sent are TRUE.
If a cell A outputs an AVAILability signal to another cell B it means that 
cell A contains a processor which is AVAILable for connection if REQuested 
by cell B. If a cell A outputs a REQuest signal to some other cell B it means 
that cell A wishes to set up a communication channel between its processor 
and the processor in B. Cell A can only send a REQuest to cell B if B is 
sending an AVAILability signal to cell A. If such a communication channel 
becomes set up then A is said to have been connected to B and the incoming 
data signals to cell B from cell A are directed to the processor in cell B.
The manner in which the REQuest and AVAILability signal are generated 
by a cell forms the heart of the algorithm and is described in detail in the 
following sections.
Generation of AVAILability Signals
A cell generates AVAILability output signals according to the following rules:
SELF-ORGANISING ALGORITHMS 74
REQuest Inputs AVAILability Outputs
REQNW REQW REQSW AVAILNW AVAILW AVAILSW
TRUE 
FALSE 
FALSE 
FALSE 
X =  TRU
X X 
TRUE X 
FALSE TRUE 
FALSE FALSE ; 
E or FALSE
TRUE
TRUE
TRUE
TRUE
FALSE
TRUE
TRUE
TRUE
FALSE
FALSE
TRUE
TRUE
Table 5.1: Generation of AVAILability output signals.
1. A cell can only output a TRUE AVAILability signal if it contains a 
processor which is fault-free (ie the self-test shows it to be functional), 
and at least one TRUE AVAILability signal is being received from its 
NE, E or SE neighbours.
2. If (1) is satisfied, then the priority system given in table 5.1 operates to 
decide in which directions to send TRUE availability signals depending 
upon the incoming REQuest signals.
From the bottom line of table 5.1 we can see that a cell receiving no 
REQuests will output a TRUE AVAILability signal to each of its three left 
hand neighbours. This allows any of the neighbours to make a REQuest if 
it wishes to do so at a later time. From the first three lines of the table 
we see that once a REQuest has been received, the cell may or may not 
remain AVAILable to the other neighbours depending on the priority of the 
REQuest. A REQuest from the NW neighbour has the highest priority, with 
the W and SW neighbours having successively lower priorities.
The scheme allows a priority of connections to be established so that a 
REQuest from the NW has highest priority, and REQuests from the W and 
SW have successively lower priorities. Such a scheme is required to ensure 
that a stable solution is reached. The scheme causes cells to output FALSE 
AVAILability signals to neighbouring cells if they have no chance of obtaining 
a connection. This occurs for example when a higher priority connection has 
already been established.
SELF-ORGANISING ALGORITHMS 75
Rule (1) above gives the cell a global look-ahead capability even though 
each cell is capable only of local communication. This enables clustered 
faults to be avoided in the following way. Information is passed between cells 
from East to West about the availability of other cells. This allows a cell A 
to prohibit another cell from connecting to it if A either contains a faulty 
processor or would be part of a dead-end route, ie a route that would not 
be able to be completed due to some blockage later. Such a dead end route 
could occur, for example, if three vertically adjacent processors were faulty. 
In this case a functional processor to the left of the centre faulty processor 
would find that all of its possible connections to neighbours are unavailable. 
The functional processor would then declare itself to be unavailable. The 
scheme allows information about blockages to be transmitted from right to 
left to all the relevant processors, which then decide upon some appropriate 
avoiding action. These features will be illustrated in section 5.3.5
Generation of REQuest Signals
Request signals are output from a cell according to a different set of rules:
1. A cell can only output a TRUE request signal if its processor is fault- 
free and at least one request has been received from one of its NW, W 
or SW neighbours.
2. If (1) is satisfied then the cell outputs a single TRUE request value 
to one of its NE, E and SE neighbours depending upon the incoming 
AVAILability signals according to the priority given in table 5.2
These rules ensure that only one REQuest signal is output from any cell, 
which in turn ensures that a cell can never accidentally become connected 
to more than one neighbouring cell in any column.
6.3.3 Array B oundary Conditions
When a number of WINNER cells are interconnected to form an array, the 
inputs around the edges of the array are not connected to other cells and must
SELF-ORGANISING ALGORITHMS 76
AVAILability Inputs REQuest Out puts
AVAILNE AVAILE AVAILSE REQNE REQE REQSE
1 TRUE X X TRUE FALSE FALSE
FALSE TRUE X FALSE TRUE FALSE
FALSE FALSE TRUE FALSE FALSE TRUE
FALSE FALSE FALSE FALSE FALSE FALSE
X = TRUE or FALSE
Table 5.2: Generation of REQuest output signals.
be defined explicitly. The values of the boundary REQuest and AVAILability 
lines are defined as follows.
Boundary AVAILability Inputs
• Set NE and SE boundary AVAILability inputs to FALSE,
• Set E boundary AVAILabilty inputs to TRUE.
Boundary REQuest Inputs
• Set NW and SW boundary REQuest inputs to FALSE,
• Set each W boundary REQuest input to TRUE if corresponding W 
AVAILability output from the array is TRUE; otherwise set to FALSE.
5.3.4 In teraction  o f  R E Q uest and AVAILability Sig­
nals
The availability and request signals together provide the cells with all the 
information they need about their surroundings in order to be able to form 
functional rows of interconnected processors. The priority system for sending 
and receiving request and availability signals ensures that stable functional 
rows are established from west to east and from north to south starting in 
the top left hand corner af the array. The priority system also ensures that 
each row formed is as close as possible to the northern edge of the array, 
thus maximising the number of rows generated.
SELF-ORGANISING ALGORITHMS 77
The one dimensional WINNER algorithm operates automatically when 
the array is switched on and is a totally asynchronous technique. All cells 
continuously make decisions based on the information available to them. 
This information will be changing as the organisation of the array gradually 
evolves to its stable state. Therefore, in the early stages of self organisa­
tion the array may be highly dynamic with cells forming and relinquishing 
connections to other cells as a result of being overridden by higher priority 
decisions which have been made at other localities and have rippled through 
the array. Connections may in fact experience a number of iterations of this 
type but will always settle into a self-consistent, stable state.
The array can be visualised as having two levels of hierarchy. One level 
comprises the underlying asynchronous network of control circuitry which is 
capable of establishing communication channels between appropriate neigh­
bours to generate a functionally orthogonal array as described. The second 
level is the array of processors containing a number of faulty elements which 
can be considered to be overlaid on the array of control circuitry which then 
forms connections between processors as appropriate.
5.3.5 Serial D escription  o f W IN N E R  O peration
Although the interactions of REQuest and AVAILability signals between the 
cells of the array occur simultaneously, it is helpful from the point of view 
of understanding the operation of the algorithm to consider the process as a 
sequence of distinct events as follows.
We assume that initially no REQuest or AVAILability signals are present 
in the array other than fixed boundary input values. From this starting 
point, no REQuest signals can be generated by any cells until at least one 
AVAILability signal has reached the left hand side of the array; this is the 
first stage of the configuration process. The AVAILability signals from each 
cell to its neighbours are generated starting from the right hand column of 
the array and working column by column across the array. For each column, 
the three availability outputs of each cell in the column are set to TRUE
SELF-ORGANISING ALGORITHMS 78
Figure 5.5: Step-by-step generation of AVAILability signals showing avoid­
ance of dead-end routes.
only if the cell has at least one TRUE incoming AVAILability signal. This 
procedure has the effect of flushing out the dead end routes from the array. 
Any dead end route will result in all the cells on that route outputting FALSE 
AVAILability signals. This is illustrated in figure 5.5 which shows the column 
by column movement of AVAILability signals across the array. The faulty 
cells (marked in black) are obviously un-AVAILable; however, in addition, 
some of the functional elements are shown as being un-AVAILable. The cell 
labelled A is un-AVAILable since all of its right hand neighbours are faulty, 
while that labelled B is un-AVAILable because although not all right hand 
neighbours are faulty, none of them are AVAILable.
In the second stage of the process REQuest signals are generated starting 
from the left hand column and working from left to right across the array. 
This is illustrated in figure 5.6 in which we consider an input REQuest being 
applied only to the uppermost cell in the left hand column which is out- 
putting a TRUE AVAILability signal. Further REQuest signals are then 
generated in subsequent columns by following a path of TRUE AVAILabiltiy
SELF-ORGANISING ALGORITHMS 79
Figure 5.6: Generation of REQuest signals for first functional row.
signals according to the rules given in table 5.2. In this way a complete func­
tional row can be constructed since a cell outputting a TRUE AVAILabilty 
signal indicates that a continuous path exists between the left and right hand 
sides of the array.
Subsequent functional rows are generated by alternate cycles of AVAIL- 
ability and REQuest generation until all possible functional rows have been 
formed. This procedure ensures that any processors which have become 
unavailable as a result of the presence the first functional row will out­
put the appropriate AVAILability signals.
5.3.6 5-N eighbour W IN N ER  algorithm
In the foregoing description of the WINNER algorithm only nearest-neighbour 
interconnections between adjacent columns were permitted. This resulted in 
a cell which could communicate with three neighbouring cells in both the 
column to the left and the right. However, this restriction is not due to 
any fundamental limiting property of WINNER and the algorithm can be 
extended in.a straightforward manner to a scheme which allows a cell to 
communicate, for example, with any of 5 neighbouring cells in the column 
to the left or right.
The main advantage of a 5-neighbour interconnection scheme is that con­
figuration performance will be improved. This is due to the longer range
SELF ORGANISING ALGORITHMS 80
communication which has been introduced which allows some cells to be 
used in the configured array which would otherwise have been omitted from 
the array. The truth tables for the control logic can be extended in the obvi­
ous manner to handle the extra inputs and outputs and will not be described 
in detail.
5.4 Algorithm 2: W IN N E R  In Two Dimen­
sions
Algorithm 1 makes the basic assumption that it is always possible to bypass 
faulty cells in the vertical direction. In many cases this may be a valid as­
sumption and the technique could be used successfully in many applications. 
However, when we started this work our basic philosophy was that the con­
figured array should completely avoid all faulty processors. In this section we 
will show how the one dimensional WINNER algorithm can be extended to 
encompass this philosophy by applying it to both the rows and the columns 
simultaneously, so that bypassing of the faulty processors is avoided. To see 
how this is done the reader is referred to figure 5.7.
Figure 5.7(a) illustrates a typical small array which has been configured 
in the horizontal direction using the WINNER algorithm in 1 dimension. 
Functional rows have been generated which each contain a number of working 
processors equal to the width o f the original array. Figure 5.7(b) shows the 
same array which has been configured in the vertical direction using the 
same one dimensional WINNER algorithm but this time operating on the 
columns. Here, columns are constructed which keep as close as possible to 
the left hand side of the array and each functional column is equal in length 
to the height of the original array.
Since the configurations generated by the algorithm consist of full width 
rows and full height columns, one might suppose that the superposition of 
the rows and the columns would generate an array o f functional processors 
at the points of intersection of each row with each column, and that the
SELF-ORGANISING ALGORITHMS 81
\  \  X \  ® X
\  \  \
° y X  x > o
\  \  i
X X >  >  <  O
y  i \
° /  x > x >
✓  ✓  i
< x  /  o X >
I ✓  ✓
i  of o x  </ o
(b)
Figure 5.7: The WINNER algorithm applied in two dimensions: (a) Config­
ured rows, (b) Configured columns, (c) Superposition of the functional rows 
and columns to create a functional array.
SELF-ORGANISING ALGORITHMS 82
processors within the array region would have orthogonal interconnections. 
The simple superposition of separately configured rows and columns is shown 
in figure 5.7(c), where the processors forming the final array are indicated 
in black. Some processors have either horizontal or vertical connections but 
not both. These are discussed in section 5.4.1 and are controlled to act 
as bypasses in the direction in which they have connections. The size of 
the functional array is determined by the number of functional rows and 
columns generated. If p is the number of functional rows and q the number 
of functional columns, then simple superposition should generate a functional 
array of dimensions p x q. A cell suitable for use with this algorithm would 
have extra communication channels and control signal paths for both the 
horizontal and vertical directions and is illustrated schematically in figure 5.8.
At first sight this simple superposition process appears to be quite straight 
forward. However, there are in fact two types of undesirable condition which 
can occur when the rows and columns are superimposed in such a simple 
manner. These have been called double site and crossover conditions and 
must be handled within the algorithm if a correctly configured array is to be 
produced for all fault distributions. The example illustrated here contains 
several double sites. As we shall see, a simple technique has been developed 
which results in an algorithm capable of configuring a functional array from 
any fault distribution.
5.4.1 D ou b le  Site C ondition
Referring to the small array configured by simple superposition of func­
tional rows and columns shown in figure 5.9(a) we see that there are pairs of 
processors, for example A and B, which both belong to the same functional 
row and the same functional column. The effect of this is that there are two 
processor sites, A and B, where only one is required. The problem can be 
overcome by instructing one of the processors to act as a bypass in both the 
horizontal and vertical directions. In our implementation we always instruct
SELF-ORGANISING ALGORITHMS 83
REQNW
IPNW
AVA1LNW
REQW
IPW
AVAILW
REQSW
IPSW
AVAJLSW
I ! i I i i It
h i  i s i  h i
Figure 5.8: Schematic of cell suitable for 2-dimensional WINNER algorithm.
the upper processor to become the bypass although the lower processor could 
be chosen equally. The rule for doing this is as follows:
If a cell finds that it is REQuesting to be connected to a cell 
to its SW or SE in both its row and its column configuration 
algorithms, it will become a bypass for its horizontal and vertical 
connections.
At first sight it may appear that since the process of avoiding double sites 
requires functional processors to be discarded the functional array size might 
be reduced as a result. However, these processors could not have formed part 
of the functional array anyway and discarding them does not affect the array 
size of p x q.
SELF-ORGANISING ALGORITHMS 84
Figure 5.9: Double-site and crossover inodes which can occur with simple 
superposition of functional rows and columns: (a) Double-site condition, 
(b) Fundamental crossover mode 1, (c) Fundamental crossover mode 2, (d) 
Composite crossover mode, (e) Composite crossover mode.
SELF-ORGANISING ALGORITHMS 85
5 .4 .2  Crossovers
Crossovers occur when a row and a column intersect each other at a point 
other than a processor site. Crossovers can occur in two distinct ways as 
shown in figures 5.9(b) and (c). Unlike the double site condition which can 
be overcome without altering the configured rows and columns, the solution 
to the crossover condition requires either the functional row or functional 
column containing the crossover to be physically altered so that the super­
position crossover does not occur. Alteration of the row or column may of 
course produce disturbances which propagate throughout the array until a 
new stable configuration is achieved.
The two fundamental modes in which crossovers can occur are shown in 
figure 5.9(b) and (c). Other crossovers can occur, such as those shown in 
figure 5.9(d) and (e) but these are simply superpositions of the fundamental 
modes. In mode 1, figure 5.9(b), the crossover could be avoided by making 
cell 3 unavailable to cell 2 in the presence of the link between cells 1 and 4 
(link 1-4). Alternatively, cell 4 could be made unavailable to cell 1 in the 
presence of link 2-3. In a similar mannjfer, mode 2 crossovers, figure 5.9(c), 
can be avoided by making cells 3 or 4 unavailable in the presence of links 1-4 
or 2-3 respectively.
We have proposed two solutions to the crossover problem, either of which 
can be embodied within the final two dimensional WINNER algorithm. The 
first requires an additional single-bit control signal to pass between cells in 
the East-West and North-South directions while the second requires a change 
only to the logic of the control circuitry in each cell.
Crossover Avoidance using Extra Control Communication
This technique requires the use of an extra, single-bit control signal which 
must pass between cells in both the horizontal and vertical directions. This 
allows the rows to be generated as in the one-dimensional WINNER algo­
rithm but restricts the generation of columns to sites which will not cause
SELF-ORGANISING ALGORITH MS 86
crossovers. As we have seen from figure 5.9(b), the crossover could be avoided 
if cell 3 was made unAVAILable to cell 2, and in figure 5.9(c), the crossover 
could be avoided if cell 4 was made unavailable to cell 1. The first of these 
can be achieved by using a single extra bit which propagates between cells 
from North to South. The bit indicates whether or not the cell which gen­
erated it is outputting a row REQuest in the SE direction and if so causes 
the column AVAILNE signal in the cell below to be inhibited. The second 
crossover mode may be avoided in a similar manner by passing an extra bit 
from West to East, indicating whether a cell has output a row REQuest to 
the NE, and if so inhibiting the column AVAILNW in the cell to its right. 
The cost o f this technique is two extra inputs and outputs plus two (A AND 
NOT B) logic functions to perform the inhibitory action.
Crossover Avoidance by Alteration to Control Circuitry
The ideal solution to the crossover problem would be one involving a simple 
alteration to the control circuitry in each cell without requiring any extra 
connections between cells, since this is likely to introduce the smallest over­
head of area into the algorithm. An approach incorporating these concepts 
has been developed and is now described.
To avoid mode 1 crossovers we wish to make cell 3 unavailable to cell 2 
in the presence of link 1-4. This can be achieved without using extra control 
bits by noting that if link 1-4 exists, then because this is the highest priority 
direction for the row generation circuitry in cell 4, cell 4 will output a FALSE 
AVAILabilty signal to cell 3. If the 1-4 link does not exist, then cell 4 outputs 
a TRUE AVAILabilty signal to cell 3. This means that the 1-4 link can be 
detected by cell 3 by the value of the AVAILabilty signal coming from cell 4. 
The AVAILability signal can therefore be used to modify the AVAILability 
of cell 3 in the NE direction of the column generation circuitry, ie to cell 2. 
This technique can be implemented by ANDing the NE column AVAILability 
output signal o f each cell with the incoming Eastern row-AVAILability signal. 
This gives the row link priority over the column link, which must then find
SELF-ORGANISING ALGORITHMS 87
an alternative route.
In a similar manner, mode 2 crossovers can be avoided. In this case, a 
FALSE column AVAILability output from cell 4 to cell 2 indicates that the 
link 1-4 is present. This information can then be used in cell 2 to inhibit 
(with a single AND gate) its row AVAILabilty to cell 3. An alternative row 
route will then have to be found.
5.5 Advantages of the W IN N E R  Approach
The WINNER self-organising approach to the configuration of 2-dimensional 
processor arrays is quite different from other published techniques and has 
several distinct advantages as follows:
• The array can configure itself automatically without the need for ex­
ternal assistance,
• The self-organising algorithm is fully convergent and cannot become 
unstable,
• As we shall see in chapter 7, the control circuitry associated with each 
cell in the array is simple, about 20 gates, resulting in a low hardware 
overhead,
• The technique results in good utilisation of the functional processors 
particularly when processor yield is greater than 80%,
• No global control lines are required,
• The array is potentially capable of being re-configured in the event of 
an in-service failure and could therefore be useful for remotely sited 
equipment, or in equipment requiring a very fast repair time.
5.6 Concluding Remarks
The self-organising algorithms presented in this chapter form the basis for 
this thesis. The algorithms have so far been described at a high level of
SELF-ORGANISING ALGORITHMS 88
operational detail. In later chapters we develop the ideas more fully and 
cover logical and statistical simulations, testing, hardware requirements and 
describe a system which has been built to d 'monstrate the ideas.
C hapter 6
Perform ance o f  the W I N N E R  
A lgorith m
6.1 Introduction
In chapter 5 we described a novel self-organising algorithm called WINNER 
for providing fault tolerance in 2-dimensional processor arrays. In this chap­
ter we address the task of evaluating the performance of WINNER for differ­
ent sizes of array, processor yield and overhead. We also consider a technique 
which could be used to increase the performance of the WINNER algorithm 
for large processor arrays. The technique involves partitioning an array into 
a number of groups of columns. We then consider the configuration ap­
proaches proposed by other authors as described in chapter 4 and compare 
their performances with that of WINNER.
6.2 Simulation
The ideal method by which to evaluate and compare performances of differ­
ent configuring algorithms designed to tolerate faults in integrated circuits 
would be to fabricate chips which have been designed using the techniques 
and observe how well the techniques tolerated real defects. The final prov­
ing of a concept must be done in this way, but initial comparisons can be 
achieved in a much more efficient manner by computer simulation. Simula­
tion is not a perfect tool for evaluating fault tolerant techniques since it is
89
WINNER PERFORMANCE 90
extremely difficult to generate a precise model of the defects introduced by 
the fabrication process. Particularly difficult in this respect are faults affect­
ing global circuitry such as power supplies and clock lines. Furthermore, the 
distribution of defective processing elements can only be estimated since as 
we saw in chapter 3, this varies from process to process. However, even in 
the presence of these difficulties, simulation offers high levels of user inter­
action and flexibility at relatively low cost and remains the most important 
tool used in the literature for evaluating algorithms. In this thesis, simula­
tion is used as the main basis for evaluation and comparison of algorithm 
performance.
6.3 Choice of Language
The ultimate aim of this research into self-organising algorithms is to pro­
duce hardware suitable for use in the design of large integrated circuits and 
possibly even Wafer Scale Integration. It is therefore possible in principle to 
use a hardware description language (HDL) such as ELLA to evaluate the 
performance of WINNER. However, although ELLA is ideal for simulating 
a single array at the hardware level, and can provide essential information 
about the signal levels existing in the array, it is not particularly suited 
to simulating large numbers of arrays with different fault distributions to 
provide statistical information about the algorithm. However, by describing 
algorithms in a behavioral manner rather than in hardware form it is possible 
to use a serial programming language (SPL) for simulation purposes.
SPLs have a number of advantages over HDLs for statistical simulation 
of the type to be carried out, as follows:
1. The flexibility of SPLs means that changes in the algorithms, parame­
ters or statistical requirements can be easily made by simply changing 
parameters of the program; in HDL’s such changes can often produce 
many consequential changes.
2. The simulations require many different random fault distributions around
WINNER PERFORMANCE 91
which to configure an array; these are readily generated in SPL’s but 
not in HDL’s.
3. With SPL’s, the scope for analysing the results of a simulation within 
the program are almost unlimited, whereas in HDL’s, there is almost 
no opportunity even for counting the number of rows which have been 
configured.
4. The efficiency, in terms of the CPU time required is usually better in 
SPL’s than in HDL’s partly because of the way in which the rules are 
specified; in SPL’s the rules can be simpler because they are described 
at a higher level.
5. The amount of memory required for SPL’s is much smaller than that 
for HDL's since the HDL representation o f  the algorithm has all the 
connections between elements of the array explicitly included for all 
elements for all time. In SPL’s, the connections between array elements 
are represented within loops, and only one single connection exists at 
any one time.
For these reasons, the statistical simulations were carried out using the 
SPL, Algo 168.
6.4 Simulation Requirements
The first step in the task of simulating the fault tolerant algorithms is to 
decide what aspects of the algorithms are to be evaluated. It is also im­
portant that the results can be sensibly compared with equivalant results 
produced by.other algorithms. A characteristic which has become popular 
in the literature is the performance of the algorithm in utilising the working 
processors in the array. This is frequently termed the Harvest, and is defined 
as the fraction of the total number of functional cells which have been used 
in the configured array; it is usually expressed as a percentage. The concept
WINNER PERFORMANCE 9 2
of a harvest enables an estimate to be produced of how well an algorithm 
has performed since it is directly related to the number o f  cells which are 
potentially available for use in configuring an array. For example, in an array 
containing 10 rows and 10 columns of processors with a 50% yield, only 50 
processors are of any use, and it is the percentage of these which can be 
configured that provides the figure for the harvest.
It is the view of the author, however, that although the harvest does 
provide a handle into algorithm perfornu^e, it is not the most useful char­
acteristic for evaluation purposes. In some algorithms, the curve of harvest 
against processor yield for a particular array size and overhead decreases 
monotonically with decreasing processor yield. For other configuration al­
gorithms, in particular those based on nodal fault tolerance, this is not the 
case. For Hedlund’s 4-out-of-12 nodal fault tolerance scheme the relation­
ship between harvest and processor yield for a 10 by 10 array with 100% 
overhead has been evaluated by Franzon (1986) and is shown in figure 6.1. 
When the processor yield is 100%, the harvest is 33.3%, since only a third 
of the processors are theoretically being used. The introduction of a single 
fault anywhere in the array causes the harvest to rise since the same func­
tional array can now be constructed from fewer functional processors. As 
the processor yield is reduced the harvest eventually starts to fall since rows 
containing many functional processors become unusable.
However, the main interest of a potential user of a configuring algorithm 
is not the harvest, since it does not tell him directly what size of array he will 
need in order to achieve the required array size with a particular probability. 
For this reason, the work in this chapter is based on configuring target arrays 
of various sizes, with the number of spare rows of cells required to enable the 
target array lo be configured being evaluated for a range o f  processor yields. 
From this information, a potential user can immediately deduce the size of 
array he will need for his application.
WINNER PERFORMANCE 93
HARVEST (X )
Figure 6.1: Harvest of the scheme of Hedlund (1982).
6.4.1 P rogram  Param eters
A program to perform the required simulation has been written and generates 
a table of results as illustrated schematically in figure 6.2. The program can 
be run for each different target array size required. The table o f  results 
consists of a two dimensional array of numbers and is essentially a yield map 
for the appropriate target array as a function of processor yield and rows 
of overhead, in the x and y directions of the table respectively. Each result 
in the table is an average over many arrays having identical overhead and 
processor yield, but differing random fault distributions.
Several pr.ogram parameters can be varied as follows:
1. Size of target array,
2. Range of number of rows of overhead,
3. Range of processing element yield,
WINNER PERFORMANCE 94
PERCENTAGE OF SAMPLE 
OF CONFIGURED ARRAYS 
WHICH REACH 
TARGET ARRAY SIZE
P R O C E S S O R  Y IE LD
Figure 6.2: Schematic of the table of simulation results.
4 . N u m b e r  o f  s a m p le s  (fo r s t a t is t ic a l ly  s ig n if ic a n t  s e t).
These parameters are set at the start of each program run.
6.4.2 P rogram  Flow  C hart
Initially, a program to carry out the simulation was written in a fairly 
straightforward manner which involved calculating the result for each point 
in the result table. However, it became clear that large amounts of CPU 
time were being consumed and it was necessary to optimise the program to 
reduce run times so that large arrays could be simulated. It was noted that 
the tables of results had a characteristic form similar to that shown schemat­
ically in figure 6.3. As can be seen, all the useful information in the table 
is contained within a fairly narrow band bounded by a region of zero array 
yield on the left and 100% array yield on the right. It was therefore clear 
that most of the CPU time is spent calculating predictable values and that 
significant time savings could be made if only values within the band were 
evaluated. The problem is that the position of the band within the table is 
unknown at the start of simulation.
This problem has been overcome by using a program whose flow chart is 
shown in figure 6.4. The essential feature is that the program first searches
WINNER PERFORMANCE 9 5
Figure 6.3: Characteristic form of the table of results.
for the band, and when found, evaluates all entries within the band using a 
recursive procedure. The procedure detects the edges of the band by noting 
the first occurrence of a 0% or 100% array yield, and uses this information to 
avoid doing further unnecessary calculations. The program listing has been 
included in Appendix A for the benefit of the interested reader.
For a typical table, the CPU time has been reduced to less than a quarter 
of that used when all table positions were evaluated.
6.4.3 Square A rrays
A 2-dimensional processor array can have arbitrary numbers of rows and 
columns. However, it is not possible to simulate all combinations of rows 
and columns and in the absence of a requirement for a specific size of target 
array, it was decided that it would be best to simulate a number of different 
sizes of square target array. However, the simulation program is quite general 
and could be used for arrays of any dimension if desired.
WINNER PERFORMANCE 96
Figure 6.4: Flow-chart of the simulation program.
WINNER PERFORMANCE 97
6.5 W IN N E R  Simulation Results
Full tables of results have been generated for a range of target array sizes as 
follows:
• 4 by 4,
• 5 by 5,
• 6 by 6,
• 8 by 8,
• 10 by 10,
• 12 by 12,
• 16 by 16.
Above target array sizes of 16 by 16, the CPU time required to generate 
a full table of results becomes very large, (for example > 1CPU day). For 
this reason, partial tables have been generated for:
• 18 by 18,
• 20 by 20,
• 32 by 32
Each target array size has been evaluated with an overhead between 0% 
and 200%. This range was chosen since it was felt that an overhead of 
more than 200% (ie 3 times the circuitry of the target array) was probably 
generally unacceptable. The partial tables of results for target arrays larger 
than 16 by 16 contain results for 200% overhead only.
A typical table of results is shown in figure 6.5. This is actually for a 10 
by 10 target array, but other sizes of array are similar in shape. A number 
of useful graphs can be drawn from the information in each table as well 
as from the relationships between tables. These graphs are discussed in the 
following section.
W INNER PERFORMANCE 98
OVERHEAD (ROWS)
15 3 . 5 32 5 7 6 . 0 9 8 . 0
14 6 . 5 16 5 71 0 9 4 . 0
13 11 5 5 6 . 0 8 8 . 6 9 8 . 5
12 7 5 4 1 . 5 8 0 . 0 9 8 . 0
11 4 0 2 2 . 0 6 5 . 0 9 4 . 5 9 9 . 0
10 1 6 9 . 0 5 4 . 5 8 9 . 5 9 9 . 0
9 4 . 0 3 1 . 6 8 1 . 6 9 7 . 6
8 2 . 0 1 5 . 0 5 9 . 0 9 2 . 5 99 0
7 4 . 6 3 6 . 0 8 6 . 0 97 0
6 3 . 0 1 6 . 5 5 4 . 6 91 6
5 2 . 5 3 6 . 6 80 0 99 0
4 6 . 5 43 5 89 6
3 6 0 67 0 97 0
2 14 0 61 0
1 7 5 7 1 . 6
0
60 64 68 72 76 80 84 88 92 9 6  100
PROCESSOR YIELD (X)
Figure 6.5: Table of simulation results for a WINNER array.
6.5.1 Graphical Presentation o f  W IN N E R  Results
Several families of curves can be drawn from the tables of results produced 
by the simulation program:
1. Array yield as a function of processor yield for various values of over­
head,
2. Overhead as a function of processor yield for various values of array 
yield,
3. Array yield as a function of target array size.
These are described in detail in the following sections.
WINNER PERFORMANCE 99
Variation of Array Yield with Processor Yield
For each size of target array a curve can be drawn of the array yield achieved 
as a function of processing element yield for different values of overhead. The 
families of curves for target arrays of 5 by 5,10 by 10 and 16 by 16, configured 
using the WINNER algorithm with 3-neighbour connectivity are shown in 
figure 6.6. With a processor yield of 100% every array can be configured 
to produce the target array. However, as the processor yield is reduced, the 
array yield remains at 100% until a critical level of processor yield is reached. 
At this point the array yield begins to drop rapidly. For an overhead of 30%, 
ie 3 spare rows in the 10 by 10 target array, the critical processor yield is 
about 95%, and the array yield becomes virtually zero at a processor yield of 
85%. Arrays with larger percentage overheads have lower values for critical 
processor yield. The main feature to note from these curves is the steepness 
of the fall from 100% array yield to 0%. This means that a very small change 
(say 1 or 2 percent) in processor yield can have a very significant effect (say 
10 or 20 percent) on the array yield.
A similar family of curves can be drawn for each size of target array. Each 
family is similar in form, but the position of the curves in the x direction 
moves to the right for larger arrays, indicating poorer array yields as target 
array size increases. This feature is investigated in a later section.
Overhead as a Function of Processor Yield
Curves of the overhead required as a function of processor yield to achieve 
different values of array yield are probably the most useful to a potential user. 
Such curves are essentially contour maps of the tables of results generated 
by simulatipn. In practice they have been produced from the curves of array 
yield against processor yield described in the previous section since these 
curves allow interpolation between the relatively coarse points of the table.
Figure 6.7 shows the curves for a 10 by 10 target array with contours of 
constant array yield of 10%, 50% and 90%. Curves for other values of array
WINNER PERFORMANCE 100
PROCESSOR YIELD (X )
ARRAY YIELD (X )
PROCESSOR YIELD (X )
(a) (b)
PROCESSOR YIELD (X )
(c)
Figure 6.6: .Array yield as a function of processor yield for different values of 
processor overhead: (a) 5 by 5 array, (b) 10 by 10 array, (c) 16 by 16 array.
PROCESSOR YIELD (X )
( c )
Figure 6.7: Overhead as a function of processor yield for different values of 
array yield: (a) 5 by 5 array, (b) 10 by 10 array, (c) 16 by 16 array.
WINNER PERFORMANCE 102
yield can easily be drawn but have been omitted for clarity. The curves can 
be used to estimate the size of array which would be necessary to generate 
a 10 by 10 target array for a given processor yield. As an example, if a 50% 
average array yield is required and the processor yield is 80%, then it can 
be seen that an overhead of 50% is required, resulting in a starting array 
of 15 rows by 10 columns. Conversely, the required processor yield can be 
estimated from given values of array yield and overhead.
A family of similar curves can be produced for each size of target array 
and the relationship between these different families is the subject of the next 
section.
Variation o f  WINNER perform ance w ith array size
An important result which has emerged from simulating a number of different 
sizes of array is that for any given element yield and percentage overhead, 
the array yield becomes less for larger array sizes. This can be seen from 
figure 6.8 which shows the processing element yield required to achieve array 
yields of 10%, 50% and 90% as a function af array size. As can be seen, 
all the curves show that for a particular array yield, an increased processing 
element yield is required as array size is increased. However, the gradient of 
the curve does reduce rapidly with increasing target array size.
6.6 Improving W IN N E R  Performance
In this section we consider the effect of two techniques designed to improve 
the performance of the basic WINNER algorithm. Both techniques involve 
increasing the connectivity of the cells in the array so that greater scope for 
avoiding faulty cells is available. The first involves increasing the number of 
neighbours with which each cell can communicate from 3 to 5 as described in 
chapter 5. In this technique, apart from each cell having more neighbours to 
choose from during configuration, the WINNER algorithm operates exactly 
as in the 3-neighbour case. The second technique involves partitioning an
WINNER PERFORMANCE 103
SQUARE TARGET ARRAY SIZE (CELLS ON A SIDE) 
Figure 6.8: Variation of array yield with target array size.
array into several groups of columns which are configured separately and 
then joined by a longer range communication network.
6.6.1 5-N eighbour W IN N ER  A lgorithm
The 5-neighbour WINNER algorithm has been simulated in exactly the same 
manner as the 3-neighbour algorithm and the results are shown in figure 6.9. 
Figure 6.9(a) shows the relationship between array yield and processor yield 
for a 10 by 10 array with various levels of overhead, while figure 6.9(b) is 
a contour map of constant array yield as a function of processor yield and 
overhead. The corresponding performance of the 3-neighbour algorithm is 
shown dotted for comparison. As can be seen, at 100% overhead, and 50%
WINNER PERFORMANCE 104
PROCESSOR YIELD (X ) PROCESSOR YIELD (X )
Figure 6.9: 5-Neighbour WINNER simulation results: (a) Array yield as a 
function of processor yield, (b) Overhead as a function of processor yield.
array yield, the 5-neighbour WINNER algorithm requires a 67% processor 
yield whereas the 3-neighbour algorithm requires a 73% processor yield. At 
200% overhead, the required processor yields are 55% and 63% respectively.
6.6.2 Array Partitioning
Although the shape of the curves in figure 6.8 indicates that higher process­
ing element yields or greater overheads will be required to achieve a given 
array yield as the size of the target array is increased, the results can also 
be interpreted in a more optimistic way. Suppose we want to generate an 
N  by N target array with a certain yield. We can estimate the overhead 
and element yield which would be required to achieve this. According to 
figure 6.8, however, these figures can be reduced if we generate four target 
arrays with dimensions N/2 by N/2, and butt them together to produce the 
required array. The curves tell us that using this partitioning approach, the 
N  by N  target array can be generated with lower processing element yield. 
Indeed, we could generate nine N / 3 by N/3 arrays to provide even greater 
advantage.
In practice there is little advantage to be gained by splitting the initial 
array in the horizontal direction, since the same result is achieved by par­
WINNER PERFORMANCE 105
titioning the array into groups of columns. Each of the groups of columns 
is then configured as usual and the blocks connected together using routing 
circuits between each block. The routing circuits simply join the N func­
tional rows of one block to the N  functional rows of the neighbouring block 
and can be designed so that the self-organising ability of the entire array is 
maintained across the partitions. The actual circuitry required for this is 
presented in chapter 7. With more and more partitions, we eventually end 
up with single columns and this is the idea/ row generating algorithm, but 
requires a large amount of circuitry to interconnect the configured groups of 
colmu^is. There is therefore a trade off between array yield and number of 
partitions and this is investigated in the following section.
The reason that the partitioning procedure provides an advantage over 
the straight self-organising algorithm is that it introduces a degree of longer 
range communication into an otherwise nearest-neighbour communication 
algorithm. This allows certain previously intolerable fault distributions, to 
be tolerated by allowing faults to be avoided using long range communication.
The relationship between array yield and array width has been simulated 
for a target array of 12 rows with groups of columns of width 2, 3, 4, 5, and 6. 
The results have been plotted in figure 6.10. It can be seen that for say 50% 
array yield we need about 65% element yield for the 12 by 12 array but only 
48% element yield for an array containing 2 columns block. At first sight 
this sounds like a tremendous improvement since in theory, by placing six 12 
by 2 arrays side by side and routing between them we could produce a 12 by 
12 array from a much lower element yield. In practice, of course, the routing 
circuitry does not have a 100% yield itself, and the increase in array yield 
will be modified by the routing circuit yield. The following section considers 
the overall advantage which could be gained.
Effect of Partitioning
In this section the effect of partitioning on array yield is examined with the 
yield of the column interconnection circuitry being taken into account. The
WINNER PERFORMANCE 106
PROCESSOR YIELD (X )
ARRAY WIDTH (CELLS)
Figure 6.10: Array yield as a function of array width for a target array of 12 
rows.
results are presented in graphical form in figures 6.11 and 6.12. Figure 6.11 
shows how the results are obtained, and is described below, while figure 6.12 
compares the results for various values of starting yield. All the results are 
for 200% overhead and illustrate the relationship between overall array yield 
and the number of groups of columns into which the array is partitioned.
In figure 6.11, curve A shows the increase in array yield which would 
be achieved if the extra circuitry (column interconnection circuitry and and 
control circuitry) had a perfect yield (ie 100%). The curve has been drawn 
by taking an arbitrary value of element yield, in this case 65%, and noting 
from the table of simulation results the value of array yield achieved (in this 
case 50%). The other points on the curve have been found by looking at 
the simulation results of arrays which have been partitioned into 12x6, 12x4, 
12x3, and 12x2 arrays, and noting the new array yield for the same processor
WINNER PERFORMANCE 107
NO OF INTERCOLUMN ROUTING CIRCUITS 
(NO OF PARTITIONS -  1)
Figure 6.11: Effect of partitioning on array yield for a 12 row target array.
NO OF INTERCOLUMN ROUTING CIRCUITS 
(NO OF PARTITIONS -  1)
Figure 6.12: Effect of the number of partitions on array yield.
I > tNNER performance 108
yield of, in this case, 65%. In other words we are observing the change in 
array yield for a fixed element yield as we reduce the partition size.
As shown by curve A, the use of two partition blocks, causes the array 
yield to increase from 50% to almost 100%. Obviously the introduction of 
further blocks cannot increase the array yield any further and the curve is 
flat for these values. The effects of the yield of the column interconnection 
circuitry results in further curves as follows. Curve B shows the estimated 
yield of the column interconnection circuitry which acts as a reducing factor 
on the array yield. The number of column interconnection circuits increases 
with the number of partitions, and its yield therefore drops exponentially as 
shown1. The yield of the control circuitry in each cell is high and since, as we 
shall see in chapter 9, it is possible to employ techniques to mask the effect 
of control circuit faults from the rest of the array, little degradation of the 
array yield will result from such faults. The net array yield is the product 
of the yield of the interconnection circuitry and the basic array yield and is 
shown in curve C.
As cam be seen, the introduction of one partition, ie using two block each 
being half the width of the original target array, provides an improvement in 
array yield from about 40% to about 55%. However, increasing the number 
of partitions does not improve the array yield further because the progres­
sively poorer yield of the routing circuitry begins to dominate, and the initial 
increase in array yield is gradually eroded. The results of figure 6.11 illus­
trate how one curve C has been generated, but of course a whole family of 
curves of this type can be drawn for different element yields. Several of these 
curves are presented in figure 6.12.
The most likely application of this approach is in arrays which have a very 
small or every zero array yield. In these cases, production may be non-viable 
unless the array yield can be increased. As can be seen, the worse the initial 
array yield, the more partitions are required before the peak in the array 
yield is reached. If the element yield is well below that required to achieve 
'T h is  assumes that the column interconnection circu itry  is not fault tolerant.
WINNER PL.lFORMANCE 109
a non-zero array yield, several partitions are required before any increase in 
array yield is achieved, but the increase actually achieved is proportionally 
greater.
6.7 Comparison with other Algorithms
In this section we compare the performance of the WINNER algorithms 
(3-neighbour and 5-neighbour) with the performance of other published con­
figuring algorithms. Comparisons are made with the following algorithms:
1. Simple row-bypassing,
2. Triple Modular Redundancy (TMR),
3. The 4-out-of-12 nodal fault tolerance scheme of Hedlund and Snyder 
(1982),
4. The scheme of Moore and Mahat (1985),
5. The best row generation scheme theoretically possible; Sami and Ste­
fanelli (1983)
6. The best global configuration scheme theoretically possible.
Contours of constant array yield for each of these algorithms will be 
plotted as a function of overhead and processor yield. The overhead required 
to achieve a particular array yield for a given processor yield can then be 
taken as a measure of algorithm performance.
6.7.1 B ounds on P erform ance
When evaluating the performance of any system it is useful to have estimates 
o f highest and lowest possible performances even if these can only be achieved 
in theory. These limits are called the upper and lower bounds. Bounds on 
the performance of configuring algorithms can be determined as follows.
WINNER PERFORMANCE 110
Lower Bound on Performance
The lower bound on performance for configuring a two-dimensional array 
clearly occurs when the array is unable to tolerate any faults and as a result 
will have zero array yield for any processor yield of less than 100%. This is 
a rather trivial bound.
Upper Bound on Performance
The upper bound on configuration performance is of more interest. In fact 
two bounds are relevant to us as follows:
1. Upper bound on performance of all possible configuration schemes,
2. Upper bound on performance of row generation schemes.
The first of these can be calculated in a simple manner since it occurs 
when all functional processors in the array are used in configuring the target 
array. The fractional overhead is given by
Overhead =  * (6.1)
Yp
where Yp is the processing element yield. This formula represents an absolute 
upper bound and cannot be exceeded.
The upper bound on the performance of row generation schemes occurs 
when any row containing at least as many functional processors as there are 
columns in the required target array is considered to be a functional row. 
The functional rows are then interconnected in an appropriate manner to 
form the functional array.
Assuming that an N row by N  column target array is required, that the 
fractional overhead is k and that the probability that a processor is functional 
is p, we can use the binomial distribution to find the probability that a row 
is functional, P(Row), as follows
kN
p(Row ) = £  ‘"c,(i - p)*"~y
imM
(6 .2 )
WINNER PERFORMANCE 111
Figure 6.13: Upper bounds on array yield for a 10 by 10 target array as a 
function of processor yield.
giving an array yield of
P(Array) — (P(./?ou;)]*
where
* ii(kN -  •)!
The upper bounds calculated above are shown in figure 6.13.
(6.3)
(6.4)
6.7.2 A lgorithm  P erform ance Com parisons
Each of the configuring algorithms has been simulated under the same con­
ditions as the WINNER algorithm so that a fair comparison can be made. 
The results-are presented in graphical form in figure 6.14 which shows the 
overhead required to achieve a 50% array yield for a 10 by 10 target array, 
as a function of processor yield. The upper performance bounds have been 
included for reference.
As can be seen from figure 6.14, the row-bypass scheme, TMR and Hed- 
lund’s scheme perform rather poorly. The performance of the row-bypass
WINNER PERFORMANCE 112
Figure 6.14: Comparison of the performance of various configuring schemes. 
A : TMR, B: Hedlund’s scheme, C: Row bypass scheme, D: WINNER 3-neigh­
bour, E: WINNER 5-neighbour, F: Moore scheme C, G: Sami and Stefanelli’s 
scheme.
scheme is poor because it is a very simple algorithm and requires very few 
switches and interconnections in its implementation. TMR is also a very 
simple scheme but suffers because the minimum possible overhead is 200%. 
Hedlund’s scheme requires configuration of the block of 12 processors and 
also has a minimum overhead of 200%. It appears to offer little advantage 
over TMR.
Of the remaining configuring schemes simulated, it can be seen that 
scheme C of Moore and Mahat (1985) performs better than the WINNER 
algorithms, although the difference between the Moore scheme and the WIN­
NER algorithm with 5 neighbours is not significant at processor yields above 
about 70%. The reason for this is that both schemes have the same degree 
o f  connectivity, ie 5, but whereas the communication path lengths in WIN­
NER are limited to two cells, the Moore scheme permits any communication 
length to be used. This enables the Moore scheme to perform better than
WINNER PERFORMANCE 113
WINNER at lower processor yields. However, long communcation paths can 
cause serious delays in high performance processor arrays, and Moore and 
Mahat suggest in their paper that the communication lengths could be re­
stricted if desired. This would reduce the performance of their scheme and 
bring it closer to that of the 5-neighbour WINNER algorithm for processor 
yields below 70%.
The best row generation algorithm is that of Sami and Stefanelli (1985) 
and essentially represents the upper bound of row generation schemes. The 
algorithm involves spare columns rather than spare rows and considers any 
row containing sufficient functional cells to form a row of the target array to 
be a functional row. Such functional rows are then interconnected using a 
bus oriented scheme. Although the results for this algorithm are presented 
for an overhead range of 0 to 200%, the cost in terms of switches and inter­
connections quickly becomes impractical and in practice overheads of a few 
columns would be the maximum contemplated.
At processor yields above about 80%, the difference in performance be­
tween the WINNER algorithm with 3 neighbours, 5 neighbours or the Moore 
scheme is small, and in the absence of any other constraints, the scheme with 
the simplest hardware implementation should be chosen.
6.8 Conclusions
In this chapter we have evaluated the performance of the WINNER algo­
rithms and compared them with competing techniques which have been 
published in the literature. From the simulations it has become clear that 
the techniques of row-bypassing, Triple Modular Redundancy and Hedlund’s 
4-out-of-12 nodal fault tolerance scheme all have a very poor performance 
compared with the other schemes simulated. The scheme of Sami and Ste­
fanelli (1983) is clearly the best but least practical scheme. For processor 
yields above about 80% there is little to choose between the Moore and Ma­
hat (1985) scheme C, and the two different WINNER schemes. Below 80%
WINNER PERFORMANCE 114
processor yields, Moore and Mahat’s scheme or the 5-neighbour WINNER 
scheme should be chosen.
C hapter 7
H ardw are Im plem entation of  
the W I N N E R  algorithm
7.1 Introduction
In chapter 5 we described several related algorithms which could be used 
to enable a 2-dimensional processor array to organise itself around faulty 
processors and generate a functional array. The purpose of this chapter is 
to consider the hardware implications of the approach. This will include the 
hardware requirements for the control circuitry, and the circuitry required for 
entering and removing data from the configured array. Where appropriate 
we develop formulae relating complexity to the number of data signal lines 
passing between cells. We also discuss the simulations of hardware which 
have been carried out to verify correct operation of the circuits. We restrict 
our study to the 3-neighbour WINNER algorithm applied to one dimension, 
the rows, of the array. This limitation has been imposed because as we have 
seen from chapter 6, the one-dimensional algorithm offers the best perfor­
mance and is therefore the most likely algorithm to be used in practice. The 
extension of the hardware to the 5-neighbour algorithm is simple.
We then consider the hardware required to implement other configuring 
schemes proposed in the literature, and compare these with WINNER.
115
WINNER HARDWARE 116
7.2 Hardware Requirements of W IN N E R  Con­
trol Circuitry
In this section we consider the hardware requirements for implementing the 
control circuitry for the WINNER algorithm applied in one dimension. We 
first consider the gate level implementation to obtain a circuit which is inde­
pendent of technology and then consider how this circuit could be translated 
into a CMOS transistor-level design and develop a formula for the number 
of transistors required to implement it.
7.2.1 Gate-Level W IN N E R  C ontrol C ircuitry
The gate level implementation of the WINNER control circuitry is shown in 
figure 7.1. We have assumed for simplicity that only a single data line passes 
through the cell in the vertical and horizontal directions. The circuit can 
be readily extended to multiple data lines as will be described later. The 
circuitry has been divided into two distinct parts, as follows.
1. Decision making logic,
2. Data routing logic.
The decision making logic communicates with neighbouring cells via the 
REQuest and AVAILability signals and eventually decides which data con­
nections should exist between which cells. The data routing circuitry is 
shown shaded, while the remainder of the circuitry forms the decision mak­
ing logic.
The decision making logic is essentially a direct implementation of the 
truth tables for REQuest and AVAILability generation given in chapter 5 
with the assumption that the pass/fail indication would be provided by a 
mechanism such as self-test as described in chapter 8. The data routing 
circuitry selects the data associated with an incoming REQuest signal and 
applies it to the input of the processor. Since the algorithm prevents more 
than one REQuest being sent to any cell, this function can be implemented
VINNER HARDWARE 117
Fi
gu
re
 7
.1
: G
at
e 
im
pl
em
en
ta
tio
n 
of
 th
e c
on
tr
ol
 ci
rc
ui
tr
y 
re
qu
ire
d 
to
 p
er
fo
rm
 
th
e 
1-
di
m
en
sio
na
l 
W
IN
N
ER
 a
lg
or
ith
m
.
WINNER HARDWARE 118
in the simple manner shown, where incoming REQuests are ANDed with 
their respective data lines and the outputs of the AND gates are ORed 
together to produce the required processor input. In the vertical direction, 
the processor is bypassed by the multiplexor circuit unless the cell contains a 
functional processor AND has at least one TRUE REQuest AND one TRUE 
AVAILability input. This means that cells containing faulty processors and 
cells which are unused are omitted from the configured array.
The complexity of the above circuitry is 18 gates counting each element 
in the circuit as a single gate. This figure, however, has limited significance 
because the basic building block in integrated circuits is the transistor and 
the number of transistors per gate varies depending on the technology in 
which it is to be implemented. For this reason the next section considers the 
transistor level circuit that would be required in a CMOS implementation.
7.2.2 Transistor C om plexity o f  Control C ircu itry
The CMOS implementations of the gates used in the circuit of figure 7.1 are 
shown in figure 7.2. A direct translation o f the WINNER control circuitry 
into transistors can be made by replacing each gate with its transistor-level 
equivalent. A formula for the number of transistors can now be developed. 
The complexity of the circuit can be expressed as:
Number o f  transistors =  Tp +  T,L, + T9L9 (7.1)
where Tp is the number of transistors required for the decision circuitry which 
generates REQuest and AVAILability signals, Tx and Tv are the number 
of transistors required for routing respectively each horizontal and vertical 
data line and Lx and Lv are the number o f horizontal and vertical data lines 
respectively which must be routed through the cell, and depends on function 
of the processor.
From figure 7.1 and 7.2 the values of Tp, Tz and Tt can be determined 
as Tp = 60, Tt =  6 and Ty =  6, giving:
Number o f  transistors =  60 + 6Lt + 6Lt (7.2)
WINNER HARDWARE 119
H A L F  L A T C H
Figure 7.2: CMOS implementation of the gates used in the WINNER control 
circuitry.
WINNER HARDWARE 120
In a real design it is likely that the circuit would be optimised for the 
particular technology to be used in the fabrication process. This may involve 
using NAND and NOR gates wherever possible instead o f  AND and OR 
gates. However, it is not appropriate to carry out an optimisation of this 
type since as far as possible we require results which are independent of 
technology.
7.3 Input/Output Interface Circuitry
A characteristic of the one-dimensional WINNER algorithm is that func­
tional rows are generated extending from one side of the array to the other. 
This means that access to the ends of the rows is easy since they reside at 
the edges of the array. However, the precise positions of the end of the func­
tional rows depends on the distribution of faulty processors in the array and 
will not generally be known in advance. From the user's point of view this 
is unacceptable since he wishes to apply his inputs and receive the outputs 
from fixed sets of I/O  pins. As a result, interface circuitry for both inputs 
and outputs has been designed to automatically perform the mapping of a 
fixed set of data inputs onto a spatially variable set of functional rows, and 
vice-versa for the array outputs.
7.3.1 Selecting Functional R ow s
We assume that the data flow through the array is from left to right and from 
top to bottom. This does not limit the applicability of what is to follow, but 
makes its description more straightforward.
From the description of the WINNER algorithm presented in chapter 5 
we remember that REQuest signals pass from left to right across the array 
and AVAILability signals pass from right to left. On the left hand side of the 
array, the end of each functional row will be marked by a TRUE AVAILability 
signal emerging from the western output of a cell. Similarly, on the right hand 
side o f the array, a TRUE REQuest signal from the eastern output of a cell
WINNER HARDWARE 121
marks the other end of a functional row. The ends of the rows on the left and 
right hand sides of the array will have the same spatial ordering, but in most 
cases will have different spatial positions. The presence of TRUE REQuest 
and AVAILabiiity outputs in the positions described could therefore be used 
by an interface circuit to control the routing of data into and out of the 
configured array.
In addition to performing the data routing task described above, the input 
interface circuitry should be able to detect whether sufficient rows have been 
configured and if not, inform the user. It should also be possible for the 
input interface circuitry to disable surplus functional rows if more have been 
configured than are required.
We now describe circuitry suitable for providing the input and output 
interface functions.
7.3.2 D ata  Input C ircu itry
A data input circuit suitable for ensuring that data signals are input to 
the appropriate rows of the WINNER array is illustrated in figure 7.3(a). 
It consists of an array of simple, identical cells whose function is shown 
in figure 7.3(b) together with a single AND gate per row to provide the 
REQuest inputs to the array. The height of the data input array is equal to 
the number of rows in the WINNER array while the width is equal to the 
number of functional rows required in the configured array. The REQ inputs 
to the top of the data input array are all set to logic 1.
Operation of the Data Input Circuitry
The data input circuitry operates in the following manner. The value of REQ 
indicates the status of the data input with which it is associated, being TRUE 
in cells through which the data passes before being routed to a functional 
row and FALSE thereafter. Similarly, the AVAILabiiity signal entering the 
data input array from the WINNER array indicates the status o f the row of 
the configured array with which it is associated. It is TRUE in a cell of a
WINNER HARDWARE 122
DATAS DATA2 DATAI
ROWS CONFIGURED
(a)
(*»)
Figure 7.3: Data input circuitry, (a) Data input array, (b) circuit of single 
cell.
WINNER HARDWARE 123
row of the date input array only if the corresponding row of the WINNER 
array is functional and an input data line has yet to be routed to it. It 
is FALSE in all other cells in the row. Once set to FALSE, the REQ and 
AVAILability signals cause cells of the data input array to take no further 
part in the routing of data into the configured array.
The right hand column of the data input array will route its data down 
the column until a cell with a TRUE AVAILability input is reached. This 
will be at a position corresponding to the start of the first functional row 
of the configured array. Within this cell the input data is routed to the 
functional row via the pass transistor, and both the REQ and AVAILability 
output signals from the routing cell are set to FALSE. This means that no 
other data input will be connected to the same functional row and that the 
current data input will not be connected to any other functional row. A 
REQuest input to the WINNER array is generated by the AND gates on 
the right hand side of the array whenever a functional row is detected. The 
routing of the other data inputs takes place in a similar manner.
Insufficient Functional Rows
It is possible that the distribution of faults in the array is such that fewer 
functional rows are configured than are required. In this case, one or more 
of the data inputs will not be routed to a functional row. The shortfall will 
manifest itself in the data input circuitry as one or more REQ signals which 
remain TRUE when the emerge from the bottom of the data input array and 
may be detected by the OR gate shown on figure 7.3(a). If the output of the 
OR gate is TRUE, at least one o f the required functional rows could not be 
configured.
Surplus Functional Rows
In the same way that some fault distributions may result in insufficient func­
tional rows being generated, others may result in a surplus. These surplus 
rows must be avoided so that they do not interfere with the function of the
WINNER HARDWARE 124
required rows. In chapter 5 we saw that when the WINNER algorithm is 
applied to one dimension of the array, cells would be controlled to act as 
bypasses in the vertical direction if the processor within the cell is faulty or 
if no REQuest inputs are received by the cell, ie if the cell is unused. Surplus 
rows can therefore be avoided by sending TRUE REQuest signals only to the 
required number of functional rows. This is achieved automatically by the 
data input array by feeding the REQ input arriving vertically at each cell in 
the left hand column of the data input array into the horizontal REQuest 
input in the same cell, as shown in figure 7.3(a). The REQ signals passing 
between cells in the data input array are then TRUE until all data inputs 
have been routed to a functional row. Thereafter, the REQ signal is FALSE 
and inhibits REQuests being applied to the WINNER array.
7.3.3 D ata O utput C ircu itry
The data output circuitry required to map the spatially variant data outputs 
from the right hand end of functional rows onto a fixed set of output lines is 
shown in figure 7.4(a). As can be seen, the circuit is in the form of an array 
similar to that used for the data input array. The cell function is shown in 
figure 7.4(b) and it will be noticed that it is in fact identical to the cell used 
in the input array (if the straight through REQuest line is removed from 
the input cells), and is simply rotated anti-clockwise through 90 degrees in 
the array. The operation of the data output array is identical to that of the 
input array. Data emerging from the first functional row is therefore routed 
to the first column of the output array, and so on.
7.4 Column Interconnnection Circuitry in Par­
titioned Arrays
As was described in chapter 6 the overall array yield can be increased by care­
ful partitioning of the processor array into groups of columns. The columns 
are then configured separately and joined together to form the final array.
WINNER HARDWARE 125
INPUTS
PROM
WINNER
ARRAY
REQ1 IN 
DATAI IN
REOJ IN 
< DATAS IN
(a)
Figure 7.4: Data output circuit, (a) Data output array, (b) circuit of a single 
cell.
WINNER HARDWARE 126
Figure 7.5: Intercolumn routing circuitry, (a) Intercolumn array, (b) single 
cell of left hand part of array, (c) single cell of right hand part of array.
The circuitry required to perform the interconnection o f the columns is shown 
in figure 7.5(a) with the cell circuitry being shown in figure 7.5(b). The op­
eration of the column interconnection circuit is very similar to that of the 
data input and data output circuits described in the previous sections. Es­
sentially, the left hand array in figure 7.5(a) is identical to the data output 
array, except that a REQuest output accompanies each output signal. The 
data and REQuest outputs from the functional rows o f one group of columns 
emerges from the top of the array and are fed into the second array as shown. 
The second array is simply a data input array which routes the signals to the 
appropriate functional rows. True AVAILability signals are fed to all rows 
of the left hand group of columns.
WINNER HARDWARE 127
Since the column interconnection circuitry transfers the REQuest signals 
from one group of columns to the other, all the features of the self-organising 
algorithm are maintained across the partition, including the handling of sur­
plus and insufficient functional rows.
7.5 Simulation of the W IN N E R  Hardware
In this section we describe the simulation of the hardware used in the WIN­
NER cell. The hardware description language, ELLA was used and this is 
briefly described first.
7.5.1 The E LLA  H ardware D escription Language
ELLA is an acronym for Electronic Logic LAnguage and was developed at 
RSRE Malvern (Morison et al, 1982), and is currently being marketed by 
Praxis Systems Limited of Bath, UK. It is a hardware description language 
(HDL) together with a simulator and design environment. Like other HDLs 
ELLA allows the user to describe circuits in a programming language and 
then animate them by running the simulator. During simulation, inputs can 
be applied to the circuit being simulated and the responses generated by 
the circuit can be observed. This means that circuits can be checked for 
satisfactory operation and mistakes corrected before any effort is put into 
physical design of the circuit. In order to enable circuits with unexpected 
behaviours to be analysed, the inputs and outputs of every node in the 
circuit can also be observed as required. This is the equivalent of being able 
to observe any point of a breadboard circuit using a logic probe. ELLA is 
a particularly suitable language for use in describing the circuits used 
in WINNER because it has very powerful instructions which allow regular 
arrays to be described simply and in an elegant fashion.
WINNER HARDWARE 128
7.5.2 Sim ulations using ELLA
All of the circuits proposed for the WINNER algorithm have been described 
and simulated in ELLA and have operated as expected. An array of both the 
one and two dimensional WINNER cells has been simulated and a number 
of different fault distributions applied. A correctly configured array was 
generated for all distributions of faults.
The benefits of including all the programs written for the purposes of 
simulating circuits described in this thesis are limited. However, the ELLA 
program written to describe the one dimensional WINNER algorithm is pre­
sented in Appendix B for the interested reader.
7.6 Hardware for other Configuring Techniques
As we found in chapter 6, the main competitors to the WINNER algorithms 
were the configuration techniques proposed by Moore and Mahat (1985) and 
Sami and Stefanelli (1983). In this section we consider the hardware which 
would be required to implement these techniques and compare the results 
with that needed in WINNER.
7.6.1 M oore  and M ahat’s Schem e
Moore and Mahat proposed three configuration schemes, A, B and C. Schemes 
B and C have the best performance and will be used here for comparison 
purposes. From figure 4.9 it can be seen that scheme B requires 5 switching 
elements per cell while scheme C requires 8 per cell for each data line passing 
between cells in the horizontal direction. In addition, each switch will require 
a latch to store the switch position. We also assume that full latches will be 
used and interconnected serially so that switch control data can be clocked 
serially into a shift register.
Assuming that a full latch requires 12 transistors, and that each switch 
requires 2 transistors, the circuitry required in each cell to perform the
WINNER HARDWARE 129
routing function is as follows.
Number o f  Transistor »(Scheme B) = 60 +  10L. (7.3)
Number o f  Transistor »(Scheme C) = 96 + 16 Lt (7.4)
7.6.2 Sami and Stefanelli’s Scheme
Although the configuration scheme proposed by Sami and Stefanelli (1983) is 
not likely to be practical for more than a few spare rows, a general formula for 
the complexity of the routing circuitry has been derived. The connectivity 
requirement in the algorithm is for a fully connected network whose height 
is equal to the height of the array, and whose width is equal to the number 
of spare rows in the array. This results^a routing complexity as follows.
As can be seen, both terms depend on the number of spare rows, Ns- The 
equation can be rewritten as
indicating that the circuitry enclosed in brackets is required per cell for each 
spare row used.
7.6.3 C om parison o f  H ardware Requirem ents
The formulae derived in previous sections have been presented graphically 
in figure 7.6, which shows the complexity per cell needed to implement each 
configuration scheme as a function of the number of horizontal data signal 
lines passing between cells. In the case of Sami and Stefanelli’s scheme, 
several curves for different numbers of spare rows are shown.
It cam be seen that the scheme of Sami and Stefanelli requires the least 
hardware when 1 or 2 spare rows are used. However, with 4 spare rows, the
Number o f  Transistors =  12Ns +  2NsLt (7.5)
Number o f  Transistors =  7VS(12 + 2Lx) (7.6)
WINNER HARDWARE 130
NUMBER OF TRANSISTORS 
PER CELL
NUMBER OF HORIZONTAL DATA SIGNAL LINES
Figure 7.6: Comparison of the hardware requirements of various configura­
tion schemes.
basic 3-neighbour WINNER scheme becomes more attractive, requiring less 
hardware than any of the other schemes.
The fact that WINNER requires less hardware than most other schemes 
is an interesting result, since the other schemes also require external control 
of the switches, whereas WINNER achieves fully automatic decision making 
and data routing. This result was quite unexpected and arises mainly be­
cause no switch control information needs to be stored in the cell since it is 
generated by the cell itself.
C hapter 8
Self-Testing o f Self-O rganising  
A rrays
8.1 Introduction
In Chapter 5 when describing the operation of the self-organising techniques 
based on the WINNER algorithm it was assumed that some method ex­
isted by which the processor in each cell of the array could reliably indicate 
whether or not it was functional.
Techniques for self-test are well known in the literature and have been 
used in the design of production devices, for example the Motorola micro­
processor range (Daniels and Bruce, 1985). Self-testing approaches have 
evolved from earlier work on improving the manual testability of circuits. In 
this chapter we describe the motivation behind the goals of design for testa­
bility and of self-test and present an established approach to both problems. 
In chapter 9 these approaches (with slight modification) will be applied to 
the testing problem in WINNER.
8.2 Design for Testability
It is now widely recognised that whilst VLSI technology offers many advan­
tages in terms of processing power per Watt or per square centimetre, it 
also generates numerous problems concerning the testability of the circuit 
so created. This stems partly from the fact that the circuits contain many
131
SELF-TESTING 132
more components which all have to be checked and partly because the ra­
tio of input/output pins on chips usually decreases as the complexity of the 
chip increases (Landman and Russo, 1971). This means that access to the 
internal nodes of the chip can often become severely limited, resulting in at 
best long test times or at worst, incomplete testing of the device.
Many researchers are active in the field of design for testability which 
aims to improve external access to internal nodes of the chip by incorpo­
rating extra hardware for use during testing. This has an obvious cost in 
terms of hardware but the advantages it offers often outweigh the extra cost. 
Most approaches to achieving a testable design rely on a serial scanning ar­
rangement to improve access to the internal components of the circuit. Two 
of these, the scan path and Level Sensitive Scan Design (LSSD), are now 
described.
8.2.1 Scan path techniques
One of the first examples of a scheme to increase testability was published by 
Kobayashi et al (1968). This paper is written in Japanese but describes what 
is known today as the scan path. The idea of the scan path is to allow data 
to be introduced and extracted from a circuit through a single pair of data 
lines so that the I/O  overhead is kept to a minimum while maintaining a high 
level of control ability and observability. A scan path comprises a number of 
cascaded shift register elements each of which can receive data from either 
the output of the previous shift register or from a parallel input. The outputs 
of these elements are applied as test stimuli to the circuit under test, while, 
in parallel mode, the parallel inputs to the scan path are provided by the 
outputs of the circuit under test. A single scan path element and a block 
diagram of a scan path in position within a circuit are illustrated in figure 8.1. 
It is normal, but not essential for the circuit being tested by the scan path 
to be purely combinational. In sequential circuits it has been suggested by 
Williams and Angell (1973) that switches could be incorporated to change 
the circuit from normal mode to test mode. In test mode, the latches would
SELF-TESTING 133
CLOCK _
SERIAL SCAN 
INPUT
SCAN PA»
COMBINATIONAL
LOGIC
CLOCK
SERIAL/PARALLEL 
SERIAL DATA IN
SERIAL DATA 
OUT TO 
NEXT STAGE
CIRCUIT RESPONSE CIRCUIT INPUT SVUULUS
SCAN PATH REGISTER
Figure 8.1: The scan-path testing technique.
be connected in the form of a serial shift register which would then be used 
in the same way as a scan path to test the remaining combinational circuitry. 
In both combinational and sequential circuits the test time can usually be 
reduced by using several independent scan paths which can then be used in 
parallel.
The scan path technique will be used in later sections as the basis for a 
control circuit testing strategy for use with the WINNER algorithm.
8.2.2 Level Sensitive Scan Design - LSSD
Another example of a technique which can improve the testability of a device 
is the Level Sensitive Scan Design technique or LSSD for short (Eichelberger 
and Williams, 1977). This technique formalises the scan path approaches for 
combinational and sequential circuits and has become a commonly used tool
SELF-TESTING 134
in the design of circuits and systems.
8.3 Self-test techniques
The previous section has briefly outlined techniques for increasing the testa­
bility of circuits using serial scanning latches. These techniques form the 
basis for self testing circuits. The motivations behind self testing circuits 
are many. As circuit complexities increase, the number of test patterns re­
quired to fully test a circuit also increases, usually at a faster rate than the 
increase in number of gates in the circuit. This means that test times using 
serial scanning techniques can become unacceptably long. Furthermore, the 
increase in performance of circuits means that test equipment must be of 
the highest quality if tests are to be carried out at the rated speed of the 
device. Such test equipment is extremely expensive, and may soon become 
impossible to build to the required specification. On-board self-test can help 
in both of these areas.
The main features of a self test approach are illustrated in figure 8.2 and 
cdmprise the circuit under test together with a method of generating test 
stimuli and a method of compressing the results produced by the circuit 
when stimulated. It is of course possible to store a number of selected test 
patterns in a ROM and apply these to the circuit under test in a sequential 
manner. A ROM could also be used to store the expected responses from the 
circuit and compare them with the actual responses. Any differences could 
then be noted and used to pass or fail the circuit. This approach, however, 
would be very costly to implement since large numbers of test patterns are 
normally required for a full test. For this reason most self test techniques use 
an exhaustive sequence of test patterns which can be produced cheaply by a 
counter or Linear Feedback Shift Register (LFSR). In addition, test results 
are not individually compared with expected results but are compressed into 
a much reduced form which can be checked in a simple manner using a 
comparator. LFSRs and compressors are discussed in the following sections.
SELF-TESTING 135
Figure 8.2: Block diagram of a typical self-testing system.
8.3.1 Linear Feedback Shift Registers
A typical LFSR is illustrated in figure 8.3(a). It comprises a number of 
cascaded shift register elements together with one or more exclusive-OR gates 
which feed information back from the outputs of some of the stages to the 
main input. When the shift register is clocked from any initial state (except 
all zeroes), a sequence of ones and zeroes can be observed at any point in 
the register, say its input, with delayed versions of the sequence appearing 
at sucessive register outputs. This sequence depends upon the initial state, 
the length of the register and the positions and number of feedback taps. 
The sequence produced by the register depicted in figure 8.3(a) is given in 
figure 8.3(b). It can be seen that all 15 possible states are achieved in the
SELF-TESTING 136
INITIAL STATE 1 0 0 0 11 1 0 0 11 1 1 0 11 1 1 1 00 1 1 1 11 0 1 1 00 1 0 1 11 0 1 0 11 1 0 1 00 1 1 0 00 0 1 1 11 0 0 1 00 1 0 0 00 0 1 0 0
REPEATS FROM 0 0 0 1 1
HERE » 1 0 0 0 1
Generation of test patterns: (a) A
( b )
shift-register (LFSR), (b) The patterns produced by the circuit in (a).
register at some point in the cycle, which then repeats. Shorter cycles can 
be achieved with different feedback taps, but for self-testing purposes we 
are generally interested in the longest, or maximal length sequences. The 
sequences produced appear to be random but are repeatable from a given 
initial state, hence the alternative name of the LFSR is the Pseudo-random 
pattern generator.
The LFSR is very suitable for use as a source of test patterns in self­
testing circuits because it is a very simple structure, even simpler than a 
counter, which could perform a similar function. Because its output sequence 
is predictable it can be used in a deterministic way to generate expected 
results from the circuit under test.
SELF-TESTING 137
8.3.2 Com pression o f  test results
The motivation behind attempting to compress test responses emerging from 
the circuit under test is to reduce the cost (mainly in time) of scanning out 
large numbers of responses serially from the circuit for immediate checking 
by an external tester. The idea is to perform most of the evaluation of the 
responses within the circuit being tested. A compressed result is essentially 
a cumulative, short pattern, dependent on a long sequence of test responses.
Compression of test responses can be achieved by two main methods, 
namely counting and recursive compaction.
8.3.3 Com pression by counting
The idea here is to count the number of some characteristic occurring in 
the sequence. Typical characteristics are the number of transitions, ie 0- 
to-1 or l-to-0, or the number of edges, ie transitions in one direction. The 
assumption is that in a circuit containing a fault, the number of counted 
transitions will be different from that produced by a perfect circuit. The 
final count therefore provides a much reduced value which can be checked 
in a simple manner. Counting can be achieved using conventional counter 
circuits. The technique turns out to be inferior to the recursive compaction 
technique described in the following section and will not be further discussed.
8.3.4 C om pression by Recursive C om paction
A circuit capable of performing recursive compaction is illustrated in fig­
ure 8.4. The reader will immediately notice the similarity to the LFSR 
pattern generator described in a previous section. The difference is that in 
order to provide an input for the serial data stream which is to be com­
pressed, an additional exclusive-OR gate is included in the feedback path as 
shown. This extra input allows the incoming serial data stream to modulate 
the feedback to the first stage of the LFSR, which is then remembered by the 
shift register. For a given input sequence and given initial state, the same
SELFTESTING 138
BE COMPRESSED
Figure 8.4: A recursive compaction circuit for single-bit input streams.
final pattern or signature will always be produced and can be used to detect 
streams which contain faults.
The use of the LFSR to compress data steams has been described by 
Frohwerk (1977), who also coined the term Signature Analysis to describe the 
approach when applied to circuit testing. Frohwerk shows that a single error 
in the incoming sequence will always be detected, and that the probability 
of non-detection when the number of faults is unrestricted is 1/2", where n 
is the number of stages in the compactor LFSR. He also shows that counting 
techniques will be unable to detect faults with a probability of greater than 
1/2"; clearly, recursive compaction is the method to be chosen.
8.3.5 C om paction  o f  M ultiple Input Stream s
If the incoming data stream which is to be compacted consists of several 
serial streams, a slightly modified circuit is required. This is illustrated in 
figure 8.5, and shows exclusive-OR gates inserted between stages of the shift 
register in addition to the extra gate in the feedback path to the first stage. 
Each of these extra gates can accommodate a serial input data stream. The 
number of such streams is limited by the number of stages in the LFSR.
SELF-TESTING 139
INPUT DATA STREAMS
Figure 8.5: Recursive compaction of multiple input streams.
8.4 Self testing requirements in W IN N E R
Most of the literature on self testing describes techniques up to the point 
where the compressed value, or signature, of the response data streams is 
produced. This is because the motivation for self-test is primarily to reduce 
test time and increase the speed at which the test can be carried out so 
that it is more representative of the speed at which the circuit under test 
will operate when in service. For this purpose it is sufficient to read the 
signature by clocking it out of the circuit and comparing it, in an external 
tester with the expected value.
In WINNER, the self testing approach is to be applied separately to each 
processor in the array and the correctness or otherwise of each signature gen­
erated should ideally be determined by the corresponding cells themselves. 
We therefore need a circuit which can perform this function.
8.4.1 Signature C om parison
The comparison of the test signature with the expected value will involve the 
use of a comparator which has been preset (probably hardwired) with the 
expected signature. The output of the comparator (a single bit) will then 
provide a go/no-go indication which can be used by the control circuitry 
during the configuration o f the array. A simple method by which this com-
SELF-TESTING HO
Correct signature 101110
Figure 8.6: Simple signature comparison method.
parison could be achieved is illustrated in figure 8.6, in which the register 
outputs in which a 1 is expected are ANDed together, while those positions 
in which zeroes are expected are NORed together. The AND and NOR out­
puts are then combined in a second AND gate which provides the required 
results with a 1 indicating pass and zero indicating fail.
C h ap ter 9
R educin g C ontrol Circuit 
Vulnerability
9.1 Introduction
In the foregoing descriptions of the operation of the WINNER algorithm we 
have described techniques for testing the processors within each cell of the 
array and how cells containing a faulty processor can be avoided by self- 
organising techniques. Throughout these discussions it has been assumed 
that the control circuitry operates correctly at all times. In general, however, 
this assumption will not be valid and in applications such as large area 
integration, it will almost certainly be necessary to take steps to identify 
faulty control circuits.
The purpose of this chapter is to address the problem of faults occurring 
in the control circuitry. The aim is not merely to detect the presence of 
faults, since this would result in the entire array having to be discarded, but 
to develop techniques by which the faults can be tolerated. As we shall see, 
this simply results in losing the opportunity to use the processor in the cell 
containing the faulty control circuit.
We present two quite different techniques for tolerating control circuit 
faults although both techniques exploit the same property of the WINNER 
algorithm as will be described. In the first technique, duplicated control 
circuits are used in each cell to enable faults to be detected and automat­
141
CONTROL CIRCUITRY VULNERABILITY 142
ically provides tolerance to many faults. The second technique is a novel 
approach involving the use of an external tester to mask out the effect of 
control circuit faults (Evans 1986, and Evans and McWhirter 1987). This 
approach can exhaustively test each control circuit and the inter-cell wiring 
and can guarantee that a correctly configured array has been produced. The 
hardware overheads associated with the techniques are also considered.
9.2 The Ideal Self-Organising Array
The ideal self-organising array would be one in which no external assistance is 
required during either the testing or configuration phases. The system should 
be able to detect any single or multiple faults present in the system and offer 
100% confidence that if an array of the required sire can be configured, that 
it is actually functional. If insufficient working processors are available, the 
system should be able to indicate this to the user.
In practice, the above requirements cannot be fully achieved because it is 
not possible for a system, none of whose component parts has been proven 
correct, to make guaranteed decisions. This is particularly true in the field 
of integrated circuits where faults will undoubtedly be present in the array. 
In such circuits we cannot rely on any part of the circuitry on the chip to 
perform its predefined task correctly. This means that in order to achieve 
100% confidence in the circuit the manufacturer must perform at least a 
small test using an external, known-to-be good tester on some part of the 
circuitry. The tested circuitry, if found to be functional, can subsequently 
be relied upon and used in further tests of the wafer. The challenge is to 
develop a testing strategy which requires only a small amount of circuitry to 
be externally tested, and to be able to carry out the externally applied test 
in a simple manner.
CONTROL CIRCUITRY VULNERABILITY 143
9.3 Inherent Fault-Tolerance of the Control 
Circuit
The control circuitry used in the WINNER algorithm can be considered to 
have two types of output signal, either active or passive. This property 
arises from the fact that only TRUE outputs affect neighbouring cells in an 
active way, possibly resulting in the neighbour taking some positive action. 
FALSE outputs do not have any positive effect on neighbouring cells and are 
therefore considered to  be passive outputs.
The property o f having active and passive output signals means that the 
control circuits each have an inherent ability to mask some faults as now 
described. Since FALSE outputs have only a passive effect on neighbouring 
cells, any fault resulting in one or more stuck-at-0 faults on the control circuit 
outputs will not affect the rest of the array in a detrimental manner. The 
stuck-at-0 output error does not propagate beyond the boundaries of the cell 
containing the fault.
The proportion o f  single stuck-at faults which can be masked in this 
way has been shown by simulation to be 50%. Techniques for masking the 
remaining 50% of single and multiple stuck-at faults are described in the 
following sections.
9.4 Dual-Rail Implementation of Control Cir­
cuitry
In the basic WINNER algorithm a fault in the control circuitry could cause an 
array to be incorrectly configured if it produces an erroneous TRUE REQuest 
or TRUE AVAILability output. This would result in the entire array being 
discarded because there is no mechanism within WINNER which can tolerate 
such faults.
The probability o f a fault in the control circuitry resulting in an entire 
array having to be discarded can be significantly reduced if duplicated control
CONTROL CIRCUITRY VULNERABILITY 144
circuits are used. This technique, sometimes called ‘two-rail’ implementation, 
(Wakerly, 1978), requires the use of two independent control circuits in each 
cell of the array. At its simplest, the idea is that if both circuits receive 
the same inputs they should generate identical outputs. If one of the circuits 
contains a fault, then for at least one input pattern, at least one of its outputs 
will be different from the corresponding output of the other, fault free circuit. 
This difference can then be detected.
In the context of the WINNER algorithm we need only to detect the 
erroneous signal and stop it propagating throughout the array. It is not 
necessary to correct the erroneous signal since, as we shall see, the self- 
organising capability of WINNER allows the cell containing the faulty control 
circuitry to be avoided.
There are two main variants of the two-rail implementation technique as 
follows:
1. Simple duplication o f the control circuits,
2. The use of true and complement control circuits.
We now describe how these techniques can be used to detect errors in the 
outputs of circuits, and then show how the idea can be used in WINNER to 
enable faults in the control circuitry to be both detected and tolerated.
9.4.1 Fault D etection  by Simple D uplication
In this approach two identical circuits are used in place of the single original 
circuit as shown for a simple circuit in figure 9.1. Each circuit receives 
identical input signals and in the absence of faults the two circuits should 
produce identical output signals. The possible outputs are as follows:
0  I
1 no error
(9.1)
0 1 
1 0 | error present
CONTROL CIRCUITRY VULNERABILITY 145
S IN G L E
LOGIC
C IR C U IT
O U T P U T
(a)
S IN G L E
C IR C U IT
S IN G L E
C IR C U IT
O U T P U T
(b )
Figure 9.1: Fault detection by simple duplication: (a) Conventional circuit, 
(b) Two-rail implementation.
Any difference in the outputs from the two circuits can be detected using 
an exclusive-OR gate. If each circuit has more than one output line, cor­
responding pairs of output lines are compared using separate exclusive-OR 
gates. Once the circuits have reached a stable state an error in an output pair 
will be indicated by a TRUE output from the exclusive-OR gate. The out­
puts of the exclusive-OR gates could be ORed together to produce a single 
error indication if desired.
This approach will detect single or multiple faults provided that no fault 
present in one of the circuits produces the same error as any of the faults
CONTROL CIRCUITRY VULNERABILITY 146
Figure 9.2: True and complement two-rail implementation.
present in the other. However, if the fault is due to a pair of output wires 
being shorted together, or failure of the power supply to both circuits, no 
fault will be detected since both output wires will carry identical signal levels.
9.4.2 Fault D etection using T rue and C om plem ent Cir­
cuits
This approach avoids the problem of non-detection of shorted output pairs 
by requiring that one of the duplicated circuits is implemented in the normal 
manner to produce the required output function, F, but that the other is 
designed to produce the logically inverted output function, NOT F from 
logically inverted inputs as shown in figure 9.2. This means that a fault free 
pair of circuits will always produce output signal pairs each containing both 
a TRUE and a FALSE value. The possible output signals in each output 
pair are as follows:
0 0 
1 1
0 1 
1 0
| error present 
| no error
(9.2)
CONTROL CIRCUITRY VULNERABILITY 147
This type of two-rail circuit is preferred since it can detect unidirectional 
multiple errors such as those caused by loss of power, etc. In addition, the 
use of true and complement circuits means that there are no constraints on 
the layout of the circuit when being fabricated as integrated circuits, since 
faults common to both circuits will be detected.
The presence of an error can be detected as before by exclusive-OR gates 
which now will produce a FALSE output if the signals in a pair are identical.
In both of these duplication techniques, the exclusive-OR gates can be placed 
either at the input to the circuit or at its output. In the former case, faults 
in both the previous circuit and the interconnecting wire can be detected, 
while the latter detects faults in the circuit but not those in the wiring.
Neither of the above approaches can correct errors since the error de­
tection circuitry has no way of knowing which o f  the duplicated circuits 
produced the correct output. As a result of this faults can only be detected 
and not tolerated by these schemes.
However, when used in the context of the WINNER algorithm, the er­
ror detection capability of the duplicated circuits combined with the self- 
organising nature of WINNER enables many faults in the control circuitry 
to  be both detected and masked automatically. The way in which this oper­
ates is described in the following section.
9.5 Application of Duplicated Circuits to W IN ­
N E R
In the WINNER algorithm the outputs of the control circuitry are either 
active or passive as discussed in section 7.3. This property can be exploited 
in the two-rail implementation of control circuits to stop propagation of 
erroneous signals beyond the cell containing the fault. This can be achieved 
by using extra circuitry to convert two-rail input signals containing errors 
into passive input signals. In a normal control circuit, the active signal
CONTROL CIRCUITRY VULNERABILITY H 8
INPUT TO TRUE 
CONTROL CIRCUIT
(a)
INPUT
INPUT TO COMPLEMENT 
CONTROL CIRCUIT
(b)
Figure 9.3: Two-rail input circuits for: (a) True WINNER control circuitry, 
(b) Complement WINNER circuitry.
level is TRUE, and the passive level is FALSE, while in a control circuit 
implemented as the complement of a normal circuit, the active and passive 
levels are reversed.
Assuming that true and complement circuits are used in preference to 
simple duplication, the inputs to the true and complement circuits for each 
value of the input pair are given in table 9.1. The two-rail signals can be 
converted according to table 9.1 for feeding to the true and complement 
circuits using the circuits shown in figure 9.3.
Input Pair Value . 
True Comp
Error Status Input to
Control Circuitry
Value of Input 
True Comp
0 0 Error Passive 0 1
: i i Error Passive 0 1
i 0 1 No error Passive 0 1
1 o No error Active 1 0
Table 9.1: Conversion of Two-rail Control Circuit Input Signals.
CONTROL CIRCUITRY VULNERABILITY 140
Figure 9.4: A two-rail signature comparator.
The signature analysis comparator can also be implemented in two-rail 
form and circuits to generate true and complement comparison outputs are 
shown in figure 9.4. These signals can be treated in the same way as the 
other two-rail inputs discussed above, since an error can be treated as a 
failing signature and assigned a passive value.
9.5.1 P erform ance o f  Duplicated C ontrol C ircu itry
It is difficult to accurately predict the effect that duplicating the control cir­
cuitry will have in a real circuit. However, the technique has been estimated 
by simulation with varying numbers of stuck-at faults as follows.
Both the single control circuit and the true and complement implemen­
tations have been simulated by applying a number of randomly distributed 
faults to the gates of the circuit with each fault being randomly stuck at 0 
or 1. For each value of the number of faults, the circuit was simulated with 
200 different distributions of faults and the number of distributions which 
caused one or more active errors in the outputs was counted and expressed 
as a percentage of the total sample. The results are presented in figure 9.5. 
It can be seen that for the single circuit implementation, 50% of single faults 
cause active output errors which would results in the entire self-orgainising
CONTROL CIRCUITRY VULNERABILITY 150
X  ACTIVE FAULTS
NUMBER OF RANDOM 
S T U C K -A T  FAULTS
Figure 9.5: Percentage of active faults as a function of the number of faults 
occurring in the control circuit. Curve A: conventional implementation, 
curve B: dual-rail implementation.
array being discarded. By contrast, the true and complement implemen­
tation produces no active errors for any single fault. In both curves, the 
percentage of active errors rises rapidly with increasing numbers of faults, 
but the true and complement implementation can contain, on average, 3 
random, stuck-at faults and still perform better than the single circuit.
Both curves, but particularly that for the single control circuit, saturate 
for large numbers o f faults. This effect is believed to be due to the increased 
probability of a fault, which on its own would cause an active error, being 
masked by another fault which causes a passive error. For example, if a 
particular fault, deep in the circuit, causes an output to become stuck-at- 
1, ie an active error, a second fault could reverse the effect if it causes a 
stuck-at-0 error on the same output.
CONTROL CIRCUITRY VULNERABILITY 151
9.6 Hardware Requirements for Two-rail Im­
plementation
In the two-rail implementation, two sets of decision logic are required. In 
addition, error detection circuitry on each input for both circuits will be 
required, amounting to 84 transistors. However, the data routing circuitry 
is unchanged, resulting in a cell complexity as follows.
Number o f  Tranaistors(Two — ra*7) = 204 + 6L, + 6L„ (9.3)
This is about a factor o f three greater than the single control circuit 
implementation. However, due to the significantly improved tolerance to 
faults, and the fact that the dual rail circuitry is still small compared to the 
processing element which are likely to be used, the technique could be very 
useful if problems of control circuit yield are encountered.
9.7 External Testing of the Control Circuitry
In this section we present a strategy based on (Evans and McWhirter, 1987) 
in which a scan path approach is used to detect faults occurring in the 
circuitry which has not been tested during the self-test procedure. We then 
show, with a small modification to the design of the scan path cell, how the 
effect of any faults detected can be masked so that the array can continue 
operating correctly. This approach provides a second level of fault tolerance 
in the WINNER technique.
9.7.1 C ontrol C ircu it  Testing Strategy
The control circuitry associated with each element of the array is a purely 
combinational circuit and as such can be tested easily if it can be accessed 
from an external source. The required access can be provided by including a 
serial scan path between each column of processors as illustrated in figure 9.6 
in which each dot represents a group of scan path registers in the AVAIL- 
ability, REQuest and signal paths. This figure 9.6 also shows the position of
CONTROL CIRCUITRY VULNERABILITY 152
s
TES T PA TTERN GENERA TOR
X
S'
Figure 9.6: 'Schematic of the scan-path testing approach applied to WIN­
NER.
CONTROL CIRCUITRY VULNERABILITY 153
Figure 9.7: Scan-path arrangement within a WINNER cell: (a) Cell complete 
with scan-path, (b) Circuit of a single scan-path register.
the test pattern generator and checker which would both be external to the 
array and would be known-good-units. A cell complete with its scan path is 
illustrated in figure 9.7(a) with a single conventional scan path register being 
shown in figure 9.7(b). When the cells are joined together to produce the 
array a separate scan path is required on the left hand side of the array to 
provide the I/O  access to the left hand column of cells as shown in figure 9.6.
As can be seen, the function of the scan register is controlled by two in­
puts, A and B. These signals determine the mode of the register according 
to the table shown in figure 9.7(b). Data can be clocked into the register 
from the output of the previous register stage, (A = B  = 0), or in parallel 
from the output of the circuitry under test, (A =  1,B =  0). In addition, two 
other modes are available. One of these is used when the circuit is operat­
ing normally, (A =  1 ,B  =  1) and permits data signals to pass through the 
register cell from the output of one circuit under test to another without pass­
ing though a latched delay. This is achieved by using the straight-through
CON* ROL CIRCUITRY VULNERABILITY 154
path, which bypasses the latch. The final mode, (.4 = 0, B = 1) allows the 
straight-through path to be tested from the external source. This is essential 
so that when the circuit is switched into normal operating mode after testing 
is completed, correct operation of the straight through path is ensured.
9.7.2 Scan Path testing procedure
The scan path testing approach operates as follows. Patterns are generated 
by the test pattern generator and with the scan register controls set to serial- 
load mode, are clocked serially into the scan paths. The outputs of the 
registers are connected directly to the inputs of the control circuitry and so 
the values in the registers are automatically applied as test pattern stimuli 
to the control circuits. After each pattern has been loaded, the scan path 
registers are set to parallel-load mode and the outputs of the control circuits 
are clocked into the registers. These results are then clocked out of the scan 
path serially and are checked for correctness. Any errors are noted and from 
the position of the error in the serial output pattern can be associated with 
the output from a particular cell.
The regular nature of the array and the simplicity of the control circuitry, 
mean that the number of test patterns required to perform an exhaustive test 
is small. The cell shown in figure 9.7(a) has seven input signals and therefore 
requires only 128 different test patterns. Since all the cells in the array are 
identical, the same test pattern can be used for every cell, and this further 
reduces the testing complexity.
The scan path procedure can detect faults within the control circuitry of 
the array and locate the fault to a particular cell in the array. This is the 
first requirement of a technique which can mask the faults from the rest of 
the array. The second requirement is for a technique to perform the masking 
process. This can be achieved using the approach now described.
CONTROL CIRC JITRY VULNERABILITY 155
9.7.3 C ontrol C ircuitry Fault M asking P roced u re
As previously described, the design of the control circuitry is such that the 
outputs of the circuit are active only when at a high (logic 1) level. When any 
output is low, (logic 0), it has no effect on neighbouring control circuits and is 
termed passive. This property can be exploited to mask out cells containing 
faulty control circuits from the rest of the array if every output of the faulty 
control circuit is forced to zero, ie any active signal values are inhibited. The 
inhibit function must of course be performed by circuitry which is known to 
be fault-free so as to ensure correct execution of the masking process, and is 
described in the following section.
9.7.4 M odified  Scan path Register
The inhibit function can be implemented using a scan path testing approach 
as described previously, but in which each scan path register has a small 
modification. A modified scan path register cell is illustrated in figure 9.8. 
When compared with the register previously used, it will be seen that the 
only difference is that the straight through path has been replaced by an 
AND gate, whose second input is supplied by the latch output. The effect 
of this is that when the straight through mode is selected, the value at the 
output of the register is determined not only by the parallel input signal, but 
also by the value in the latch. If the latch contains a high level, the circuit 
operates exactly as before, with the parallel input value passing directly to 
the parallel output. Alternatively, if the latch contains a low level, the AND 
gate will always output a low value, and therefore inhibit any active signal 
on the parallel input.
From the above description, it will be apparent that in order to mask 
out cells containing faulty control circuits, it is necessary to preload the 
latches, with zeroes being placed in registers corresponding to the outputs 
of the faulty circuit, and ones everywhere else. This can be achieved in the 
external test equipment by constructing a map of faulty cells, and generating
CONTROL CIRCUITRY VULNERABILITY 156
PARALLEL
INPUT
TEST
DATA
DATA
OUT
PARALLEL
OUTPUT
Figure 9.8: Modified scan-path register for use with WINNER.
the appropriate mask pattern at the end of the test phase. This is then loaded 
into the scan path registers before the configuration phase commences.
An important point about the modified scan path register is that it can 
be fully tested before being relied upon to peform the fault masking process. 
Both of the inputs and the output of the AND gate can be checked, as can the 
lower multiplexer and the latch. The parallel input of the upper multiplexer 
cannot be tested explicitly, but a fault in this area will be detected as a fault 
on the output of the circuit feeding the multiplexer.
9.8 Hardware Complexity of the Scan Path 
Approach
A transistor level circuit of the scan path cell shown in figure 9.8 contains 28 
transistors. One scan-path cell is required for each of the three AVAILabilty
CONTROL CIRCUITRY VULNERABILITY 157
and REQuest outputs of the cell. We also need one scan path cell for each 
horizontal data line. The complexity of the scan path circuitry required per 
cell is therefore:
Number o f  transistors = 28(6 + Lt ) (9.4)
This means that the circuitry required for the scan path testing procedure 
is greater that that required for the control circuitry and at first sight this 
may not appear to be a sensible way forward. However, the testing strategy 
enables testing not only of the control circuitry, but also of the interconnec­
tions between processors and of the output stages of the self-test circuitry. 
The scan path design is also such that it can be fully tested from the external 
tester before being used to mask out faults. It is believed therefore that the 
overhead of the scan path circuitry is acceptable if it is essential that correct 
operation of the WINNER circuitry is to be guaranteed.
9.9 Other Tests
There are three areas which have not been tested by either the processor 
self-test procedure or the scan path test of the control circuitry. These are:
• the signature comparator circuitry,
• the vertical bypass circuitry used in the l-dimensional WINNER algo­
rithm,
• the interconnections between the control circuitry and the processor in 
the horizontal direction.
Tests to check these components are described in the following sections, 
and involve the scan path testing procedure described above.
9.9.1 Testing o f  the Signature C om parator
The self-test procedure should detect faults in the processor with a high de­
gree of reliability, and the presence of the faults will show up when the test
CONTROL CIRCUITRY VULNERABILITY 158
signature is compared with the template signature. It is possible, however, 
for a fault to be present in the comparator itself, and to propagate an in­
correct go/no-go indication to the control circuitry. There are many ways in 
which this problem can be overcome. We present two methods. The first is 
very simple and based on triple modular redundancy, but cannot guarantee 
to detect all faults. The second involves more additional circuitry but is 
capable of detecting any faults.
The TM R  Approach
In this approach, the gates used in the comparator shown in figure 8.6 are 
triplicated, as shown in figure 9.9(a). A voting circuit is then used to de­
liver the output go/no-go signal to the control circuitry. The voting circuit 
required is very simple and is shown in figure 9.9(b). Because of its simplic­
ity it should have a very high probability of correct operation, but there is 
always a small probability of a fault occurring in the voting circuit itself.
Fully Testable Comparator
There are many ways in which a scheme to enable full testing of the com­
parator circuit can be designed. In the scheme to be presented we attempt to 
minimise the amount of testing required at the expense of including a small 
amount of extra hardware and for this reason a serial method of comparison 
is used rather than the parallel method presented previously. The approach 
is illustrated in figure 9.10. The template signature is stored in a shift reg­
ister which is entirely separate from the LFSR used in the compactor. Once 
the self-test has been completed, the signature and the template signature 
are clocked out of their respective registers and compared bit-by-bit in a sin­
gle exclusive-OR gate whose output feeds a JK latch designed to remember 
any occurrences of a difference between the two signatures. The latch output 
then provides the go/no-go signal to the control circuitry.
The JK latch and the exclusive-OR gate can both be tested during the 
test of the control circuitry via the scan paths since the value of the go/no-
CONTROL CIRCUITRY VULNERABILITY 159
Figure 9.9: A TMR based signature comparator.
CONTROL CIRCUITRY VULNERABILITY 160
go signal on the JK latch output alters the function of the control circuitry. 
The two states of the latch can be exercised during testing in the following 
manner. A reset signal is required to switch the latch into the correct state 
(Q=0) prior to signature comparison and this can be used to check the go 
output value. More importantly, the no-go output state can be checked by 
using a seed in the LFSR whose Isb is different from that of the template 
signature. This will cause the exclusive-OR gate to detect an error and will 
set the latch into the no-go state (Q = l). In this way the user can be sure 
that the comparison circuit is operating correctly.
9.9.2 V ertical B ypass C ircu itry
The vertical bypass circuitry is only used in the 1-dimensional WINNER 
algorithm, but it is an important component since all bypasses must be 
functional if the array is to operate correctly. Figure 9.11 illustrates a cell 
in which the vertical bypass circuitry can be tested in a simple manner.
CONTROL CIRCUITRY VULNERABILITY 161
The diagram shows the processor and control circuitry within the cell, to­
gether with the test pattern generator (TPG) and signature analyser, which 
together perform the processor self-test function. The vertical bypass func­
tion is performed by the multiplexer M3. An additional multiplexer M2 has 
been included to permit selection between the test pattern generator and the 
Northern data input to the cell. With M2 and M3 selecting the northern 
data input and the bypass line respectively, data will pass through the cell 
unchanged in the vertical direction. When cells are connected in an array, a 
complete column of cells can be checked for correct operation of the vertical 
bypass by ensuring that data applied at the top edge of the array emerges 
unchanged at the bottom edge. If any bypasses are found to be faulty, the 
entire array must be discarded. However, the circuitry involved is small, and 
the probability of achieving a fully functional array is high.
CONTROL CIRCUITRY VULNERABILITY 162
9.9.3 H orizontal Interconnections
The connections between the control circuits in adjacent cells are checked au­
tomatically during the scan path testing procedure. However, those between 
the control circuitry and the processor are only partially checked during the 
processor self-test procedure. These connections can be fully checked using 
the approach shown in figure 9.11 in which the multiplexers Ml and M4 have 
been included for the purpose. These multiplexers bypass the processor in 
the horizontal direction and are used in the bypass mode during the test 
of the control circuitry by the scan path procedure. Signals can then be 
passed through the cell and should be detected unchanged at the output. At 
the end of the test, Ml is controlled to select the data emerging from the 
processor, while M4 continues to select the horizontal input data from the 
control circuitry. All other parts of the circuitry are fully tested by the self 
test procedure.
9.10 Benefits of the Scan path Test proce­
dure
The control circuitry test procedure provides a number of benefits which are 
listed below.
1. A small number of simple test patterns is required,
2. The number of distinct test patterns is independent of the size of the 
array, and of the function of the processor used in the array,
3. Interconnections between cells are checked automatically,
4. Faults'in the control circuitry and cell interconnections can be masked, 
providing a second level of fault tolerance in the array,
5. The scan path circuitry be fully tested before being required to perform 
any role in which it must function correctly,
CONTROL CIRCUITRY VULNERABILITY 163
In addition, the vertical and horizontal interconnections between the con­
trol circuitry and the processor in each cell can be fully checked, as can the 
operation of the signature comparator.
9.11 Comments
In this chapter we have proposed two techniques which could be used to 
reduce the vulnerability of the WINNER control circuitry. The techniques 
are quite different and are likely to find use in different situations. The 
first technique involving duplicated control circuits requires less additional 
hardware than the external testing approach and is more suited to general 
applications, such as the fabrication of large integrated circuits, in which the 
configured array can be tested before use. For single faults occurring in each 
pair of control circuits, the duplication approach has the same performance 
as the external testing approach, being able to mask any fault. The second 
technique, requiring external testing of the control circuitry can be used if 
an absolute guarantee is required that the control circuitry, self-test signa­
ture comparator, and cell interconnections are functioning correctly. This 
assurance may be required in remotely sited applications, where testing the 
configured system may be impossible.
C hapter 10
W I N N E R  D em onstrator
10.1 The Need for a Demonstrator
In previous chapters we have described the operation of the WINNER self- 
organising algorithm and its associated testing strategy, and illustrated the 
performance of the approach by simulation at both functional and hardware 
levels. This may be adequate as a purely academic investigation of the con­
cept of self-organisation but is not sufficient if the work is to eventually be 
realised in a practical system. The WINNER techniques could be used in 
either a monolithic device such as a large area integrated circuit, or in a 
system required to have a high availability, which might be constructed from 
many separate modules, perhaps printed circuit boards containing compo­
nents. For designers of these systems, the existence of a simple but practical 
demonstration of the WINNER algorithm would be a valuable aid to increas­
ing confidence, understanding and appreciation of the benefits the approach 
can offer.
10.2 Type of Demonstrator
The most attractive approach to demonstrating the WINNER algorithm 
would be to apply WINNER to a large area integrated circuit which would 
normally be expected to have a yield close to zero, and to show that the 
use of WINNER in the device allows a yield significantly above zero to be
164
WINNER DEMONSTRATOR 165
achieved. However, the design and fabrication of such a device would con­
stitute a thesis in its own right and cannot be tackled within the constraints 
of the current project. Furthermore, a monolithic demonstrator would be 
very inflexible in the sense that the distribution of faults for a given de­
vice is fixed. This means that the ability to examine the response of the 
configuration algorithm to different fault distributions would be limited.
A more tenable approach, is to construct an array of printed circuit 
boards (pcbs) each comprising a single WINNER cell containing processor, 
control circuitry and test circuitry. This approach has been chosen for the 
WINNER demonstrator for this project and has several advantages over a 
monolithic device as a first demonstrator. Firstly, it can be completed within 
the project timescale, secondly, the cost involved in fabricating the demon­
strator would be relatively low compared to a monolithic device, and thirdly, 
the larger physical scale of the demonstrator will present greater opportunity 
for producing a design whose operation can be understood by lay observer. 
This could be achieved by the inclusion of extra features such as lights to 
indicate the positions of faulty cells and configured rows, and the ability to 
manually introduce faults into the array to demonstrate operation of the 
configuration and fault masking processes.
10.3 Demonstrator Objectives
The objectives for the WINNER demonstrator are as follows:
1. To verify correct operation of the WINNER self-organising algorithm 
when implemented in hardware.
2. To provide a visual illustration of the operation of the algorithm that 
can be appreciated and understood by both expert and non-experts 
observers.
3. To verify the operation of the testing strategy and the procedure for 
masking control circuitry faults.
WINNER DEMONSTRATOR 166
4. To act as a first prototype which could lead towards development of a 
large area silicon demonstrator based on WINNER.
10.4 Demonstrator Specification
Consideration o f  the hardware implementation of WINNER has led to the 
following specification for the demonstrator.
1. The demonstrator should be capable of configuring a functional array 
with 4 rows and 4 columns from an array containing 6 rows and 4 
columns. It should use the one-dimensional WINNER algorithm since 
as described in chapter 6, this is the algorithm most likely to be used 
in practice.
2. The processor design is to be kept simple by avoiding the use of self­
test circuitry since this is the subject of much research elsewhere and is 
known to be possible. It is considered adequate for the purposes of this 
demonstration to use a switch on each cell to indicate the condition of 
the processor. This provides the flexibility to alter the fault distribution 
to test and illustrate various array configurations.
3. The configuration of the functional rows should be indicated visually 
using LEDs or similar illuminating device. It should also be possible 
to configure the array in alow motion so that the interaction between 
REQuest and AVAILability signals can be observed.
4. Correct functioning of the configured processor array must be demon­
strated and it should be obvious to an observer that this is the case. 
This could be achieved by allowing the array to perform its function 
on known input data and observing the output.
5. The necessary row selection circuitry for both inputs and outputs 
should be included.
WINNER DEMONSTRATOR 167
6. The modified scan-path testing procedure is to be included and should 
permit detection and masking of real faults introduced into the con­
trol circuitry and inter-cell connections, for example, by shorting wires 
together. This will necessitate the development of an external tester 
for generating test patterns, evaluating responses from the array and 
producing the appropriate fault masking pattern. The location of the 
cell in which the fault was detected should be indicated visually.
10.5 Processor Array Function
Since the main purpose of constructing a demonstrator is to prove the opera­
tion of the WINNER algorithm and its associated test strategy, the function 
of the processors used in each WINNER cell and the function of the pro­
cessor array itself are of secondary importance. For this reason an array of 
very simple processing elements (PEs) has been chosen. The array performs 
the multiplication of two binary numbers A and B using ripple-through PEs 
and is shown in figure 10.1(a). The ith bit of A, o,, 0 <  »' < 3 is broadcast 
to each PE in the ith row, while the »'th bit of B, 6,, 0 < t <  3 is broadcast 
to each PE in the ith column of the array. Each PE takes inputs a, and 6,, 
s and c and generates a new value for s and c as shown in figure 10.1(b). 
The values of s and c are formed by multiplying the incoming bit from A 
with that from B and adding the product to the incoming s and c values. 
The main array of PEs in figure 10.1(a) generates all the partial products 
required for the final result and also performs part of the task of summing 
the partial products. The remainder of the task is carried out by the extra 
row of full adders shown at the bottom of the main array. The product of 
A and B emerges as shown. The multiply function carried out by this array 
is ideal for this project because the PEs are very simple and, in addition, an 
observer will instantly recognise whether or not the array is producing the 
correct product.
It will be noticed that the array shown in figure 10.1(a) contains diagonal
X Q
l
WINNER DEMONSTRATOR
connections between PEs. These must be removed before the WINNER algo­
rithm can be applied. Figure 10.2(a) shows the same array redrawn without 
diagonal interconnections. An extra dummy connection has been made as 
shown in figure 10.2(b) which enables the diagonal path to be generated from 
two orthogonal paths. The function of the PE is otherwise unchanged.
10.6 Implementation Options
The circuitry for the cell to be used in the demonstrator can be implemented 
in a number of ways. Those which have been considered are described below 
and one of the approaches is selected for the design.
10 .6 .1  LSI C om ponents
The most straightforward implementation approach is to use standard, off- 
the-shelf components such as those available from the TTL or CMOS families 
of logic circuits. The appropriate components are simply selected from the 
catalogue and the pcb is designed to interconnect them in accordance with 
the circuit diagram. However, although the circuitry required in the cell is 
relatively simple, 17 LSI chips would be required which together with the 
necessary LEDs etc would result in a pcb of about 5 inches square. It was 
felt that this would in turn result in an unacceptably large array size.
10 .6 .2  Custom  C hip or G ate A rray
In this approach, the circuitry for an entire cell would be placed on a single 
chip. A cell library is used to provide standard functional blocks such as 
gates and latches and these are placed on the silicon and interconnected as 
required. The completed chip could then be used on a pcb together with 
the appropriate LEDs and switches. This approach would incur the least 
effort in pcb design but would be relatively inflexible in terms o f ability to 
introduce faults at various locations within the cell.
WINNER DEMONSTRATOR 170
Figure 10.2: Multiplier array redrawn with orthogonal interconnections: (a) 
Array, (b) Single cell.
WINNER DEMONSTRATOR 171
10.6.3 E P R O M  Im plem entation
Since much of the circuitry required in the WINNER cell is combinational 
logic, it is possible to design a compact circuit using EPROMs. EPROMs 
are Programmable Read Only Memories, which can also be erased for al­
teration or re-use by prolonged exposure to ultra-violet light. EPROMs can 
implement a truth table of combinational logic if data corresponding to the 
output required from each input address is stored in the memory. EPROMs 
are primarily designed for use in computer systems and as a result store 
8-bit data words. One 8-bit word is therefore output by the EPROM for 
each address input. The number of bits in the input address is determined 
by the size of the EPROM, with for example, a IK x 8 EPROM having a 
10-bit address. The most cost effective EPROM at the time the design of 
the demonstrator was being undertaken was an 8k x 8 device, having 13 ad­
dress inputs. The advantage of using EPROMs in the demonstrator is that 
the number of chip packages can be reduced. This reduces the pcb size and 
design time. However, the cost of using EPROMs is approximately the same 
as that for LSI devices.
EPROMs have some of the advantages of both the custom chip approach 
and the LSI approach for this project. For this reason they have been chosen 
as the method for implementing the combinational logic required in the cells. 
Careful partitioning of the circuit will be necessary to minimise the number 
of EPROMs required.
10.7 Demonstrator Circuitry Requirements
A block diagram of the demonstrator is shown in figure 10.3, and shows the 
main components of the system. In addition to the 6-row by 4-column WIN­
NER array, there are other circuits which are required to perform functions 
such as generation and display of input data and results, row selection for the 
input and output of data to the array, and interfacing to the BBC computer. 
The circuits for these functions are described in more detail in the following
WINNER DEMONSTRATOR 172
Figure 10.3: Block diagram of the WINNER demonstrator.
WINNER DEMONSTRATOR 173
sections.
10.7.1 Circuit for the W I N N E R  Cell
The circuitry required to implement a cell to the to the specification of the 
demonstrator is illustrated in figure 10.4. The functions shown in blocks are 
implemented in separate EPROMs and are shown in detail in figures 10.5,
10.6 and 10.7. The inputs and outputs on the right hand side of the cell each 
pass through a latch which is additional to the latch required for the scan 
path testing scheme. The extra latches serve no function in the WINNER 
algorithm, but allow the array to be configured one step at a time by clock­
ing the latches slowly. The partitioning into circuits suitable for EPROM 
implementation has been carried out so that the WINNER control circuitry 
which generates REQuest and AVAILability signals is in one EPROM, while 
the processor and data routing circuitry is in another. The logic involved in 
the scan path testing circuitry requires two EPROMs for its implementation 
as shown.
The logic levels of the input and output REQuest and AVAILability sig­
nals are indicated by LEDs which are illuminated when the signal is at a high 
logic level. These LEDs are placed on the periphery of the cell corresponding 
approximately to the positions where the signals enter or leave the board. A 
switch is included on each cell to simulate the output of a self-test circuit. 
The switch controls an input to the EPROM which contains the control cir­
cuitry. A green LED at the centre of the pcb is used to indicate the result 
of the simulated self-test; if illuminated, it indicates that the processor is 
functional, and vice-versa. Another LED, also in the centre of the pcb is 
used to indicate the position of cells which have been found to contain faults 
during the scan path testing of the cell. A cell found to have a fault will 
cause its (red) LED to flash. Test points have been included on each cell 
to allow the introduction of faults on the REQuest and AVAILabilty signal 
lines. These are not the only faults which can be applied, but they are the 
ones most likely to be used to demonstrate the fault masking process.

WINNER DEMONSTRATOR 175
Fi
gu
re
 1
0.5
: 
Eq
ui
va
len
t c
irc
ui
t o
f t
he
 C
on
tr
ol
 P
RO
M
.
WINNER DEMONSTRATOR 176
10.7.2 O ther C ircu itry
Extra Row of Full Adders
This circuit has been implemented in a single EPROM, and takes its inputs 
from the bottom of the WINNER array.
Input/Output Row Selection Circuitry
The circuitry used in the input and output row selection arrays is purely 
combinational and can therefore be implemented using EPROMs. The cir­
cuits of the cells of the input and output selection arrays together with the 
circuitry placed on each EPROM are shown in figures 10.8 and 10.9 respec­
tively. Sufficient circuitry for two rows of cells has been placed in a single 
EPROM, and three EPROMS can be cascaded to form input and output 
arrays of the required dimensions.
WINNER DEMONSTRATOR 177
TO LATCH 
n O H  LATCH
OUT 1
TO LATCH 
n O H  LATCH
OUT 2
TO LATCH 
mOH LATCH
OUT 3
TO LATCH 
n O H  LATCH
OUT 4
Figure 10.7: Equivalent circuit of the Scan PROM.
WINNER DEMONSTRATOR 178
TO NCXT DATA£N7WYP*0U
Figure 10.8: Equivalent circuit of the data entry PROM.
Data Generation and Display
The two data words to be multiplied together on the configured array are 
each generated by a circuit comprising a 4-bit binary up/down counter circuit 
clocked by debounced switches. The value in the counters is displayed on 
7-segment LED displays with one of the values being fed to the input data 
row selection circuit, and the other directly to the cells in the array. The 
display circuitry also receives the product generated by the array (via the 
output data row selection circuitry), and displays the result. This should 
therefore be equal to the product of the two input numbers when the array 
has been configured.
Scan Path Testing Circuitry
Most of the circuitry required to implement the scan path testing procedure is 
embodied within the cells themselves. However, since a scan path is required 
on the left and right hand sides of each cell and the cells have been designed 
with a scan path on only the right hand side, an extra scan path is required 
on the far left hand side of the array. This has been implemented in the 
same manner as those within cells, except that hand wired boards have been
WINNER DEMONSTRATOR 179
CONNECT TO NEXT OATAOUTPUTPROU
Figure 10.9: Equivalent circuit of the data output PROM.
used.
10.8 External Testing
The scan-path testing procedure requires the use of an external tester which 
is known to be fully functional. The tester must be capable of applying 
test patterns to the array via the scan paths, receiving the responses from 
the array, and generating the appropriate fault masking patterns. These 
must then be loaded into the scan paths before configuration commences. 
It is possible to design a piece of dedicated hardware to perform this task 
but it would be very inflexible to changes in test pattern, and would take 
considerable time to design and construct. A better approach, and the one 
adopted in this project is to use a computer and program it to perform the 
required task. We have used a BBC computer, with the I/O  and printer ports 
being used to transfer data. A circuit has been designed which provides the 
necessary interface between the computer and the array.
WINNER DEMONSTRATOR 180
10.8.1 C om puter Interface circuitry
The BBC computer has insufficient input/output ports to allow one port to 
be dedicated to each of the inputs and outputs which require to be monitored. 
For this reason an interface circuit between the computer and the array is 
required. The function of the interface circuit is to provide a set of latches 
with one latch for each input and output. A multiplexer arrangement is 
included so that different subsets of latches can be selected. The computer 
can then either write to the subset, or read from it as required.
10.8.2 Testing Software
The BASIC program which runs the test sequence is included in Appendix C 
together with a brief description of its operation. The program makes full 
use of the PROCedures available in BBC BASIC and is therefore relatively 
easy to read. The instructions for setting the various array inputs to TRUE 
or FALSE have also been written so that a simple procedure call is all that 
is required to change the value of an input. The program has been designed 
so that it operates on a menu basis. The operation of the array can therefore 
be controlled without knowing the details of how the program operates.
10.9 Demonstrator Results
A photograph of the completed demonstrator system together with computer 
and computer interface is shown in figure 10.10. A close-up photograph of a 
single cell of the WINNER array is shown in figure 10.11, and a configured 
array is shown in figure 10.12. This has been taken through a red filter so 
that only the REQuest signals can be seen. These indicate the positions of 
the functional rows.
The completed demonstrator operates fully to specification and has been 
demonstrated to many people. The configuration can be observed in slow 
motion and the interaction between REQuest and AVAILability signals can 
be seen clearly. The display of the product generated by the array is always
W
 I 
N
 N
 E
 R
 S
EL
F-
O
R
G
A
N
IS
IN
G
 A
R
R
A
Y
Fi
g
u
re
 1
0
.1
0
: 
Ph
ot
og
ra
ph
 o
f t
he
 d
em
on
st
ra
to
r s
ho
wi
ng
 th
e 
W
IN
N
E
R
 ar
ra
y, 
co
m
p
ut
er
 i
nt
er
fa
ce
 a
nd
 c
om
pu
te
r.
WINNER DEMONSTRATOR 182
Figure 10.11: Single cell of the demonstrator array. The values of the RE- 
Quest and AVAILability inputs and outputs are indicated by the red and 
green LEDs respectively.
correct when the scan path test has been carried out and when the array is 
in configuration mode. When the distribution of processor faults is altered 
by adjusting the positions of the switches, the display of the result generated 
by the array multiplier returns to the correct value after a brief transient 
period while the array is reconfiguring.
The testing procedure based on the BBC computer is rather slow. How­
ever, this is a feature of the BBC and the way in which the program has been 
written. It could be greatly speeded up by writing critical portions of the 
program in machine code or by using a more powerful computer. A dedicated 
circuit to generate the test patterns would also operate much more rapidly.
WINNER DEMONSTRATOR 183
Figure 10.12: Photograph of the demonstrator array when configured for a 
particular fault distribution. (Taken through a red filter to highlight the 
configuration pattern.)
WINNER DEMONSTRATOR 184
The problem is of little consequence to the WINNER algorithm itself.
Real faults, such as stuck-at-1 or 0 can be introduced into the control 
circuitry via the test points on the pcbs, or by other means, and these are 
all correctly detected by the test sequence and the culprit cell identified by a 
flashing LED. Subsequent configuration of the array shows that the cell into 
which the fault has been introduced has been correctly avoided and is not 
part of a functional row even though it may appear to be outputting TRUE 
AVAILability signals. The resulting product is also correct.
10.10 Concluding Remarks
The main purpose in constructing a self-organising array based on WIN­
NER was to demonstrate that the ideas proposed in chapters 5 and 9 are 
sound and practically realisable. We have shown this to be the case, and 
have not encountered any unforseen problems. Further work towards the 
development of a large area monolithic device based on WINNER can now 
be undertaken, confident in the knowledge that the underlying configuration 
and fault-masking techniques are sound.
C h apter 11
A pp lication s of W I N N E R
11.1 Introduction
The WINNER self-organising technique described in the previous chapters 
can potentially find use in several different applications. In this chapter we 
briefly outline some of these applications and provide an indication of the 
specific details of each in terms of special implementation requirements. The 
first section is concerned with the types of processor array which could benefit 
from a technique like WINNER, while later sections focus on the application 
of WINNER to various different implementation approaches. The chapter is 
not meant to provide complete details of how WINNER would actually be 
used in each application, since this would constitute a thesis in its own right.
11.2 Application to Processor Arrays
As was described in chapter 2, arrays of processing elements are finding 
increasing use in high performance computers and in digital signal processing 
applications. In this section we briefly mention several processor arrays which 
are currently of interest in system design and in which the WINNER self- 
organising technique could be used.
185
APPLICATIONS OF WINNER 186
_L L
TRANSPUTER
T T -
B I-D IR E C T IO N A L
L IN K S
Figure 11.1: Schematic view of the Transputer.
11.2.1 T h e  Transputer
Transputer chips are manufactured by Inmos Ltd, see Inmos (1984). A 
Transputer is essentially a RISC microprocessor with additional circuitry to 
provide it with four high speed bi-directional serial data links. The links 
are completely asynchronous and have been designed to allow easy intercon­
nection of many transputers in an array. A schematic view of a Transputer 
is shown in figure 11.1. The connectivity of the array can be designed to 
suit the problem in hand. For orthogonally interconnected arrays, or arrays 
which can be transformed so that they contain only orthogonal interconnec­
tions, it would be possible to use WINNER to provide the basis of a fault 
tolerant capability. Such arrays are useful in image processing applications, 
where part of an image might typically be allocated to each processor in the 
array. An array o f transputers is a SIMD machine since each transputer can 
contain a separate program and operate on separate data.
An advantage of the transputer in the context of WINNER is that the 
serial nature of the interconnecting links means that only a pair of wires 
requires to communicate in each of the four directions. This means that 
routing of the data lines can be achieved with very little hardware. Currently,
APPLICATIONS OF WINNER 187
the transputer is not able to test itself, so some external mechanism would 
be required to perform this function.
Being a single chip, the transputer also lends itself to wafer scale integra­
tion, although at present, the size of the chip means that the necessary yield 
of chips on the wafer is likely to be insufficient for economic production.
11.2.2 T he D istributed  Array P rocessor (D A P )
The DAP (Reddaway, 1973) is built by ICL and is a high-performance gen­
eral purpose array processor which supports a high level language called DAP 
Fortran. The DAP can be programmed to perform many tasks in the field of 
digital signal processing including speech processing and image processing. 
It is a SIMD machine and contains a 64 by 64 array of bit-level processing 
elements each having 4kbits of memory. The processing elements communi­
cate with neighbouring elements via an orthogonally interconnected mesh as 
shown in figure 11.2. Each processing element in the DAP is currently im­
plemented using several separate chips but communication between elements
APPLICATIONS OF WINNER
is by single bit signal lines, which is ideal for WINNER.
11.2.3 D igital Signal P rocessing A rrays
Under this heading is grouped a number of arrays which essentially perform 
a fixed task although they are programmable in terms of coefficients and 
operands. Their purpose is to perform a limited class of operations very 
rapidly on streams of high speed data. Functions may include matrix opera­
tions, filtering operations such as filter banks as well a predefined operations 
on images, such as edge detection and segmentation. Arrays can be built 
from standard chips such as multipliers, cascadable filters or microproces­
sors specifically designed for use in digital signal processing arrays, such as 
the Systolic Node chip (Hargrave, 1986) which is currently under develop­
ment at STL.
Since the processing elements tend to be available as single chips there is 
great potential for improved performance by implementing arrays as mono­
lithic circuits.
11.3 Application in Array Implementation
The following applications of the WINNER self-organising technique are pre­
sented in order of increasing difficulty of implementation.
• High availability applications,
• Implementation as part of an active substrate in a silicon hybrid,
• As the configuration technique in Wafer Scale Integration,
It should be possible to use WINNER in all of the above applications 
without much extra effort and without any modification to the basic WIN­
NER algorithms or hardware. In the following sections, we describe the 
essential features of each application and the main problems which must be 
addressed.
APPLICATIONS OF WINNER 189
11.4 High Availability Systems
A High Availability system is one which although not able to tolerate faults 
as they occur can nevertheless rapidly reconfigure itself to avoid the fault. 
This means that the user of the system would receive corrupted output data 
for a short period while the system reconfigures. High availablity systems are 
useful both in remote applications such as satellites, where manual repair is 
not practical, and in applications where the time to carry out manual repair 
would be unacceptable.
With the increasing interest in parallel processing techniques as a means 
of achieving high performance computers, there are several computing ma­
chines available which consist of two-dimensional arrays of processing el­
ements. In general, currently available technology does not permit these 
arrays to be implemented as single chips due to the overall complexity of 
the system, and the arrays are usually partitioned into a number of printed 
circuit boards. When the processor array is manufactured, it is tested to 
check that it is fully functional. However, when in service, it is possible 
for components or interconnections to fail and automatic repair would be 
advantageous in many applications.
WINNER could be used to provide this automatic repair capability. A 
special feature of this application is that all the circuitry involved, including 
that specific to WINNER is fully functional at the start of the life of the sys­
tem. This is different to the other potential applications of the technique. As 
a result, a straightforward implementation of the WINNER control circuitry 
together with self-test of the processor is all that is required.
11.4.1 Special Considerations
Self-Test
Some mechanism is required by which the array can be informed that it 
contains a fault. The simplest, but not very satisfactory approach could 
involve the user who could initiate a retest and reconfiguration when he
APPLICATIONS OF WINNER 190
detects an error. A better method would be to automatically perform a 
self-test at regular intervals or whenever the processor is idle. Any detected 
faults could then be avoided by reconfiguration.
In systems constructed from a number of interconnected printed circuit 
boards, the most likely failure mechanism is a fault occurring at the interface 
between different levels of integration. In a typical system inter-chip com­
munication might include the following interfaces: chip-to-package, package- 
to-board, board-to-backplane, backplane-to-board, board-to-package, and 
package-to-chip. Since the interconnections between individual processors 
are likely to involve communication across some of these interfaces, the self­
test approach used in the system must be capable of checking the inter- 
processor connections.
Control Circuitry
At the start of the life of the system, each control circuit will be fully func­
tional. If the control circuitry is implemented on a single chip it will have a 
small probability of failure compared with the rest of the processors. How­
ever, since the most likely cause of failure is the interconnections between 
boards, the operation of the configuration circuitry could be significantly 
affected since signals are passed between processing elements. Faults in the 
interconnections between control circuits in adjacent cells can be tolerated by 
taking advantage of the active and passive conditions which were described 
in chapter 9. In this application, we simply need to design the input lines to 
the control circuits such that if the line is open-circuit, it is pulled low and 
as a result appears to be a passive input level. This will result in the circuit 
receiving the passive signal treating its neighbour as if it was a faulty cell, 
and ignoring it.
11.4.2 G enera] C om m ents
One of the advantages of using WINNER in a high-availability application 
based on an array of pcbs is that only a small number of redundant elements
APPLICATIONS OF WINNER 191
are required. This is because the manufacturer can ensure that all of the 
array elements are fully functional before installing the system, and sufficient 
spare capacity need only be provided to cope with the expected failure and 
repair rates. The minimum requirement is one spare row, but provided that 
not more than one fault occurs in any column, several faults can be tolerated 
by the spare cells. This results in a very cost-effective solution and enables 
the array to be configured automatically whenever a fault is detected.
11.5 Silicon Hybrids
In a conventional hybrid, individual dice, or chips mounted in leadless chip 
carriers are bonded to a ceramic substrate. The substrate provides the re­
quired interconnections between the chips through metal tracks (usually 
gold) which are screen-printed onto its surface. This approach has been 
used for many years and offers much reduced sire of system implementation. 
However, further miniaturisation using the conventional hybrid technique is 
limited by the low density of interconnect possible on the ceramic substrate. 
For this reason the replacement of the ceramic substrate with a silicon wafer 
or part wafer has been the subject of research for several years.
A Silicon Hybrid is a compromise between a conventional hybrid circuit 
and Wafer Scale Integration and has many advantages over conventional 
hybrids, a few of which are listed below: (Hagge, 1988)
• Wafers can use integrated circuit lithography techniques to generate 
the high density interconnections,
• Silicon hybrids can have active substrates, in which some components 
are fabricated in the substrate material and others bonded to the sur­
face,
• The higher density interconnect means that chip-to-chip propagation 
delays are reduced due to shorter interconnections,
APPLICATIONS OF WINNER 192
11.5.1 Requirem ent for Fault Tolerance
An advantage of both the conventional hybrid approach and the silicon hy­
brid approach from the point of view of fabrication is that each chip to be 
bonded to the substrate can be fully tested before use. In addition, the inter­
connections on the silicon substrate can be checked by a probe test. During 
assembly, however, the bonding process may not have a 100% yield, which 
will result in some substrates being faulty. At present, the technology of sil­
icon hybrids is immature and no firm details are available on the reliability 
o f the bonding process. It is likely that the faulty bonds may be able to be 
repaired by being reflowed, but in applications involving large numbers of 
chips each with a dense pinout structure it is likely that repair may prove 
costly.
An approach which could avoid having to locate and repair bonding de­
fects if a 2-dimensional array is to be fabricated is to use the WINNER 
technique and implement the control circuitry in the active substrate of the 
hybrid. A further advantage of this approach would be that faults occurring 
in-service can also be automatically tolerated.
Placing the configuration control circuitry in the active substrate means 
that it remains entirely separate from the chips being attached to the surface 
o f the silicon. As a result, no alterations are required to the chips themselves 
and so standard components can be used. The only requirements are that 
the processors are self-testing and that the yield of the control circuitry is 
sufficient to allow a high percentage of the active substrates to be fully func­
tional. There is already a trend towards self-testing chips since they reduce 
test times in conventional stand-alone chips and if this trend continues, such 
chips would be ideal for use in silicon hybrids.
11.6 Wafer Scale Integration
Wafer Scale Integration (WSI) is possibly the most demanding application of 
both the WINNER technique and the silicon process being used. In WSI, no
APPLICATIONS OF WINNER 193
component can be guaranteed to work and the best one can do is to ensure 
that the process has a yield which, on average, produces acceptable results.
The original idea behind WINNER was to develop a configuration tech­
nique suitable for enabling WSI, or at least, large chips to become a reality 
for 2-dimensional arrays of processors. We believe that the techniques have 
been developed to the stage where it is appropriate to consider building a 
WSI demonstration. However, the configuration of the processors is not the 
only problem involved in WSI, and the following list highlights some of the 
other areas which must be considered to be successful in WSI.
• Control circuit yield,
• Power supply distribution and integrity,
• Clock line distribution and integrity,
• Disconnection of faulty processors,
• Fault model for the process being used,
• Lithograpic techniques for large area exposure,
• Packaging techniques,
• Thermal management within the package,
Some of these problems have been addressed within this project and the 
results reported in this thesis. These include techniques for improving control 
circuit yield, and a study of integrated circuit fault distributions. It is not 
possible to address the other problems here since they are beyond the scope of 
this thesis. However, some work has been done in the context of specific WSI 
architectures on the distribution of power and clock signals by Warren et al, 
(1986) and Coleman and Lea (1986). The problem of disconnecting faulty 
processors is discussed by Warren and Lea, (1987). Problems associated with 
packaging and thermal management have been considered by Pitt (1987).
C hapter 12
Conclusions and Further W ork
12.1 Conclusions
In this thesis we have presented the results of a project concerned with re­
search into configuration techniques suitable for use in Wafer Scale Integra­
tion. Motivation for the research has been generated from two independent 
directions. Firstly, the increasing trend towards regular parallel processing 
architectures is likely to have a significant, beneficial impact on the design of 
integrated circuits and enable very large chips to be designed with moderate 
effort. This increase in the level of integration should then feed back into 
the field of parallel processing in the form of reduced hardware costs etc. 
Secondly, these large chips are at present impossible to produce successfully 
due to the fabrication defects which are always introduced during silicon pro­
cessing. A study of parallel processing trends and of the fault distributions 
occurring in integrated circuits has been carried out and the results have 
been presented.
A detailed study of published approaches to the problem of configuring 
a 2-dimensional array has been carried out. Each author proposes his own 
approach to arranging the switching elements which perform the routing of 
data to avoid fault elements. Many different methods of implementing the 
switching elements are also proposed and we have developed classifications 
for the switch organisation methods and implementation methods. We show 
that there is an alternative, novel approach to configuration which has not
194
CONCLUSIONS AND FURTHER WORK 195
been addressed in the literature, namely that of Self-organisation.
In the remainder of the thesis we have concentrated on developing the 
concept of self-organisation. We have proposed a novel algorithm, called 
WINNER by which a 2-dimensional array can avoid the faulty processing 
elements and organise itself into a functional array without any external as­
sistance. We have performed simulations which compare the performance of 
WINNER with other published techniques. We show that the configuration 
performance of WINNER is comparable to other approaches when processor 
yield is 80% or more, although it becomes less competitive at lower yields. 
In addition, WINNER has several advantages.
• The array can configure itself automatically without the need for ex­
ternal assistance,
• The self-organising algorithm is fully convergent and cannot become 
unstable,
• The control circuitry associated with each cell in the array is simple, 
about 20 gates, resulting in a low hardware overhead. In fact for arrays 
with more than four spare rows, the overhead is less than the competing 
approaches,
• The control circuitry has the inherent ability to tolerate certain faults 
occurring within it. In addition, it can be designed so that it can 
tolerate a range of other faults without affecting the operation of the 
remainder of the array,
• The technique results in good utilisation of the functional processors 
particularly when processor yield is greater than 80%,
• No global control lines are required,
• The array is potentially capable of being re-configured in the event of 
an in-service failure and could therefore be useful for remotely sited 
equipment, or in equipment requiring a very fast repair time.
CONCLUSIONS AND FURTHER WORK 196
The fact that the hardware overhead of the WINNER technique compares 
favourably with competing techniques is a somewhat surprising result. It is 
due to the fact that the required configuration of the functional array is 
calculated by the control circuitry itself and as a result there is no need to 
use latches to store switch control information within the array.
A weakness of any fault tolerant strategy is that faults occurring in the 
switching circuitry can render the entire array unusable. This problem does 
not appear to have been addressed in the configuration techniques published 
in the literature. However, we have shown that the WINNER approach 
has the intrinsic ability to tolerate some faults in the control circuitry. In 
addition, we have developed a technique which enables many more control 
circuit faults to be tolerated automatically. We have also shown that all of 
the control circuits in the array can be fully checked by a simple external 
test if a simple novel scan-path arrangement is incorporated into the array. 
Faulty control circuits can subsequently be masked from the rest of the array 
before configuration takes place.
We have ¿fa n *  that the WINNER concept is practically realisable by 
constructing a small demonstrator based on an array of printed circuit boards. 
The array demonstrates the operation o f the basic WINNER configuration 
algorithm and also allows real faults to be introduced into the control cir­
cuitry (by shorting signal lines high or low). These faults are then detected 
by the scan-path testing scheme and after their effect has been masked from 
the rest of the array correct configuration of the array is demonstrated. As 
described in the following section, the WINNER technique is currently being 
used by a UK company as part of their research into Wafer Scale Integration.
Finally, we have briefly outlined some of the applications which could 
benefit from the WINNER approach. These include high-availability sys­
tems, silicon hybrid construction and reliability improvement, and Wafer 
Scale Integration. We have identified several systems using orthogonally in­
terconnected arrays of processors which could potentially benefit from the 
WINNER technique.
CONCLUSIONS AND FURTHER WORK 197
12.2 Suggestions for Further Work
As far as possible, the problems concerning the application of WINNER 
to orthogonally interconnected two-dimensional arrays has been addressed 
within this thesis. In the context of further research outside the scope of this 
thesis, however, there are several areas on which attention could be focussed.
12.2.1 Fabrication o f  a M onolith ic circuit
The real test of WINNER will occur when it is used in the design and fabri­
cation o f a monolithic circuit. It is only when used in a real application that 
the technique will be fully tested and any weaknesses uncovered.
In the past few months, a British company has shown interest in the 
WINNER technique and intends to use it as the basis for its in-house research 
program into wafer scale integration. This is very encouraging both from the 
point of view of WINNER and of the long term view and commitment being 
shown by the company concerned.
The project will contain a number of stages, the first of which will be to 
demonstrate that the control circuitry can be fabricated with a sufficiently 
high yield and that some functional devices can be obtained. Initially only 
a very simple processing element will be used, such as that used in the 
demonstrator reported in chapter 10. If necessary, the dual-rail techniques 
described in chapter 9 will be used to increase control circuit yield. When it 
has been demonstrated that sufficient control circuit yield can be achieved, 
a larger processor will be used and a practical wafer scale circuit will be 
designed.
At present the project is at an early stage and no results have yet been 
obtained.
12.2.2 Extension to  Tree Structures
It may be possible to apply the concept of self-organisation to other inter­
connection patterns, in particular to tree structures. Tree structures are of
CONCLUSIONS AN D  FURTHER WORK 198
interest in the field o f artificial intelligence because they allow searching of 
rule bases to be carried out in parallel (Hillis, 1986). One of the attrac­
tions of the tree structure, particularly in wafer scale integration is that only 
a single processing element needs to communicate with the outside world. 
However, one of the problems of developing a self-organising tree structure 
is that the optimum configuration is likely to depend on the size of the tree 
which can be configured, and of course this will not be known at the start 
of the configuration process.
12.2.3 Fault Tolerant Sw itch ing N etw orks
There is much interest in the literature in Interconnection Networks for com­
puters; see for example the review by Adams, Agrawal and Siegel (1987). 
Engineers are interested in designing switching networks capable of inter­
connecting any o f a group of machines to any other in the group. This 
represents a challenge in itself in terms of designing a switch with the ap­
propriate trade-off between hardware used to implement the switch and the 
flexibility of the switch. Also of interest is the ability of the interconnection 
networks to tolerate faults, so that in the event of a fault in the switch, an 
alternative route between the required machines is available.
The link between this work and WINNER is that WINNER is a type of 
fault tolerant switching network. It may be possible, with some modification 
of the particular WINNER algorithm presented in this thesis, to apply a 
similar approach to this area. For example, in an interconnection network, 
the removal of a connection between two parties should have no effect on the 
machines still connected. In the basic WINNER approach, the removal of 
a functional row would cause all functional rows below the one removed to 
reconfigure to take up the vacated space.
An initial, very brief study has been carried out which suggests that this 
problem can be overcome by including latches in the WINNER control cir­
cuitry to store the configuration pattern of each functional row until the row 
is no longer required. Space vacated by the removal of interconnections would
CONCLUSIONS AND FURTHER WORK 199
be used wherever possible when new interconnection are set up. However, 
much more work is required in this area before useful algorithms emerge.
Bibliography
Adams G B, Agrawal D P and Siegel H J (1987), Fault-Tolerant Multistage 
Interconnection Networks, IEEE Computer, June 1987, pp 14-27.
Aubusson R C and Catt I (1978), Wafer Scale Integration - a Fault Tolerant 
Procedure, IEEE J o f Solid State Circuits, Vol 13, No 3, pp 339-344.
Avizienis A, (1976), Fault-Tolerant Systems, IEEE Trans, on Computers, 
C-25, No.12, December 1976, pp 1304-1312.
Barsuhn H (1969), Functional Wafer - A New Step in LSI, 1977 European 
Solid State Circuits Conference (Ulm, Germany), pp 79-80.
Bentley L and Jesshope C R (1986), The Implementation of a Two-dimen­
sional Redundancy Scheme in a Wafer Scale High Speed Disk Memory, 
in Wafer Scale Integration, C R Jesshope and W R Moore eds, Bristol 
UK, Adam Hilger, pp 187-197.
Calhoun D F (1969), The Pad Relocation Technique for Interconnecting LSI 
Arrays of Imperfect Yield, Proc 1969 Fall Joint Computer Conference, 
pp 99-109.
Catt I (1981), Wafer Scale Integration, Wireless World, July 1981, pp 57-59.
Chapman (1985), Laser-Linking Technology for RVLSI, Proc International 
Workshop on Wafer Scale Integration, Southampton University, July 
1985, pp 204-215.
200
Bibliography 201
Chen W, Mavor J, Denyer P B and Renshaw D (1988), Superchip Architec­
ture for Implementing Large Integrated Systems, Proc IEE Part E, Vol 
135, No 3, May 1988, pp 137-150.
Cliff R A and Rao T R N (1974), Improving Yield of LSI Memory Chips by 
the Application of Coding, 8th Princeton Conf on Information Science 
and Systems, pp 386-390.
Cole B C (1985), Wafer Scale Integration Faces Pessimism, Electronics Week, 
April 1, 1985, pp 49-53.
Coleman J N and Lea R M, (1986), Clock Distribution Techniques for Wafer 
Scale Integration, in Wafer Scale Integration, eds Moore and Jesshope, 
Adam Hilger, 1986, pp 46-53.
Daniels R G and Bruce W C (1985), Built-in Self-test Trends in Motorola 
Microprocessors, IEEE Design and Test, April 1985, pp 64-71.
Eckert J P Jr, Weiner J R, Welsh H F and Mitchell H F, (1951), The UNI- 
VAC System in Bell and Newell 1971, pp 157-169.
Eichelberger E B and Williams T W , (1977), A Logic Design Structure for 
LSI Testability, 14th Annual Design Automation Conference, June 
1977, pp 462-468.
Evans R A, McCanny J V, McWhirter J G, McCabe, Wood D and Wood K 
W (1983), A CMOS Implementation of a Systolic Multi-bit Convolver 
Chip, Proc VLSI-83, Trondheim, Norway, 1983, pp 227-235.
Evans R A (1985), A Self Organising, Fault Tolerant, S-Dimensional Array, 
Proc. VLSI-85, Tokyo, Japan, ed E Hoebst, North Holland 1986, pp 
239-248.
Evans R A, McCanny J V and Wood K W (1985), Wafer Scale Integration 
Based on Self-Organisation, Proc. Workshop on Wader Scale Integra-
Bibliography 202
tion, Southampton, ed C Jesshope and W Moore, Adam Hilger, 1986,
pp 101-112.
Evans R A (1986), Wafer Scale Integration of Systolic and other Two-dimen­
sional Processor Arrays, proc IFIP Workshop on Wafer Scale Integra­
tion, Grenoble, March 1986.
Evans R A and McWhirter J G (1987), A Hierarchical Testing Strategy for 
Self-organising Fault-tolerant Arrays, in ‘Systolic Arrays’ , eds Moore, 
McCabe and Urquhart, Adam Hilger (Bristol) UK, pp 229-238.
Evans R A (1989a), Wafer Scale Integration in Design and Test Techniques 
for VLSI and WSI Circuits, R E Massara ed., Peter Peregrinus Ltd, 
London, to be published 1989.
Evans R A (1989b), Self-Organising Arrays for Wafer Scale Integration in 
Design and Test Techniques for VLSI and WSI Circuits, R E Massara 
ed., Peter Peregrinus Ltd, London, to be published 1989.
Ferris-Prabhu A V, Smith L D, Bonges H A and Paulsen J K (1987), Radial 
Yield Variations in Semiconductor Wafers, IEEE Circuits and Devices 
Magazine, Vol 3, No 2, March 1987, pp 42-47.
Fitzgerald B F and Thoma E P (1980), Circuit Implementation of Fusible Re­
dundant Addresses in RAMs for Productivity Enhancement, IBM Jor- 
rnal or Research and Development, Vol 24, No 3, pp 291-298.
Flynn M J (1972), Some Computer Organisations and their Effectiveness, 
IEEE Trans on Computers, C-21, pp 948-960.
Franzon P (1986), Interconnect Strategies for Fault Tolerant SD VLSI Ar­
rays, proc ICCD, 1986.
Frohwerk R A (1977), Signature Analysis - a new Digital Field Service Meth­
od, Hewlatt Packard Journal, May 1977, pp 2-8.
Jibliography 203
Gaverick S L and Pierce E A (1983), A Single Wafer 16-point 16 MHz FFT 
Processor, Proc 1983 Custom Integrated Circuits Conference, IEEE, pp 
244-248.
Gr inberg J et al (1984), SD Computing Structures for High Throughput In­
formation Processing, in VLSI Signal Processing, P R Cappello ed,
1984, pp 2-14.
Gupta A and Lathrop J W (1972), Yield Analysis of Large Integrated Cir­
cuit Chips, IEEE J. Solid-State Circuits, vol SC-7, pp 389-395, Oct 
1982.
Hargrave P J (1986) A VHPIC Node Processor, Proc 1986 Military Microwave 
Conference, pp 405-410.
Hagge J K, (1988), Ultra-reliable Packaging for Silicon-on-silicon WSI, 38th 
Electronic Components Conference, May 1988.
Hedlund K S and Snyder L (1982), Wafer Scale Integration of Configurable 
Highly Parallel (CHiP) Processor, Proc 1982 Int Conference on Parallel 
Processing, IEEE, pp 262-264.
Hedlund K (1985), WASP - A WAfer Scale Systolic Processor, proc ICCD
1985, pp 665-671.
Hillis W D (1986), The Connection Machine, MIT Press, Cambridge, Mas­
sachusetts, ISBN 0-262-08157-1, 1986.
Hockney R W and Jesshope C R (1981), Parallel Computers, Adam Hilger 
Ltd, Bristol, UK.
Holland J H (1959), A Universal Computer Capable of Executing and Arbi­
trary Number of Sub-programs Simultaneously, Proc 16th East Joint 
Computer Conference, pp 108-113.
Bibliograph, 204
Hsia Y, Chang G C C and Erwin F D (1979), Adaptive Wafer Scale Integra­
tion, Proc 1 Ith Conference on Solid State Devices, Tokyo, Jap J of 
Applied Physics, Vol 19, pp 193-202, Supplement 19-1.
Hu S M (1979), Some Considerations in the Formulation of IC  Yield Statis­
tics, Solid State Electronics, Vol 22, pp 205-211, February 1979.
Inmos Ltd (1984), Inmos Preliminary data sheet IMS TjS4 Transputer, 1984.
Katevenis M G H and Blatt M G (1985) Switch Design for Soft-Configurable 
WSI Systems, Proc 1985 VLSI Conference.
Kent F P (1983), Yield Enhancement of Integrated Circuits by Fault Toler­
ant Design, BSc Project Report, University of Southampton.
Ketchen M B (1985), Point Defect Yield Model for Wafer Scale Integration, 
IEEE Circuits and Devices Magazine, July 1985, pp 24-34.
Kobayashi A, Matsue S and Shibe H, (1968), A Flip-flop circuit with FLT 
(Fault-location-technique) capability, (in Japanese), Proc 1968 IECEO 
Conference, p 962.
Kuck D J (1977), A Survey of Parallel Machine Organisation and Program­
ming, IEEE Trans Comput. C-17, pp 758-70.
Kung H T (1980), Algorithms for VLSI Processor Arrays in Introduction to 
VLSI Systems by C Mead and L Conway Addison-Wesley 1980, p 271.
Kung S Y and Gal-Ezer R J (1982), Synchronous vs. Asynchronous Com­
putation in VLSI Array Processors, Proc SPIE Conference, 1982, Ar­
lington, VA.
Landman B S and Russo R L (1971), On Pin versus Block Relationship for 
Partition of Logic Graphs, IEEE Transactions on Computers, Vol C-20, 
No 12, December 1971, pp 1469-1479
Bibliography 205
Lathrop J W et al (1967), A Discretionary Wiring System as the Interface 
Between Desigh .Automation and Semiconductor Manufacture, Proc 
IEEE, Vol 55, 1967, pp 1988-1997.
Lawson T R Jr (1966), A Prediction of the Photoresist Influence on Inte­
grated Circuit i'ield, SCP and Solid State Technology, July 1966, pp 
22-25.
MacWilliams F J and Sloane N J (1977), Theory of Error Correcting Codes, 
Elsevier, North Holland, 1977.
Mangir T E (1984), Sources of Failures and Yield Improvement for VLSI: 
Part /, Proc IEEE, Vol 72, No 2, June 1984, pp 690-708.
Manning F B (1977), An Approach to Highly Integrated Computer-maintain­
ed Cellular Arrays, IEEE Trans on Computers, C26, pp 536-552.
McCanny J V and McWhirter J G, (1982), Implementation of Signal Pro­
cessing Functions using 1-bit Systolic Arrays, Electronics Letters, Vol 
18, 1982, pp 241-243.
McCanny J V and McWhirter J G (1983), Yield Enhancement of Bit-level 
Systolic Array Chips using Fault Tolerant Techniques, Electronics Let­
ters, Vol 19, No 14, July 1983, pp 525-527.
Meindl J D (1985), Interconnection Limits on Ultra Large Scale Integration, 
Proc VLSI-85, Tokyo, North Holland, ppl3-19.
Meindl J D, 1987, Chips for Advanced Computing, Scientific American, Oc­
tober 1987, pp 54-62.
Moore G E (1970), What Level of LSI is best for you?. Electronics, Vol 43, 
pp 126-130, February 16 1970.
Moore W R and Day M J (1984), Yield Enhancement of a large Systolic Ar­
ray Chip, Microelectronics and Reliability, Vol 25, No 2, 1985, pp 291- 
294.
Bibliography 206
Moore W R and Mahat R (1985), Fault Tolerant Communications for Wafer 
Scale Integration of a Processor Array, Microelectronics and Reliability, 
Voi 25, No 2, 1985, pp 291-294.
Moore W R, McCabe A P H and Bawa V (1986), Fault Tolerance in a Large 
Bit-level Systolic Array, in Wafer Scale Integration, C R Jesshope and 
W R Moore eds, Bristol UK, Adam Hilger 1986, pp 259-272.
Moore W R (1986), A Review of Fault Tolerant Techniques for the Enhance­
ment of Integrated circuit Yield, IEEE Computer, Voi 745, No 5, pp 
684-698.
Morison J D, Peeling N E and Thorp T L (1982), ELLA: A Hardware De­
scription Language, Proc IEEE ICCC’82, September 1982, pp 604-607.
Muroga S (1982), VLSI Systems Design, Pubi. John Wiley and Sons, pp 
264-266.
Murphy B T (1964), Cost-Size Optima of Monolithic Integrated Circuits, Pro­
ceedings IEEE, Voi 52, December 1964, pp 1537-1545.
Murphy B T  (1971), Comments on: A New Look at Yield of Integrated Cir­
cuits, Proc IEEE, Voi 59, July 1971 p 1128.
Nomura H (1985), Current Status, Future Trends and Impact of VLSI, Proc 
VLSI-85, Tokyo, North Holland, p 3-11.
Paz O and Lawson T R (1977), Modification of Poisson Statistics: Modell­
ing Defects Induced by Diffusion, IEEE J. Solid State Circuits, Voi 
SC-12, October 1977, pp 540-546.
Peltzer D (1983), Wafer Scale Integration: The Limits of VLSI?, VLSI De­
sign, September 1983, pp 43-47.
Peltzer D L (1983), Wafer-Scale Integration: The Limits of VLSI?, VLSI 
Design, September 1983, pp 43-47.
Bibliography 207
Perloff D S, Wahl F E, Mallory C L and Mylroie S W (1981), Microelectron­
ic Test Chips in Integrated Circuit Manufacturing, Solid State Technol­
ogy, Sept 1981, pp 75-80.
Petritz R I (1967), Current Status of LSI Technologg, IEEE J of Solid State 
Circuits, Vol SC-2, No 4, 1967, pp 130-147.
Pitt K E G ,  (1987), WSI Packaging Problems, Proc IFIP Workshop on Wafer 
Scale Integration, Brunei University, 1987.
Posa J G (1981), What to do when the Bits go out, Electronics, July 28, pp 
117-120.
Price J E (1970), A New Look at Yield of Integrated Circuits, Proc IEEE,
Vol 58, August 1970, pp 1290-1291.
Raffel J I, Anderson A H, Chapman G H, Gaverick S L, et al (1983), A Demon­
stration of verg large area Integration using Laser restructuring, Proc 
1983 Int Symp. on Circuits and Systems, IEEE, pp 781-784.
Reddaway S F (1973), D AP - A distributed arrag processor, 1st annual sym­
posium on Computer Architecture (IEEE/ACM), Florida.
Rhodes F M (1986), Performance Characteristics of the RVLSI Technologg,
Proc IFIP Workshop on Wafer Scale Integration, Grenoble, France,
1986.
Rogers T J (1982), Redundancg in RAMs, proc Int. Solid State Circuits 
Conference, February 1982, pp 228-229.
Sack K A (1964), Evolution of the Concept of a Computer on a Slice, Proc 
IEEE, Vol 52, 1964, pp 1713-1720.
Sami M and Stefanelli R (1983), Reconfigurable Architectures for VLSI Pro­
cessing Arrags, Proc 1983 AFIPS National Computer Conference, Ana­
heim, California, pp 565-577.
bibliography 208
Sami M G and Stefanelli R (1984), Fault Tolerance oj VLSI Processor Ar­
rays - the Time-Redundant Approach, Proc Real Time Systems Symp, 
IEEE, Austin.
Shaver D C (1984), Electron-Beam Customisation, Repair and Testing of Wa­
fer Scale Circuits, Solid State Technology, February 1984, pp 135-139.
Shore J E (1973), Second Thoughts on Parallel Processing, Comput. Elec­
trical Eng, Vol 1, pp 95-109.
Siegel H J, Siegel L J, Kemmerer F C, Mueller P T Jr, Smalley H E Jr and 
Smith S D (1981), PASM: A Partitionablc SIMD/MIMD System for 
Image Processing and Pattern Recognition, IEEE Trans on Computers. 
Vol C-30, No 12, 1981, pp 934-946.
Siewiorek D P and Swarz R S (1982), The Theory and Practice of Reliable 
System Design, Digital Press, 1982, ISBN: 0-932376-13-4.
Slotnick D L, Borck W C and McReynolds R C (1962), The SOLOMON Com­
puter, AFIPS Conf proc. 22, pp 97-107.
Smith R T (1981), Using a laser beam to substitute good cells for bad. Elec­
tronics, July 28, 1981, pp 131-134.
Stapper C H, Armstrong F M and Saji K (1983), Integrated Circuit Yield Stat­
istics, Proc IEEE, Vol 71, No 4, April 1983, pp 453-470.
Stapper C H (1985), The Effects o f Wafer to Wafer Defect Density Varia­
tions on Integrated Circuit Defect and Fault Distributions, IBM J. of 
Research and Development, Vol 29, January 1985, pp 87-97.
Stapper C H (1986), On Yield, Fault Distributions and Clustering Particles, 
IBM Journal of Research and Development, Vol 30, No 3, May 1986.
Stewart J H, (1977), Future Testing of Large LSI Circuit Cards, Semicon­
ductor Test Symposium, October 1977, pp 6-15.
Bibliography 209
Trilhe J and Saucier G (1987), An European program on Wafer Seale Inte­
gration, proc VLSI-87, Vancouver, August 1987.
Turley A P and Herman D S (1974), LSI Yield Predictions based upon Test 
Pattern Results: an Application to Multi-level Metal Structures, IEEE 
Trans Parts, Hybrids, Packaging, Vol PHP-10, Dec 1974, pp 230-234.
Unger S H (1958), A computer oriented towards spatial problems, Proc Inst 
Radio Eng (USA), 46, pp 1744-50.
Urquhart R B and Wood D, (1984), Systolic Matrix and Vector Multiplica­
tion Methods for Signal Processing, Proc IEE part F, Vol 131, 1984, 
pp 623-631.
Wakerley J (1978), Error Correcting Codes, Self Checking Circuits and Ap­
plications, Elsevier, North-Holland.
Warren K D, Abdelrazik M B E, McKirdy R D and Lea R M, (1986), A Pow­
er Distribution Strategy for WSI, in Wafer Scale Integration, eds Moore 
and Jesshope, Adam Hilger, 1986, pp 54-61.
Warren K D and Lea R M, (1987), Electrical Design Issues for Wafer Scale 
Integration, proc IFIP Workshop on Wafer Scale Integration, Brunei 
University, Sept 1987.
Williams M J Y and Angell J B (1973), Enhancing testability of Large Scale 
Integrated Circuits via test points and additional Logic, IEEE Transac­
tions on Computers 1973, pp 46-60.
Yanagawa T (1972), Yield Degradation of Integrated Circuits due to Spot 
Defectfi, IEEE Trans, on Electronic Devices, Vol ED-19, No 2, Feb 
1972, pp 190-197.
von Neumann J (1966), A System of 29 states with a general transition rule, 
in Theory of Self-reproducing Automata, ed A W Burks (Urbana, Illi­
nois: University of Illinois press) pp 305-17. (First published in 1952).
A p p en d ix  A
W I N N E R  Perform ance  
Sim ulation Program
This appendix contains a listing of the program used for simulating the per­
formance of the WINNER 3-neighbour algorithm. It is written in ALGOL- 
68RS and was run on a DEC-VAX 8600. Comments have been included in 
the text to aid in its understanding.
A .l  Program Overview
The program allows statistical data on the WINNER self-configuring algo­
rithm to be gathered for different sizes of target array, different values of 
processing element yield, and for a variety of values of processing element 
overhead.
The user specifies a target array size, range of overhead, range of processor 
yield and the number of samples, N, to be evaluated and averaged for each 
position in the table of results. Then, for each combination of overhead and 
processor yield the program configures N arrays and records the number of 
arrays from which a functional array at least equal to the target array can 
be configured. Finally, these values are then plotted as a table.
In order to minimise the CPU time required, the main program has two 
distinct parts. The first part, represented by the procedure ‘ find band’, 
searches for the band of results which separates the 0% values from the 100%
210
APPENDIX A PERFORMANCE SIMULATION PROGRAM 211
values in the table of results. Once found, a recursive procedure, ‘evaluate 
band', is called which evaluates the band region only. This approach reduces 
the CPU time by about a factor of about 4 compared with that required to 
evaluate every point in the results table. The configuration of each array 
using the 3-neighbour WINNER algorithm is carried out by the procedure 
called ‘ findrows’ .
A .2 Suitability for Other Algorithms
Although this particular program is only suitable for simulating the 3-neighbour 
WINNER algorithm, its general structure has been used to simulate other 
algorithms to produce the results given in chapter 6. This was done by alter­
ing the procedure ‘ findrows’ to correspond to the required algorithm, with 
the number of configured rows being recorded by ‘ functrow’ .
APPENDIX A: PERFORMANCE SIMULATION PROGRAM 212
A .3 Program Listing
PROGRAM statistical configuration
CO Simulation of tha WINNER algorithm with thraa-naighbour 
connactivity.
VARIABLES
physrow
functrow
rows
cols
randstart - 
chipyiald - 
call 
minovhd 
m axovhd 
minyiald 
maxyield - 
yialdstap - 
iterations- 
CO
row-number of physical rows on tha array, 
row-number of functional rows on the array, 
target for number of rows to be configured, 
target for number of columns to be configured, 
parameter used by ’nextrandom’. 
percentage yield of cells in array, 
array of elements to be configured, 
minimum value of the overhead range (in rows),
maximum value of the overhead range (in rows).
minimum value of the yield range (in X) .
maximum value of the yield range (in X) .
incremental yield between minyield and maxyield. 
number of samples for each result in the table.
BEGIN
INT physrow, functrow. rows, cols, randstart;
INT minovhd. maxovhd. minyield. maxyield. yieldstep, iterations; 
INT ovhdrange, yieldrange, ovhdelem. yieldelem;
CHAR file;
MODE CELL - STRUCT(BOOL failure, avail. INT rowno);
< M M M M t M M M M t M *  Read in data >
print(("Enter target array in rows and columns".newline)) ; 
read((rows. cols));
print(("Enter min and max overhead in rows".newline)); 
read((minovhd. maxovhd));
print(("Enter yield range and step (X)" .newline)); 
read((minyield. maxyield. yieldstep)); 
print(("Enter number of iterations required".newline)); 
read(iterations);
print(("Output to file? <y or n>".newline)); 
read(file);
APPENDIX A: PERFORMANCE SIMULATION PROGRAM 2 13
{ ************ Set up output file and channel ************ ) 
FILE arrayout;
REF FILE currant - IF file - "y"
THEN INT eat;
< outputfila is logical name >
ast:-establish(arrayout, "outputfila" .
standoutchannel.1.1000.150) ;
IF est/*0
THEN fault(ast."Fault in establish")
PI;
arrayout 
ELSE standout 
FI;
{  * * * * * * * * * * * * *  S e t  up  d a ta  d e p e n d e n t  a r r a y s  * * * * * * * * * * * * * *  >
ovhdrange :■ maxovhd-minovhd*1;
yialdrange :■ (maxyield-minyield)%yieldstep*l;
< Sat up array to hold results as they are generated > 
[ovhdrange.yieldrange]REAL bin;
{ Set up array to record which points in result table have }
< been evaluated. >
[ovhdrange.yieldrange]BOOL been;
clear(bin); 
clear(been);
< ********* Print out initial data to output file ********* )
putì(current. 
putì(current. 
putf(current. 
putì(current.
($ "Target array size ■ " 2zd," rows by"2zd" 
columns."21 $. rows.cols));
($ "Overhead Range * "2zd," rows to "2zd" 
rows. "21 $. minovhd.maxovhd));
($ "Yield Range * "2zd," to "2zd," %, Step - "2zd" 
*¿"21 $, minyield .maxyield . yieldstep) ) ;
($ "Number of Iterations - 3zd 21 $. iterations));
{ ********** Start of procedure declarations *********** >
PROC setfaults - (REF[.]CELL cell. INT ovhd, chipyield)VOID:
< Sets up random fault distribution for array to be configured >
APPENDIX A: PERFORMANCE SIMULATION PROGRAM 214
BEGIN
INT goodrow. goodcol. goodcella, goodposition;
INT actualrowa:-rows*ovhd;
FORALL cc IN call
DO FORALL c IN cc DO falluraOFc:-TRUE OD OD; 
goodcalla ENTIER(actualrowa*cola*chipyield/100); 
WHILE goodcalla/-0 
DO goodpoaition:-
ENTIER(naxtrandom(randatart)*actualrows*cols*1); 
goodcol:-(goodpoaitlonMODcola);
IF goodcol-0 THEN goodcol:-cola FI;
goodrow:-ENTIER((goodpoaltion-goodcol)/cola)* 1 ;
IF NOT failuraOFcall[goodrow.goodcol]
THEN SKIP
ELSE failuraOFcall[goodrow,goodcol]:-FALSE; 
goodcalla MINUSAB 1 
FI 
OD
END;
PROC aatadgaa - (REF[.]CELL call. INT ovhd)V0ID:
< Sets up edge conditiona in array to be configured >
BEGIN
INT actualrowa:-rowa-»ovhd ;
FOR i FROM 0 TO actualrowa-» 1 
DO
FOR J FROM 0 TO cola*l 
DO availOFcell [i , J]:•
IF J-0 OREL i-0 OREL i-actualrowa-»l 
THEN FALSE 
ELSE TRUE 
FI 
OD 
OD
END;
PROC updateavail - (REF [.]CELL call. INT ovhd)V0ID:
{ Updatea availability aignala aftar aach functional >
< row ia aatabliahad. >
BEGIN
INT actualrowa :-rowa-»ovhd ;
FOR j FROM cola BY -1 TO 1 
DO FOR i TO actualrowa
DO availOFcall[i.J]:-
availOFcell [i,j] ANDTH 
(availOFcall[i+l.j+1] OREL 
availOFcall[i ,J*1] OREL 
availOFcell[i-l,j+1]) ANDTH NOT
APPENDIX A: PERFORMANCE SIMULATION PROGRAM 215
failuraOFcall[i,J]
OD
OD
END;
PROC findrows - (REF[.]CELL call. INT ovhd)V0ID:
< Finds as many functional rows as possibls >
BEGIN
INT actualrows:-rows+ovhd;
REF CELL ptr :• NIL;
< clsar(rownoOFcall); >
FORALL cc IN call
DO FORALL c IN cc DO rowno0Fc:-0 OD OD; 
functrow:-0;
FOR i TO actualrows
DO updatsavail(csll.ovhd);
IF avallOFcall(i,1]
THEN physrow:-i;
functrow PLUSAB 1; 
rownoOFcsll[i.1]:-functrow; 
availOFcsll[i.1]:-FALSE;
FOR rhcsll FROM 2 TO cols
DO IF ptr csll[physrow-1.rhcsll]; 
availOFptr
THEN physrow MINUSAB 1
ELIF ptr csll[physrow.rhcsll];
availOFptr 
THEN SKIP
ELSE ptr csll[physrow*1.rhcsll];
physrow PLUSAB 1 
FI;
availOFptr FALSE;
rownoOFptr :- functrow 
OD 
FI 
OD
END;
PROC printarray - (REF[.]CELL csll. INT ovhd. chipyisld)VOID:
< Prints configurad array >
BEGIN
INT actualrows:-rows*ovhd;
putf(currant. ($"Chip Yisld - " 3zd.d " Psrcsnt"
21$.chipyisld)) ;
putf(currant. ($"Numbar of Functional Rows found ■
"3zd 21$.functrow)); 
putf(currant. $ n(cols)(aq)l $);
FOR i TO actualrows
APPENDIX A: PERFORMANCE SIMULATION PROGRAM 216
DO FOR j TO cols 
DO puti(currant.
IF f a i lu reO F ce l l [ i , j ] THEM "x"
ELIF rownoOFcell[i.J]*0 THEN
ELSE whole(rownoOFcell[i.J] MOD 10.0)
F I)
OD
OD
END;
PROC configura array ■
(REF t.]CELL call. INT ovhd. chipyiald)VOID ;
< Sets up and configuras a aingla complata array >
BEGIN
satadgas(call.ovhd);
satfaults(call.ovhd.chipyiald);
findrowa(call.ovhd)
END;
PROC avaluata point ■ (INT ovhdelam, yieldelem)VOID :
< Evaluates ona point in tha tabla of rasulta >
BEGIN
INT ovhd:»ovhdelem«minovhd-l;
[0 : rows+ovhd •» 1. 0 : cola« 1] CELL call;
INT chipyiald:*minyiald«yialdstap*(yieldelem-1);
TO iterations
DO configure array(cell.ovhd.chipyiald) ;
IF functrow>»rows
THEN bin[ovhdelem.yieldelem] PLUSAB 1 
FI 
OD;
bean[ovhdelem.yieldelem];-TRUE
END:
PROC find band - VOID:
{ Searches for band of non-extreme values in table of results > 
BEGIN
WHILE evaluate point(ovhdelem.yieldelem) ;
IF bintovhdelem.yieldelem]«0 
THEN ovhdelem PLUSAB 1;
TRUE
ELIF bintovhdelem.yieldelem]»iterations 
THEN yieldelem MINUSAB 1;
TRUE
ELSE FALSE 
FI
DO SKIP OD 
END;
APPENDIX A: PERFORMANCE SIMULATION PROGRAM 217
PROC «valúate band ■ (INT ovhdalam. yieldelea)VOID :
< Evaluates all points within non-extreae band >
BEGIN
IF yisldslsB>l
THEN IF NOT been[ovhdelea.yieldelea-l] ANDTH 
bin[ovhdel«m,yield«lem]/*0 
THEN evaluate point(ovhdelea. yieldelea-1); 
evaluate band(ovhdeleB.yieldeleB-1)
FI
FI;
IF ovhdeleB>l
THEN IF NOT been[ovhdelea-1.yleldelea] ANDTH 
bin[ovhd«lea,yieldelea]/-0 
THEN evaluate point (ovhdelea-1. yields lea); 
evaluate band(ovhdelea-1.yieldelea)
FI
FI;
IF yieldelea<yieldrange
THEN IF NOT beentovhdelea.yieldelea*1] ANDTH
bin[ovhdelea.yieldelea]/«iterations 
THEN evaluate point(ovhdelea.yieldelea*1); 
evaluate band(ovhdelea.yieldslea*1)
FI
FI;
IF ovhdelea<ovhdrange
THEN IF NOT been[ovhdelea* 1.yieldelea] ANDTH
bin[ovhdelea.yieldelea]/«iterations 
THEN evaluate point(ovhdelea*1. yieldelea): 
evaluate band(ovhdelea*1.yieldelea)
FI
FI
END;
PROC printresults ■ VOID: 
i Prints table of results >
BEGIN
put(current. ("SIMULATION OF WINNER WITH CONNECTIONS 
TO 3 NEIGHBOURS", 
newline, newline));
put(current, ("PERCENTAGE ARRAY YIELD AS A", newline)); 
put(current. ("FUNCTION OF OVERHEAD AND PROCESSOR YIELD", 
newline.newline));
put (current, (' OVERHEAD (ROWS)", newline.newline));
APPENDIX A: PERFORMANCE SIMULATION PROGRAM  218
putf(current. $2zdx "I". n(yialdranga)(x2z.z) 1$);
FOR ovhd FROM maxovhd BY -1 TO ainovhd 
DO putf(currant, ovhd) ;
FOR yialdalam TO yialdranga 
DO putf(currant.
IF ENTIER bin[ovhd-minovhd*l .yialdelem] 
MOD 1 0 0 - 0  
THEN 0
ELSE bin[ovhd-minovhd*l.yialdalam]
r n
OD
OD;
putf(currant. $4x n(5*yialdranga*l)"-" 1 6x 
n(yieldrange)(2zd2x) 1 4x 
n(yialdranga*5%2-9)x 
"PROCESSOR YIELD (%)"•);
FOR yiald FROM minyiald BY yialdatap TO maxyiald 
DO putf(currant. yiald) OD
END;
{ ****************** Main Program ****************** )
< Sat starting point for ovhdalam and yialdranga >
ovhdelem 1;
yialdalam yialdranga;
find band;
avaluata band(ovhdalam. yialdalam) ;
< Convart rasults to parcantagas )
FOR m TO ovhdrange
DO FOR n TO yialdranga
DO bin[m,n]:-bin[m.n]/itarationa*100 OD 
OD;
printrasults
END
FINISH
A p p en d ix  B
E L L A  Sim ulation P rogram
This file contains the ELLA description of the hardware required to im­
plement the WINNER algorithm. The array of WINNER cells has been 
described in parameterised form so that arrays of differing sizes can be sim­
ulated relatively easily. A simple processing element is used so that program 
complexity is kept to a minimum. The function of each processor in both 
the horizontal and vertical directions is to add one to an incoming digit and 
pass on the results to neighbouring cells.
B .l Program Listing
hit ■  - e.
n ■ 6.
# m and n determine the size of the array #
* as defined by the macro 'array’ #
TYPE bool - MEW (tlflx).
data - NEW da/(0. .200) . 
limint - NEW i/(l..n).
COM
The gates which carry out the boolean functions AND, NOT 
and OR are defined below. Gates MUX.AND and MUX.OR are based 
on their boolean equivalents, but act as multiplexers such 
that all bar one of the inputs are controls and determine 
the output (which is an ELLA integer type). TW0_IP_MUX 
defines a multiplexer with two inputs and a control line, 
and PROCESSOR defines a mathematical function (in this case, 
to add one to an ELLA integer).
MOC
219
APPENDIX B: ELLA SIMULATION PROGRAM
• ..........AMD.......... •
FN AND - (bool: inputl input2) -> bool:
CASE (inputl. input2)
OF (t. t): t.
(f. bool)I(bool, f): f 
ELSE x 
ESAC.
• ......... MOT.......... •
FN NOT ■ (bool: input) -> bool:
CASE input 
OF f: t.
t: 1 
ELSE x 
ESAC.
• ........THREE.IP.AND.......... •
FN THREE.IP.AND - ( [ 3 ]bool: ip) -> bool:
CASE ip OF
(t. t. t): t.
(f. bool, bool) I (bool, f. bool) I (bool, bool, f): f 
ELSE x 
ESAC.
• ..........THREE. IP.OR..........•
FN THREE.IP.OR - ([3]bool: ip) -> bool:
CASE ip OF
(f. f. f): f.
(t. bool, bool) I (bool. t. bool) I (bool. bool, t): t 
ELSE x 
ESAC.
• ......... FOUR. IP. AND..........i
FN FOUR.IP.AND - ([4]bool: ip) -> bool:
CASE ip
OF (f. bool. bool, bool) I (bool. f. bool, bool)I 
(bool. bool. f. bool) I (bool. bool. bool. f): #.
(t. t. t. t) : t 
ELSE x 
ESAC.
•..........MUX .AMD..........•
FM MUX.AND - (bool: ipl. d«t«: ip2) -> d m t m :  
CASE ipl
APPENDIX B: ELLA SIMULATION PROGRAM 221
OF t: lp2. 
f: da/O 
ELSE ?data 
ESAC.
• ........MUX _ OR.......... i
FN MUX.OR - ([3]data: ip) -> data:
CASE (ip[l]>da/0. ip[2]>da/0. ip[3]>da/0)
OF ( t .  f. f): ip[l],
(f. t .  f): ip[2],
(f. f. t): ip[3]
ELSE ?data 
ESAC.
• ........TWO.IP.MUX.......... •
FN TWO.IP.MUX ■ (data: ipl ip2, bool: C t r l )  -> data: 
CASE Ctrl 
OF t: ipl.
f: lp2 
ESAC.
• ........PROCESSOR.......... •
• The boolean input 'passfail’ determines whether •
• or not the call is working #
FN PROCESSOR - (data: ipl ip2. bool: ip3)
-> (data. data, bool):
BECIN
LET outl - ipl ♦ da/1.
LET out2 ■ ip2 ♦ da/1.
LET passfail ■ ip3.
OUTPUT (outl, out2, paasfail)
END.
• ........CELL.......... •
COM
This function connects the processor to the control 
logic (which comprises those gates defined).such that 
each cell is able to communicate with its neighbours. 
MOC
FN CELL ■ (data: ipn ipnw ipw ipsw.
bool: reqnw reqw reqsw
availse avails availne pf) ->
([4]data.
# ops opse ope opne #
[6]bool
APPENDIX B: ELLA SIMULATION PROGRAM 222
• availnw availw avalla» raqaa raqe raqne #):
BEGIN
MAKE PROCESSOR: procaaaor.
TWO.IP.NUX: two.ip.mux,
[6]AND: and.
[ 3 ]  MUX.AND: m u x .an d .
THREE.IP.AND: thraa.ip.and.
FOUR.IP.AND: four.ip.and.
[2]THREE.IP.OR: thraa.ip.or.
MUX.OR: aux.or,
[4] HOT: not.
JOIN (ipn. mux.or, pf) -> procaaaor,
( procaaaor[1] , ipn. and[5]) -> two.ip.mux. 
(mux.and[1], mux.and[2]. mux_and[3]) -> mux.or, 
raqnw -> notti],
(notti], andt3]) -> and[l] , 
reqw -> not[2],
(not[2 ], andtl]) -> and[2] .
(raqnw. ipnw) -> mux.andtl] .
(raqw, ipw) -> mux.and[2] .
(reqaw, ipaw) -> mux.and13] .
(raqnw. raqw. raqaw) -> thraa.ip.or[1].
(procaaaor[3]. thraa.ip.or[1]) -> and[4]. 
(procaaaor[3]. thraa.ip.ort2]) -> and[3].
(and[3]. thraa.ip.or11]) -> and[ 5 ] ,
(availna. availa, availaa) -> thraa.ip.or[2]. 
availna -> not[3],
(not[3], andt4]. availa) -> thraa.ip.and. 
availa -> not[4],
(not[4]. not[3], and[4] . availaa) -> four.ip.and, 
(and[4], availna) -> and [6].
OUTPUT ((two.ip.mux, procaaaor t2] . procaaaort2], 
procaaaor[2]). (and(3] . andtl] . and[2], 
four.ip.and. thraa.ip.and. andtfi]))
END.
•..........MACRO : ARRAY.......... •
COM
Tha macro ARRAY dafinaa an array of calla with m
APPENDIX B: ELLA SIMULATION PROGRAM 223
c o lu a n s  and  n row a « i t h  in p u t «  f o r  th o a e  c « l l s  on  th «  
e d g e s  o f  t h «  « r r a y .  FN ARRAY.OF.CELLS c r « « t « «  a  x  n 
c a l i «  w h ich  a re  t h « n  c o n n e c te d  t o g e t h e r  a y a t e m a t i c a l l y .
••eh  d i r e c t i o n  b e in g  c o n a id e r e d  a e p a r a t e ly .
HOC
MAC ARRAY{INT m n> ■ ( [m ]d a t a : i p n .  [m * n -1 ] d a t a  : ipnw ip a w , 
[n ]d a t a :  ip w .
[ m * n - l ] b o o l : reqnw r e q s w  a v a i la e  a v a i I n e .  
[ n ] b o o l :  reqw  a v a i l e .  [m ] [ n ] b o o l : p f ) ->
([a][n ] d a t a . [ a ]  [ n ] d a t a ) :
• opa op e  •
BEGIN
FN ARRAY_OF_CELLS - ([m][n][4]data :ipl. [m] [n] [7]bool:ip2) 
-> [a] [n] ( [4]data, [6]bool) :
[INT i-1. .a][INT J-l..n]
CELL (ipl[i][J][1]. iplti] [J][2],
ipl[i][j][3]. iplti] [J] [4]. 
ip2[i][j][1]. ip2[i] [J] [2]. 
ip2[i][J][3], iP2 [i] [J] [4]. 
ip2[i] [j] [6] . ip2ti)tj][«3. 
ip2[i][j][7]).
COM
In connecting th« c«lla together, each cell ie 
conaidered aeparately for each direction. If the 
cell ia on the edge of the array, input connection«
•re made, otherwise the cell is connected to adjacent ones. 
MOC
MAKE ARRAY.OF.CELLS: array.
JOIN (tINT i-1..a] [INT 1-1..n]
(IF j-l THEN ipnti]
ELSE arrayti][j-l][1][1] FI. • ipn • 
IF J-l THEN ipnw[i]
ELSE IF i-1 THEN ipnwtj]
ELSE array[i-l][j-l][l][2]
FI
FI. • ipnw •
IF i-1 THEN ipw[j]
ELSE array[i-l][j][1][3] FI. • ipw • 
IF j-n THEN ipaw[i]
ELSE IF i-1 THEN ipsw[j]
ELSE array[i-l][J*l][1][4]
APPENDIX B: ELLA SIMULATION PROGRAM 224
FI
) .
[INT i-1. ■Him J-l. .n]
(IF J-l THEN reqnw [i]
ELSE IF i-1 THEN reqnwlj]
ELSE array[i-l][J-l)[2][4]
FI
FI. • reqnw •
IF i-1 THEN reqw[j]
ELSE array[i-l][J][2][6] FI. i reqw #
IF j-n THEN r«q«w[J]
ELSE IF i-1 THEN r«qsw[J]
ELSE array[i-l][J»l][2][6]
FI
FI. • reqsw *
IF j-n THEN avails«[i]
ELSE IF i-a THEN availaa[J]
ELSE array[i+l]CJ♦11[2]Cl]
FI
FI. • avails« #
IF i-m THEN avail«[J]
ELSE array[i*l][j][2)[2] FI. i avail« •
IF J-l THEN availn«[i]
ELSE IF i - B  THEN availn«[J]
ELSE array[i+l][J-l][2][3]
FI
FI. • availn« «
pf [J]til)) -> array.
LET colop - [INT i-1..a](INT J-l. .njarray[i][j)[l](1]. 
rowop « [INT j-l..n][INT i-1..ajarray[i][j][1][3] .
COM
Th« output foraat is to output th« signals froa 
•ach c«ll in th« «astarly and southerly directions, 
'colop' stores th« southerly outputs in coluans, and 
’rowop’ stores th« easterly outputs in rows.
MOC
OUTPUT (colop. rowop)
END.
•..........COUNTER..........•
COM
This function basically defines a ring counter which
APPENDIX B: ELLA SIMULATION PROGRAM 225
•tart* at 1 and counts on# more for «vary time cycle. 
Whan 12 ia reached, the counter loops back to i on 
the following time cycle.
MOC
FN COUNTER - (bool) -> data:
BEGIN SEQ
STATE VAR count INIT da/O;
FN INC - (data: lp) -> data:
ARITH IF lp-(2*n«l) THEN 1
ELSE (ip ♦ 1) FI;
LET out * count; 
count :■ INC(out);
OUTPUT (out)
END.
#..........LIMIT.......... #
4This function ensures indexing in ‘ROWMUX’ 4 
4 does not become 0 4
FN LIMIT - (data: ip) -> limint: ARITH ip.
4---------- < ----------4
FN < - (data: a b) -> bool: ARITH IF a<b THEN 1
ELSE 2 FI.
FN - » (data: a b) -> data: ARITH a-b.
4.......... - .......... 4
FN ■ • (data: a b) -> bool: ARITH IF a-b THEN 1
ELSE 2 FI.
4..........MAC: ROWMUX.......... 4
COM
This macro multiplexes the output rows of data 
from ‘ARRAY’. Successive rows are output each 
time unit by way of the increasing count (select).
MOC
MAC ROWMUX<INT m n) - ([m][n]data: ipl. (m](n)data: ip2) 
-> [m]data:
BEGIN
LET select - COUNTER(t).
OUTPUT
APPENulX B: ELLA SIMULATION PROGRAM 226
CASE eeleet-da/1
OF t: ([INT j-1. m]da/lll)
ELSE CASE select<da/8
OF t: ([INT J-1..m]ip2[[LIMIT(select-da/l))][J]).
f : ([INT J-1. m]ipl[J][[LIMIT(seleet-da/7)]]) 
ESAC
ESAC
END.
i......... CONFIGURE. ARRAY.......... •
COM
This function configures the array of cells. To change 
the dimensions of the array, different values are 
given to m and n (which are defined as INTegers at the 
start of the program).
MOC
FN CONFIGARRAY - ([m]data:ipl. [n]data:ip2. [m] [njbool:ip3)
-> ([mldata):
BEGIN
LET diagonal-(ipl.[m*n-l]da/l.[m*n-1]da/1.ip2.[m+n-l]f.
[m*n-l]f.[m*n-l]f.[m*n-l]f.[n]t,[n]t,ip3). 
MAKE ARRAY<m. n): array.
ROWMUX{m. n>: rowmux.
JOIN diagonal -> array.
array -> rowmux.
OUTPUT rowmux
END.
Appendix C
W IN N E R  Demonstrator Test 
Program
C .l Program Overview
This program, written in BASIC, and run on a BBC computer, controls the 
WINNER demonstrator array. All inputs to the array are generated by the 
program, and all outputs from the array are monitored. The main purpose 
of the program is to provide the test patterns for the scan path test of the 
control circuitry. In addition, the program allows selection between several 
modes o f operation, such as normal configuration, test, and control circuit 
fault masking. The program itself is controlled by a menu, from which the 
user can select the required mode of operation.
The program make full use of procedures which makes it relatively easy to 
understand. However, comments have been incorporated where appropriate.
C.2 Program Listing
6 REM **• MAIN PROGRAM • ••
10 CLS 
20 PROCSETUP 
30 PROCREFGEN 
40 PROCMENU
60 IF KEY-1 THEN PROCFILL : PROCCONFIG 
60 IF KEY-2 THEN 30
70 IF KEY-3 THEN PROCFINDFAULTS : PROCMASKINIT :
227
APPENDIX C: DEMONSTRA. OR TEST PROGRAM 228
PROCCHECKTEST : PROCMASKGEN : PROCLOADMASK : PROCCONFIG 
76
76 REM • •• PROCEDURE DECLARATIONS •••
77
80 DEF PROCSETUP
86 REM ••• SETS UP THE NECESSARY OUTPUT PORTS AND ARRAYS ••• 
90 A-0:B-64:C-128:D-192 
100 7RFE62-195 : REM DDRB SET 
110 7RFE66-RFF : REM DDRA OUT 
120 ?*FE6C-ltOA : REM PULSE OUT 
130 7RFE61-0 : REM CLEAR LSB
140 7WE61-64 : REM CLEAR
160 7RFE61-128 :REM CLEAR 
180 7RFE61-192 : REM CLEAR MSB 
170 DIM ARRAY(4.48)
180 DIM ARRAY2(4.48)
190 DIM RESULTS(4.48)
200 DIM MASK(48)
210 DIM FMASK(48)
216 VCLK-7RFE60
216 ?*FE60-(VCLK OR 3)
220 ENDPROC 
230
240 DEF PROCREFGEN
245 REM •** APPLIES TEST TO FAULT-FREE ARRAY AND •••
246 REM *•* STORES OUTPUTS FOR USE AS REFERENCES •**
247 REM ••• FOR COMPARISON WITH FAULTY OUTPUTS ***
250 FOR Q-l TO 4
260 FOR Ql-1 TO 48 
270 ARRAY(Q.Q1)*0 
280 RESULTS(Q.Ql)-O 
290 NEXTQ1 
300 NEXT Q 
310 CLS
320 PRINT "GENERATING REFERENCE ARRAY"
330 PRINT:PRINT
340 PROCAPPLYTEST
350 FOR 1-1 TO 4
360 FOR J-l TO 48
370 ARRAY(I.J)-RESULTS(I.J)
380 NEXT J 
390 NEXT I 
400 ENDPROC 
410
420 DEF PROCFINDFAULTS
426 REM APPLIES TEST PATTERN TO ARRAY TO •••
426 REM FIND CTRL CCT FAULTS 
430 FOR Q-l TO 4
APPENDIX C: DEMONSTRATOR TEST t ROGRAM 229
440 FOR Ql-1 TO 46 
460 ARRAY2(Q.Ql)-0 
460 RESULTS(Q.Ql)-O 
470 NEXTQ1 
480 NEXT Q 
400 CLS
600 PRINT "GENERATING FAULT ARRAY"
610 PRINT:PRINT
620 PROCAPPLYTEST
630 FOR 1-1 TO 4
640 FOR J-l TO 48
660 ARRAY2(I ,J)"RESULTS (I ,J)
660 NEXT J 
670 NEXT I 
680 ENDPROC 
600
600 DEF PROOFILL
601 REM •** FILL SCAN PATHS WITH l'S TO ALLOW •••
602 rem ••• NORMAL CONFIGURATION 
606 PROCAOFF
606 PROCBOFF
620 PROCSINSON
630 FOR F-l TO 100
640 PROCCLK
660 NEXT F
660 PROCAON
665 PROCBON
670 ENDPROC
680
600 DEF PROCAPPLYTEST
700 REM **• APPLY TEST PATTERN TO THE ARRAY **•
710 PRINT " PLEASE WAIT"
720 PRINT :PRINT
730 DATA 0.102.36.3
740 PROCAOFF
760 PROCBOFF
760 RESTORE
770 FOR PATNUM-0 TO 3
780 READ CNT
700 PRINT "APPLYING TEST PATTERN NUMBER ";PATNUM*1
800 SLICENUM-1
810 FOR ROW-1 TO 6
820 BIT - 1
830 FOR BITPOS-1 TO 8
840 IF (CNT AND BIT)-0 THEN PROCSINSOFF ELSE PROCSINSON 
860 PROCREAD 
860 PROCCLK 
870 BIT-BIT*2
APPENDIX C: DEMONSTRATOR TEST PROGRAK.
880 SLICENUN-SLICENUMM
800 NEXT BITP0S
000 NEXT ROW
010 PROCAON
016 PROCDELAY
020 PROCSLOCLK
030 PROCSLOCLK
040 PROCCLK
045 PROCDELAY
050 PROCAOFF
060 NEXT PATNUM
070 PR0CLAST48
080 ENDPROC
000
1000 DEF PROCMASKINIT
1006 REM INITIALISE FAULT MASKING ARRAY TO l'S •
1010 FOR POSN-1 TO 48
1020 MASK(POSN)-265
1030 FMASK(POSN)-265
1040 NEXT POSN
1060 ENDPROC
1060
1070 DEF PROCCHECKTEST
1076 REM ••• COMPARE RESULTS OF TEST WITH REFERENCE 
1080 FOR COL-1 TO 4 
1000 FOR ROW-1 TO 48
1100 P-ARRAY(COL,ROW) EOR ARRAY2(C0L.ROW)
1110 P-NOT P
1120 P-(P AND (MASK(40-R0W)))
1130 MASK(40-ROW)-P 
1140 NEXT ROW 
1160 NEXT COL 
1160 ENDPROC 
1170
1180 DEF PROCSINSON
1185 REM ••• SET GROUP OF SUM INPUTS TO LOGIC 1
1100 PR0CS10N
1200 PR0CS20N
1210 PR0CS30N
1220 PR0CS40N
1230 PROCSSON
1240 ENDPROC
1250
1260 DEF PROCSINSOFF
1265 REM SET GROUP OF SUM INPUTS TO LOGIC 0 •** 
1270 PR0CS10FF 
1280 PR0CS20FF 
1200 PR0CS30FF
APPENDIX C: DEMONSTRATOR TEST PROGRAM 231
1300 FR0CS40FF 
1310 PR0CS60FF 
1320 ENDPR0C 
1330
1340 DEF PR0CREAD
1341 REM *** READ INFO FROM THE ARRAY INTO COMPUTER *** 
1346 VREAD-78FE60
1360 ?*FE60-(VREAD OR 12S)
1366 VREAD-7RFE60
1360 ?*FE60-(VREAD AND 101)
1370 AB-(?AFE60 AND 60)/4
1380 VREAD-7RFE60
1386 ?*FE60-(VREAD OR 192)
1390 CD-(?RFE60 AND 32)/2
1400 ANS - AB OR CD
1410 RESULTS(PATNUM.SLICENUM)■ANS
1420 ENDPROC
1430
1440 DEF PR0CLAST48
1446 REM CLOCK OUT LAST 48 BITS OF TEST RESPONSE •••
1460 PATNUM-4
1460 FOR SLICENUM-1 TO 48
1470 PROCREAD
1480 PROCCLK
1490 NEXT SLICENUM
1600 ENDPROC
1610
1620 DEF PROCMASKGEN
1626 REM ••• GENERATE ARRAY TO MASK CTRL CCT FAULTS *•* 
1630 FOR COL-1 TO 4 
1640 LINDEX-2“(6-COL)
1650 RINDEX-2“(4-COL)
1560 INDOR-LINDEX OR RINDEX 
1570 FOR ROW-1 TO 6 
1680 OSET-((ROW-1)*8)♦1
1690 IF R0W<>1 THEN AVAILNW-(MASK(OSET-O)AND LINDEX 
ELSE AVAILNW-LINDEX 
1600 AVAILW-(MASK(0SET*6))AND LINDEX 
1610 IF R0W<>6 THEN AVAILSW-(MASK(OSET+9))AND LINDEX 
ELSE AVAILSW-LINDEX 
1620 REQNE*(MASK(OSET))AND RINDEX 
1630 REQE-(MASK(0SET*2))AND RINDEX 
1640 REQSE-(MASK(0SET+6))AND RINDEX 
1660 LAND-AVAILNW AND AVAILW AND AVAILSW 
1660 RAND-REQNE AND REQE AND REQSE 
1670 MASKVAL-LAND OR RAND 
1680 IF MASKVALo INDOR THEN PROCRMASK 
1690 NEXT ROW
APPENDIX C: DEMONSTRATOR TEST PROGRAM 232
1700
1710
1720
1730
1736
1736
1740
1760
1760
1770
1780
1700
1800
1810
1820
1830
1840
1860
1860
1870
1880
1885
1800
1000
1 0 1 0
1020
1030
1040
1050
1060
1070
1080
1OO0
2000
2010
2020
2030
2040
2046
2060
2060
2070
2080
2000
2 1 0 0
2110
NEXT COL 
ENDPROC
D E F  PROCR M A .SK
REM USED IN PROCMASKGEN. ASSIGNS MASK BIT **• 
REM TO EACH OUTPUT OF A FAULTY CELL 
PRINT-FAULT AT ■
PRINT-ROW ";ROW 
PRINT-COL " ;COL
PRINT-..............
IF R0W<>1
THEN FMASK(OSET-1)■(MASK(OSET-1))AND(NOT LINDEX) 
FMASK(0SET*5)■ (MASK(0SET*5))AND (NOT LINDEX)
IF ROW <>6
THEN FMASK(0SET*9)-(MASK(OSET+9))AND(NOT LINDEX) 
FMASK(OSET)■(MASK(OSET))AND(NOT RINDEX)
FMASK(OSET+2)-(MASK(OSET-2))AND(NOT RINDEX) 
FMASK(0SET*6) ■ (MASK (0SET-»6) ) AND (NOT RINDEX)
FMASK(0SET*3)-(FMASK(OSET-3))AND (NOT RINDEX)
FMASK(0SET*4)■(FMASK(OSET+4))AND (NOT RINDEX) 
ENDPROC
DEF PROCLOADMASK
REM ••• LOAD MASK INTO ARRAY
PROCAOFF
PROCBOFF
FOR 1-1 TO 48
P-FMASKU9-I)
IF (P AND 1)-1 THEN PR0CS40N ELSE PROCS40FF
IF (P AND 2)-2 THEN PR0CS30N ELSE PR0CS30FF
IF (P AND 4)-4 THEN PR0CS20N ELSE PROCS20FF
IF (P AND 8)-8 THEN PR0CS10N ELSE PROCSIOFF
IF(P AND 16)-16 THEN PR0CS60N ELSE PROCS60FF
PROCCLK
NEXT I
PROCAON
PROCBON
ENDPROC
DEF PROCCONFIG
REM ••• MENU FOR CONTROLLING CONFIGURATION MODES *** 
CLS
PRINT : PRINT : PRINT : PRINT
PRINT" 1. CONTINUOUS CONFIGURE"
PRINT
PRINT" 2. ONE CONFIGURE CLOCK PULSE- 
PRINT
PRINT" 3.RETURN TO MENU "
APPENDIX C: DEMONSTRATOR TEST PROGRAM
2120 PRINT ¡PRINT
213 0  PRIN T" PLEASE SELECT OPTION ( 1 .2  OR 3 
214 0  CONKEY-GET : C0NKEY-C0NKEY-48 
2 1 6 0  IF  CONKEY<1 OR C0NKEY>3 THEN 2 140  
216 0  IF  CONKEY-3 THEN 40
2 1 7 0  IF  CONKEY-2 THEN PROCSLOCLK : GOTO 2 1 4 0  
218 0  IF  CONKEY-1 THEN PROCCONTCONF : GOTO 2 0 6 0  
2190
2 2 0 0  DEF PROCCONTCONF
2206 REM ••• ALLOWS ARRAY TO CONFIGURE CONTINUOUSLY 
2 2 1 0  CLS
2220 PRINT:PRINT:PRINT¡PRIN T
2 230  PRIN T "CONTINUOUSLY CONFIGURING"
2 240  PRINT
2 260  PRINT " PRESS ANY KEY TO STOP"
226 0  IF  IN K E Y(l) — 1 THEN PROCSLOCLK : GOTO 2 2 6 0  
2 270  ENDPROC 
2280
229 0  DEF PROCMENU
229 6  REM • •• GENERATES TOP LEVEL MENU •*» 
230 0  CLS
2310 PRINT : PRINT : PRINT : PRINT
-MENU-2320 PRINT 
233 0  PRINT 
234 0  PRIN T"
235 0  PRINT 
236 0  PR IN T"
2370 PRINT 
238 0  P R IN T" 3 . SCAN *  MASK OUT FAULTS"
239 0  PRINT ¡PRINT:PRINT
240 0  PRINT"PLEASE SELECT OPTION ( 1 .2  OR 3 ) "  
2410 KEY-GET:KEY-KEY-48
1 . W . I N N E R .  ALGORITHM DEMO"
2 . SCAN REFERENCES"
2 420  IF  KEY<1 OR KEY>3 THEN 2410
2430 ENDPROC
2436
2 436  REM ••• PROCEDURES TO SET ARRAY INPUT SIGNALS •
2437
2440 DEF PROCAON 
2 460  A -A  OR 1 
2 460  GOTO 3780 
2 470  DEF PRDCAOFF 
2 480  A -A  AND 62 
2 490  GOTO 3780 
2 6 0 0  DEF PROCBON 
261 0  A -A  OR 2 
2620 GOTO 3780 
263 0  DEF PROCBOFF 
264 0  A -A  AND 61
APPENDIX C: DEMONSTRATOR TEST PROGRAM 234
256 0  GOTO 3 7 8 0  
256 0  DEF PR0CDATN10N 
257 0  A-A OR 4 
268 0  GOTO 3 7 8 0  
260 0  DEF PR0CDATN10FF 
260 0  A-A AND 5 0  
261 0  GOTO 3 7 8 0  
262 0  DEF PR0CDATN20N 
263 0  A-A OR 8 
2 6 4 0  GOTO 3 7 8 0  
266 0  DEF PROCDATN20FF 
266 0  A-A AND 5 5  
267 0  GOTO 3 7 8 0  
268 0  DEF PR0CDATN30N 
260 0  A-A OR 1 «
270 0  GOTO 3 7 8 0  
271 0  DEF PR0CDATN30FF 
272 0  A-A AND 4 7  
273 0  GOTO 3 7 8 0  
274 0  DEF PR0CDATN40N 
276 0  A-A OR 3 2  
276 0  GOTO 3 7 8 0  
277 0  DEF PR0CDATN40FF 
278 0  A-A AND 31  
270 0  GOTO 3 7 8 0  
280 0  DEF PROCTESTON 
281 0  B-B OR 6 5  
282 0  GOTO 3 8 0 0  
283 0  DEF PROCTESTOFF 
284 0  B-B AND 1 2 6  
286 0  GOTO 3 8 0 0  
286 0  DEF PR0CS10N 
2 8 7 0  B-B OR 6 6  
288 0  GOTO 3 8 0 0  
2 8 0 0  DEF PR0CS10FF 
2 9 0 0  B-B AND 1 2 5  
2 9 1 0  GOTO 3 8 0 0  
2 9 2 0  DEF PR0CS20N 
2 9 3 0  B-B OR 6 8  
2 9 4 0  GOTO 3 8 0 0  
2 9 6 0  DEF PR0CS20FF 
296 0  B-B AND 1 2 3  
2 9 7 0  GOTO 3 8 0 0  
2 9 8 0  DEF PROCS30N 
299 0  B-B OR 7 2  
300 0  GOTO 3 8 0 0  
301 0  DEF PR0CS30FF 
302 0  B-B AND 1 1 9
APPENDIX C: DEMONSTRATOR TEST PROGRAM 235
3 0 3 0  COTO 3800 
3 0 4 0  DEF PR0CS40N 
3 0 6 0  B-B 0R 80 
3 0 6 0  GOTO 3800 
3 0 7 0  DEF PR0CS40FF 
3 0 8 0  B-B AND 111 
3 0 0 0  GOTO 3800 
3 1 0 0  DEF PR0CS60N 
3 1 1 0  B-B OR 06 
3 1 2 0  GOTO 3800 
3 1 3 0  DEF PR0CS50FF 
3 1 4 0  B-B AND 06 
3 1 6 0  GOTO 3800 
3 1 6 0  DEF PR0CSUM10N 
3 1 7 0  C-C OR 120 
3 1 8 0  GOTO 3820 
3 1 0 0  DEF PR0CSUM10FF 
3 2 0 0  C-C AND 100 
3 2 1 0  GOTO 3820 
3 2 2 0  DEF PR0CSUM20N 
3 2 3 0  C-C OR 130 
3 2 4 0  GOTO 3820 
3 2 6 0  DEF PR0CSUM20FF 
3 2 6 0  C-C AND ISO 
3 2 7 0  GOTO 3820 
3 2 8 0  DEF PR0CSUM30N 
3 2 0 0  C-C OR 132 
3 3 0 0  GOTO 3820 
3 3 1 0  DEF PR0CSUM30FF 
3 3 2 0  C-C AND 187 
3 3 3 0  GOTO 3820 
3 3 4 0  DEF PR0CSUM40N 
3 3 6 0  C-C OR 136 
3 3 6 0  GOTO 3820 
3 3 7 0  DEF PR0CSUM40FF 
3 3 8 0  C-C AND 183 
3 3 0 0  GOTO 3820 
3 4 0 0  DEF PR0CC10N 
3 4 1 0  C-C OR 144 
3 4 2 0  GOTO 3820 
3 4 3 0  DEF PR0CC10FF 
3 4 4 0  C-C AND 176 
3 4 6 0  GOTO 3820 
3 4 6 0  DEF PR0CC20N 
3 4 7 0  C-C OR 160 
3 4 8 0  GOTO 3820 
3 4 9 0  DEF PR0CC20FF 
3 6 0 0  C-C AND 169
APPENDIX C: DEMONSTRATOR TEST PROGRAM 2 3 6
3610 COTO 3820
3620 DEF PR0CC30N
3630 D-D OR 193
3640 GOTO 3840
3660 DEF PR0CC30FF
3660 D-D AND 264
3670 GOTO 3840
3680 DEF PR0CC40N
3590 D-D OR 194
3600 GOTO 3840
3610 DEF PR0CC40FF
3620 D-D AND 253
3630 G0T0384O
3640 DEF PROCCLK
3660 V-78FE60
3660 78FE60-(V AND 254)
3670 FOR DEL-1 TO 10
3680 NEXT DEL
3690 78FE60-V
3700 ENDPROC
3710 DEF PROCSLOCLK
3720 V-7AFE60
3730 7AFE60-(V AND 263)
3740 FOR DEL-1 TO 10
3760 NEXT DEL
3760 78FE60-V
3770 ENDPROC
3780 78FE61-A
3790 GOTO 3860
3800 7AFE61-B
3810 GOTO 3860
3820 78FE61-C
3830 GOTO 3850
3840 7AFE61-D
3860 ENDPROC
3865
6000 DEF PROCDELAY 
6010 FOR DELAY-1 TO 60 
6020 NEXT DELAY 
5030 ENDPROC
A pp en d ix  D  
Published P apers
D .l Papers Included in the Appendix
This appendix contains the most important papers published by the author 
which are relevant to the research in this thesis. Several other papers have 
been published and presented at various forums, but are similar in content 
to those listed below and have not been included in this appendix.
The following papers are included in chronological order:
Evans R A (1985), A Self Organising, Fault Tolerant, S-Dimensional Array, 
Proc. VLSI-85, Tokyo, Japan, ed E Hoebst, North Holland 1986, pp 
239-248.
This is the first publication o f  the self-organising technique and de­
scribes the WINNER algorithm applied to one dimension of an array 
only. Circuitry for automatically entering and retrieving data from the 
functional rows of the array is described.
Evans R A, McCanny J V and Wood K W (1985), Wafer Seale Integration 
Based on Self-Organisation, Proc. Workshop on Wafer Scale Integra­
tion, Southampton, ed C Jesshope and W Moore, Adam Hiiger, 1986,
pp 101-112.
This paper extends the WINNER concept and shows how it can be 
applied to both dimensions of an array.
2 3 7
APPENDIX D: PUBLISHED PAPERS 238
Evans R A and McWhirter J G (1987), A Hierarchical Testing Strategy for 
Self-organising Fault-tolerant Arrays, in ‘ Systolic Arrays’ , eds Moore, 
McCabe and Urquhart, Adam Hilger (Bristol) UK, pp 229-238. 
Introduces the scan-path technique for testing the control circuity in
WINNER.
D.2 Other Publications
The following publications are not included in this appendix since they are 
strongly based on chapters of this thesis:
Evans R A (1989), Wafer Scale Integration in ‘ Design and Test Techniques 
for VLSI and WSI Circuits’ , R E Massara ed., Peter Peregrinus Ltd, 
London, to be published 1989.
This is a review of wafer scale integration research based on chapter 4 to 
be published in book form. In addition it contains a section describing 
the motivation behind the research into WSI.
Evans R A (1989), Self-organising Arrays for Wafer Scale Integration in ‘De­
sign and Test Techniques for VLSI and WSI Circuits’ , R E Massara 
ed., Peter Peregrinus Ltd, London, to be published 1989.
This is to be published in the same book as above, and is a detailed 
description of the WINNER algorithm, including some details about 
its performance.
A SELF-ORGANISING
FA ULT-TOLERAN T, 2-OIN EN SIO N AL ARRAY
R ic h a rd  A . Evans
Royal S ig n a ls  and Radar E s ta b lis h m e n t 
St A n drew 's Road, M a lv e rn , U o rc s ,  
UR14 3PS , England
A s e l f -o r g a n i s i n g ,  f a u l t  t o le r a n t  a lg o r it h m  f o r  g e n e ra tin g  
2-d1 n e ns1 o nal a rr a y s  Is  d e s c r ib e d .  T h e  a lg o r i t h m  1s 
com pletely s ta b le  f o r  a l l  p ro c e s s o r f a u l t  d i s t r i b u t i o n s  and 
re q u ire s  no e x te rn a l c o n t r o l .  Any re q u ir e d  d e g re e  o f  fa u lt  
tolerance may be In tro d u c e d  by In c o r p o r a t in g  a d d i t i o n a l  rows 
In to  the a r r a y ,  w it h  an o verhead o f  a bou t 20 g a te s / p ro c e s s o r . 
The technique appears to  be v e r y  s u it a b le  f o r  use 1n th e  area 
o f Wafer S ca le  I n t e g ra t io n  and h ig h  r e l i a b i l i t y  s y s te m s .
1 .  INTRO DUCTION
Th e  p a s t  few ye a rs  have seen a dra m a tic  In c re a s e  1n I n t e r e s t  In  f a u l t  t o le r a n t  
te c h n iq u e s  s u it a b le  f o r  use w it h  h a rd w a re  d e s i g n s .  T h i s  In c r e a s e  c a n  be 
l a r g e l y  a t t r i b u t e d  to  two main f a c t o r s .  F i r s t l y ,  t h e re  1s broad agreement 1n 
th e  V L S I community t h a t  l a r g e r  ch ip s  w i l l  be r e q u ir e d  i n  t h e  f u t u re  and 1n o r d e r  
t o  k e e p  t h e  cost o f  the se  c h ip s  a t  an economic l e v e l ,  m a n u fa c tu re rs  w i l l  need 
to  c o n s id e r  em ploying  f a u l t  t o l e r a n t  t e c h n iq u e s  t o  I n c r e a s e  t h e  v e r y  low 
y i e l d s  e xp e cted  a t these c h ip  s i z e s .  S e c o n d ly , w id e sp re a d  Im portance 1s now 
b e in g  p la c e d  on r e g u la r  a r c h i t e c t u r e s  c o n s i s t i n g  o f  a r r a y s  o f  I d e n t i c a l  
p r o c e s s i n g  e le m e n ts .  I t  1s w e l l  known t h a t  t h e s e  a rr a y s  can o f f e r  h ig h  
c o m p u ta tio n  ra te s  by e x p lo it in g  p a r a l le l  p ro c e s s in g  and p i p e li n i n g  as w e ll  as 
s im p l1 f1 y in g  the d e s ig n  p ro c e s s . In  a d d i t i o n ,  th e  r e g u l a r i t y  a llo w s redundant 
e le m e n ts  t o  be In c o rp o ra te d  In t o  the a r r a y s  1n a v e r y  s im p le  manner and th e s e  
1n t u r n  can be used to  re p la c e  any f a u l t y  ele m e nts w h ic h  o cc u r 1n th e  a c t iv e  
p a r t  o f  th e  a r r a y .
R e g u la r  a r r a y s  o f  p r o c e s s in g  ele m e nts can be c o n s id e re d  1n two c a te g o r ie s . 
The f i r s t  c a te g o ry  In c lu d e s  a rr a y s  c o n s tru c te d  from s im p le  p ro c e ssin g  e le m e n ts  
e a c h  o f  w h ic h  c o n t a in s  a s m a ll  nu m ber o f  g a t e s .  An example o f  such an 
a r c h it e c t u r e  1s th e  b i t - l e v e l  s y s t o l ic  a r r a y  C13, where each elem ent c o n ta in s  
a s i n g l e  f u l l  adder and a few la t c h e s . An a rr a y  c o n t a in in g  elem ents o f  t h i s  
ty p e  w o u ld  n o rm a lly  be Implem ented as a s in g le  c h ip  and f o r  t h i s  r e a s o n  I t  has 
been te rm e d  a 'c h ip  le v e l  a r r a y ' .
T h e  se c o n d  c a te g o ry  In c lu d e s  a rra y s  1n which th e  I n d i v i d u a l  elem ents a re  more 
c o m p le x . Examples, a re  a r r a y  p ro c e ssin g  com puters l i k e  th e  DAP C23 and GRID 
C 3 3 ,  s y s t o l i c  a r r a y s  o f  th e  ty p e  p ro p o s e d  b y  H u ng  C 4 3 , and a r r a y s  o f  
T r a n s p u t e r s  C53. W ith the se  a r r a y s ,  e a c h  e le m e n t  I s  l i k e l y  t o  r e q u i r e  a 
c o m p le t e  c h ip  f o r  I t s  Im p le m e n ta tio n , w h ile  th e  w hole a r r a y  may be co n s id e re d  
s u i t a b le  f o r  Im p lem e ntatio n  as a Wafer S cale  sy s te m . The se  a rra y s  have been 
te rm e d  's y s te m  le v e l  a r r a y s ' .
C o p y r ig h t  ©  C o n t r o l le r  HMSO, London, 1985.
The f a u lt  t o l e r a n t  s tra te g y  used w ith  a p a r t i c u l a r  a r r a y  w i l l  de p e n d  upon 
th e  c a t e g o r y  I n t o  w h ic h  th e  a r r a y  f a l l s .  A t th e  c h ip  l e v e l ,  • r e c e n t ly  
p u b lis h e d  te c h n iq u e  C63, In v o lv e s  t h e  u se  o f  re d u n d a n t  ro w s o f  p ro c e s s o r s  
w h ic h  r e p la c e  com p le te  rows o f th e  a rra y  c o n ta in in g  one o r  more f a u l t s .  Each 
row c o n ta in in g  a f a u l t  1s bypassed and a sp a re  1s sw itc h e d  1n t o  re p la c e  1 t .  
T h i s  te ch n iq u e  1s e f f e c t iv e  when used 1n c h ip  le v e l  a rra y s  because each row of 
th e  a r r a y  1s a f a i r l y  s im p le  c i r c u i t  and th e  am ount o f  g o o d  c i r c u i t r y  
d is c a r d e d  by s w itc h in g  o ut a f a u l t y  row 1s s m a ll .  In  a d d it io n  the s w itc h in g  
o p e ra tio n  can be p e rfo rm e d  by some v e r y  s im p le  a d d it io n a l  l o g i c .
A t th e  sy s te m  l e v e l ,  the c o m p le x ity  o f each c e l l  means t h a t  I t  1s In e f f ic ie n t  
to  d is c a rd  a w h o le  row  o f  c e l l s  1 f  j u s t  one o f  I t s  m em bers 1s f a u l t y .  
F u rth e rm o re , 1 f  a f ix e d  maximum p e rce nta ge  o verhead o f e x tra  c o n t r o l  c i r c u i t r y  
fo r  p e rfo rm in g  th e  f a u l t  t o le r a n t  o p e r a t io n  1s p e r m i t t e d ,  t h e n  1 t  w i l l  be 
p o s s ib le  to  i n c lu d e  a la r g e r  a b s o lu te  q u a n t it y  o f  lo g ic  p e r p ro c e s s o r  a t the 
system  l e v e l  t h a n  a t  th e  c h ip  l e v e l .  T h i s  p r e s e n ts  th e  o p p o r t u n i t y  o f  
em ploying a more I n t e l l i g e n t  f a u l t  t o le r a n t  s t r a t e g y .
T h i s  paper a d d re s s e s  the problem  o f  a p p ly in g  f a u l t  to le ra n c e  to  system  le v e l  
a r r a y s .  As o u r  k e y  c o n c e p t we p ro p o s e  a m ethod b y  w h ic h  a n  a r r a y  o f  
p ro c e s s o rs  ca n  be g iv e n  the a b i l i t y  to  a u to m a t ic a lly  o rg a n is e  i t s e l f  1n such a 
way as t o  c o n s t r u c t  a f u n c t i o n a l  ¿ -d i m e n s i o n a l  a r r a y  f r o m  a g i v e n  
¿ -d im e n s io n a l  a r r a y  c o n ta in in g  a number o f f a u l t y  p ro c e s s o rs . Redundancy 1s 
In tro d u c e d  i n  th e  form  o f  a d d it io n a l  rows o f  p ro c e s s o rs , th e  number o f which 
can be chosen to  g iv e  the degree o f  f a u l t  t o le ra n c e  r e q u ir e d .  The approach 
appears to  be s u i t a b le  fo r  use b o th  w ith  a rr a y s  o f In te rc o n n e c te d  c h ip s  o r  a t  
th e  WSI l e v e l .
I n  s e c t io n  2 we p re s e n t  th e  p ro p o s e d  a lg o r i t h m  and a d e s c ib e  a p r a c t ic a l  
im p le m e n ta tio n  o f  1 t  in  S e ctio n  3 .  S e c tio n  4 g iv e s  d e t a ils  o f  th e  t h e o r e t ic a l  
and s im u la t e d  p e rform a nce s f o r  v a r io u s  y i e ld  c h a r a c t e r i s t i c s ,  w h ile  a method 
o f  im p roving  th e  u s e r  In te r fa c e  to  a c o n f ig u re d  a rr a y  I s  p re s e n te d  1n S e c tio n  
5 .  F i n a l l y ,  1n S e c t io n  6 we d is c u s s  the m e r its  o f  the te ch n iq u e  and co n s id e r 
ways 1n which 1 t  m ig h t be Im p roved.
2 . THE ALGORITHM
T h e  a im  o f  t h i s  a l g o r i t h m  1s t o  c o n s t r u c t  a ¿ -d im e n s i o n a l  a r r a y  o f  
o r t h o g o n a l l y  I n t e r c o n n e c t e d  p r o c e s s o r s  w i t h  c o n n e c t io n s  1n t h e  x a n d  y 
d i r e c t i o n s .  I t  1s assumed t h a t  some mechanism e x is t s  whereby each p ro ce sso r
F ig u re  1 .  ( a )  P e rfe c t a r r a y ,  <b) C c n flg u re d  im p e rfe c t a r r a y .
1s a b le  to  In d ic a t e  whether o r  n o t  1 t 1s f a u l t y .  In  p r a c t ic e  t h i s  m ight be 
a chie ve d  by soae fo ra  o f  s e l f - t e s t i n g  sch e m e , o r  p e rh a p s  b y  an e x t e r n a l l y  
a p p lie d  t e s t in g  s t r a t e g y .  6
F ig u r e  1 ( a )  l l l u s t a t e s  the ty p e  o f  a rr a y  b e in g  c o n s id e re d . The a rr a y  shown 
has a p e rf e c t  In te rc o n n e c t io n  p a t t e r n  s in c e  th e re  a re  no f a u l t y  p ro c e s s o rs  I n  
th e  a r r a y .  F ig u re  1 (b )  shows a s i m il a r  a rr a y  but which now c o n ta in s  a number 
o f  f a u l t y  p ro c e sso rs  ea ch  I n d ic a t e d  by a c ro s s . I t  can be seen t h a t  an a r r a y  
w h ic h  f u n c t i o n s  as an o r t h o g o n a l  a r r a y  (te r m e d  a f u n c t i o n a l l y  o rth o g o n a l 
a r r a y )  can s t i l l  be c o n s tru c te d  b y  u s in g  th e  I n t e r c o n n e c t io n s  s h o w n . The 
I n t e r c o n n e c t io n s  I n  th e  h o r i z o n t a l  d i r e c t i o n  have been b e n t  b y  a llo w in g  
p ro c e sso rs  to  communicate w ith  t h e i r  d ia g o n a l ne ig h b o u rs  1 f n e c e s s a ry  and t h is  
e n a b le s  row s o f  f u n c t i o n a l  p r o c e s s o r s  t o  a v o id  . f a u l t y  p ro c e s s o r s . I t  1s 
o b vio u s  th a t t h i s  f u n c t io n a l  a r r a y  w i l l  be s m a lle r  th a n  th e  c o r r e s p o n d in g  
p e r f e c t l y  conne cte d  a r r a y ,  but 1 t  sh ou ld  be noted t h a t  the x dim e nsio n  o f  the 
fu n c t io n a l a rr a y  1s I d e n t ic a l  to  t h a t  o f  the g iv e n  a r r a y ,  w h ile  th e  d im e n s io n  
1n th e  y d i r e c t i o n  w i l l  depend upon  the f a u l t  d i s t r i b u t i o n  o f  th e  a r r a y .  T h is  
1s c h a r a c t e r is t ic  o f th e  a lg o r it h m  t o  be d e s c r ib e d .
An In te r c o n n e c t io n  p a t te r n  such a s  th a t  d e s c rib e d  above c o u ld  be a c h ie ve d  by 
la s e r  programm ing o r  b y  e l e c t r i c a l  fu s e  b lo w in g . How ever, th e  method p ro p o s e d  
here r e l i e s  upon l o g ic a l  c o n f ig u r a t lo n  which w i l l  o cc u r a u to m a t ic a lly  when the 
a rr a y  is  sw itc h e d  o n . In  o rd e r  t o  a ch ie ve  t h i s  I t  i s  n e c e ssa ry  t o  a s s o c ia te  
some a d d i t i o n a l  c o n t r o l  c i r c u i t r y  w it h  ea ch p ro c e s s o r 1n the a r r a y .  The 
com bin a tio n  o f  a p ro c e sso r and i t s  c o n t r o l  c i r c u i t r y  w i l l  be c a l le d  a ' c e l l ' .  
A c e l l  i n t e r c o n n e c t i o n  p a t t e r n  s u i t a b le  f o r  u se w it h  th e  a lg o r i t h m  i s  
i l l u s t r a t e d  in  f ig u r e  2 . I t  s h o u ld  be no te d  th a t  a lth o u g h  an o r t h o g o n a l l y  
in t e r c o n n e c t e d  a r r a y  o f  f u n c t i o n a l  p ro c e sso rs  i s  to  be c o n s tr u c te d , a more 
complex c e l l  in te r c o n n e c t io n  scheme i s  re q u ire d  in  o rd e r  t h a t  f a u l t y  c e l l s  can 
be a vo id e d . The c e l l  i l l u s t r a t e d  has a s in g le  co n n e c tio n  in  th e  n o rt h -s o u t h  
d ir e c t io n  w h ic h  i s  i d e n t i c a l  t o  t h a t  o f  th e  p r o c e s s o r .  H o w ever i n  th e  
e a s t -w e s t  d i r e c t i o n ,  s u f f i c ie n t  com m unication channels have been p ro v id e d  to 
a l lo w  a c e l l  to  communicate W estwards w ith  i t s  NU, U o r  SU n e ig h b o u rs , and 
Eastwards w ith  I t s  N E , E o r  SE n e ig h b o u rs .
IPNW REQNW IPN REQNE IPNE
AVAILNW AVAILNE
REQW ----------- -------------- R EQ E
IPW -----------
AVA1LW ----------
----------  OPE
' AVAILE
REQSW REOSE
IPSW AVAILSW OPS AVA1LSE OPSE
F ig u re  2 .  C e l l  In p u t/ O u tp u t  req uire m e n ts.
T h *  f u n c t io n  o f *ach c e l l  1* to  e s t a b li s h  com m unication c h a nne ls between I t s  
I n t e r n a l  p ro c e sso r and th e  p ro c e s s o r o f  a n e ig h b o u rin g  c e l l  1n th e  colum ns to  
I t s  l e f t  and r i g h t ,  so t h a t ,  1n a g lo b a l  c o n t e s t , f a u l t - f r e e  p ro c e s s o rs  a re  
connected to  form a number o f  f u n c t i o n a l  row s w h ic h  t o g e t h e r  oak* u p  th e  
r e q u i r e d  f u n c t i o n a l  a r r a y .  F a u lt y  o r  unused p ro c e sso rs  a re  bypassed 1n the 
n o rt h -s o u t h  d i r e c t i o n  b y  the c o n t r o l  c i r c u i t r y  w h ic h  e f f e c t i v e l y  e l i m in a t e s  
them from  the fu n c t io n a l  a r r a y .
E a c h  c e l l  has a n u m be r o f  c o n t r o l  I n p u t s  1n a d d i t i o n  to  th e  p r o c e s s o r  
com m unication c h a n n e ls . R e fe r rin g  to  f i g u r e  2 1t can be se e n  t h a t  s i g n a ls  
l a b e l l e d  REQNU, REQU and REQSU e n t e r  from  th e  N U , U and SU d i r e c t i o n s  
r e s p e c t i v e l y .  These s ig n a ls  a re  'r e q u e s t 1 s ig n a ls ,  and s i m i l a r  s i g n a l s  le a v e  
t h e  c e l l  I n  th e  d i r e c t i o n s  o f  N E, E and SE as shown. Each c e l l  a ls o  has 
c o n t r o l  s ig n a ls  l a b e l le d  A V A ILN E, A V A IL E  and AVAILSE e n t e r in g  from th e  N E , E 
a n d  SE d i r e c t i o n s  r e s p e c t i v e l y .  T h e s e  a re  ' a v a i l a b i l i t y '  s i g n a l s ,  and 
c o rre s p o n d in g  s ig n a ls  le a v e  t h e  c e l l  1n th e  d i r e c t i o n s  o f  NW, U a n d  SW.
I f  a c e l l  A o u tp u ts  a t ru e  re q u e s t s i g n a l  to  some o th e r  c e l l  8 I t  means th a t 
c e l l  A w ishes to  se t up a c om m u nica tio n  channel between I t s  p ro c e s s o r a n d  th e  
p r o c e s s o r  1n 8 .  I f  such a co m m u n ica tio n  channel becomes s e t up the n  A 1s said  
to  have been 'c o n n e c te d ' to  8 . I f  a c e l l  A o u tp u ts  a t ru e  a v a i l a b i l i t y  s i g n a l  
t o  a n o t h e r  c e l l  8 I t  means t h a t  c e l l  A c o n ta in s  a p ro c e s s o r which 1s a v a i la b le  
f o r  c o n n e c tio n  I f  re q u e s te d  b y  c e l l  B .  The manner i n  which these s ig n a l  a re  
g en e ra te d  by a c e l l  form s the h e a r t  o f t h e  a lg o r it h m .
A v a i l a b i l i t y  S ig n a ls
A c e l l  g e n e r a t e s  I t s  a v a i l a b i l i t y  o u t p u t  s ig n a ls  a c c o rd in g  to  the fo llo w in g  
r u l e s :
( 1 )  A c e l l  can o n ly  o u tp u t a t ru e  a v a i l a b i l i t y  s ig n a l  I f  I t  
c o n ta in s  a p ro c e s s o r w hich 1s f a u l t - f r e e  (1 c  th e  s e l f - t e s t  shows 
1 t to  be f u n c t i o n a l ) ,  and a t le a s t  one t ru e  a v a i l a b i l i t y  s ig n a l  
has been re c e iv e d  from I t s  N E , E o r  SE n e ig h b o u rs .
( 2 )  I f  ( 1 )  I s  s a t i s f i e d ,  th e n  th e  f o llo w in g  p r i o r i t y  system 
o p e ra te s  t o  de c id e  I n  w h ich  d i r e c t i o n s  to  send t ru e  a v a i l a b i l i t y  
s ig n a ls :
Request In p u ts  
REQNU REQU REQSU
A v a i l a b i l i t y  O u tp u ts 
AVAILNW AVAILU AVAILSU
TRUE X X TRUE FALSE FALSE
FALSE TRUE X TRUE TRUE FALSE
FALSE FALSE TRUE TRUE TRUE TRUE
FALSE FALSE FALSE TRUE TRUE TRUE
X >  Any V a lu e
T h i s  schem e a l l o w s  a p r i o r i t y  o f  c o n n e c t io n s  to  be e s ta b lis h e d  so t h a t  a 
re q u e s t from the NU has h ig h e s t  p r i o r i t y ,  and re q u e s ts  from the V and SU have 
s u c c e s s iv e ly  lo w e r  p r i o r i t i e s .  S u c h  a scheme i s  r e q u ire d  1n an I t e r a t i v e  
system  to  re s o lv e  problem s such as a c e l l  A becom ing connected to  a c e l l  to 
th e  SW o f  A 1n advance o f  r e c e i v i n g  a req ue st from a c e l l  to  the NU o f  A.
Th e  s e c o n d  re q u e s t has h ig h e r p r i o r i t y  and o u s t b e  a b le  to  o v e rr id e  the SU 
c o n n e c tio n  and e s t a b lis h  I t s  own c o n ne ction  1n I t s  p la c e .  In  a d d i t i o n ,  th e  
schem e c a u s e s  a FALSE a v a i l a b i l i t y  s ig n a l  to  be o u t p u t  to  c e l l s  w h ich  have no 
chance o f  o b ta in in g  a co n n e c tio n  to  A , 1e 1 f A 1s a lr e a d y  conne cte d ' t o  a n o th e r  
c e l l  w i t h  some p r i o r i t y ,  a c e l l  w ith  low er p r i o r i t y  cannot become connected 
to  A .
Th e  f i r s t  r u le  g iv e s  the c e l l  a g lo b a l lo o k -ah e a d  c a p a b i l i t y  even tho ugh each 
c e l l  1s ca p a b le  o n ly  o f  lo c a l  commmi c a t io n .  In fo rm a t io n  1s p a ss e d  b e tw een 
c e l l s  from  east to  west about th e  a v a i l a b i l i t y  o f  o t h e r  c e l l s .  T h is  a llo w s  a 
c e l l  A to  p r o h ib it  a nothe r c e l l  from con n e c tin g  to  I t  I f  A e i th e r  c o n t a in s  a 
f a u l t y  p ro c e s s o r o r  would be p a rt  o f  a dead-end r o u t e ,  1e a ro u te  t h a t  would 
n o t be a b le  to  be com pleted due to  some b lo c k a g e  l a t e r .  The scheme a llo w s  
I n f o r m a t i o n  about blockages to  be t ra n s m itte d  from  r ig h t  to  l e f t  to  a l l  the 
r e le v a n t  p ro c e s s o rs , which the n  de c id e  upon some a p p ro p r ia t e  a v o id in g  a c t i o n .
Request S ig n a ls
R e q u e s t s i g n a ls  a re  o utp u t from a c e l l  a c c o rd in g  t o  a d if f e r e n t  se t o f  r u le s :
( 1 )  A c e l l  can o n ly  o u tp u t a t ru e  req ue st s i g n a l  1 f  I t s  p ro cesso r
1s f a u l t - f r e e  and a t le a s t  one req ue st has bee n  re c e iv e d  from one 
o f  I t s  NU, U o r  SU n e ig h b o u rs .
( 2 )  I f  <1> 1s s a t ls f le o  the n  the c e l l  o u tp u ts  a s in g le  t ru e  req ue st 
v a lu e  to  one o f  I t s  N E, E and SE n e ig h b o u rs  a c c o rd in g  to  the 
p r i o r i t y  ta b u la te d  b e lo w :
In p u t A v a i l a b i l i t y  | O utput R equest
I
AVAILNE AVAILE AVAILSE j REQNE REQE REUSE
TRUE X X
1
| TRUE F ALSE FALSE
FALSE TRUE X | FALSE TRUE FALSE
FALSE FALSE TRUE | FALSE FALSE TRUE
FALSE FALSE FALSE | FALSE FALSE FALSE
X - Any v a lu e
T h e  r u l e s  e nsure th a t  o n ly  one req ue st s ig n a l  1s o u tp u t  from any c e l l ,  which 
1n t u r n  ensu re s t h a t  a c e l l  can n e ve r a c c id e n t a l l y  become conne cte d  t o  more 
t h a n  o n e  n e i g h b o u r i n g  c e l l  1n t h e  E a s t e r l y  and W e s t e r ly  d i r e c t i o n s .
The a v a i l a b i l i t y  and req ue st s ig n a ls  to g e th e r p r o v id e  the c e l l s  w it h  a l l  th e  
I n f o r m a t i o n  th e y  nee d  about t h e i r  su rro u n d in g s  1n o rd e r  to  be a b le  to  form 
f u n c t io n a l  rows o f  In te rc o n n e c te d  p ro c e s s o rs . Th e  p r i o r i t y  system f o r  sending 
a nd  r e c e iv in g  req ue st and a v a i l a b i l i t y  s ig n a ls  e n s u re s  t h a t  s ta b le  fu n c t io n a l 
row s a re  e s ta b lis h e d  from west to  e a st and from n o r t h  to  south s t a r t i n g  1n th e  
t o p  l e f t  hand c o rn e r a f the a r r a y .  The p r i o r i t y  syste m  a ls o  ensures t h a t  each 
row fo rm e d  1s as c lo s e  as p o s s ib le  to  th e  n o r t h e r n  edge o f  t h e  a r r a y .
I t  i s  p o s s i b l e  f o r  a p ro c e sso r to  be o m itte d  fro m  the fu n c t io n a l  a rr a y  fo r  
s e v e ra l  d if f e r e n t  re a s o n s . These can be be st se e n  b y  re fe re n c e  t o  f i g u r e  3 .  
C e l l s  m arked w it h  a c ro s s  a re  o b v i o u s ly  o m i t t e d  because th e y a re  f a u l t y .
®  R8 ® -
F ig u ra  3 .  Some good 
p ro c e sso rs  e re  n o t  u se d.
H o w e v e r ,  a lth o u g h  c e l l s  A and G a re  f a u l t - f r e e ,  th e y  have been o m itte d  from 
th e  f u n c t io n a l  a r r a y  because o f  th e  p re s e n ce  o f  f a u l t s  1n n e a rb y  c e l l s  as 
s h o w n , w h ic h  cause a shadowing e f f e c t .  The c o n t r o l  c i r c u i t r y  1n c e l l  A w i l l  
d e t e c t  the fa c t  t h a t  a l l  o f I t s  N E, E and SE n e ig h b o u rs  a re  u n a v a ila b le  and 
w i l l  d e c la re  I t s e l f  to  be u n a v a ila b le  as a r e s u l t .  C e l l  B has been o m itte d  
because I t  ca nnot be req ue ste d  by I t s  N U, W o r  SU n e ig h b o u rs ,  sin c e  th e y  a re  
e i t h e r  f a u l t y  o r  a l r e a d y  c o n n e c te d  t o  a n o t h e r  c e l l  w it h  h ig h e r p r i o r i t y .  
C e l ls  la b e lle d  C a re  o m itte d  because th e re  a re  I n s u f f i c i e n t  a v a i la b le  c e l l s  to 
a l lo w  f u r t h e r  com plete fu n c t io n a l  rows to  be fo rm e d .
A l l  c e l l s  which a re  e i t h e r  f a u l t y  o r  unused f o r  any rea son  a re  c o n t r o l le d  to  
a c t i v a t e  a n o r t h -s o u t h  bypass which s im p ly  re n d e rs  th e  c e l l s  tra n s p a re n t 1n 
t h e  v e r t i c a l  d i r e c t i o n .
3 .  CELL IMPLEMENTATION
C o n t r o l  c l r u l t r y  s u i t a b le  f o r  use w ith  p ro c e s s o rs  h a v in g  one In p u t o r  o utp u t 
w ir e  In  each o f  th e  N , S ,  E ,  and U d i r e c t io n s  has been d e s ig n e d  and I n t r o d u c e s  
a n  o ve rh e a d  o f a p p ro x im a te ly  20 tw o -1 n p u t g a te s  p e r p r o c e s s o r .  For p ro c e sso rs  
w it h  more tha n  one w ir e  1n the se  d i r e c t io n s  1 t w i l l  be  n e c e s s a ry  to  add some 
e x t r a  g a t e s  f o r  each e x tra  w i r e .  A 2 -d1m ens1onal a r r a y  o f  these c e l l s  has 
bee n  d e s c rib e d  a t  t h e  g ate  le v e l  In  th e  ha rdw are d e s c r ip t i o n  la n g u a g e , ELLA  
C 7 3 , and th e  o p e r a t io n  o f  th e  a r r a y  h a s  bee n  s i m u l a t e d  f o r  a number o f 
d i f f e r e n t  p ro c e s s o r f a u l t  d i s t r i b u t i o n s .  A c o r r e c t ly  c o n f ig u r e d  c i r c u i t  has 
been produced each t im e .
I n  a p r a c t i c a l  Im p le m e n ta tio n  o f  t h i s  te c h n iq u e  1 t 1s p o s s ib le  to  v is u a l is e  
th e  whole a rr a y  as h a v in g  two s e c t io n s . One s e c t io n  c o m p ris e s  an u n d e rly in g  
a r r a y  o f  c o n t r o l  c i r c u i t r y  w h ic h  i s  c a p a b le  o f  e s t a b li s h i n g  com m unication 
c h a n n e ls  between -a p p ro p r ia t e  n e ig h b o u rs  t o  g e n e ra te  a f u n c t i o n a l ly  o rth o g o n a l 
a r r a y  as d e s c r ib e d . The second s e c t io n  1s an a rr a y  o f  p ro c e s s o rs  c o n ta in in g  a 
n u m b e r o f  f a u l t y  d e v ic e s  which can be c o n s id e re d  to  be o v e r l a i d  on th e  a rra y  
o f  c o n t r o l  c i r c u i t r y  w h ich  th e n  form s In te r c o n n e c t io n s  b e tw e e n p r o c e s s o r s  as 
a p p r o p r i a t e .  S in c e  the c o n t r o l  o verhead in  a rr a y  1s o n ly  about 20 g ate s  per 
p ro c e s s o r th e  p r o b a b i l i t y  o f  a f a u l t  o c c u r in g  i n  t h e  c o n t r o l  c i r c u i t r y  i s  
l i k e l y  to  be a c c e p ta b ly  lo w . However I f  d e s ire d  1 t s h o u ld  be p o s s ib le  to  use 
te c h n iq u e s  such as t r i p l i c a t i o n  o f  th e  c o n t r o l  c i r c u i t r y  i n  o rd e r  t o  re d u c e  
t h i s  p r o b a b i l i t y  even f u r t h e r .
4. PERFORMANCE
I n  o r d e r  to  t r y  t o  e s t a b l i s h  th e  f a u l t  t o l e r a n t  c h a r a t e r i s f l e a  o f  the 
t e c h n iq u e  t h e o r e t ic a l ly  a model o f  th e  a lg o rith m  was u sed. I n  th e  m odel we 
h a v e  assumed t h a t  a l l  the c o n t r o l  c i r c u i t r y  o p e ra te s  r o r r e c t ly  and t h a t  fa u lts  
o n ly  o c c u r  1n th e  p r o c e s s o r s .  I n  a d d i t i o n  we '.a v e  a s s u m e d  t h a t  th e  
d i s t r i b u t i o n  o f  f a u l t s  o v e r  the a rr a y  1s e n t i r e l y  ra.-.som and have a d o p te d  the 
s i m p l i s t i c  v ie w  t h a t  e a c h  f a u l t  a c t  a In d e p e n d e n t  l y  o f  o t h e r  f a u l t s  1n 
I n f lu e n c in g  the number o f  fu n c t io n a l  rows which are g e n e ra te d .
An Im p o rta n t  f e a tu re  o f  th e  f a u l t  d i s t r i b u t i o n  1s the maximum number o f  fa u lty  
p ro c e s s o rs  o c c u r r in g  1n any one column o f  th e  a rr a y  s in c e  I t  1s p r i m a r i l y  t h i s  
v a l u e  w h ic h  l im i t s  th e  number o f  p ro c e sso rs  a v a i la b le  fo r  fo rm in g  fu n c t io n a l  
ro w s . Th e  c a lc u la t io n s  g iv e  th e  f a c t o r  (c a l le d  th e  R e du nda ncy F a c t o r ) ,  by 
w h ic h  th e  number o f  rows must be In cre a se d  1n o rd e r  to  a c h ie ve  a fu n c t io n a l  
a r r a y  o f  x by y  p r o c e s s o r s  com p a re d  w i t h  a f a u l t  f r e e  a r r a y  o f  x b y  y 
p ro c e s s o r s .  In  o th e r  w o rd s :
A c t u a l  n o . o f  rows needed to  ■ Redundancy • y
g e n e ra te  y  f u n c t io n a l  rows F a c to r
The f i r s t  o rd e r  t h e o r e t ic a l  perform ance 1s p lo t te d  1n f ig u re  4 .
I n  o r d e r  to  assess th e  v a l i d i t y  o f  the model the a lg o rith m  was d e s c r ib e d  and 
s im u la te d  1n A l g o l .  A number o f s im u la t io n s  were perform ed on an a r r a y  w it h  
10 ro w s and 10 columns and random d i s t r i b u t i o n s  o f f a u l t s .  An e s t im a te  o f  the 
re d u n d a n c y  fa c t o r  was th e n  d e r iv e d  fo r  d if f e r e n t  y i e ld  v a lu e s  b y  a v e r a g in g  the 
r e s u lt s  from 20 s im u la t io n s  w ith  Independent f a u l t  d i s t r i b u t i o n s .  The  r e s u lt s  
b y  c a lc u l a t i o n  and s im u la t io n  a re  p re s e n te d  1n f i g u r e  4 . I t  can be s e e n  t h a t  
b o th  c u r v e s  a re  s i m i l a r  1n shape and t h a t  th e re  1s good c o r r e l a t io n  a t  y ie ld s  
above a b o u t 6 0S. H o w e ve r, th e  s im u la t io n  r e s u lt s  In d ic a t e  t h a t  th e  re d u n d a n c y  
f a c t o r  r i s e s  much e a r l i e r  tha n  p r e d ic t e d .  T h is  1s th o ug h t to  be due to  second 
o rd e r  e f f e c t s  b e c o m in g  s i g n i f i c a n t  a t  y i e l d s  b e lo w  a b o u t 6 0 Z . T h e  most
d o m in a n t  e f f e c t  1s l i k e l y  to  be t h a t  1n the p resence o f  a la r g e  number of 
f a u l t s ,  some o f  th e  fu n c t io n a l  c e l l s  w i l l  become In a c c e s s ib le  beca use  t h e y  are  
p a r t i a l l y  o r  c o m p le t e ly  s u r ro u n d e d  b y  f a u l t y  c e l l s .  T h is  w i l l  re d u c e  the 
e f f e c t i v e  y i e ld  o f  th e  a r r a y  and te nd  to  move the t h e o r e t ic a l  c u r v e  to  the 
r i g h t .
REDUNDANCY FACTOR
F ig u re  4 .  T h e o r e t ic a l  and s im u la te d  p e rfo rm a n ce s.
Jnl
fìì
f
5 . TRANSPARENT USER INTERFACE
The t e c h n iq u e  so f a r  d e s c r ib e d  seems t o  be p o t e n t i a l l y  u s e fu l, ',  b u t s t i l l
lea ve s much to  be d e s ir e d  from th e  u se rs  p o in t  o f  v ie w . The u se r ne e d s to
know, f o r  e xa m ple , where fu n c t io n a l  rows o f  p ro c e sso rs  s t a r t  and end 1n o rd e r  
to  a p p ly  In p u ts  and re c e iv e  o u tp u ts  from th e  a r r a y .  In  the v e r t ic a l  d i r e c t i o n  
t h i s  1s no t such a problem  because the fu n c t io n a l  a rra y  I s  the same w id t h  as 
the g iv e n  a r r a y .
In  th e  h o r iz o n t a l  d i r e c t i o n ,  h o w e v e r, t ru e  a v a i l a b i l i t y  s ig n a ls  em erging  from  
the W s id e  o f  th e  a r r a y  mark th e  s t a r t  o f  each fu n c t io n a l  ro w , w h ile  on th e  E
s i d e ,  t r u e  re q u e s t s ig n a ls  mark th e  ends o f  fu n c t io n a l row s. These s ig n a ls
can be used 1n c o n ju n c t io n  w it h  some s im p le  e x t r a  c i r c u i t r y  t o  make th e
f a u l t - t o l e r a n t  a r r a y  appear l ik e  a p e rf e c t  a r r a y  t o  the u s e r ,  who need n o t be 
aware t h a t  he u s in g  a p h y s ic a l ly  Im p e rfe c t  a r r a y .
T h i s  ta s k  can be p e rform e d u sin g  an In p u t a rr a y  o f  c e l l s  and an o u tp u t a rr a y
o f c e l l s  whose c i r c u i t r y  1s I l l u s t r a t e d  1n f ig u re s  5 and 6 . F ig u re  5 show s a
s i n g l e  c e l l  f o r  e a c h  o f  th e  I n p u t  and o u t p u t  a r r a y s ,  w h il e  f i g u r e  6
I l l u s t r a t e s  th e  c e l t  In te rc o n n e c t io n s  r e q u ire d  to  form th e  a r r a y s .  I t  s h o u ld  
be no te d  th a t th e  c e l l s  a re  m ir r o r  Images o f  each o th e r  w ith  the In p u t c e l l  
h a vin g  an e x tra  w ir e  p a ssing  tho ugh 1 t .  Each o f  th e  In p u t and o u tp u t a r r a y s  
has as many rows ss  th e re  a re  p h y s ic a l  rows I n  the s e lf -c o n f ig u r in g  a r r a y ,  and
/a ) (b >
OUTPUT
C
V
VFROM
ARRAY
F ig u re  5 .  C e l ls  f o r :  ( a )  In p u t  a r r a y ,  ( b )  O utput a r r a y .
OUTPUTS
F ig u re  6 .  <a> In p u t and (b )  Ouput In te r f a c e  a rra y s ,
as «a n y  columns as th e re  a re  h o r iz o n t a l  In p u ts  o r  o u tp u ts  to  th e  c o n f ig u r in g  
a r r a y .  The In p u t a rr a y  o p e ra te s  as f o l l o w s .  The CTRL In p u ts  a t  the to p  o f 
th e  a r r a y  a re  se t to  TRUE and s ig n a ls  a re  a p p lie d  to  s ig n a l  In p u t s .  Each CTRL 
s ig n a l  passes down I t s  r e s p e c t iv e  column u n t i l  1 t e nco u n te rs  a c e l l  w h ich  has 
a TR UE a v a i l a b i l i t y  s i g n a l  a r r i v i n g  fro m  t h e  E a s t .  T h i s  c e l l  1 s  t h e n  
c o n t r o l le d  to  connect I t s  s ig n a l  In p u t to  I t s  h o r iz o n t a l  s ig n a l  l in e  which 1n 
t u rn  1s connected to th e  m ain a r r a y .  The c e l l  the n  o u tp u ts  FALSE C TR L and 
A V A IL U  v a lu e s . T h is  p re v e n ts  any o th e r  s ig n a l  In p u ts  from  becoming connected 
to  th e  same In p u t o f  th e  main a r r a y ,  and a ls o  p re ve nts th e  s ig n a l In p u t j u s t  
c o n n e c t e d  to  the main a r r a y  from becom ing connected to  any o th e r  In p u t o f  the 
main a r r a y .  In  t h i s  w a y , th e  r ig h tm o s t  s ig n a l  In p u t  a p p li e d  t o  th e  I n p u t  
a r r a y  w i l l  be connected to  the f i r s t  a v a i la b le  fu n c t io n a l  row s t a r t i n g  from 
the to p  o f  th e  a r r a y ,  w it h  o th e r  In p u ts  becom ing c o n n e c te d  to  s u c c e s s iv e ly  
low er f u n c t io n a l  row s. I t  shou ld  be n o te d  t h a t  the NW, S U , NE and SE In p u ts  
to  c e l l s  on th e  p e rip h e r y  o f  the a rr a y  a re  connected to  a FALSE v a lu e .  T h is  
p r o v i d e s  the n e ce ssa ry  b o u n d a rie s  w it h in  which th e  c o n f Ig u ra t io n  a lg o rith m  1s 
to  o p e r a t e .
I t  1s Im p o rt a n t  to  no te  t h a t  some s p e c ia l  a c t io n  1s n e c e ssa ry  i f  th e  number o f 
f u n c t io n a l  rows g en e ra te d  b y  the s e l f -c o n f i g u r in g  c i r c u i t r y  exceeds the number 
o f  s i g n a l  In p u t s .  In  t h i s  case we must e nsu re  th a t the e x t r a  f u n c t io n a l  rows 
a re  bypassed in  th e  n o r t h -s o u t h  d i r e c t i o n .  T h is  can be  s im p ly  a c h ie v e d  by 
fe e d in g  the CTRL o u tp u t fro m  each c e l l  1n th e  l e f t  hand column In to  the REQW 
In p u t o f  the n e x t row o f  th e  main a r r a y  as shown. T h is  e n s u re s  t h a t  TRUE 
r e q u e s t  s ig n a l  a re  a p p lie d  to  a l l  fu n c t io n a l  rows e xce p t those which a re  1n 
e xcess o f  the r e q u ire d  nu m b e r.
Th e  o u t p u t  a rr a y  o p e ra te s  1n a s i m il a r  manner to  the In p u t  a r r a y ,  e xc e p t th a t 
1 t  1s now CTRLed by th e  REQUEST s ig n a ls  em erging  from th e  main a r r a y .  O u tp u ts  
a p p e a r  a t  th e  top o f th e  a r r a y  w ith  th e  o u tp u t from  the topm ost f u n c t io n a l  row 
b e in g  o n  th e  l e f t .
An I n t e r e s t i n g  fe a tu re  o f  e i t h e r  the In p u t  o r  o u tp u t a rr a y s  1s t h a t  th e y  can 
a ls o  p ro v id e  In fo rm a tio n  a b o u t whether s u f f i c ie n t  f u n c t i o n a l  row s h a v e  been 
fo u n d  f o r  th e  num ber o f  s i g n a l  I n p u t s  w h ic h  have been a p p lie d .  Such an 
In d ic a t i o n  can be g e n e ra te d  by ORing th e  CTRL o u tp u ts  em erging  from the b o tto m  
o f  e i t h e r  a r r a y .  I f  s u f f i c i e n t  f u n c t io n a l  rows a re  a v a i l a b le ,  a l l  th e  CTRL 
o u tp u ts  shou ld  be FALSE. H o w ever, 1 f  some s ig n a l  In p u ts  do n o t have a ro w  to  
c o nne ct t o ,  one o r  more o f  th e  CTRL o u tp u ts  w i l l  s t i l l  be TR UE.
6 .  DISCUSSION
I n  t h i s  s e c t i o n  we c o n s i d e r  th e  m e r i t s  o f  th e  t e c h n iq u e  and d is c u s s  the 
v a l i d i t y  o f th e  a ss u m p tio ns w h ich  have been made.
Th e  t e c h n iq u e  possesses a number o f a t t r a c t i v e  p r o p e r t ie s .  These in c lu d e  the 
s i m p l i c i t y  o f  t h e  c o n t r o l  c i r c u i t r y ,  s t a b i l i t y  f o r  a l l  p ro c e s s o r  f a u l t  
d i s t r i b u t i o n s ,  and the f a c t  th a t  the a lg o r it h m  re q u ire s  no g lo b a l c o n t r o l .  In 
a d d it io n  th e  te c h n iq u e  1 s  a p p l i c a b le  t o  a n y  o r t h o g o n a l l y  c o n n e c te d  a r r a y  
r e g a rd le s s  o f  s i z e .
A n u m b e r o f  a ssum ptio ns ha ve  been made. F i r s t l y  1 t has been assumed t h a t  each 
p ro c e s s o r  can te s t  I t s e l f .  T h is  1s no t unre a sona ble  a s  t h e r e  1s c u r r e n t l y  
much In t e r e s t  In  the a re a  o f  s e l f - t e s t .  However 1 t w i l l  be ne c e ssa ry  to  ensure 
t h a t  a p ro c e sso r does n o t  i n c o r r e c t ly  d e c la r e  I t s e l f  to  be fu n c t io n a l  due to 
some f a u l t  1n th e  t e s t  c i r c u i t r y .  S e c o n d ly , I t  has been assumed th a t  the 
c o n t r o l  c i r c u i t r y  In  e a c h  c e l l  1s f a u l t - f r e e .  T h is  i s  n o t a t o t a l l y  v a l i d  
a s s u m p tio n  but the p r o b a b i l i t y  o f  I t  b e in g  t ru e  can be in c re a se d  s i g n i f i c a n t l y  
by u s in g  o th e r  f a u l t  t o l e r a n c e  t e c h n i q u e s ,  f o r  e xa m ple  t r i p l i c a t i o n  w it h
v o t i n g ,  o r  b y  r e l a x i n g  the de sig n  r u l e s  f o r  c r i t i c a l  p a rt s  o f  th e  c i r c u i t .  
The assu m p tio n  t h a t  f a u l t y  p ro c e sso rs  u l l l  be d i s t r i b u t e d  e n t i r e l y  ra n d o m ly  
a c r o s s  th e  a r r a y  may n o t alw ays be v a l i d .  In  th e  case o f  a w a f e r ,  1 t  1s 
l i k e l y  t h a t  f a u l t s  w i l l  o cc u r 1n c l u s t e r s ;  1n a d d i t i o n ,  a p ro c e sso r c lo s e  to 
th e  a r r a y  edge w i l l  have a h ig h e r p r o b a b i l i t y  o f  f a i l u r e  tha n  one n e a re r  to 
th e  m id d le  o f  th e  a r r a y .  I t  may be p o s s ib le ,  h o w e v e r, to  overcom e t h i s  to  
some e x t e n t  b y ,  f o r  e x a m p le , d i s c a r d i n g  th e  o u t e r  p o r t io n  o f  th e  w a fe r.
The a lg o rith m  c o u ld  f in d  a p p lic a t io n  1n any system  w h ic h  In v o lv e s  th e  u se of 
a rra y s  o f id e n t i c a l  p ro c e s s o rs . I t  c o u ld  be used t o  enhance the r e l i a b i l i t y  
o f  a sy s te m , f o r  exam ple a D is tr ib u t e d  A r ra y  P ro c e s so r C 5 3 , o r  to  Im p r o v e  th e  
y i e l d  o f  a W a fe r S c a le  In t e g r a t e d  C i r c u i t .  The minimum c o m p le x ity  o f  the 
p ro c e sso rs  w i l l  b e  g o v e rn e d  b y  th e  a c c e p t a b le  o v e rh e a d  p re s e n te d  b y  th e  
c o n t r o l  c i r c u i t r y .  I n  a d d i t i o n  t o  I t s  use f o r  g e n e r a t in g  f u n c t i o n a l  
2 -d im e n s io n a l a r r a y s ,  the te ch n iq u e  c o u ld  a ls o  be used to  c o n s tru c t  a l i n e a r  
c h a in  o f  p ro c e s s o rs  by j o in in g  the ends o f  the f u n c t io n a l  ro w s . In  t h i s  case 
the p ro c e s s o r byp a ss  c i r c u i t r y  would n o t be  r e q u ir e d .
REFERENCES
C l3 Evans R A ,  e t  a l ,  'A  CMOS Im p le m e n ta tio n  o f  a S y s t o l i c  C o n vo lve r C h i p ' ,
Proc V L S I -8 3 ,  T ro n d h e im , N orw ay, A u g u s t 1983.
CZ3 Hockney R W and Je ssho p e  C R , 'P a r a l l e l  C o m p u te rs ', Adam H ilg e r  L t d . ,  
B r i s t o l ,  pp 1 7 8 -1 9 2 .
C33 C o rry  A G , e t  a l ,  'Im a ge  P ro c e s sin g  w it h  V L S I ' ,  P ro c  NATO AS I on Im p a ct o f 
P ro c e s sin g  T e c h n iq u e s  on Co m m u n ic a tio n s, Chateau de B o n a s, F ra n c e , J u l y  
1 1 -2 2 , 1983.
C43 Rung H T  and L e is e r s o n  C R , 'A lg o r i t h m s  f o r  V LS I S y s t e m s ',  S e c tio n  8 .3  o f 
'In t r o d u c t io n  to  V L S I S ystem s' by Mead C and Conway L ,  Addison W e s le y , 1980
C53 'INMOS P r e l im in a r y  Data Sheet IMS T 4 2 4  T r a n s p u t e r ',  INMOS L t d ,  Nov 19 8 4 .
C63 McCanny J  V and M c U h lrte r  J  G , 'Y i e l d  Enhancement o f  B i t -L e v e l  S y s t o l i c  
A rra y  Ch ip s u s in g  F a u lt  T o le ra n c e  T e c h n iq u e s ',  E le c t r o n ic s  L e t t e r s ,  7  J u l y  
19 8 3 , V o l 1 9 , No 1 4 ,  pp 5 2 5 -5 2 7 .
C73 H o ris o n  J  D , e t  a l ,  'E L L A : A Hardware D e s c r ip tio n  L a n g u a g e ', Proc IE E E , 
ICCC 8 2 , Sept 1 9 8 2 , pp 6 0 4 -6 0 7 .
3.5 WAFER SCALE INTEGRATION BASED 
ON SELF-ORGANISATION
R A Evans, J V McCanny and K W Wood
INTRODUCTION
Advances in VLSI technology have led to increasing interest in two- 
dimensional processor array architectures as a means o f implementing 
hardware systems which are required for the high speed computation of 
highly structured operations. Such applications are encountered in real-time 
digital signal and image processing and in scientific computation (Reddaway 
1979, Robinson and Moore 1982, Duff 1978, Kung and Leiserson 1980). 
Given the current trend towards wafer scale integration (wsi) (McDonald et 
a l 1984, Moore 1986), it is important to consider how such architectures 
could benefit from developments in this type of technology, particularly as 
they exhibit a number of features which are attractive from a wsi point of 
view. First, their highly regular nature should make such systems easier to 
design than ones based on random logic. Secondly, their strong dependence 
on nearest neighbour connections should help avoid problems associated 
with propagation delays on long random interconnects. Such problems 
appear to have been the major reasons why Trilogy's recent wsi venture did 
not result in the production of commercial devices (McDonald et al 1984).
A  number of techniques have now been proposed and/or demonstrated 
w hich are applicable to two-dimensional processor arrays. Broadly speaking 
these can be divided into two main classes: (a) those which required some 
form of post-processing to be done to the wafer, such as discretionary wiring 
or the use of lasers to make or break electrical connections (Raffel etal 1984, 
Petritz 1967); and (b ) those which involve reconfiguration by electronic 
switches which are usually addressed from the edge of the wafer (Hedlund 
and Snyder 1984, Katevenis and Blatt 1985).
The main advantage of the second class of techniques over the first is that 
reconfiguration can be carried out during normal circuit operation and this 
allows further faults to be avoided as they develop during use. The scope for 
doing this using post-fabrication techniques is extremely limited. However,
Copyright <S> Controller HMSO. London. 1986.
101
102 Interconnection Strategies
the use of programmable switches and their associated mesh of control buses 
(which in general must be designed to have a high probability of working) 
represents a considerable overhead in terms of silicon area and power 
consumption by comparison with techniques such as laser welding.
In this section we examine fault tolerance in two-dimensional processor 
arrays and present what we believe to be a novel solution to the problem. 
The general approach which we will describe could be classified under (b ) 
above, in that it is based on electronic switching. However, it eliminates the 
need for any global control and occurs automatically when the array is 
switched on. The technique utilises the cellular properties of arrays and is 
applicable to systems such as systolic arrays which have nearest neighbour 
interconnections only. Each cell within the array is given some intelligence 
in the form o f a small amount of additional circuitry. This enables each 
processing element independently to make local decisions about how it 
should be connected to neighbouring elements, taking into account its own 
functionality, the functionality of its neighbours and the connection priorities 
to these neighbours which are defined in specific algorithms. The effects of 
these local decisions propagate throughout the array and manifest them­
selves globally as a complete self-organisation of the functional processing 
elements into a correctly interconnected functional two-dimensional 
processor array.
A number of algorithms incorporating these basic concepts can be derived. 
For our present purposes we concentrate our attention on two related 
algorithms which illustrate the basic technique. The first scheme is the 
simpler of the two: the second method to be described is o f a more general 
nature. The relative merits of the two approaches are considered, along with 
a number of other important issues such as practical implementation, and 
ability to cope with various fault distributions. The major conclusions which 
can be drawn from the work are presented at the end.
ALGORITHM NUMBER 1
The aim of both algorithms to be described in this section is to construct an 
orthogonally interconnected two-dimensional array o f processors of the 
type show’n in figure 3.5.1(a) from an array of processing elements, some of 
which may be faulty. Figure 3.5.1 (b )  shows an example o f an array which has 
been connected in such a manner, with faulty processors each indicated by a 
cross. The interconnections in the horizontal direction have been altered in 
the vicinity o f faulty elements by allowing processors to communicate with 
their diagonal neighbours and this enables rows of functional processors to 
avoid faulty processors. It is obvious that the functional array generated in 
this way will be smaller than the corresponding perfect array due to the 
presence of the faulty elements but it should be noted that for the wiring
Wafer Scale Integration Based on Self-Organisation 103
scheme shown in figure 3.5.1(6) the x  dimension of the functional array is 
identical to that of the given array, while the y  dimension depends on the 
number o f faults which occur in each column of the array.
Figure 3.5.1 (a) Perfect array. (b ) array with faults avoided.
An array of processors containing faulty elements can be given the ability 
to organise itself into a structure similar to that shown in figure 3.5.1 (6) if the 
following assumptions are made: (i) that each processor contains some form 
o f self-testing circuitry which allows it to indicate whether it is working, and 
(ii) that all connections in the vertical direction can be designed so that they 
are fault-free and are organised so that initially, at switch-on. all processors 
are bypassed. Bypass connections around a processor arc only removed to 
allow a cell to become part of the functional array if the cell itself is functional 
and is contained within a functional row.
The method proposed for fault tolerance is assumed to occur auto­
matically when the array is switched on and is a totally asynchronous 
technique. The decisions which cells make about their connections to 
neighbours depend not only on whether a neighbour is faulty but also on the 
decisions being made by those neighbours about their own environment. In 
the early stages of self-configuration the situation may be highly dynamic 
with cells forming and relinquishing connections to other cells as a result of 
being overridden by higher priority decisions which have been made at other 
localities and have rippled through the array. Connections may in fact 
experience a number o f iterations of this type but will always settle into a 
self-consistent, stable state.
A basic cell with the required self-healing capability is shown in figure 
3.5.2. It should be noted that although the N-S connections arc similar to 
those required in a non-fault-tolerant circuit, extra channels have been 
provided on the eastern and western sides which allow the cell to communi­
cate with neighbours to the NW, W and SW. and the NE, E and SE 
respectively. Each cell also has a number o f control inputs in addition to the
104 ln  terconnection Strategies
processor communication channels. These are indicated in figure 3.5.2 as 
request (REO) and availability (avail) signals.
If a cell A outputs a TRUE request signal to some other cell B it means that 
cell A wishes to set up a communication channel between its processor and 
the processor in B. If such a communication channel becomes set up then A 
is said to have been ‘connected* to B. If a cell A outputs a true availability 
signal to another cell B it means that cell A contains a processor which is 
available for connection if requested by cell B. The manner in which these 
signals are generated by a cell forms the heart of the algorithm.
IPNW REQNW IPN REONE IPNE
Figure 3.5.2 Basic cell having self-healing capability.
Availability signals
A cell generates availability output signals according to the following rules:
(i) A cell can only output a TRUE availability signal if it contains a processor 
which is fault-free (i.c. the self-test shows it to be functional), and at 
least one TRUE availability signal has been received from itsNE. E or SE 
neighbours.
(ii) If (i) is satisfied, then the priority system of table 3.5.1 operates to 
decide in which directions to send TRUE availability signals.
Tabic 3 .5 .1
REONW
Request inputs 
RHOW REOSW AVAILNW
Availability outputs
AVAII.W AVAILSW
TRUE X X TRIE FALSE FALSE
FALSE TRUE X TRUE TRUE FALSE
FALSE FALSE TRUE TRUE TRUE TRUE
FALSE FALSE FALSE TRUE TRUE TRUE
Wafer Scale Integration Based on Self-Organisation 105
This scheme allows a priority of connections to be established so that a 
request from the NW has highest priority, and requests from the W and SW 
have successively lower priorities. Such a scheme is required in an iterative 
system to ensure that a stable solution is reached. The scheme causes cells to 
output FALSE availability signals to neighbouring cells if they have no chance 
o f obtaining a connection. This might occur for example when a higher 
priority connection has already been established.
The first rule gives the cell a global look-ahead capability even though 
each cell is capable only of local communication. This enables clustered 
faults to be avoided in the following way. Information is passed between 
cells from east to west about the availability o f other cells. This allows a cell 
A to prohibit another cell from connecting to it if A either contains a faulty 
processor or would be part of a dead-end route, i.e. a route that would not be 
able to be completed due to some blockage later. Such a dead-end route 
could occur, for example, if three vertically adjacent processors were faulty. 
In this case a functional processor to the left o f the centre faulty processor 
would find that all o f its possible connections to neighbours are unavailable. 
The functional processor would then declare itself to be unavailable. The 
scheme allows information about blockages to be transmitted from right to 
left to all the relevant processors, which then decide upon some appropriate 
avoiding action.
Request signals
Request signals are output from a cell according to a different set of rules:
(i) A  cell can only output a TRUE request signal if its processor is fault-free 
and at least one request has been received from one of its NW. W or SW 
neighbours.
(ii) If (i) is satisfied then the cell outputs a single TRUE request value to one 
o f  its NE. E and SE neighbours according to the priority tabulated in 
table 3.5.2.
These rules ensure that only one request signal is output from any cell, 
which in turn ensures that a cell can never accidentally become connected to 
more than one neighbouring cell in the easterly and westerly directions.
Table 3.5.2
Input availability Output request
AVA1LNE A VAILE AVAILSE REONE REOE REOSE
TRUE X X TRUE FALSE FALSE
FALSE TRUE X FALSE TRUE FALSE
FALSE FALSE TRUE FALSE FALSE TRUE
FALSE FALSE FALSE FALSE FALSE FALSE
106 Interconnection Strategies
The availability and request signals together provide the cells with all the 
information they need about their surroundings in order to be able to form 
functional rows of interconnected processors. The priority system for 
sending and receiving request and availability signals ensures that stable 
functional rows are established from west to east and from north to south 
starting in the top left-hand corner of the array. The priority system also 
ensures that each row formed is as close as possible to the northern edge of 
the array, thus maximising the number of rows generated.
ALGORITHM NUMBER 2
The above approach makes the basic assumption that it is always possible to 
bypass faulty cells in the vertical direction. However, when we started this 
work our basic philosophy was that the configured array should completely 
avoid all faulty processors. In this section we will show how the technique of 
algorithm 1 can be extended to encompass this philosophy by configuring 
both the row's and the columns. To see how this is done the reader is referred 
to figure 3.5.3.
Figure 3.5.3(a) illustrates a typical array which has been configured in the 
horizontal direction using algorithm 1. Functional rows have been generated 
which each contain a number of working processors equal to the width of the 
original array'. Figure 3.5.3(b )  illustrates the same array which has now been 
configured in the vertical direction using an identical algorithm operating 
vertically. Here, columns are constructed which keep as close as possible to 
the left-hand side of the array and each functional column contains cells 
equal to the height of the original array.
Since the configurations generated by the algorithm consist of full width 
rows and full height columns, the superposition of the rows and the columns 
must generate an array o f functional processors at the points of intersection 
o f each row with each column. In addition, the processors within the array 
region will have orthogonal interconnections. The superposition of the rows
Wafer Scale Integration Based on Self-Organisation 107
ano columns is shown in figure 3.5.3(c), where the processors forming the 
final array are indicated in black. Some processors have either horizontal or 
vertical connections but not both. These are controlled to act as bypasses in 
the direction in which they have connections. The size of the functional array 
is determined by the number of functional rows and columns generated. If p 
is the number of functional rows and q  the number of functional columns, 
then superposition will generate a functional array of dimensions p  x q. 
A cell suitable for use with this algorithm would have extra communication 
channels and control signal paths for both the horizontal and vertical 
directions.
There are in fact two types of undesirable condition which can occur when 
the rows and columns are superimposed. These have been called ‘double 
site* and ‘crossover’ conditions and must be handled separately if a correct 
array is to be produced. The example illustrated here contains several 
double sites.
Double Site Condition
Referring to figure 3.5.4(a) we see that there are two processors (A and B) 
belonging to the same functional row which both occur in the same functional 
column (or vice-versa). The effect of this is that there are two processor 
sites, A and B, where only one is required. The problem can be overcome by 
instructing one of the processors to act as a bypass in both the horizontal and 
vertical directions. In our implementation we always instruct the upper 
processor to become the bypass. The rule for doing this is as follows:
If a cell finds that it is requesting to be connected to a cell to its SW or SE 
for both its rows and its columns, it will become a bypass for its 
horizontal and vertical connections.
(6) I tel I
I I
Figure 3.5.4 (a) Double site condition, (ò), (c) crossovers.
108 Interconnection Strategies
At first sight it may appear that since the process of avoiding double sites 
requires functional processors to be discarded the functional array size 
might be reduced as a result. However, these processors could not have 
formed part of the functional array anyway and discarding them does not 
affect the array size o f p  x q.
Crossovers
This problem occurs when a row and a column intersect each other at a point 
other than a processor site. Crossovers can occur in two distinct ways as 
shown in figures 3 .5 .4 (b ) and (c). Unlike the double site condition which can 
be overcome without altering the configured rows and columns, the solution 
to the crossover condition requires either the row or the column containing 
the crossover to be physically altered so that the superposition crossover 
does not occur. Alteration of the row or column may of course produce 
effects which propagate throughout the array until a new stable configuration 
is achieved.
We have devised a technique which avoids the crossover condition. It 
requires the use o f an extra bit in both the horizontal and vertical directions. 
These extra bits allow the rows to be generated as before, but restrict the 
generation of columns to sites which will not cause crossovers. It can be seen 
from figure 3.5.4( b )  that the crossover could be avoided if cell 3 was made 
‘unavailable* to cell 2. In figure 3.5.4(c), the crossover could be avoided if 
cell 4 was made ‘unavailable’ to cell 1. In the first case, this can be achieved 
by using an extra bit which propagates between cells in the north-south 
direction. The bit indicates whether or not a cell is outputting a row request 
in the SE direction and if so it causes the column a v a il n e  signal in the cell 
below to be inhibited. The second crossover case may be avoided in a similar 
manner by passing an extra bit from west to east, indicating whether a cell 
has output a row request to the NE. and if so inhibiting the column a v a il n w  
in the cell to its right. The cost of this technique is two extra inputs and 
outputs plus two (A  AND NOT B) logic functions to perform the inhibitory 
action.
We have also investigated the possibility of avoiding crossovers without 
using any extra bits by exploiting features of the availability signals. This is 
in fact possible but can unfortunately become unstable for certain fault 
distributions. For this reason the technique is not described in detail here 
and would not be recommended for use in practice.
DISCUSSION
A number of important questions arise concerning the implementation and 
application of the two schemes described. In both approaches the basic
Wafer Scale Integration Based on Self-Organisation 109
assumption is made that each processor has the ability to test itself. This is 
felt to be a reasonable assumption given the current interest in the area of 
self-test and the fact that a number of chips are now available which possess 
this capability. It is also important to note that although the algorithms 
described in this paper have been presented with wafer scale integration in 
mind they are equally applicable to high availability and high reliability 
systems. For example, a major application of the techniques may be to 
circuits such as multi-processors on a hybrid or printed circuit board. The 
basic assumption of fault-free bypass circuitry in the vertical direction which 
is implicit in the first algorithm should be reasonably easy to achieve for this 
type of application.
Control circuitry suitable for use with processors with single input and 
output lines in each of the N, S, E and W directions has been designed for 
both methods. In the first case this introduces an overhead of approximately 
20 two-input gates per processsor whilst in the case of the second method the 
corresponding figure is roughly doubled. For systems in which processors 
are connected via multi-bit buses, additional circuitry is required for each 
wire in order to allow the whole bus to be routed to the appropriate 
neighbour. Generally speaking, the overheads required are relatively small 
but the approach is obviously best suited to systems in which interprocessor 
communication is by serial links. Systems built from the INMOS Transputer, 
for example, are therefore seen as ideal candidates for the methods described 
(INMOS 1984).
Both algorithms have been validated using the hardware description 
language ELLA (Morison et a ! 1982). This has allowed the techniques to be 
described and simulated at the gate level for a number of different fault 
distributions. In all cases correctly configured arrays were produced.
In a practical implementation of this technique it is possible to visualise 
the whole array as having two sections. One section comprises an underlying 
asynchronous network of control circuitry w-hich is capable of establishing 
communication channels between appropriate neighbours to generate a 
functionally orthogonal array as described. The second section is an array of 
processors containing a number of faulty elements which can be considered 
to be overlaid on the array o f control circuitry which then forms connections 
between processors as appropriate. Since the control overhead in an array is 
only a few tens of gates per processor the probability of a fault occurring in 
the control circuitry is likely to be acceptably low. However, if desired, it 
should be possible to use techniques such as triplication of the control 
circuitry in order to reduce this probability even further.
An interesting application o f  our approach is in self-timed systems such as 
wavefront array processors (Kung 1982), where the only global signals 
required are power rails. Such a circuit would consist of a collection of totally 
autonomous cells each with the capability o f forming links with its neigh­
bours to generate a functional two-dimensional array and each with the
110 Interconnection Strategies
independent capability of controlling the timing o f information between 
itself and its neighbours.
It is important to ascertain the ability of the two algorithms described to 
cope with various fault distributions. Models of the algorithms were there­
fore written in ALGOL and each was simulated for a 10 x 10 array of cells 
with random distributions of faults. A  number of different simulations were 
carried out for arrays with processor yields of between 50% and 100%. Then 
by averaging the results obtained at each yield value we were able to 
estimate the ‘overhead factor’, which for a given target array size and overall 
processor yield indicates the factor by which the number of cells in the target 
array must be multiplied in order that, on average, it will be possible to form 
the target array. These results are presented in figure 3.5.5. The shapes of 
both curves are similar with the value of the overhead factor initially rising 
slowly from unity as the cell yield drops from 100% but then rising more 
rapidly for yields less than 60% and 80% for algorithms 1 and 2 respectively. 
This is mainly due to the fact that functional cells start to become inaccessible 
below these values due to being partially or completely surrounded by faulty 
cells and this reduces the number o f  functional rows and columns which can 
be formed.
Figure 3.5.5 Algorithm performances.
Since algorithm 2 effectively uses algorithm 1 in both the row and column 
directions, one would expect the overhead factor for algorithm 2 to be the 
square of that for algorithm 1. The results of the simulations confirm this and 
clearly indicate that if bypassing o f  faulty processors is acceptable then 
algorithm 1 should be used. An alternative way o f expressing the perfor­
mance of the algorithms is to say that if the initial array has a processor yield 
o f 75% then on average a ‘harvest’ o f  60% of the working processors will be 
achieved by algorithm 1 and a 25% harvest by algorithm 2.
The discussion up to this point has neglected the important issue of inputs 
and outputs to the fault-tolerant arrays described. The user of such a circuit
Wafer Scale Integration Based on Self-Organisation 111
obviously needs to be able to apply his input signals and receive output 
signals at appropriate points at the extreme ends of functional rows and 
columns of the array. Connections between a set of input/output ports and 
the main array can in fact be made by applying the same principles of self- 
configuration which have been applied to the array itself. As is described in 
detail elsew here (Evans 1985), one can make use of the availability and 
request signals which emerge from the edges o f the array to route the inputs 
and outputs of cells to a set of pads located around the edge of the wafer. 
With this facility, the array appears as a perfect circuit to the user and he 
need not be aware that the circuit he is using actually contains a number of 
faulty processors.
CONCLUSIONS
In this section we have proposed a novel approach to achieving fault 
tolerance in any processor array using techniques based on self­
organisation. For the purposes of illustration we have concentrated our 
attention on an orthogonally interconnected two-dimensional array and 
have described two specific algorithms which can be used, each having its 
own relative merits. However, many variants of the self-organising approach 
can be developed and can be applied to a range of computational structures, 
particularly those with regular interconnections. It is hoped to present 
further discussion on the broader application of the basic concepts in the 
future.
REFERENCES
Duff M J B 1978 Review of the CLIP-» Image Processing System Proc. Nat. Comput. 
C on f 1055-1060
Evans R A 1985 A Self Organising. Fault-Tolerant. 2-Dimensional Array Proc. 
VLSI-85 (Amsterdam: North-Holïand)
Hedlund K and Snyder L 1984 Systolic Architectures—A Wafer Scale Approach 
Proc. IEEE Int. Conf. on Computers 604-610 
INMOS Ltd 1984 INMOS Preliminary Data Sheet IMS T424 Transputer 
Katevenis M G H and Blatt M G 1985 Switch Design for Soft-Configurable wst 
Systems Proc. 1985 VLSI Conf.
Kung H T and Leiserson C R 1980 Algorithms for VLSI Processor Arrays Introduction 
to VLSI Systems (Reading, Mass.: Addison-Wcsley) Ch. 8 
Kung S Y et al 1982 Wavefront Array Processor: Language Architecture and 
Applications IEEE Trans. Comput. C-31 1054 
McDonald J F, Roger E H and Rose K 1984 The Trials of Wafer Scale Integration 
IEEE Spectrum (October) 32-39
12 Interconnection Strategies
Moure W R 1984 A Review of Fault Tolerant Techniques for the Enhancement of 
Integrated Circuit Yield GECJ. Res. 2(1) 1-15
Morison J D et al 1982 ELLA: A Hardware Description Language Proc. IEEE 
ICCC 82 604-607
Petritz R I 1967 Current Status of LSI Technology IE E E  J. Solid State Circuits 2(4) 
130-147
Rafel J et al 1984 A Wafer Scale Digital Integrator Proc. IEEE Int. Conf. on Computer 
Design: VLSI in Computers (ICCD '84) 121-126
Reddaway S F 1979 The d a p  Approach Infotech State o f  the Art Report: Super­
computers 2 311-329 ed C R Jesshope and R W Hockney (Maidenhead: Infotech 
International Ltd)
Robinson I N and Moore W R 1982 A Parallel Processor Array Architecture and its 
Implementation in Silico Proc. IEEE C/CC41-45
A HIERARCHICAL TEST STRATE6Y FOR SELF-ORGANISING 
FAULT-TOLERANT ARRAYS
R A EVANS and J G McWHIRTER
INTRODUCTION
In the past, increases in the performance of electronic systems have to a 
large extent been gained as a result of improved device processing. This has 
provided higher levels of integration together with reduced device propagation 
delays. The realisation that devices are approaching fundamental performance 
l im its  and the fact that for many applications the advances are not keeping 
pace with the desire for faster processing has been fu e l l in g  interest in 
parallel architectures for a number of years and is now a major driving force 
behind the increasing interest in Wafer Scale Integration.
The main difference between single chips and wafer scale devices is that Wafer 
Scale Integration, or WSI for short, requires fault tolerance to be b ui lt  in 
as a standard procedure before devices can be fabricated; chips in general do 
not require fault tolerance although some specialised devices, such as 
memories, have incorporated fault tolerance for many years.  For this reason, 
many researchers in the WSI f ie ld , for example Chevalier and Saucier (1985), 
Raffel (1985), and Moore and Mahat (1985) have devoted their efforts to 
developing techniques by which working systems can be generated from systems 
containing faulty components. Most of the work has focussed on arrays of 
processors since their regularity allows a global pool of redundant elements 
to be employed. Each redundant element can in principle be used to replace a 
faulty cell  anywhere in the array provided that the appropriate switching 
arrangement is incorporated.
Several problems require study. F i r s t ly ,  one and two dimensional arrays 
require very different treatment. In a linear, or one dimensional a rr a y ,  
fa u lty  elements can simply be bypassed; in a two d im e n s u - ’ array, the 
network connectivity must be maintained in the presence of the faults and this 
is a more complex task. In this paper we consider only ‘ wo dir.c.iiional 
arrays. Secondly, in WSI we are dealing with the unknown in the sense that we 
do not know which parts of the circuit  are functional and which are faulty. 
It  may be that the switches themselves are faulty. We therefore need to 
investig ate  ways in which faults can be detected and tolerated, not just in 
the array i t s e l f ,  but in the configuration logic, the s e l f - te s t  circuitry, and 
any other c ir c u it ry  which might be used. In short, we need to develop a 
ver if iably functional system so that the user can be confident that his array 
w il l  configure itsel f correctly.
In th is  paper, we present a hierarchical approach t o  testing and fault 
tolerance within the array. This allows the user to v e r i f y  that the circuit
Copyright ©  Controller HMSO, London, 1986.
is working and also maximises the array yield . In the following section we 
briefly review an approach to "»-dimensional USI processor arrays in which t..e 
functional elements have the a bi li ty  to organise themselves around the faulty 
ones in order to construct a functional array. This is followed by a 
discussion of the requirements and potential problems of testin g  the array 
elements and the control circui try and a presentation of a hierarchical 
strategy which permits external testing of a l l  the control c i r c u i t r y  with the 
emphasis on v e r i f ia b i l i t y  by the user. We also show how faults in the control 
circui try and test c ir cu it ry  can be tolerated. In the f i n a l  section we 
attempt to quantify the effect of the hierarchical test strategy on the 
overall array yield.
THE 'WINNER* SELF-ORGANISING ALGORITHM
'WINNER' is an acronym for 'Wafer Integration by Nearest Neighbour Electrical 
Reconfiguration', and is an algorithm for configuring a 2-dimer.sicnal array of 
processors in the presence of faults; see Evans et al (1985).  The central 
concept of the algorithm involves distributing a small amount of control 
c ircui try throughout a 2-dimensional array of processing elements such that 
each processor has an identical extra c irc ui t associated with i t .  The extra
IPNW REQNW IFN RECNE IFNE
Figure 1. Connectivity of Self-Organising C e l l .
c ircui try gives the processors the a bi li ty  to decide how they should be 
connected to their neighbouring processors based on a knowledge of their 
neighbours' funct ional ity, and a v a i l a b i l i t y .  The i n t e r a c t io n  between 
processors occurs lo ca lly,  with processors communicating on ly  with nearest 
neighbours and this results in a complete self-organisation of the array into 
a functional array.
A c e l l  w it h  Interconnections suitable  for generating orthogonally  
Interconnected arrays of processors 1s Illustrated 1n figure 1. I t  can be 
seen that 1n add ition  to the North, South, East and West connections 
normally required for an orthogonally Interconnected array, the cell  has 
North-West, North-East, South-West and South-East connections. These Increase 
the connectivity of the ce l l  and allow f a u lt y  c e l ls  to be avoided. 
Furthermore, there are connections prefixed by REQ and AVAIL. These are 
single bit  signals which allow the control circuits 1n adjacent c e l l s  to 
Interact with each other, 'he manner 1n which these Interactions take place 
forms the heart of the WINNER self-organising algorithm, and 1s presented In 
table 1 1n the form of a truth table giving the cell  AVAILablllty outputs 
corresponding to REQuest Inputs, and vice-versa.
Table 1. Generation of Avai labili ty  and Request Signals
Cell must be functional,  and at least one REQuest or AVAILabillty 
Input must be TRUE, otherwise REQuest and AVAlLability 
outputs become FALSE. (X * DON'T CARE)
R E Q U E S T IN P U T S A V A I L A B I L I T Y  O U TP U TS
REQNW REQU REQSU Ì AVAILN W AV A ILW A V A IL S U
TR U E X X TRUE F A LSE F A LSE
F A LSE TR U E X TRUE TRUE F A LS E
F A LS E F A LS E TR U E  I TRUE TRUE TR U E
F A LS E F A LS E F A L S E  I TRUE TR U E TR U E
AVAILABILITY INPUTS REQUEST OUTPUTS
AVAILNE AVAILE AVAILSE I REQNE REQE REQSE
TRUE X X TRUE FALSE FALSE
FALSE TRUE X FALSE TRUE FALSE
FALSE FALSE TRUE I FALSE FALSE TRUE
FALSE FALSE FALSE I FALSE FALSE FALSE
FOR BOUNDARY INPUTS: HORIZONTAL INPUTS = TRUE
DIAGONAL INPUTS = FALSE
The local decisions made by the cells within the array occur simultaneously. 
In the early stages of self-organisation the situation may be highly  dynamic 
with cells forming and relinquishing connections to other cells as a result 
of being overridden by higher p r io r i ty  decisions which have been made at other 
lo c a l i t i e s  and have propagated through the array. However, although 
connections may experience a number of Iterations of this t y p e ,  stable 
functional rows of interconnected processors are formed, one by one, starting 
at the top of the array. Once formed, these rows are no longer a ffec ted  by 
the a c t i v i t y  of the cells 1n the lower parts of the array. In effect , a 
'boundary' moves through the array from top to bottom, above which stable rows
exist,  and below which stable rows have yet to be formed. When the boundary 
passes out of the bottom of the array, a l l  possible functional rows will have 
been formed. This process is always completed within a fixed number of steps 
-  approximately 2N, where N is the dimension of the array. The p r i o r i t i e s  
for REQuest and AVAILability which are given in table 1 ensure that no 
unresolvable contention problems can occur.
The interactions of the REQuest and AVAILability signals described above 
generate functional rows of interconnected processors spanning the e n t i r e  
width of the array. These rows have one processor in each column of the a rray. 
To form a functional array the rows can be connected together in the vertical 
direction by making connections between a ll  the cells in a given column, but 
bypassing faulty cells and cells which, although functional, are not part of a 
functional row. An array which has been configured in this way is i l lu strateo 
in figure 2.
Figure 2. Example of a Configured Array.
Simulations to estimate the performance of the algorithm have been carried out 
and the results are plotted in figure 3. The graph shows a number of curves, 
each representing a constant value of array yield. These show how the 
array yield varies with both redundancy overhead and processor yield and 
indicates that processor yields of 60% or greater are likely to be required in 
a practical system.
HIERARCHICAL TEST STRATE6Y
From the manufacturers point of vie w,  the ideal WSI system would be one in 
which the arra y elements are capable of performing a f u l l  s e l f - t e s t ,  and 
c o n f i g u r i n g  themselves autom at ica lly int o a functional array with 100% 
r e l i a b i l i t y  and without any external assistance. In the real  world Suc h a
system can only be a dream since we cannot rely on any part of the circuitry 
on the wafer to perform its  predefined task correctly. From a practical point 
of view this means that the manufacturer must perform at least a small test on 
some part of the c ir c u i t r y .  The tested circuitry  can then be relied upon and 
used in  further tests of the wafer. The challenge is to develop a strategy 
which requires a small amount of c ir cu it ry  to be externally tested, and to be 
able to carry out the test in a simple manner.
Figure 3. Performance of the WINNER Algorithm.
Although essential to the self-organising algorithm, the control circui try can 
cause serious degradation in the overall  array yield for the following reason. 
The self-organising array consists of a number of processing elements together 
with their  associated control c ircuits . Provided that sufficient spare rows 
have been used, the configuration algorithm can potentially configure a 
functional array even i f  many of the processors are faulty. However, for an 
array to be sure of working, a ll  the control c ircuits must work, and although 
each is very simple, the total amount of control c ircui try in the array is 
quite large. It  is easy to see that in a large array, the array yield 
achieved is like ly to be dominated by the possibly poor yield of the array of 
control circuits rather than the configured array of processors. The test and 
fault tolerant hierarchy to be described reduces this problem to a more 
acceptable leve'l.
The testin g strategy operates at several levels. The highest level is the 
se lf - te st  performed automatically by the elements of the array. This is 
assumed to be based on the signature analysis approach. The other levels of 
test are a ll  performed by an external source and allow testing of the control 
circuitry, the signature analyser comparator and the scan paths which are used 
to test the control c ircuits.  In addition, faults in the control circuits and 
comparator can be toleratea by masking out the circuits which are faulty so 
that they are ignored by the other elements within the array. Faults in the 
scan paths can also be tolerated to some extent as will  be described.
Control Circuit Tests
In considering the self-organising algorithm described in section 2, i t  can 
be seen that a fau^t in a control c ircuit  could be disastrous. For example, 
an AVAILability output in a cell  containing a faulty processor could be 
stuck-at-1, incorrectly  indicating that the cell  is available for use. This 
could cause the faulty  processor to be inadvertently configured into the array 
and would result in the array being non-functional. For this reason, an 
external test of the control c ir c u it ry  is essential.
The control c i r c u i t r y  associated with each element of the array is a purely 
combinational c i r c u i t  and as such can be tested easily i f  i t  can be accessed 
from an external source. The required access can be provided by including a 
scan path between each column of processors as illustrated in figure 4, in 
which each dot represents a group of scan path registers in the AVAILabi lity,  
REQuest and signal paths. A single scan register is il lustrated in figure 5 
and i ts  function for different values of A and B is shown in table 2 (the AND 
gate should be ignored for the moment). By applying the appropriate 
combination of control signals at the A, B and clock inputs, test patterns can 
be loaded se r ia l ly  into the scan paths from the external tester, (set A«0, 
B=0>, applied in  parallel to a l l  the control c ircui ts (A*1, B*0), and the 
outputs clocked out ser ial ly  for checking (A=0, B~0). This procedure allows 
a ll  control c i r c u i t  faults to be detected with a small number of test
C h e c k e r
Figure 4. Schematic of the Scan Path Testing Approach.
Table 2. Scan Path Function.
p a t t e r n * .  T o  a l l o w  t h e  c o n t r o l  c i r c u i t r y  t o  o p e r a t e  n o r m a l l y ,  A  a n d  B  a r e  b o t h
s e t  t o  1 t o  a l l o w  s i g n a l s  t o  p a s s  s t r a i g h t  t h r o u g h  t h e  s c a n  r e g i s t e r .
A B Scan Function
0 0 S e r ia l - lo a d  s h i f t  r egist er
0 1 S er ia l  te s t of s tr a ig h t  
through connection
1 0 P a r a l l e l - l o a d  s h i f t  re gist er
1 1 Acti vat e s t r a i g h t  through path
Having detected a faulty control c ircuit  1t 1s desirable to be able to 
tolerate 1t rather than discard the whole wafer because of a small f a u l t .  
From table 1, i t  can be seen that a cell  receiving a FALSE AVAILability signal 
from i t s  neighbour cannot output a REQuest to that neighbour. Furthermore, a 
cell receiving a FALSE REQuest Input from a neighbour 1s not Inflenced at all  
by that neighbour. As a result,  1f REQuest and AVAILablllty outputs of a cell 
containing a faulty control c irc ui t could be forced to be FALSE, 1t wculd 
effective ly mask the fa ult y  cell  out of the array. This can be achieved by 
using a modified scan path register 1n which the link between the upper and 
lower multiplexers is replaced by an AND gate as shown dotted 1n figure 6. 
During the test phase, the register operates in exactly the same way as an 
ordinary scan path register , but 1n the operational mode (A«1,  B=1), the 
output signal can be controlled either to follow the input signal or to output 
a permanent FALSE value; th is  is achieved by preloading the register from the 
external tester with a 1 or  0 level repectively.
C K  DATA A  B
. Modified Scan Register fo r To le ra ti n g  
Control  C i r c u i t  Faults.
Figure 5
The component! within the scan path registers themselves can be thoroughly 
tested by the external tester by simple test patterns. This ensures that they 
can be relied  upon to force REQuest and AVAIlability signals to FALSE when 
controlled to do so.
Signature Comparator Test
A potential problem with processors which have on-board self-test is that even 
i f  the se lf - te st  result shows the processor to be functional, there is s t i l l  
the possib ili ty that the signature comparator is at fault .  This problem can 
be overcome by testing the comparator from an external source using the scan 
paths registers descibed above. We assume that the signature of the self-test 
is formed in a register of some kind and can be clocked seria lly into the 
comparator. To test the comparator, we need to temporarily break the serial 
connection to allow test patterns to be injected into the comparator. The 
test pattern required w i l l  depend upon the structure of the comparator, but in 
principle i t  is possible to carry out a fu l l  test which checks that a pass 
indication is only delivered by the comparator when the correct signature is 
applied. The results of the test can be monitored via the scan paths. The 
comparator test could be carried out at the same time as the test on the 
control c i r c u i t r y ,  anc as with the control c irc ui t tes t,  i f  a comparator is 
found to be faulty , the outputs of the cell containing the comparator are 
forced to zero before configuration takes place.
Seen
Column l Vain l  2. mux Column!
So far we have shown how to f u l ly  test the scan paths, the control circuits 
and the comparator. However, the amount of c ircui try required in the scan 
paths is not insignificant and their  yield may therefore be less than desired. 
This can be improved by noting that the scan paths can be duplicated to 
introduce fault tolerance into the paths. Instead of placing a single scan 
path between each column of cells in the array, two scan paths are used, as 
shown in figure 6. After the two scan paths have been externally tested, one 
of the two scan paths can be selected for use by appropriately controlling a 
column of multiplexers associated with each pair of scan paths.
EFFECT ON ARRAY YIELD
In order to assess the benefits of introducing the proposed testing strategy 
into the array we need to consider its effect on the overall  yield of the 
array. This is not an easy task because of the d if f ic u l ty  of making realistic 
estimates of yields of individual components. However in order to make a 
r e s o n a b l e  comparison we have made the f o l lo w i n g  assumptions:
The probabili ty that a single gate works is estimated as follows. We assume 
that each processor has an independent probability of working of 0.65, and so 
the probabi l i t y  that a single gate works is therefore Yg = 0.651 ^ 0,000  ^ $ 
figure can now be usee when estimating the yields of control c i r c u i t s ,  etc .
In a 10 by 10 target array operating with 100% redundant elements, there are 
200 c e l l s .  In the absence of any technique to tolerate faults in the control 
c ir cu it  or comparator, the yield of the control/ccmparatcr array would be 
Y 200.80 -  o  5_ T h i s  m e a n s  th a t  hnuev/er o n o d  the c o n f i o u r a t i o n  a l a o r i t h m
The inclusion of the scan paths in the array alters this situation, since now 
the control/comparator c ircui ts do not a l l  have to work. The yield of the 
con tro l circuits is now reflected in the processor yield which is slightly 
reduced from 0.65 to about 0.648. However, a l l  the scan paths must work if 
the array is to have any chance of configuring correctly. The total scan path 
c ir c u i t r y  in an array with 200 cells is about 8400 gates. The proba bili ty 
that the whole array of scan paths is functional is therefore 0.7, which is 
better than the figure of 0.5 for the control c ir c u it ry  alone but not really 
acceptable. A significant improvement in scan path yield can be achieved 
however by placing two scan paths instead of one in between each column of 
cell s . The probability that a single column of scan path registers works is 
about 0.965 and that of one column scan path out of two being functional is 
0 .9 98. The figure for each column of multiplexers required to select the 
functional scan path is about 0.993. When a l l  the columns are considered, the 
probability of the array of scan paths working becomes 0.92, which is a great 
improvement on the figure of 0.5 for the control c ircuitry  alone.
conclusion:
We have described a h ie ra rch ica l approach to testing a wafer scale, 
two-dimensional array which is to be configured using the WINNER algorithm. 
The technique also includes the a bi li ty  to tolerate faults in a very simple 
manner at a number of levels within the c irc u i t  including the processors, the
Processor Complexity 
Processor Yield 
Target Array Size 
Redundancy Overhead 
Control Circuit Complexity 
Comparator Complexity 
Scan Path Complexity
10,000 gates 
65%
10 by 10
100%
30 gates/cell 
50 gates/cell 
840 gates/column
s e l f - t e s t  signature  comparator, the control c i r c u i t r y ,  and scan path 
registers. We have shown that the use of the combination of fault tolerant  
techniques achieves a much improved probability of an array being functional. 
In addition, the user can be much more confident that his system is f u l l y  
functional  than he could before, because he can now ver ify by simple tests 
that each section of the configuration and test logic is functional.
REFERENCES
Chevalier G and Saucier G, 'A Programmable Switch for Fault Tolerant USI 
of Processor Arrays' Proc WSI Workshop, Southampton, UK, Ju ly  1985.
Evans R A, McCanny J V and Wood K, 'Wafer Scale Integration Based on 
Self-Organisation ',  ibid.
Moore W R and Mahat R, 'Fault Tolerant Communications for WSI of a 
Processor Array ' Microelectronics and R e l ia b i l i ty ,  25, 2, pp291-294.
Raffel J I ,  'The RVLSI Approach to WSI', Proc WSI Workshop, Southampton, UK, 
July 1985.
T H E  B R IT I S H  L I B R A R Y  DOCUMENT SUPPLY CENTRE
Attention is drawn to the fact that the copyright of 
this thesis rests with its author.
This copy of the thesis has been supplied on condition 
that anyone who consults it is understood to recognise 
that its copyright rests with its author and that no 
information derived from it may be published without 
the author’s prior written consent.
TITLE
Self-organising Techniques 
for Tolerating Faults in 
2-Dimensional Processor Arrays
AUTHOR Richard Anthony Evans
INSTITUTION  
and DATE University of Warwick /
T H E  B R ITIS H  LIBR A R Y
D O C U M EN T SU PPLY CEN TR E 
Boston Spa, W etherby 20
United Kingdom R E D U C T IO N  X
«C  ftfAtP A
