Self test and self repair strategies in VLSI architectures for high speed digital correlation by Blackley, William Sinclair
SELF TEST AND SELF REPAIR STRATEGIES 
IN VLSI ARCHITECTURES FOR HIGH SPEED 
DIGITAL CORRELATION 
William Sinclair Blackley 
A Thesis Submitted to the Faculty of Science, 
University of Edinburgh, for the degree of 
Doctor of Philosophy. 
Department of Electrical Engineering 
1985 
I 
AB S T R A C T 
In this thesis, the concepts of self test and self 
repair are applied to a VLSI architecture for digital 
polarity correlation. A prototype correlator chip has 
successfully demonstrated the value of regular array 
architectures with built-in self test and self repair in 
the implementation of large area silicon systems. 
The polarity correlation function is implemented 
using an overloading integrating counter technique. This 
technique permits direct cascading of individual correla-
tor chips, without using additional components, to give 
complete flexibility in choice of correlator delay and 
resolution. Regularity, and a concerted strategy of design 
for testability in the chip's architecture, allow the 
correlator to perform self test and self repair in an 
economic and efficient manner. The built-in self test and 
self repair mechanisms automatically detect and eliminate 
failed channels in the VLSI circuit. 
A review of correlation techniques in VLSI, and the 
concepts of fault tolerance and yield enhancement are 
presented. The correlator has been fabricated on a five-
micron N-channel MOS process and results from the proto-
type chips are reported. 
DECLARATION OF ORIGINALITY 
This thesis, composed entirely by myself, reports on 
work conducted by myself in the Department of Electrical 













All. 	 AM 
••- 	 rit 	t 
Photograph of Eu349 Digital Polarity Correlator Chip 
Table of Contents 
GLOSSARY 	 . 	xi 
CHAPTER 1: INTRODUCTION 	 1 
1.1. 	VLSI: Linking Design and Test .................1 
1.2. 	Layout of Thesis ................................3 
CHAPTER 2: 	CORRELATION THEORY AND TECHNIQUES 4 
2.1. Introduction 	.................................. 4 
2.2. Interpreting the Correlation Function ......... 6 
2.3. Historical 	Development 	........................ 12 
2.4. Correlation 	Principles 	......................... 13 
2.4.1. Random 	Data 	Concepts 	........................ 13 
2.4.2. Fundamental Estimation Errors 	............... 15 
2.4.3. Discrete Time 	Correlation 	................... 17 
2.5. Correlation 	Techniques 	........................ 19 
2..5.1. Quantisation of 	Input Data 	.................. 20 
2.5.2. Direct Analogue Correlation 	................. 22 
2.5.3. Stieltjes 	Correlation 	....................... 23 
2.5.4. Relay 	Correlation 	............................ 24 
2.5.5. Multilevel 	Digital 	Correlation 	.............. 25 
2.5.6. Polarity-Coincidence Correlation 	............ 27 
2.5.7. Modified Correlators: 	Dither 	................ 28 
2.6. Polarity Correlation and the Overloading 
Integrating Counter 	Technique 	....................... 30 
 -v -  - 
2.6.1. Polarity Correlation 	 30 
2.6.2. 	Overloading Counter Technique ...............34 
2.7. 	Summary .......................................38 
CHAPTER 3: 	INTEGRATED CIRCUIT CORRELATORS .......... 40 
3.1. 	Introduction 	.................................. 40 
3.2. 	Correlation Architectures 	..................... 41 
3.2.1. 	Serial 	Architecture 	......................... 41 
3.2.2. 	Parallel 	Architecture 	....................... 43 
3.2.2.1. 	Parallel architecture with temporal 
integration and 	spatial delay 	....................... 43 
3.2.2.2. 	Parallel architecture with temporal 
integration and spatial delay using the overload- 
ing 	counter 	technique 	............................... 47 
3.2.2.3. 	Parallel architecture with spatial 
integration and temporal 	delay 	....................... 50 
3.2.2.4. 	Parallel architecture with spatial 
integration and temporal delay using pipe-organ 
structures 	.......................................... 56 
3.2.3. 	Serial Parallel Architectures (DELTIC) 58 
3.2.4. 	Systolic 	Architectures 	...................... 60 
3.2.4.1. 	Temporal 	Integration 	...................... 60 
3.2.4.2. 	Spatial 	Integration 	....................... 62 
3.3. 	Correlation Cube: The Difference Between 
Temporal 	and Spatial 	Integration 	.................... 67 
3.3.1. 	Correlator Architecture based on Spatial 
Integration 	......................................... 69 
3.3.2. 	Correlation Architecture based on Temporal 
Integration 	......................................... 72 
3.3.3. 	Display of 	Correlation Output 	............... 73 
3.4. 	Summary 	....................................... 74 
- vi - 
CHAPTER 4: VLSI DESIGN STRATEGIES FOR TESTABILITY 
AND 	FAULT 	TOLERANCE 	................................. 76 
4.1. Introduction 	.................................. 76 
4.2. Test Philosophies and The Motivation Behind 
Design 	for 	Testability 	............................... 78 
4.3. Design for Testability Methods 	................ 81 
4.3.1. Objectives 	.................................... 81 
4.3.2. Ad 	Hoc 	Methods 	.............................. 82 
4.3.3. Scan 	Methods 	................................ 83 
4.3.4. Built-In 	Self 	Test Methods 	.................. 87 
4.4. VLSI Design for Testability in the Eu349 
Correlator 	Chip 	..................................... 91 
4.5. Integrated Circuit Yield Statistics 	........... 94 
4 .5.1. Scope 	....................................... 94 
4.5.2. Yield Loss due to Gross Defects 	............. 96 
4.5.3. Yield Model for Random Defects 	.............. 98 
4.5.4. General Yield Model for VLSI Chips with No 
Redundancy 	.......................................... 102 
4.5.5. Yield Model for Chips with Redundancy ....... 104 
4.5.6. Cost 	of 	Redundancy 	.......................... 110 
4.6. Yield Enhancement Techniques 	.................. 112 
4.6.1. Scope 	....................................... 112 
4.6.2. Integrated Circuit Redundancy Schemes 	....... 113 
4.6.2.1. Bypass 	schemes 	............................ 114 
4.6.2.2. Nearest neighbour 	schemes 	................. 116 
4.6.2.3. Chaining 	schemes 	.......................... 116 
4.6.3. Comparison of Redundancy Schemes 	............ 117 
4.7. Yield Enhancement Features in the Eu349 
Correlator Chip 	..................................... 119 
- vii - 
4.8. Summary 	 . 121 
CHAPTER 5: DESIGN AND TEST OF THE PROTOTYPE 
INTEGRATED CIRCUIT ..................................123 
5.1. Introduction 	.................................. 123 
5.2. Architecture of the Basic Polarity Correla- 
tor................................................. 123 
5.3. Architecture of the Correlator with Built-In 
Self 	Test 	and 	Self 	Repair 	Features 	.................. 127 
5.4. Design of 	the Eu349 	Correlator 	................ 131 
5.4.1. System 	Overview 	............................. 131 
5.4.2. Correlator Array 	Design 	..................... 137 
5.4.3. Peripheral 	Circuit Design 	................... 144 
5.5. Test 	Strategy 	................................. 144 
5.5.1. Initial 	Test 	................................ 144 
5.5.2. Self 	test 	................................... 145 
5.5.3. Self 	repair 	................................. 146 
5 .5.4. Run 	......................................... 146 
5.6. Test System Configuration and Results 	......... 147 
5.6.1. Test 	Configuration 	.......................... 147 
5.6.2. Test 	Results 	................................ 147 
5.6.3. Yield 	Enhancement 	........................... 159 
5.7. Summary 	....................................... 161 
CHAPTER 6: 	CONCLUSIONS .............................162 
6.1. 	Summary of Work ...............................162 
6.2. 	Further Work ..................................164 
- viii - 
ACKNOWLEDGEMENTS 	 . 167 
APPENDIX 1. EU349 CORRELATOR DESIGN 	 168 
A1.1. 	Introduction .................................168 
A1.2. 	Silicon Design ...............................169 
A1.3. 	Peripheral Circuitry Design ..................172 
A1.4. 	Power Supply Considerations ..................180 
APPENDIX 2. EU349 	TEST 	SCHEDULE 	.................... 181 
A2.1. Introduction 	.................................... 181 
A2.2. Test MCR 	as 	Shift 	Register 	.................... 181 
A2.3. Test OSR and DSR as Shift Registers 	.......... 182 
A2.4. Test MCR Effect on both OSP and DSP 	.......... 182 
A2.5. Test SET and CLEAR Features of DSP 	........... 183 
A2.6. Test Latches and MCR Parallel Load 	........... 183 
A2.7. Test Latches and OSR Parallel Load 	........... 184 
A2.8. Self Test 	Sequence 	........................... 185 
A2.9. Self Repair 	Sequence 	......................... 187 
A2.10. Correlation Test 	Sequence 	................... 187 
APPENDIX 3. EU349 TEST CONFIGURATION ...............190 
A3.1. 	Introduction .................................190 
A3.2. 	Prototype Test Configuration .................190 
A3.3. 	DAS Data Probes ..............................191 
A3.3.1. 	91A32 Data Acquisition Module ..............192 
A3.3.2. 	91P16 Pattern Generator Module .............192 
A3.3.3. 	91P32 Pattern Generator Module .............193 
- ix - 
A3.4. Channel Specification 	 . 
A3.5. Timing 	Diagram 	............................... 
A3.6. Trigger Specification 	........................ 
A3.7. Pattern Generator 	- 	Timing 	................... 
A3.8. Pattern Generator Instruction Codes 	.......... 
A3.9. Pattern Generator 	- 	Program 	.................. 









- x - 
GLOSSARY 
Symbol 	 Description 
Clustering parameter 
A Defect susceptible chip area 
A0 Chip area without redundancy 
AE Chip area with redundancy 
Am Module area 
B Bandwidth 	- 
BILBO Built-In Logic Block Observer 
BIST Built-In Self Test 
CCD Charge Coupled Device 
CLX Clock 
CMOS Complementary Metal-Oxide-Semiconductor 
CRC Cyclic Redundancy Check 
At Sample interval 
D Average defect density 
DAS Digital Analysis System 
DELTIC Delay Line Time Compressor Correlator 
DIL Dual-In-Line 
DSR Data Shift Register 
DUT Device Under Test 
ECL Emitter Coupled Logic 
EXNOR Exclusive-NOR function 
EXOR Exclusive-OR function 
Coincidence function 
FM Figure of merit 
F{x(t)} Fourier transform of x(t) 
Cross power spectrum of y(t) and x(t) 
GND Ground 
'sat Value of drain current at saturation 
- xl - 
Symbol 	 Description 
III, Integrated Injection Logic 
k Sequence index variable 
k 	- Process gain factor 
A Average number of faults (A=AD) 
L MOS transistor gate length 
LFSR Linear Feedback Shift Register 
LSB Least Significant Bit 
LSSD Level Sensitive Scan Design 
in Number of input sample pairs 
MCR Multiplexer Control Register 
MISR Multiple Input Signature Register 
MOS Metal-Oxide-Semiconductor 
MUX Multiplexer 
N Capacity of each integrating counter 
NMOS N-channel Metal-Oxide-Semiconductor 
OD Overload detect 
OSR Overload Shift Register 
OVRFLO Integrating counter overload signal 
pl Clock phase 1 
p2 Clock phase 2 
PCM Pulse Code Modulation 
PMOS P - channel Metal - Oxide - Semiconductor 
PRBS Pseudo-Random Binary Sequence 
q Quantisation interval 
q(T) Contents of integrating counter 
rdyx (r) Direct digital correlation function 
rnxx (t) Normalised autocorrelation function of x(t) 
rpyx (T) Polarity correlation function 
rryx (T) Relay correlation function 
rsyx (T) Stieltjes correlation function 
rxx (T) Autocorrelation function of x(t) 
rxxx (ti , T2) Triple correlation function 
ryx (T) Crosscorrelation function of y(t) and x(t) 
- xii - 
Symbol 	 Description 
RMS Root Mean Square 
Mean square value of x 
o Standard deviation 
S Spectrum of x 
SAW Surface Acoustic Wave 
SGN Signum, i.e. polarity 
Time delay 
TB Time-bandwidth product 
Vds MOS transistor drain to source voltage 
Vg5 MOS transistor gate to source voltage 
'th MOS transistor threshold voltage 
VBB MOS back bias voltage supply 
VDD MOS transistor drain voltage supply 
VIN Input voltage 
VLF Very Low Frequency 
VLSI Very Large Scale Integration 
VSS MOS transistor source voltage supply 
W MOS transistor gate width 
W/L MOS gate aspect ratio 
x+ Quantised version of x 
Y Preassembly yield 
CRD Correctable random defect yield 
Enhanced yield 
Yield due to gross defects 
Y  Yield due to random defects 
UNc Uncorrectable random defect yield 
''eff Effective yield 
Module yield 
- xiii - 
CHAPTER 1 
INTRODUCTION 
1.1. VLSI: Linking Design and Test 
The maturing of silicon integrated circuit technology 
from large scale to very large scale integration, has 
improved performance, reduced costs and opened new systems 
applications. However, one important facet of integrated 
circuit technology lags dangerously behind the complexity 
potential of VLSI: establishing the integrity of the VLSI 
design in terms of initial design validation, manufactur-
ing quality, and fault tolerance (1). 
This thesis addresses the need to embody a testabil-
ity scheme within the VLSI integrated circuit itself. It 
presents details of a digital polarity correlator archi-
tecture with built-in self test and self-repair mechan-
isms. Results obtained from a prototype integrated cir-
cuit chip fabricated in five-micron enhancement/depletion 
N-channel MOS technology demonstrate the concept. 
Correlation techniques are widely used in communica-
tions, instrumentation, telemetry, sonar, radar, and in 
medical diagnosis. Important correlation properties 
include the ability to detect a desired signal in the 
presence of noise or other signals, to recognise specific 
patterns, and to determine time delays through various 
media. Electronic systems for computation of the correla-
tion function have been available for many years, but they 
have been large and inefficient. With the development of 
VLSI, correlation can be performed efficiently and with 
-2- 
fewer components. 
The integrated circuit to be described here, offers a 
digital implementation of the polarity correlation func-
tion using an overloading integrating counter technique 
[.2]. The VLSI architecture offers high speed operation, 
long (programmable) integration time, and an arbitrarily 
long correlation time delay. The mathematical theory of 
correlation analysis, including the effects of finite sam-
pling and quantisation is presented as a prerequisite to 
deriving the overloading integrating counter technique. 
From this theoretical base the architecture of the corre-
lator chip is described. 
The architecture consists of a linear cascade of 
identical correlation elements. The performance of the 
correlator depends on the serial connection of correctly 
functioning correlation elements. To optimise the perfor-
mance and gain full advantage of the VLSI architecture, a 
design philosophy was adopted which includes design for 
testability, self test, and fault tolerance. 
The question "why design for testability?" 	is 
answered by discussing some existing test philosophies. 
Various approaches exist, and each has its specific appli-
cations, but there is no general agreement on how to 
design for testability. The thesis examines the "ad hoc" 
testability approach, which consists of circuit partition-
ing and added test points. This is contrasted with the 
"structured" testability approach, where the test problem 
is solved at a much lower design level. The object of a 
structured approach is to reduce the sequential complexity 
of a logic network and thus aid test generation and verif-
ication. 
Built-in test and self test techniques are also dis-
cussed. 	Built-in test techniques, 	when used in 
conjunction with redundant circuitry and reconfiguration 
techniques in VLSI, provide the basis of self repairing 
systems. The ease with which built-in self test and self 
repair techniques have been employed in the VLSI architec-
ture to be described here, is demonstrated by the very low 
overhead required in silicon area. 
1.2. Layout of Thesis 
In Chapter 2, a concise background and theory of 
correlation is presented. The effect of finite averaging 
time, discrete sampling, and quantisation of input data 
are discussed. Quantisation of the input data is used to 
link direct correlation to relay correlation, and to 
polarity correlation. The overloading integrating counter 
technique is then derived for the polarity correlator. 
A review of silicon correlators is presented in 
Chapter 3. In this chapter a comparison is made between 
the various published architectures (including the one 
expounded by this thesis), that have been realised as sil-
icon integrated circuits. In Chapter 4, the concept of 
design for testability is discussed, and the subject of 
integrated circuit yield statistics is introduced. Cir-
cuit redundancy is discussed as a method for achieving 
yield enhancement and fault tolerance. 
The integrated circuit design is described in Chapter 
5. The architecture of the basic correlator is shown 
modified to allow built-in self test and self repair. The 
performance of the prototype chip and the experimental 
results of the self repair concept are also presented in 
Chapter 5. 
Chapter 6 summarises and highlights observations from 
the work. In addition, areas of special interest that may 
be considered for further investigation are identified. 
-4- 
CHAPTER 2 
CORRELATION THEORY AND TECHNIQUES 
2.1. Introduction 
Correlation analysis is of great interest to 
engineers and scientists. A wide range of engineering 
applications of random data analysis centres around the 
determination of linear relationships between two or more 
sets of data. These linear relationships may be extracted 
in terms of a correlation function [3,4]. Correlation 
techniques are widely used in communications [5], sonar 
[6], radar [7,8,9], and medicine [10,11,12], where they 
are used to detect known signals in the presence of noise 
or other signals [13,14]. They have application in many 
areas such as spectral estimation [15), time response 
measurements of linear systems [16,17,18,19], pattern 
recognition [20,21,22,23,24], and time delay estimation 
[25,26,27] including flow measurement [28,29,30,31,32,33]. 
The bandwidths of the signals to be correlated vary from 
several Hz. in seismology and very low frequency (vif) 
radio wave studies [34], to several MHz. in photon spec-
troscopy [35,36], radio astronomy [37,38], or plasma phy-
sics experiments [39,40,41], for example. Other fields in 
which correlators are useful include flaw detection and 
system health monitoring [42,43]. 
This chapter deals with the historical development of 
correlator systems, and the mathematical theory of corre-
lation analysis. A brief summary of random data concepts 
is included, on account of the statistical nature of 
correlations. The concept of the ideal correlation 
-5- 
coefficient, which is computed over an infinite number of 
data sets, is related to the correlation function, which 
is computed over a single data set for a finite length of 
time. This relationship is crucial to the physical reali-
sation of a correlation system. 
A correlator consists of three basic elements: a 
delaying device, a multiplier, and an averager or integra-
tor as shown in Figure 2.1. 
Signal 1 p 	( X )_ INTEGRATOR  F-o Output 
Signal 2 0-H DELAY 
Figure 2.1. Basic elements of a correlator. 
Direct implementation of the correlation function imposes 
a large processing cost. Consequently, considerable 
effort has been expended to devise approximations that 
will reduce the cost involved. Significant reductions are 
achieved when signals are converted to the sampled-data 
form, and the analogue integration process is replaced by 
one of summation. Further reductions follow when quan-
tised signal representations are used. This chapter 
discusses the various forms of correlator which arise from 
the use of quantisation, of a varied degree, and the use 
of "dither' signals. In addition, the inevitable process-
ing errors, which result from the necessary approximations 
to the ideal correlation coefficient, are examined. 
Firstly however, the interpretation of the correlation 
function shall be studied. 
2.2. Interpreting the Correlation Function 
Correlation functions may be divided into two 
categories: autocorrelation and crosscorre].atjon. The 
autocorrelation function r(r) of the time function x(t) 
is defined as 
T 
r(T) = urn 1 fx(t)x(t-t)dt 	 2.1 
T.00 TO 
where t is a continuous time delay parameter. 	Autocorre- 
lation represents a comparison of an input signal with a 
time delayed replica of itself. The autocorrelation func -.-
tion.. can yield useful information about the signal x(t). 
For example, the value of the autocorrelation function at 
zero delay, is simply the mean square value o of the 
signal, that is, 
r(0 )= o 	 2.2 
In addition, if the signal contains periodic components, 
then the resulting autocorrelation function will also 
exhibit periodic components. This feature is useful in 
recovering periodic signals buried in noise or other 
interference [ 1 3]. Other special properties of the auto-
correlation function are 
Ir xx ( -r)I < r( 0 ) 	 2.3 
and 
r(t) = r( -T) 	 2.4 
A typical autocorrelation function is illustrated in Fig-
ure 2.2. 
-7- 
Figure 2.2. Autocorrelation function of a zero-mean sig-
nal. 
The correlation between two signals x(t) and y(t) is 
given by the crosscorrelation function 
T 
yx 	= urn 	Jy(t)x(t-t)dt 	 2.5 
	
T-.co 0 
where r is simply the averaged product of y lagged withyx 
respect to x. When the value of the crosscorrelation is 
high for some value of lag t, it can be said that x and y 
are similar, in some sense, at this lag value. Some spe-
cial properties are 
Ir yx (T)I < [rxx (0)r y (0)] ½ 	 2.6 
or 
Ir(t)I < OXOY 
where o, = r(0) 1 i.e. the mean square value of y, and 
r 
yx 	yx 	 yx 	xy (T) * r (-i) but r CT) = r (-i) 	2.7 
The most straightforward interpretation of the 
crosscorrelation function is in the context of time delay 
estimation [44,45,46,47]. Consider the propagation path 







Figure 2.3. Non-dispersive propagation path. 
In this example the signal, represented by x(t), pro-
pagates through the nondispersive, linear path and com-
bines with statistically independent noise n(t), to pro-
duce the output response y(t). Assuming, for simplicity, 
that the frequency response function of the propagation 
path is a constant H(f) = K, that the propagation distance 
is d, and that the propagation velocity is c, it follows 
that [4] 
Y(t) = Kx(t-d/c) + n(t) 	 2.8 
S 
The crosscorrelation function between x(t) and y(t) is 
then 
T 
= urn 	f[Kx(t-d/c)+n(t)].x(t--t)dt 	2.9 
T-.°° T 0 
= Kr(t_d,c) 
So, in this simple example, the crosscorrelation function 
is given by the autocorrelation function of x(t) multi-
plied by K and displaced in time to have a peak at t 1=d/c. 
Thus, the crosscorrelation function can be used to deter -
mine either the distance d, the velocity c, or the time 
delay t 1 of the propagation path. In realistic situa-
tions, such as in flow metering, the model is less 
straightforward. Turbulence in the flow causes the 
crosscorrelation to become asymmetrical about its maximum 
and adopt a skewed form [48,49], as shown in Figure 2.4. 
- 10 - 
FLOW 
L 
CORRELATOR 	I 	ryx 
Figure 2.4. Skewing of crosscorrelation functions due to 
the effects of flow turbulence. 
Normalised correlation functions are defined by the 
following expressions. Firstly, for autocorrelation, 
T) 
r(T) = r(0) 	-1 < r(T) < 1 	2.10nxx nxx 
and secondly, for crosscorrelation, 
= 
ryx 
-1 	r 	(t) 	1 	2.11 
[rxx(0)ryy(0)]½ ' 	nzx 
where r 	 and rnyx are the normalised correlationnxx 
- 11 - 
functions, and r(0) and r( 0 ) are the mean square 
values of the signals y and x respectively. 
Normalisation of the function makes interpretation 
clear when r 	 or r 	 equals ±1.0 or zero. However whennxx 
the result is less than 1.0 but greater than zero, the 
significance is less clear. To help with interpretation 
some associated functions are introduced. For the pur-
poses of this thesis they are mentioned only briefly; a 
more detailed account is given by Roth [4]. Firstly, the 
correlation integral is closely related to the convolution 
integral. The only significant difference being the time 
reversal operation required by the convolution integral. 
For example, 
T.ryx(T) =y(t)*x(_t) 	 2.12 
where the star indicates the convolution of the two time 
functions. Another useful function is the cross-power 
spectrum, Gyx i which is the Fourier transform of the 
crosscorrelation function, 
= F{ryx } 	 2.13 
In addition, the cross-power spectrum may be obtained from 
the linear spectrums, thus: 
* 
G yx  =S  y x S 	 2.14 
where Sy=F{Y(t)}i S x=F{X(t)}i and S indicates the complex 
conjugate of S. In addition, there are triple correlation 
functions, which are defined as the average product of the 
input signal at three instants in time, i.e. two time 
lags. Thus the triple correlation rxxx (T) is given by 
T 
r 	(Ti ,T2) = limfx(t)x(t-T 1 )x(t-T 2 )dt 	2.15 
- 12 - 
A study of triple correlation is presented by Lohmann and 
Wirnitzer [50], and a correlator which can produce triple 
correlations has been reported by Corti et al [36]. In 
combination, therefore, these functions provide comprehen-
sive and powerful tools for measurement and analysis. 
2.3. Historical Development 
In the 1950's, computation of the correlation func-
tion was performed using a variable delay line, a single 
multiplier, and a single integrator. The delay line had 
to be non-dispersive over the frequency range of interest; 
Various early methods using magnetic tape loops are 
reviewed by Cheney [51] and Lange [52]. In 1952 Brooks 
and Smith proposed a general purpose analogue computer for 
correlation functions [53]. The delay parameter is pro-
vided by staggered magnetic tape inputs. In the following 
year Bennett used a tapped delay line to replace the stag-
gered tape inputs [54]. A complete integration period was 
required by these early systems for each successive value 
of the time delay i, and although they produced accurate 
estimates of the correlation function, their computation 
time was too long for many applications. 
A relentless demand for ever greater computational 
speed prompted the development of digital correlators. By 
quantising the input signals into two levels, the tasks of 
multiplication and integration become simple arithmetic 
procedures. Quantisation causes an increase in the vari-
ance of the output, but as we shall see later, the effect 
can be reduced by integrating over a longer time. 
With the development of Large Scale Integration 
(LSI), it became more economically feasible to make tapped 
delay lines, and arrays of multipliers and integrators. 
This heralded an era of parallel processing where values 
- 13 - 
of the correlation function, for many values of the time 
delay parameter t, are computed simultaneously. This 
improvement in processing power, made possible by LSI and 
VLSI, is essential for real time measurement and control 
applications. However, in each area of application of 
correlation analysis, there exists a need to compromise 
between sampling rate, the level of input quaiit;:,atjon, 
and the length of the integration period. 
The remaining sections of this chapter deal with the 
fundamentals of correlation theory. The effects of sam-
pling rate, input quantisation, and observation period are 
examined. To introduce the theory, a short summary of 
random data concepts is presented. 
2.4. Correlation Principles 
2.4.1. Random Data Concepts 
Physical phenomena of interest in engineering are 
usually described in terms of amplitude versus time func-
tions, known as "time history records". Many of these 
phenomena are "non-deterministic", or "random"; that is, 
each measurement produces a unique time history record 
which is not likely to be repeated, and cannot accurately 
be predicted. 
In the case where the measurements of a physical 
phenomena are considered random, then the resulting time 
history record represents only one instance of what might 
have happened. To gain a fuller understanding of the 
phenomenon one must consider a set of all possible time 
history records that could have occurred. For example, a 
set of all time history records x1 (t), 1=1,2,3,..., is 
illustrated in Figure 2.5. 
- 14 - 
xn(t) 
t1 - -r T 	t 
Figure 2.5. Ensemble of time history records defining a 
random process (x(t)}. 
This is referred to as the "ensemble" that defines the 
random process {x(t)l. Given an ensemble of time history 
records, the average properties can be computed at any 
specific time t 1 and t 1 -t, that is, the autocorrelation 
- 15 - 
function at time delay t, is given by 
N 
= urn 	E x (t 
N-+oo N 1=1 	
1 ).x(t 1 -t)dt. 	2.16 
In the general case, where one or more of the average 
values vary with time, the process is said to be "non-
stationary". In the special case, where the average 
values are constant from one ensemble to the next, the 
process is said to be "stationary". For almost all sta-
tionary data the average values computed over the ensemble 
at time t 1 will equal the corresponding average values 
computed over all time from any single time history 
record. Thus the autocorrelation function may be written 
as 
T 
= urn 	fx(t)x(t-t)dt 	 2.17 
.T-.00 	0 
where x(t) is any arbitrary record from the ensemble 
{x(t)}. The justification for the interchange of time and 
ensemble averaging is given by the ergodic hypothesis 
[55,56]. 
2.4.2. Fundamental Estimation Errors 
In practice the number of data records available for 
analysis by ensemble averaging techniques, or the length 
of a data record used for analysis by time averaging tech-
niques, will always be finite. Therefore, the average 
properties of the data can only be estimated and never 
computed exactly. As a result, certain errors arise. 
These are in addition to numerous potential errors from 
other sources, such as errors that might arise, for exam-
ple, from input transducers, signal pre-processing, and 
analogue to digital conversion (quantisation). 
- 16 - 
The estimation errors can be divided into two 
classes: bias error and random error. Bias error is a 
systematic error that will appear with the same magnitude 
and in the same direction from one analysis to the next. 
Random error is a haphazard scatter in the results from 
one analysis to the next, of different samples from the 
same random data. It is a direct result of averaging over 
a finite number of time history records, or over a single 
record of finite length; it is therefore present in all 
analyses. 
Random error is defined by the standard deviation of 
the estimate about its expected value, and it is often 
normalised to the parameter being estimated [3]. The nor-
malised value is inversely proportional to the square root 
of the number of records N, or record length T. Hence to 
reduce the random error to half its value, the number of 
records, or the integration time, must be increased by a 
factor of four. 
Also, for time averaged estimates, the normalised 
random error is inversely proportional to the square root 
of the data bandwidth B. This means that for those appli-
cations where the data bandwidth is very wide, as often 
occurs in communications, a relatively short record might 
provide highly accurate estimates. In contrast, for those 
applications where the data bandwidth is typically narrow, 
as occurs in studies of ocean movements or atmospheric 
turbulence, very long records may be required to obtain 
acceptably accurate results. 
In the next section the discussion is extended to 
include the correlation function of discrete time sampled 
data, and then to discrete time quantised data. 
- 17 - 
2.4.3. Discrete Time Correlation 
In the case of sampled inputs, the process of 
integration is replaced by one of summation, and Equations 
2.1 and 2.5 may be rewritten as 
1 N-i r' 	(kt) = 	E x(nt).x(nt-kt) 	2.18 
	
XX 	 n=O 
1 N-I r' (k 1 t) = N E y(nt).x(nt-kt) 	2.19yx 	 n=O 
where k, and n are integers. The notation r' (kt) 
represents 'an approximation to the defined correlation 
function, but for convenience the approximation will be 
written simply as r(kt). The analogue signals are nor -
mally sampled at equally spaced time intervals, it, with 
delay calculated at kt intervals, where k is an integer 
from 0 to K-I, equal to the number of correlation points. 
In practice the maximum number of samples N (corresponding 
to the maximum integration time Nat), is finite. 
The sampling period At is related to the signal 
bandwidth B and the number of correlation points required 
to define the peak of the function. If we assume that we 
require p points within the peak region to define the peak 
position adequately, then the sampling per Lod At is given 
by [57] 
At = B(p+I) 	 2.20 
In certain applications, such as flow rate measure-
ment or time delay estimation, a reduction can be achieved 
in the number of delay increments required to implement 
the function. At minimum flow velocities (that is, max-
imum time delays), the number of points computed to define 
the peak far exceeds the number required to determine the 
position of the peak accurately. The amount of redundant 
information in these situations can be reduced by increas-
ing the time delay increment at longer correlated time 
delays [58,16]. Alternatively, a variable sampling rate, 
which is derived from the flow velocity, may be used, 
although this approach is unreliable when there is a step 
change in the flow velocity [59]. 
As we have already seen, there are several sources of 
estimation error. Finite averaging time, finite 
bandwidth, noise, waveform sampling, and waveform quanti-
sation all contribute to the variance of the result. 
Expressions relating the variance to the averaging time, 
bandwidth, and mean square signal to noise ratios have 
been derived theoretically and confirmed experimentally by 
several authors [60,61,16]. Sampling and quantisation 
both introduce noise and, in addition, sampling can limit 
bandwidth. 
Intuitively, one would expect the accuracy of the 
correlation estimate to increase with increasing sampling 
rate. However, Kay [61] has shown that, for long averag-
ing times (time bandwidth product greater than 25), the 
variance does not significantly reduce as the sampling 
rate increases. For short averaging times (time bandwidth 
product less than 25), the variance does reduce when the 
analogue waveform is sampled faster than the Nyquist rate. 
Sampling at twice the Nyquist rate appears to be a good 
compromise between the desires of minimising the mean 
square error and of maintaining a low sampling rate. This 
result concurs with the earlier analysis of Bowers et al 
[62]. 
- 19 - 
2.5. Correlation Techniques 
Direct implementation of the correlation function 
imposes a large processing cost. Considerable effort has 
been expended to devise approximations that will reduce 
this cost. Notable reductions are achieved when signals 
are converted to sampled-data form, and the analogue 
integration process is replaced by one of summation. 
Further reductions follow when quantised signal represen-
tations are used. This section is concerned with the 
classes of correlator which arise from the use of quanti-
sation and dither. In Chapter 3, where integrated circuit 
correlators are reviewed, another facet of correlator 
implementation is introduced: that is, computation using 
parallel techniques, serial techniques, or a combination 
of the two. 
Four basic types of correlator, resulting from the 
use of quantisation and dither, can be defined as follows: 
Direct correlators, where both inputs are analogue. 
Stieltjes correlators (63,64,65], where one input is 
quantised and the other is analogue. 	The relay 
correlator is a limiting case, where the digital 
channel is quantised to just two levels, +1 and -1. 
Digital correlators, where both inputs are quantised. 
The polarity correlator is the limiting case of this 
class of correlator, where both input signals are 
quantised to two levels, +1 and -1, before correla-
tion. 
Modified correlators, where a dither signal is added 
to the digital input or inputs. The modified relay 
correlator is a special case of the modified 
Stieltjes correlator, and the modified polarity 
- 20 - 
correlator is the limiting case of modified digital 
correlators. 
These classes of correlators are described in more detail 
in the following sections. Firstly, however, a brief dis-
cussion about quantisation is presented. 
2.5.1. Quantisation of Input Data 
Quantisation is the process of replacing analogue 
samples with approximate values taken from a finite set of 
allowed values [66]. Quantisation is employed in the 
design of correlators so that the benefits of digital cir-
cuit techniques may be exploited. It will be seen that 
the disadvantages arising form the errors introduced by 
quantisation, are often compensated by the reduction in 
correlator's cost and complexity. 
There are various forms of quantiser, the more 
sophisticated of which add a minimum of distortion or 
quantisation noise to the signal. The simplest and most 
common form is the zero-memory quantiser. In this case, 
the output value is determined from only one correspond-
ing input sample, independent of the values of earlier (or 
later) analogue samples applied to the quantiser input. 
More sophisticated, is the block quantiser which looks at 
a group, or block, of input samples simultaneously, and 
produces a block of output values to represent the 
corresponding input samples. Another class of quantisers, 
which could be described as sequential, includes digitis-
ing schemes such as delta modulation, differential PCM, 
and other adaptive versions. A sequential quantiser 
stores some information about previous samples and gen-
erates the present quantised output using both the current 
input and the stored information. 
- 21 - 
For the purposes of this thesis, we need not be con-
cerned further with the details of quantisation, other 
than its effect on the realisation and accuracy of the 
correlation function. In this respect the most important 
parameter of quantisation is, the number of quanta, or 
quantum levels allocated to each input of the correlator.. 
For example, coarse quantisation in both inputs of a 
correlator permits the use of much less complex circuits 
for multiplication and summation etc., than would be the 
case for one with finely quantised data. On the other 
hand, coarse quantisation leads to a degradation in the 
accuracy of the correlation function. However, the degra-
dation in the output can be eliminated by averaging over a 
longer period, since the errors are essentially random. 
But, in the case of extremely coarse quantisation, i.e. 
two levels, significant bias errors are incurred which 
cannot be removed by simply extending the integration 
period. These bias errors are eliminated by the use of. 
dither, or auxiliary signals [67,68]. A dither signal is 
added to the input of a digital correlator before quanti-
sation. Unfortunately, dither signals introduce an addi-
tional source of random errors into the system, which, in 
turn, must be eliminated by integrating over a longer time 
[69]. 
Another form of quantisation, delta sigma quantisa-
tion, is the basis of a separate class of correlators. 
Delta sigma correlators are described by several authors 
[64,70,71,72,73,74], but they are beyond the scope of this 
thesis. 
Quantisation can have a significant effect on the 
complexity of correlators. In 1962, Watts [75] presented 
a detailed analysis of the effect that quantisation has on 
correlator performance and derived a general form for mul-
tiplier correlators. The direct analogue correlator, the 
- 22 - 
digital correlator, and the Stieltjes correlator are all 
shown to be special cases of the general form. 
Amplitude quantisation is a non-linear process. When 
such a non-linear operation is incorporated into a system, 
detailed analytical analysis of the system is made 
extremely difficult. The statistical analysis of such a 
system can be relatively easy because it is possible to 
investigate the statistical effects of quantisation in 
detail. It can be shown that, for many cases, quantisa-
tion is equivalent to the addition of random independent 
noise with a mean square value equal to one-twelfth the 
square of the quantisation interval [76). Thus, the quan-
tised signal x is considered to be equal to the original 
signal x, plus the additive quantisation noise a. For 
example, x+=x+a, y=y+b. The correlation function of two 
quantised signals x  and y (with zero means) may then be 
expressed as 
= r yx 
 +r 	+rb +rb 	 2.21ya 
where ryx is the correlation between the signals y and x, 
rya is the correlation between the signal y and the quan-
tisation noise a, rb X is the correlation between the sig-
nal x and the quantisation noise b, and rb a is the corre-
lation between the quantisation noise a and the quantisa-
tion noise b. 
2.5.2. Direct Analogue Correlation 
Analogue or continuous correlators are those correla-
tors in which the signals are processed directly, without 
any form of amplitude distortion being used. They have 
been termed ideal correlators because, with the same input 
signals and noise, their signal to noise ratio is not 
exceeded by any other form of correlator. However, they 
- 23 - 
possess important disadvantages such as drift. The imple-
mentation of analogue correlators has been described in 
Section 2.3. The analogue multipliers used in them can be 
realised using transistor circuits, and the delay opera-
tion can be performed by LC circuits, or tape recorder 
systems. The integration can be achieved using current 
summing amplifiers or low pass filters. 
When the analogue signals are sampled at the Nyquist 
rate, or faster, sampled data techniques, such as charge 
coupled devices, may be employed. Analogue correlators 
realised using integrated circuit techniques are described 
in Chapter 3. 
2.5.3. Stieltjes Correlation 
The Stieltjes correlator is a special form of the 
general configuration, in which one of the inputs is 
analogue and the other is coarsely quantised [63,64,65]. 
Since only one of the inputs is quantised, the output of a 
Stieltjes correlator is 
r 	=r +r syx yx 	bx 	 2.22 
The only error term is the term rbX , which, even for quite 
coarse quantisation of y, can be extremely small. Watts 
[75] has shown that when the digital channel is quantised 
into three levels, the Stieltjes correlation function is 
related to the direct correlation function to within 1%. 
The circuitry required to implement this correlator 
represents a considerable saving in complexity when coin-
pared with direct correlation [77]. The correlation delay 
is implemented digitally in one channel. The multiplica-
tion may be performed by a digital-to-analogue converter 
with the analogue signal as its reference. A disadvantage 
of the Stieltjes correlator, as with all analogue 
-24- 
correlators, are the difficulties concerning drift. 
2.5.4. Relay Correlation 
A special case of the Stieltjes correlator is the 
relay correlator, which is illustrated in Figure 2.6. 
analogue YO 
input 	
PHASE 	INTEG ~A ~R Output 
+ 
xO—I Sgn 	SHIFT REGISTER DELAY 
Figure 2.6. Basic configuration of a relay correlator. 
In this case one of the inputs is quantised into two lev-
els, denoted by the 'sgn" operator, and the other is 
analogue. Sgn(x) means signum(x), a function of the value 
+1 for positive x and - 1 for negative x. The output of 
the relay correlator rryxl for sampled inputs with Gaus-
sian statistics, is related to the direct correlation 
function by 
rryx(kt) = [2/u] r(T) a 	 2.23 
where 
i N-i 
rryx (kt) =N E y(nLit).sgn[x(nt-kt)] 	 2.24 n=O 
and o is the RMS value of the signal y, given by 
(r(0)) . The relay correlator represents a compromise 
between the direct digital correlator and the polarity- 
coincidence correlator, both in terms of accuracy, and in 
terms of circuit complexity. To achieve results of the 
- 25 - 
same level of accuracy as the ideal analogue correlator, 
the integration time must be approximately 1.5 times 
longer. 
2.5.5. Multilevel Digital Correlation 
Direct digital correlation, or multilevel correla-
tion, in which quantisation is done using more than two 











Figure 2.7. Basic configuration of digital correlator. 
It has been shown that [76], in many cases, even for 
fairly large quantisation intervals (for example, the 
total range of x or y divided into eight intervals), the 
terms rya  and rbxl defined above, are negligible, and the 
term rba is also negligible, except when x equals y, in 
which case rba = q 2/12, where q is the quantisation inter -
val. This is known as the Sheppard correction to the mean 
square for grouped data [78]. This correction is a sim-
plification of a more general expression, given by Gersho 
[66], and assumes that the intervals of quantisation qi 
are equal, that is, where i is an integer from 0 to 
L-1 in an L level quantiser. Thus, the output of the 
direct digital correlator rd YX I using uniform quantisa-
tion, may be taken to be 
rdyx 	=ryx + q2/12, 	t=0 
= ryxl 	 x*y, all t 	2.25 
=ryxl 	 x=y, t*0 
- 26 - 
The hardware realisation of a direct digital correla-
tor involves complex digital circuitry but results in an 
accurate correlation estimate. A description of a 2-bit 
by 2-bit digital correlator for measuring the spectra of 
radio astronomy signals is given by Ables et al [79], and 
the Hewlett Packard correlator [80], which quantises the 
input signals into three levels in one channel and seven 
levels in the other, has been used successfully for many 
years. Another digital correlator, which resembles a 3-
level by 3-level correlator is presented by Dewdney [81]. 
In this case the circuit complexity is reduced by accumu-
lating the product transitions rather than the products 
themselves. The penalty incurred by this technique is a 
six percent decrease in output signal to noise ratio, when 
compared with a normal three level correlator. The loss 
in signal to noise ratio may be recovered by increasing 
the integration time, since integration time is propor-
tional to the square of the signal to noise ratio. A spe-
cial case of multilevel correlation is digital relay 
correlation. This is in addition to polarity correlation, 
which is discussed in the next section. A digital relay 
correlator, illustrated in Figure 2.8, averages the pro-
duct of the quantised values of the y-input, and the 
polarity of the x-input, using a digital adder/subtractor 
and digital store. 
yO—jQUANTISER )1±CCUMULORJ>Output 
control 
xO 	Sgn_--k-4SHIFT REGISTER DELAY — 
Figure 2.8. Configuration of a digital relay correlator. 
- 27 - 
The circuit complexity represents a compromise between 
full multilevel correlation and polarity correlation. 
2.5.6. Polarity-Coincidence Correlation 
A special case of digital correlation is the 
polarity-coincidence correlator, shown in Figure 2.9. 






x   
O=count down 
Figure 2.9. Configuration of a polarity correlator. 
In this case the inputs are quantised to two levels, ±1, 
denoted by the sgn operator, as above. If the input sig-
nals have Gaussian statistics, the polarity correlation 
function r 	 may be related to the direct correlationpyx 
function by 





r 	(kt) = 	E sgn[y(nt)].sgn[x(nM_kt)] 	2.27 py 
n=O 
and r(t) is the normalised correlation function of the 
signals y and x, and t is the time lag between the two 
signals, as given in Equation 2.11. The arcsine relation-
ship was first reported by Van Vieck in 1943 and subse-
quently by Van Vleck and Mid ileton in 1966 [ 8 2]. The 
hardware 	realisation 	of the polarity-coincidence 
correlator is very much more simple than the direct digi-
tal correlator. The signal delay is implemented by a 
single-bit shift-register, multiplication is achieved by 
exclusive-NOR gates, and the integration process is per -
formed by simple counters. The correlation estimate 
obtained from a polarity correlator is less accurate than 
one obtained from a direct digital correlator (83], and 
accordingly requires an integration time which is approxi-
mately 2.5 times longer, to achieve the same level of 
accuracy. Polarity correlation is treated in more detail 
in Section 2.6. 
2.5.7. Modified Correlators: Dither 
The significant bias errors incurred when extreme 
clipping operations, such as sgn(x), are used, can be 
eliminated by adding a dither signal to the signal to be 
clipped, as shown in Figure 2.10. 
- 29 - 
DITHER SIGNAL 
+ 	QUANTISER y ~- 	 *( x 
Output 
x 
	 +) 	QUANTISER 	DELAY 
DITHER SIGNAL 
Figure 2.10. Configuration of a modified digital correla-
tor. 




where the signal input magnitude must be maintained equal 
to or less than the upper bound A on the amplitude of the 
dither signal. A detailed analysis of modified correla-
tors has been presented by Berndt [84), and by Chang and 
Moore [67]. Landsberg and Cohen [85] have reported a 
modified digital correlator which uses three levels of 
quantisation in both channels. 
A polarity correlator can be modified to give an 
unbiased output, and is applicable to any random process 
with bounded inputs. This modification is achieved by 
- 30 - 
adding uniformly distributed, statistically independent 
noise to each of the input signals before they are 
clipped. A wide range of random dither signals have been 
used to modify correlators, but it has been found, subse-
quently, that deterministic signals can be used-if they 
have uniformly distributed amplitude values [86,68,67]. 
Dither signals have found numerous applications in fields 
such as communications, where it enables capture of a 
wanted signal despite the presence of unwanted interfer -
ence [87], and control, where it improves the performance 
of quantised sampled-data systems. 
2.6. Polarity Correlation and the Overloading Integrating 
Counter Technique 
2.6.1. Polarity Correlation 
Implementation of a high speed correlator requires an 
array of multipliers, delay elements, and accumulators, 
either analogue or digital. Polarity correlation methods 
minimise the complexity of the computational elements by 
discarding the magnitude information of the input 
sequences. Digital design techniques can then be employed 
to realise the multipliers by EXNOR gates, the delay ele-
ments by a digital shift register and the accumulators by 
simple counting circuits. This results in a more economi-
cal and more compact implementation than would otherwise 
be achieved, the penalty for which is an increase in 
integration time to obtain a correlation function with 
acceptable variance [88]. The polarity correlation func-
tion is nonlinearly related to the direct correlation 
function by the Van Vieck arc sine relation, Equation 
2.26, for input sequences which have Gaussian statistics. 
In Chapter 3 details are presented of previously 
reported techniques for obtaining the polarity correlation 
- 31 - 
function (89,90,91,92,93). 	These 	techniques 	include 
parallel counters [94,95,96,97], which are not directly 
cascadable and hence non-optimal for VLSI implementation. 
The prototype chip described here is based on an interpre-
tation of the polarity correlation function which permits 
the elimination of parallel counters and results in a 
highly regular - correlator structure amenable to VLSI 
implementation. The structure also permits direct cascad-
ing of correlator stages. A block diagram of a correlator 
using this approach is shown in Figure 2.11. 
Preset Integration Time N 
xy 4 ___ 











Counter q3 	r(3)=q3/N <I I 
Figure 2.11. Block diagram of a polarity correlator. 
Polarity correlation is based on the computation of 
the discrete function, 
N-i 
r(t) = N E (sgn[y].sgn[x]) 	2.29 
n=0 
which is based on Equation 2.27, but, for convenience, the 
- 32 - 
time lag kt is replaced by the symbol t,and the sampled 
signals of the form y(nt) are replaced by a data sequence 
y with sequence index n. Complete positive correlation 
(rpyx = 1) occurs when the polarities of the input sam-
ples (assuming the mean of both inputs to be zero) are at 
all times equal, yielding an average product of +1. Com- 
plete negative correlation = -1) occurs when thePYX 
polarities of the input samples are never equal (inverse 
proportionality), yielding an average product of -1. In 
the case where the input samples are not related 
(rpyx = 0) the sum of the positive products will equal 
the sum of the negative products and the average product 
will be zero. 
Implementation of polarity correlation requires an 
analogue comparator circuit to convert sgn[x]=x/IxI and 
sgn(y]=y/lyl into logic I if the signal is positive and 
logic 0 if the signal is negative. Note that this defini-
tion means that a logic 0 represents -1 (see Section 
2.5.6). The time delay t between the two signals is 
achieved by using a digital shift register where a partic-
ular value of delay is defined by the product of the 
number of preceding shift register stages and the sample 
clock period, At. Multiplication is performed by the 
Boolean coincidence function, EXNOR, whose output is I 
only if the inputs are both equal. If time-successive 
values of the coincidence function F n (T) are summed in a 
digital counting circuit for a period T seconds, where 
T = Nat, then the contents of the counter at the end of 
the period will be proportional to the relevant value of 
the correlation function. The EXNOR function can only be 
regarded as performing multiplication if the logic 0 is 
allowed to represent -1. Thus, a logic I in the coin-
cidence signal would indicate 'increment by one' the con-
tents of the counter and a logic 0 would indicate 'decre-
ment by one' the contents of the counter. This would 
- 33 - 
necessitate the use of up-down counters which are undesir-
able from a VLSI circuit design point of view. However, 
it is possible to use simple up-counters whose contents, 
q( -r), can be related to the correlation function in the 
following way. Firstly the contents of an integrating 
counter are given by 
N-I 
q(t) = 	E F (t) 	 2.30 
n=0 
where F(t) is the coincidence function bit stream defined 
by 
F(t) = 	+ sn[YflJ.sgn[xfl_] 	 2.31 
= lorO 
Thus, by substituting into 2. 30, 
N-I 1 	N-I 1 
q(t) = 	+ E 	sn[Y].sgn(x n- T] 	 2.32 n=0 	n=0 nT 
N N = .. + 	.rpyx (t) 	 2.33 
Hence, 
g(T) 
rpyx (T) 	2 N - 2.34 
where r(t) is the polarity correlation function as 
given by Equation 2.29. Thus, Equation 2.34 gives a meas-
ure of the correlation function using the integration 
counter contents, q(t), after sampling N times. At max-
imum positive correlation (rpyx = +1) a maximum count 
q(t) = N is obtained after sampling N times. In the case 
- 34 - 
of maximum negative correlation 	= -1), where thePYX 
input samples are never equal, the coincidence signal is 
always zero, resulting in a zero count, q(r) = 0. In the 
case of zero correlation = 0), a count of q(t) = N/2PYX 
is reached after sampling N times. 
2.6.2. Overloading Counter Technique 
An alternative approach to polarity correlation is 
based on an integrating overloading counter technique 
(98,99,2], which eliminates the requirement for a value of 
q(T) to be available. Instead, the correlation function 
is computed using the number of samples required to 
achieve overload count conditions, q(T) = N, in a given 
integrating counter. The concept of the technique is 
illustrated by Figure 2.12, which shows the relationship 
between the contents of an integrating counter, q(t), and 
the number of samples, which is now a variable, m. 






Cn overload occurs at q=N 
° N 
Cn 	 r =+1 V pyx 
C 












Contents of sample counter, m 
Figure 2.12. Relationship between the contents of the in-
tegrating counters and the number of input sample pairs. 
The number of samples, m, can be related to the polarity 
correlation function by writing q(t) as, 
	
rn-i 	1 	1 q(r) = N = E ( + sn[Y].sgn[x - 1) 	 2.35 
n=o nt 
m m = - + 	 2.36 
where 
rpyx(t) 
= 	E (sn[Y].sgn(x]) 	 2.37 m 
Hence, in this case, 
= 2 12-  - 1 1 	for in > N 	 2.38 
- 36 - 
where N is the capacity of the integrating counters and m 
is the number of samples required to achieve overload con-
ditions in the integrating counter corresponding to time 
delay T. An overload occurs after m=N samples when corre-
lation is maximum and positive. In the case of zero 
correlation an overload occurs after m=2N samples and 
after an infinite number of samples when the correlation 
is maximum and negative. Note that an overload cannot 
occur until m > N. 
A polarity correlator using the overloading counter 
technique thus comprises a delaying shift register con-
nected to a parallel array of coincidence detectors and 
integrating counters. A block diagram of a polarity 
correlator using the ove doading counter technique is 
shown in Figure 2.13. 
- 37 - 














Figure 2.13. Polarity correlator block diagram using the 
overloading counter technique. 
An overload pattern shift register is used to inspect the 
overload condition of the counters. The evolving pattern 
of overload states defines the correlation function shape 
and the time delay position of the first integrating 
counter to overload defines the position of the most sig -
nificant peak of the function. A sample counter is 
included to count the number of input samples, m, so that 
the value of the correlation function may be computed for 
any integrating counter to overload. If the maximum capa-
city of the sample counter is set to be twice the capacity 
of the integrating counters the significance range is lim-
ited to I > r > 0. If it is required to cover the range 
I > r > —I, two correlator circuits working in parallel 
can be used with one covering the positive range and the 
other covering the negative range. 
Such a system is most suitably realised using 
integrated circuit technology and an early device imple-
mented 12 stages of correlation using pMOS technology 
[28]. The correlator chip described this thesis, con-
sists of a linear cascade of identical correlation ele-
ments, which has been fabricated in 5 micron nMOS technol-
ogy. The performance of the correlator depends on the 
serial connection of correctly functioning correlation 
elements. To optimise performance, and gain full advan-
tage of the VLSI architecture, a design strategy was 
adopted; which includes testability, yield enhancement, 
and improves reliability. The design incorporates built-
in self test (81ST) and self repair mechanisms, which 
automatically detect and eliminate failed correlation 
stages in the VLSI circuit [100,101,102,103]. 
2.7. Summary 
In this chapter, correlation theory has been 
presented. It has been shown that, for stationary, 
ergodic signals, a temporal correlation function with fin-
ite integration time can approximate the true correlation 
coefficient. The effects of sampling, quantisation, and 
dither have been described. The main conclusion is that 
any physically realisable correlation system must comprom-
ise accuracy with integration time, or circuit complexity. 
The overloading integrating counter technique for 
- 39 - 
polarity correlation has also been described, and the pro-
totype correlator chip, featuring built-in self test and 
self repair mechanisms, has been introduced. Design 
details of a 28 stage prototype chip (termed the Eu349) 
are reported in Chapter 5. In the next chapter a review 
of silicon integrated circuit correlators is presented, in 
which the Eu349 chip's architecture, and how it relates to 
other integrated circuit correlators, is discussed. 
-40- 
CHAPTER 3 
INTEGRATED CIRCUIT CORRELATORS 
3.1. Introduction 
Devices for computing correlation functions have been 
implemented using a variety of technologies and tech-
niques. They span the entire gamut of signal processing 
techniques from optical signal processors to microcomputer 
systems; from surface acoustic wave devices to charge cou-
pled devices; and from electronic systems built with small 
scale and medium scale integrated circuits, to full custom 
VLSI processors. This chapter reviews correlation tech-
niques which have been realised by silicon integrated cir-
cuits. Implementations based on optical techniques 
[104,105,106,107], acousto-optical techniques [108,109], 
ultrasonic and surface acoustic wave (SAW) techniques 
[110,111,42] are beyond the scope of this discussion. So 
to are integrated optical correlators [112], which have 
received considerable attention and will find applications 
in parallel array signal processing problems such as real 
time image processing. Also excluded from this discussion 
are the microprocessor based correlators. These, in gen-
eral, use a microprocessor to control a dedicated peri-
pheral circuit which performs the delay, multiply, and 
accumulate operations [113,114,59,115]. In some cases 
however algorithms are used which allow the microprocessor 
to compute the correlation function with a minimum of 
additional circuitry. Examples of this are the zero 
crossing algorithm of Henry [116], and the skip algorithm 
of Fell [117]. 
- 41 - 
In the remaining sections of this chapter silicon 
correlators are discussed. The architectural concepts, 
which distinguish the VLSI implementations, include serial 
correlators, parallel correlators, and serial/parallel 
correlators. The discussion also includes systolic arrays 
and examples are given for bit-systolic, word-systolic, 
linear, and two-dimensional systolic architectures. 
3.2. Correlation Architectures 
3.2.1. Serial Architecture 
The basic elements of a correlator are shown in Fig-
ure 3.1. In a serial correlator, this configuration 
y 
	
X )-H INTEGRATOR  H ryx(T) 
0-H DELAY 
Figure 3.1. Serial correlator. 
is implemented directly, and its operation is straightfor-
ward. The underlying principle can be described in terms 
of a temporal correlation lag and a temporal integration 
(architectures which implement spatial lags or spatial 
integrations are discussed in the next section). By mak-
ing use of temporal techniques, the serial correlator 
minimises the circuitry required to implement the func-
tion. However the penalty for this simplicity is a long 
processing time. To compute, for example r(T 1 ), the delay 
element is first set to the value 
t. Then, the input 
data sequences are multiplied, and the results are 
integrated. After the integration period, computation for 
- 42 - 
this single point of the function is complete. The entire 
computation is repeated for the next value of correlation 
delay, hence the term "temporal lag and temporal delay". 
Due to the long processing time, serial correlators are 
not common in VLSI implementations, although if the signal - 
bandwidth is large and has stationary characteristics, 
then serial correlation is useful and very simple to 
implement. One instance of an integrated serial correla-
tor, designed to verify a correlation algorithm which uses 
a pseudorandom dither signal, has been reported [86]. 
In general, there are two ways to increase the compu-
tation rate of a signal processing system. One is to use 
faster components and the other is to use concurrency. 
The last decade has seen an order of magnitude decrease in 
the cost and size of integrated circuit components, but 
only an incremental increase in component speed. With 
current technology, tens of thousands of gates can be 
fabricated on a single chip, but no gate is much faster 
than its TTL (Transistor-Transistor Logic) counterpart of 
ten years ago. Since the technology trend indicates a 
diminishing growth rate for component speed, any major 
improvement in computation rate must come from the con-
current use of many processing elements. The degree of 
concurrency is largely determined by the underlying algo-
rithm. Optimum performance can be achieved when the algo-
rithm is designed for the most effective degrees of pipe-
lining and multiprocessing [118]. However, it must be 
noted that, when a large number of processing elements 
work simultaneously, coordination and communication prob-
lems become significant [119,120]. The objective, there-
fore, is to design algorithms which allow high degrees of 
concurrency, while employing only simple, regular communi-
cations and control. Direct cascading of cells for system 
expansion is also important. 	Systolic architectures, 
introduced by Kung and Leirserson [121], provide 	a 
- 43 - 
solution to the above objectives. 
A systolic system consists of a set of synchronously 
clocked, interconnected cells, each of which are capable 
of performing some simple operation. The cells are usu-
ally connected together to form a systolic array or sys-
tolic tree. Information flows between cells in a pipe-
lined fashion and communication with the outside world 
occurs only at the boundary of the array. Features to 
avoid in the design of a systolic system are global broad-
casting of signals across the array, and fan-in of many 
outputs to a single computational element [122]. These 
criteriO. will be illustrated by the correlator architec-
tures in the remaining sections of this chapter. 
3.2.2. Parallel Architecture 
In the previous section, a serial correlator was 
described as employing temporal delay and temporal 
integration. The first parallel correlation architecture 
to be discussed here achieves concurrency by replacing the 
temporal delay with spatial delay. 
3.2.2.1. Parallel architecture with temporal integration 
and spatial delay 
This parallel architecture is shown in Figure 3.2. 
- 44 - 
Sample  
Clock 	 SAMPLE COUNTER 
 













INTEGRATING COUNTERS (RESET) 
Figure 3.2. Parallel architecture with temporal integra-
tion and spatial delay. 
It can be seen from Figure 3.2 that the correlation func-
tion is computed by an array of multipliers, integrators, 
and delay elements. Each point of the function is com-
puted simultaneously by a dedicated multiplier and 
integrator. The delay operation is implemented by a 
tapped shift register. At each cycle of the computation 
all the delayed values of the input signal are available - 
hence the term "spatial delay architecture". 
- 45 - 
The architecture considered here has a major disad-
vantage in that a parallel output is required by each 
integrator. In the case of a digital circuit this output 
is around 8-16 bits wide for each point of the function. 
Direct communication with every integrator would involve u 
large number of output pins unless some form of multiplex-
ing were used. An example of such an implementation 
(although not an integrated circuit) is reported by Corti 
et al [36], where each integrated result is shifted seri-
ally across the array of counter/registers to the output. 
This technique defeats the purpose of the spatial time 
delay by reintroducing a temporal operation at the output. 
The method is only advantageous when the integration 
period greatly exceeds the maximum time lag, or the 
integration period is too long to be implemented by a spa-
tial integrator (spatial integration is discussed in sec-
tion 3.2.2.3). The correlator designed by Corti was 
designed to correlate weak optical signals over a range of 
108 delays with an integration period of approximately 
65,000 sample periods. Currently, the maximum integration 
period using spatial integration is 512 sample periods, 
which is only possible using analogue current summing 
techniques (123,63]. Integration times in digital 
integrated correlators are much shorter. Chips which 
comprise 128 integrating stages are state of the art (90]. 
The architecture of Figure 3.2 may be optimised for 
VLSI implementation by incorporating the overloading 
integrating counter technique, which is discussed in the 
next section. 
Another time-integrating correlator, but with a dif-
ferent architecture to the one described above, is 
reported by Burke et al [124]. The technique, which is 
illustrated in Figure 3.3, is peculiar to analogue corre-
lators employing charge coupled devices (CCDs), and will 
- 46 - 
be discussed again in section 3.2.2.4. The correlation 
delay is achieved spatially, but in this case N(N+1) 
shift register cells are required, compared with N shift 


















Delay Elements 	 Integrators 	CCD out 
Figure 3.3. CCD time integrating correlator. 
In the case of CCDs, there are advantages . in using the 
larger array of CCD cells. The principal advantage this 
array has over its equivalent using only N cells is that 
the requirement for non destructive sensing of the CCD 
outputs is eliminated. This greatly simplifies the CCD 
design and clocking scheme. 
- 47 - 
3.2.2.2. Parallel architecture with temporal integration 
and spatial delay using the overloading counter technique 
The architecture of the Eu349 chip falls into this 
category. It is shown in Figure 3.4. The arguments 
presented in the previous section apply also to this 
architecture. 
Sample clock r(-r)=N/m 
inputs SAMPLE COUNTER 
m 
Overload 
x 	y pattern 













- Xy DECODE N 
Counters 	Latches 
Figure 3.4. Architecture of overloading type correlator. 
A full description of the overloading counter technique is 
given in Chapter 2, but briefly the operation of the cir-
cuit is as follows: A small modification to the 
4 : 
integrating counters leads to a system which is much more 
suitable for realisation as a large scale integrated cir-
cuit. Instead of monitoring the total contents of each 
counter at the end of a predetermined integration period, 
the counters are arranged to indicate when a preset value 
is reached. Thus when a counter overloads (i.e. exceeds 
its preset capacity), an overload bit is stored in an 
associated latch. Clearly the first counter to overload 
indicates the position of the most significant peak of the 
polarity correlation function. If integration is allowed 
to proceed after the peak has been detected then progres-
sively more counters will overload and the pattern of 
overload states will grow as shown in Figure 3.5. The 
envelope of the overload pattern describes the shape of 
the function. 
























Figure 3.5. Overload patterns from overloading integrat-
ing counters. 
This architecture can be described as a linear systolic 
array. Advantages include cascadability, without the need 
for external components; long programmable integration 
time; and nearest neighbour communications. There is no 
fan-in. Two versions of this architecture have been real-
ised by silicon integrated circuits, one using pMOS tech-
nology [28,98], and the other, the Eu349 described by this 
thesis, in nMOS technology [2,100]. The Eu349 has, how-
ever, some novel design features which allow it to perform 
automatic self test and self repair [101,103,102]. This 
aspect of the design is discussed in Chapter 5. 
-50- 
3.2.2.3. Parallel architecture with spatial integration 
and temporal delay 
Figure 3.6 shows the elements of spatial integration 
correlator. The operation of such an architecture is as 
follows: both signals are stored in registers, with taps 
at each stage connected to a parallel array of multi-
pliers. The products are summed over the array to give 
the integrated result for a single value of correlation 
delay. Subsequent values of the correlation function at 
different delays are then computed by shifting one of the 











Figure 3.6. Parallel correlator using spatial integra-
tion. 
The architecture is especially suited to analogue correla-
tors due the ease with which the summing network can be 
implemented using analogue techniques. A purely analogue 
correlator has been reported which consists of 64 cascaded 
stages to give integration over 64 samples [125,126]. MOS 
storage capacitors are employed for the "static" channel, 
- 51 - 
charge coupled devices are used for the " active " channel 
(i.e. the one to be shifted), and single MOS transistors 
perform the analogue multiplication. Currents from all 
the multiplier transistors are summed on a common source 
busbar and summing amplifier. 
The majority of the analogue correlators in this 
category implement relay (analogue-binary) correlation. 
Again, CCD shift registers are used in the active channel 
and digital shift registers are employed in the static 
channel. Current summing is a method of integrating the 
sample products which requires less silicon area than 
digital methods. Relay correlators with 64 stages [127], 
128 stages (128,129], and even 512 stages [63,123] have 
been reported. An example of an analogue/digital correla-
tor is reported [130]. Analogue information is sampled 
and held at fixed sites on the chip and digital informa-
tion is shifted past them. The digital channel, which is 
quantised into 7 bits, controls the selection of 7 binary 
area-ratioed MOS capacitors per correlator stage. The 
area penalty for employing a 7 bit digital channel is that 
the chip contains only 16 stages of correlation. 
However, analogue techniques have serious disadvan-
tages, not least with the CCD5. Complicated clocking 
schemes, clock breakthrough, bias, and leakage are some of 
the problem areas. Digital correlators are therefore 
desirable but generally require more silicon area to 
implement, unless the accuracy of polarity correlation is 
sufficient for the application. One digital polarity 
correlator architecture retains the analogue output and 
current summing technique in an attempt to enjoy the best 
of both worlds [131]. The chip consists of 64 stages of 
correlation each with its own current generator which 








Figure 3.7. Digital correlator using current summing in-
tegration technique. 
A system for spread spectrum communications based on this 
chip, is described by Saethermoen et al [132]. To allow 
for over sampling (twice the Nyquist rate), in-phase and 
quadrature correlation, and 4-bit quantisation, the system 
required a total of 16 stages of correlation per integra-
tion sample. The correlator described has an integration 
time of 1024 samples, which was achieved by cascading 256 
correlator chips. 
The architecture of a correlator employing all digi-
tal techniques consists of digital shift registers, digi-
tal multipliers and a digital summing network. A recent 
digital correlator chip (90,89] consists of four, 32-bit 
polarity correlator modules. Their individual outputs can 
be combined in a variety of ways t imp i.merit 1x4 bit 
quantised inputs, 2x2 bit quantisation, 2x1, or lxi bit 
inputs, all with a corresponding compromise in integration 
time. There is also a facility for quadrature signal 
correlation. In the case of a polarity correlator the 
- 53 - 
multipliers are EXNOR gates and the summing network is a 
parallel counter. A parallel counter is a combinational 
circuit that determines how many of its inputs are at a 
given logical state (usually logic 1) expressing the 
result as a parallel, binary number of its outputs. Paral-
lel counters have been extensively researched 
(133,94,95,134]. They are difficult to design, in that 
they lack modularity in an arithmetic sense. For example 
a large parallel counter can only be made from two smaller 
parallel counters by using extra components to combine the 
separate outputs. As a result, a large parallel counter 
is best designed recursively, starting from the minimum 
implementation (3-input full adder) and working up, 
geometrically. Such an approach is the basis of a silicon 
compiler for parallel counters reported by Cappello [135]. 
The architecture of a 31-bit parallel counter is shown in 
Figure 3.8. Parallel counters occupy a significant por-
tion of the silicon area in digital correlator designs. 
Also, pipelining is normally employed to reduce the 
throughput rate, which increases the required area still 
further. Multi-valued logic techniques have been used to 
reduce the silicon area required by integrated parallel 
counters, as shown in Figure 3.9. Area savings of nearly 
50% using quaternary logic have been reported [96,97]. 
Multi-valued logic circuits are most easily realised in 
technologies such as ECL and IlL [136]. This means that 
very high chocking rates (50 MHz) are possible in digital 







- 54 - 
Figure 3.8. 31-bit binary parallel counter using binary 
full adders. 








Figure 3.9. Quaternary logic parallel counter using full 
adder circuits. 
Note the lack of modularity in the architecture of paral-
lel counter, and the high degree of global interconnection 
and fan-in that is necessary. A method which solves these 
particular problems is the architecture shown in Figure 
3.10. Here the summation is distributed along the corre-
lator array. The operation of the circuit will be slower 
than one with a pipelined parallel counter unless pipelin-
ing is incorporated here also, and the circuit is operated 
in a systolic fashion. However, other problems are then 
introduced, since the summing network must be pipelined 
along with the other elements of the correlator. 
- 56 - 
Y -4 	 SHIFT REGISTER 
W  I M M ~ HIM Output 
X Y (X)1 (X)1 (X 
X 
- 	 SHIFT REGISTER 
Figure 3.10. Spatial integration using distributed 
adders. 
This is the basis of the systolic correlators discussed in 
section 3.2.4. 
3.2.2.4. Parallel architecture with spatial integration 
and temporal delay using pipe-organ structures 
A special architecture, shown in Figure 3.11 and 
termed a pipe-organ correlator, is equivalent to the con-
ventional spatial integrating correlator of Figure 3.6. 
It arises from the fact that those CCDs, which do not 
require non-destructive sensing techniques, are much 
simpler to construct than their destructive sensing coun-
terparts. Every delay element in the conventional archi-
tecture transfers its stored information to two inputs, 
namely the next delay element, and a multiplier. It is 
essential, therefore, that the information remains intact 
during the process. To avoid this situation, the same 
algorithm can be implemented using separate delay times 
for each correlation point. Every delay cell now feeds 
- 57 - 
only one input. 	The stored information may now be des- 
troyed during a transfer operation. 	In CCD technology, 
the simplification in circuit design (and control) that 
the technique of destructive sensing permits is often 
worth the extra area involved [137,138]. Miller and Berry 
have described a pipe-organ correlator, where the extra 
area required is reduced by merging CCD cells in groups of 
four [65]. 
A dual of this architecture, which uses temporal 









y in (ODD) 






Figure 3.11. Spatially integrated pipe-organ correlator. 
3.2.3. Serial Parallel Architectures (DELTIC) 
The architectures described in the previous sections 
have comprised of functional elements, all of which 
operate at a common clock rate. This section deals with 
serial/parallel architectures, termed delay-line time 
compressor (DELTIC) correlators [6,139], where internal 
- 59 - 
circuitry operates at a higher rate than the sample rate. 











Figure 3.12. DELTIC correlator configuration. 
Here the data is time compressed, or expanded in 
bandwidth, by a factor N, to permit a single, fast multi-
plier to perform the required NxN multiplications in a 
time equal to N input sample periods. Thus the data con-
tained in the recirculating store must be recirculated at 
a rate of N times the input sample rate. For a fixed data 
record, the memory information is held for N complete 
recirculations before being replaced by a new record. In 
the case of a varying input signal, the oldest memory sam-
ple is replaced by a new input sample after each recircu-
lation. Multiplying each sample of the recirculating data 
by a reference signal and integrating N samples provides 
one point of the correlation function. Further points are 
obtained on successive recirculations. The disadvantage 
of this architecture is that the correlation rate is lim-
ited by the speed of the single multiplication element. 
The concurrent use of an array of multipliers, as 
S. 
described in section 3.2.2, increases the correlation rate 
significantly. It also renders a more modular design 
which is more suitable for VLSI implementation. 
3.2.4. Systolic Architectures 
Systolic architectures have been reviewed by several 
authors [122,121,140,141]. In this section, only those 
systolic architectures for integrated circuit correlation 
will be discussed. The algorithms that underlie these 
architectures can again be divided into two categories: 
spatially integrating correlators, and time integrating 
correlators. 
3.2.4.1. Temporal Integration 
The Eu349 correlator chip, shown in Figure 3.4 is a 
time integrating linear systolic array, which uses global 
control signals but no fan-in. There is only a single 
delay on both input and output shift registers, and the 
elements of the delay may be cascaded directly. This 
architecture can also be adapted to provide fault tolerant 
features by simply adding two multipliers and one latch 
per correlator stage. 	The circuit design of the Eu349 
chip is discussed in detail in chapter 4. 	A time 
integrating correlator chip, similar in concept to the 
Eu349, has been reported by Barral and Moreau [142], and 
is illustrated in Figure 3.13. 




Figure 3.13. Bit-serial systolic correlator (single 
stage). 
The correlator stage shown in Figure 3.13 can compute a 
single correlation point integrated over a maximum of 512 
samples. The samples are 12-bit two's complement numbers 
and the chip contains 11 identical stages. The architec-
ture is bit serial. From the view point of this discus-
sion, however, the main difference between this architec-
ture and that of the Eu349 is the extra pipeline delay 
between each stage. The control signals propagate through 
the pipeline from one stage to the next; thus global con-
trol signals are avoided. Disadvantages of this architec-
ture include low correlation rate due to the bit-serial 
- 62 - 
implementation (300 kHz maximum); integration time limited 
to 512 samples; and only 11 parallel stages of correlation 
per chip. 
The remaining examples here, of systolic correlation 
chips, employ spatial integration. The conceptional 
difference between time integration and spatial integra-
tion is treated in Section 3.3. 
3.2.4.2. Spatial Integration 
In devising systolic architectures all the possible 
permutations of the three quantities (reference, input, 
and results) and the two parameters (moving or stationary) 
are explored. For example, the architecture shown in 
Figure 3.6 can be described as having "stationary refer-
ence signal, moving input signal and stationary output". 
At each shift cycle the stationary outputs fan-in to the 
single. summing network. A similar situation is shown in 
Figure 3.10, the only difference being the summing net-
work, which is now distributed over the array. 
Another permutation is shown in Figure 3.14. In this 
example, which is the architecture of a correlator chip 
described by Snelling and Penn [143], the summing network, 
and input signal channel, are pipelined. If the summing 
network alone were pipelined, an architecture is produced 
which has a stationary reference signal, moving input sig-
nal, and a moving output signal, which will not compute a 
correlation function unless the adjacent bits of the input 
signal (and hence the output signal) are separated by 
zeros. One alternative solution, adopted by Snelling and 
Penn, is to introduce a pipeline delay at each stage in 
the input signal, as well as in the summing network. The 
correlator described by Snelling and Penn is also pipe-
lined into bit slices. The complete architecture allows 
- 63 - 
1-bit x 8-bit correlation, but integration is only over 8 
samples. Another alternative solution, which is described 
by Kawahara (144], is to remove the delay entirely from 
the input signal. This architecture is shown in Figure 
3.15. The chip described by Kawahara computes a 3-bit' x 
4-bit correlation function integrated over 32 samples. 
The output word size is 11 bits. 
PIPELINE DELAYS 





Figure 3.14. Systolic correlator with pipelined summing 
network and input register. 
- 64 - 
PIPELINE REGISTERS 
C 
Figure 3.15. Correlator architecture with pipelined sum-
ming network and global input signal. 
Finally, a two dimensional systolic array of simple 
1-bit processor and memory cells, which can compute corre-
lation functions, is described by McWhirter et al 
(145,1461. The silicon implementation of the architecture 
[147] provides 4-bit x 1-bit correlation, employing spa-
tial integrations over 64 data samples. The correlation 
algorithm uses a moving reference signal, moving input 
signal and moving results. Zeros are interspersed between 
adjacent bits of the input data words and the reference 
words to achieve the desired interaction of the com-
ponents. The architecture of the correlator is shown in 
Figure 3.16. 
__r 	+ 





- 65 - 
Figure 3.16. Two-dimensional systolic correlator of 
McWhirter et al. 
As a result of the interspersed zeros and the continual 
contra-flow of the data and reference bits, a diamond 
shaped region of valid interaction propagates down the 
array, as shown in Figure 3.17. 
S. 
	
Zo /X 	U 	X 	i; 	 0 	): 	0 \ 	 / x \/o 
Cl., o a c N. o i: \o /\ ci 
I 	1 	0 	xi 	co 1 	
/\ / 
	/ 




/ x 	0  - 	 / / - / J/ \ U/ 
/ 
\/x; 	o 	 \ I. 	/ 3 	 .1. 	 0 4x  





0 	I. \/ 0 / X. \ 	/\ 
0 	 \<C4 	 U ; 
Figure 3.17. Data flow in the systolic correlator of 
McWhirter et al. 
The partial products inside the interaction area eventu-
ally reach the bottom edge of the array where they are 
accumulated by the adder cells (marked (b) in Figure 
3.16). Only those partial products which are relevant to 
the particular correlation point being accumulated, will 
have any effect, since all others will have a zero in one, 
or both of the multiplicands. The correlator operates in 
a bit serial manner, and produces a valid result every 
4N-1 clock cycles (for two's complement numbers), where N 
is the length (integration time) of the array. A CMOS 
realisation of this architecture, where N=64, operating at 
20 MHz could provide 16-bit results at a rate of just 















- 67 - 
this type of systolic array is one of throughput, particu-
larly if the array is large. Another disadvantage is that 
arrays must be cascaded geometrically to allow for inter-
nal word growth in the partial products. In practice, 
truncation is used to limit the permitted word growth. 
3.3. Correlation Cube: The Difference Between Temporal 
and Spatial Integration 
In this section the contrasting architectures of time 
integrating correlators and spatially integrating correla-
tors are discussed. There are two points to note in par-
ticular: time delay implementation and integration tech-
nique. Figures 3.18 and 3.19 show respectively spatial 
integration and time integration architectures. 
integration time (N) 
Figure 3.18. Correlator architecture using a single 
parallel integrator. 
Inputs 
x 	y 	PD = polarity detect 
- 	







Figure 3.19. Eu349 architecture using a parallel array of 
serial integrators. 
In Figure 3.18, the relative delay between the sig-
nals is achieved by dumping the x register contents into 
the reference register. In this way the x signal is held 
stationary while the y signal is shifted past. The time 
delay window is given by the period between the x register 
parallel dumps. In contrast the Eu349 architecture delays 
only one input signal. Hence each stage of the correlator 
introduces a unit of time delay between the input signals. 
For the Eu349 the time delay window is given by the number 
of correlator stages, and may be increased easily by cas-
cading the Eu349 chips. 
Integration time in the Eu349 is governed by the 
capacity of the integrating counters, which is programm-
able. Thus the integration time may be varied from I to 
32,766, regardless of the number of correlation stages in 
the cascade. 
In the architecture of Figure 3.18, the integration 
time is determined by the length of the correlator, that 
is, the number of bits in the shift registers. 
Integration time is therefore short. The integration time 
may be increased by cascading the chips, but this is dif-
ficult because the individual chip outputs (typically 7 
bits for an integration time of 64) must be added together 
using external circuitry [94,95]. 
The differences in the architectures of Figures 3.18 
and 3.19, may be summarised by viewing correlation in 
three dimensions: function amplitude, time delay, and 
integration time. A correlation cube to demonstrate this 
is shown in Figure 3.20. Both architectures incorporate 
two physical dimensions and one time dimension. It can be 
seen in Figure 3.20 how the two physical dimensions of the 
architectures occupy orthogonal slices of the correlation 
cube. 
architecture using 	architecture of 
parallel integration Eu349 
in -7- 
1-1 
Figure 3.20. The correlation cube showing the relation-
ship between the two contrasting architectures. 
3.3.1. Correlator Architecture based on Spatial Integra-
tion 
For this architecture, the correlation equation 
- 70 - 
N-i 
r(t) = E 'k Xk_T 	 3.1 k=0 
is implemented by storing the reference signal Yk in a 
(maskable) register latch of N stages, where N represents 
the correlator integration time. The input signal x  is 
shifted along a tapped shift register and at the kth tap, 
on each clock cycle, the product Yk.Xk_t is produced. 
Summing these individual products in a single (parallel) 
counter produces the desired correlation function. The 
output is one b-bit value of the function for each clock 
cycle. Effectively, the parallel counter is integrating 
the products for all k, from 0 to N-i, simultaneously, at 
a single delay value t, per clock cycle. 
Increasing code lengths by cascading individual 
correlator chips is complicated by the need to add 
together the b-bit words from individual parallel 
counters. In this parallel counter structure the correla-
tor integration time is less than (if masking is used), or 
equal to the reference code length. The correlator, 
therefore, has three degrees of freedom: integration time, 
correlation lag or delay, and correlation amplitude. 
These may be represented on the correlation cube as shown 
in Figure 3.21. 
- 71 - 
Correlation 
Amplitude 
r(r) Car lation function 
builds up in this 
- direction 
Integration 
time (N) 	 Correlation 
lag (r) 
Figure 3.21. Correlation cube for the spatially integrat-
ing correlator. 
The correlation function of Figure 3.22, which has 
fixed integration time N, and produces one value of r(t) 
for each lag t, would be depicted as shown in Figure 3.21, 
on the rear face of the cube, since the integration over N 
samples is effectively performed instantaneously within 
the parallel counter. 
rç r) 
Figure 3.22. 3.22. The correlation function from the spatially. 
integrating architecture. 
- 72 - 
3.3.2. 	Correlation Architecture based on Temporal 
Integration 
In the Eu349, the correlation equation 
N-i 
r(t) = E y k Xk T 	 3.2 k=O 
is implemented. The input signal x  is shifted through a 
tapped shift register and at each tap the product Yk.Xk.t 
is produced. Note that, in contrast to the previous 
architecture, the Eu349 does not involve on-chip latching 
of the reference waveform y k' and that each individual 
sample of y is applied simultaneously to all stages. A 
serial counter/integrator on each stage integrates the 
product values for all values of k, up to a maximum of N-
1. The value N, which represents the integration time, is 
simply the preset capacity of the serial counters, and has 
nothing whatsoever to do with the number of stages in the 
Eu349 correlator. Each integrating counter in the Eu349, 
is dedicated to integrating the product values for one 
particular value of lag or delay. The value of lag is 
determined by the position of the counter in the overall 
array. Thus the contents of the tth counter, after an 
integration time of N, will be 
N-i 
1 	Y1F Xl,_ = 
k=O ' ' 
3.3 
which is exactly the function produced by the spatially 
integrating correlator. 
The main difference between the two architectures can 
be visualised with reference to the correlation cube. 







buikis up in this 
- 	Correlation 
log (T) 
Figure 3.23. Correlation cube for the correlator Eu349. 
Whereas in the spatially integrating correlator, the 
integration over N samples is performed "instantaneously" 
in the parallel counter to produce one value of the corre-
lation function per clock cycle, in the Eu349, the array 
of serial counters simultaneously offer the values of the 
correlation function at all time lags as a function of 
integration time. Thus, after an integration time of N 
clock cycles, the values in the serial counters will 
represent the final correlation function, identical to the 
result from the spatially integrating device. 
In the Eu349, the values r(t) can be read out from 
the array of counters to yield the correlation function. 
In contrast to the spatially integrating device, the Eu349 
offers direct cascading of individual chips without 
requiring the use of additional circuitry, to increase the 
maximum length of reference code and lag value, whilst 
offering an independent variation of integration time N by 
the presettable serial counters. Also, the correlation 
rate is not affected by the size of the array. 
3.3.3. Display of Correlation Output 
With the spatially integrating correlator, the b-bit 
output of the correlation function is obtained each clock 
cycle. With the Eu349, a latency of N clock cycles (N is 
- 74 - 
the integration time through the correlation cube of Fig-
ure 3.23) is required before the correlation function can 
be read out. As an alternative display mechanism, the 
overloading counter technique can be used to provide a 
bit-serial output of the correlation function for applica-
tions where the integration time is significantly greater 
than the maximum lag value, or reference code length. 
Here, when one of the presettable serial counters over-
loads, a flag is set, the overload status (one bit for 
each counter) is read out for all counters from a serial 
shift register, and the time lag of the correlation func-
tion peak (see Figure 3.23) can be determined. On later 
clock cycles (representing lesser correlation signifi-
cance), several other counters will have overloaded, so 
that points around the main correlation peak, and other 
lesser peaks of the function may be displayed (see Figure 
3.24). 
counter index 	(T) 
Figure 3.24. Display technique for the Eu349 correlator. 
3.4. Summary 
In this chapter, several implementations of silicon 
correlators have been discussed. The architectures may be 
classified by observing whether time integrating or spa-
tially integrating techniques have been used. The 
- 75 - 
difference between these two concepts has been illustrated 
by the correlation cube. Further segregation of correla-
tor architectures may be made by observing which computa-
tional techniques have been used, namely bit serial, bit 
parallel, polarity, systolic etc. 
Parallel and concurrent techniques are employed to an 
ever increasing extent in integrated circuit correlators. 
However there exists a compromise between using a large 
number of very simple concurrent operations, and using a 
small number of complex cells, to achieve a common objec-
tive. In the DELTIC correlator, a single, fast, multi-
plier is used. In the systolic correlator of McWhirter et 
al. 	delay, multiply, and add operations are distributed 
over a large 2-dimensional array of simple cells. 	How- 
ever, partial products are only generated in cells within 
an interaction region and these in turn are only used to 
form a product on every alternate clock cycle. Further-
more, to achieve useful integration times a large array of 
cells is required, and to increase the integration time 
requires cells to be cascaded. Normally this would not be 
a disadvantage; it is in fact preferable for VLSI archi-
tectures.to be modular and cascadable. However the output 
rate of this correlator is inversely proportional to the 
size of the array. 
The architecture of the Eu349 correlator achieves a 
balance between concurrency, cascadability and correlation 
rate. The architecture is concurrent in that each point 
of the correlation function is computed in parallel. The 
architecture is directly cascadable, and the correlation 
rate is independent of the length of the array. 
- 76 - 
CHAPTER 4 
VLSI DESIGN STRATEGIES 
FOR TESTABILITY AND FAULT TOLERANCE 
4.1.. Introduction 
The concepts of design for testability and fault 
tolerance in integrated circuit design become important as 
feature sizes shrink and chip sizes increase. The chip 
described in this thesis embodies a design for testability 
strategy and provides yield enhancement and fault toler-
ance through the use of redundancy. These two topics are 
discussed in this chapter. 
Design for testability addresses the two major facets 
of the chip testing problem: test pattern generation, and 
test response verification. At the circuit complexities 
presented by VLSI the need to design testable logic cir-
cuits is crucial, and considerable work has been done in 
recent years in devising design strategies that produce 
highly testable circuits [148,149,150,151,152]. Testabil-
ity can be achieved by: 
ad hoc partitioning of a VLSI design into small 
testable modules or stages, 
the inclusion of a systematic testability scheme, 
such as scan path, and 
C) 	built-in test and self test strategies, and associ- 
ated data compression techniques. 
Fault tolerance is undoubtly a desirable property in 
- 77 - 
any electronic system. In order to take full advantage of 
VLSI, the design strategy should include techniques for 
fault tolerance and yield improvement. Examples of these 
techniques are 
modified design rules, which reduce the probability 
of yield loss due to critical spacings, or random 
defects (defect avoidance), 
replication of critical circuits with associated 
majority voting schemes (concurrent fault tolerance), 
and 
C) 	modified VLSI architectures in which redundant cir- 
cuit modules can be switched into operation to com-
pensate for defective areas (nonconcurrent fault 
tolerance). 
In this thesis, attention will be restricted to non-
concurrent schemes, referred to in paragraph (c) above. 
Increased design and implementation costs should be 
expected when redundancy is incorporated into a VLSI 
design. A figure of merit can be defined, however, which 
takes into account the improvement of yield and the 
increase in implementation cost. The yield enhancement 
scheme is worthwhile, when the figure of merit is greater 
than unity i.e. the cost of the redundant chips will be 
lower than the cost of the nonredundant chips. The figure 
of merit for redundantly designed chips is a maximum when 
approximately 10% of the circuit is redundant [153]. This 
implies that chips can be designed around an optimum 
amount of additional circuitry to improve yield. 
The Eu349 chip described in this thesis, has been 
designed for testability and fault tolerance. Further-
more, the design strategy allows faulty stages to be 
detected and eliminated automatically. The circuit design 
is presented in Chapter 5, but as a precursor, the sub-
jects of VLSI design for testability, and design for fault 
tolerance and yield enhancement will be reviewed in this 
chapter. 
4.2. Test Philosophies and The Motivation Behind Design 
for Testability 
With the increase in complexity of logic that can be 
fabricated on a VLSI chip, there is a growing problem in 
validating the logical behaviour of the chip at manufac-
ture. Traditional test techniques require the derivation 
of input test stimuli, and associated output responses. 
Exhaustive testing of circuits demands the consideration 
of all possible logic states in which a circuit can exist. 
This strategy rapidly becomes uneconomical in complex, or 
deeply sequential circuits, since the costs and times 
involved in test pattern generation grow exponentially 
with increasing circuit complexity [152]. Techniques to 
reduce the number of test stimuli are based on the use of 
fault models and a knowledge of the internal structure of 
the circuit. The most common fault model is the stuck-at 
model. More comprehensive models are possible [154] but 
they substantially increase the difficulty of test pattern 
generation and do not offer any significant compensating 
advantages [151]. 
The efficiency of a test set is measured by its fault 
coverage, which, in the case of a stuck-at fault model, 
refers to the percentage of possible stuck-at faults the 
test set will detect. Fault simulation is commonly used 
in logic circuit testing to evaluate whether a generated 
test set does indeed detect the faults it was intended to 
detect. It is also used to compute the fault coverage. 
-79- 
There are a number of difficulties with this 
approach. Firstly, a fault model is required. In VLSI 
circuits the classical assumption that only single stuck-
at faults need be modelled is not sufficient [154]. More 
comprehensive models are possible, but they increase the 
task of test pattern generation. Secondly, test pattern 
generation is required. Automatic test pattern generation 
[155] is very costly and typically does not provide a suf-
ficiently high fault coverage. For sequential circuits at 
VLSI complexity, automatic test generation is extremely 
difficult, and manual generation is time consuming and 
error prone [149]. 
One method which avoids the problem of producing a 
specific test pattern is random testing [156]. In this 
case a relatively large number of random patterns are 
applied to the circuit under test. If the response is 
found to match the expected circuit response, then it can 
be assumed, within a specified confidence limit, that the 
circuit is fault-free. Random testing has been found to 
be. an extremely effective means for fault detection in 
combinational circuits, but its effectiveness in dealing 
with sequential circuits is not easily defined [151]. 
An alternative to gate level testing is functional 
testing. This approach has the advantage that tests can 
be generated without having a detailed knowledge of the 
gate structure of the chip. the problem with functional 
testing, however, is that the only way to be certain that 
the circuit is fault-free, is to perform an exhaustive 
functional test. Since exhaustive testing is only feasi-
ble for circuits which have few inputs and few sequential 
states, then functional testing, on its own, is not a 
practical approach to VLSI testing. 
From the foregoing discussion, it can be concluded 
that testing becomes increasingly difficult as designs 
approach VLSI complexity. Methods used to reduce the 
amount of test data reduce, in turn, the fault coverage, 
and in any case are difficult to automate for large cir-
cuits. The only solution to these problems is to reduce 
the complexity of VLSI circuits, at least with regard to 
testing. Hence the term "design for testability". Figure 
4.1 shows a comparison between test costs with, and 
without, design for testability. The test costs without 
design for testability grow exponentially with increasing 
complexity, in contrast to the almost linear characteris-
tics of test costs for circuits which incorporate a design 






5k 	 20k 
Circuit complexity (gates/chip) 
Figure 4.1. Comparison of test costs with, and without, 
design for testability. 
This thesis addresses the need to embody a testabil-
ity scheme within the VLSI integrated circuit itself, and 
describes a methodology which makes this possible for well 
structured systems. 
4.3. Design for Testability Methods 
4.3.1. Objectives 
Testability involves two important concepts: control-
lability and observability. Controllability is the abil-
ity to establish a circuit in a controlled initial state, 
and observability is the ability to observe externally, 
the internal states. Design for testability involves 
increasing the controllability and observability of the 
constituents of a design by decomposing the overall design 
into more manageable elements. The cost of design for 
testability can be measured by the number of additional 
package pins required for test purposes, the number of 
additional test circuits required, and any loss in perfor-
mance resulting from design for testability techniques. 
Increased circuit complexity reduces fabrication 
yield (153]. Thus, the increased chip costs involved in 
using extra silicon area for test purposes must be weighed 
against the savings in test costs, which are-usually 
reflected by test time. Typically the use of test cir-
cuits which increase the chip area by approximately 10% is 
considered reasonable [158]. The variation of relative 
test costs with test circuit area overhead is shown in 
Figure 4.2 [157]. 
 






10 	20 	30 	40 
Test circuit area overhead (%) 
Figure 4.2. Typical variation of chip costs as a function 
of test circuit area overhead. 
The significance of the test circuit area overhead depends 
on the type, and application, of the chip being designed. 
For low cost, high volume, modest performance designs, an 
acceptable test overhead is around 10%, whereas for high 
performance, low volume applications, test overheads of 
100% may be acceptable. 
4.3.2. Ad Hoc Methods 
Ad hoc approaches to design for testability are in 
fact simply guidelines on how to improve the testability 
of a particular circuit. The testability problem has to 
be addressed again and again with every new design. The 
most common ad hoc method is circuit partitioning with 
added test points. This allows the circuit to be split 
into functional sub-modules, each of which may be accessed 
and tested individually. The type of circuit architecture 
is important to the choice of ad hoc testability scheme. 
For example, bus structured circuits, such as microproces-
sors, are easily partitioned, using the busses as test 
points. However, with growing VLSI complexity, additional 
design for testability schemes must be employed within the 
sub-modules. 
4.3.3. Scan Methods 
The scan path method of design for testability 
enhances the controllability and observability of a VLSI 
circuit by allowing access to the internal states of a 
circuit [159]. The principle of the technique is to pro-
vide additional facilities within the circuit, so that the 
storage devices can be tested separately from the rest of 
the circuit; the future state of the internal variables 
can be set to any desired value independent of their 
present values; and the values of the internal variables 
can be accessed and observed directly. These facilities 
can be achieved by establishing a scan path through the 
storage devices, as shown in Figure 4.3. The scan path 
operates in two modes. In normal mode, the storage dev -
ices in the scan path are not linked together and the nor-
mal operation of the circuit is not affected. In scan 
mode, the storage devices are linked to form a shift 
register. The serial input and serial output provide con-
trollability and observability to the internal states of 






Figure 4.3. Scan path in design for testability. 
Level sensitive scan design (LSSD) is a method of 
constructing scan paths which relies on strict design 
rules and guidelines. They are designed so that their 
operation is as independent as possible from the circuit's 
a.c. parameters, such as degraded rise and fall times, 
degraded propagation delays, or other faults that may 
introduce race or hazard conditions. As a result, the 
potential effect of failure mechanisms that cause timing 
faults is reduced [160]. 
The method of testing using scan path is as follows. 
Firstly the scan path is itself tested. This is done by 
selecting scan path mode, i.e. the storage elements con-
figured as shift register. The status and operation of 
each storage device is tested using the Scan Data In, Scan 
Data Out, and Clock facilities shown in Figure 4.3. The 
test procedure uses a flush test followed by a shift test. 
Flush test begins by initialising the storage elements to 
0. Then a single 1 is clocked through the scan path from 
: 
the Scan Data In input to the Scan Data Out output. The 
test can be repeated with a single 0 flushed through a 
background of is. Flush test checks the ability of each 
storage device to assume a 0-state and a 1-state, and the 
ability to transfer the stored state to the output. Shift 
test consists of clocking the sequence 00110011... through 
the scan path shift register. This sequence exercises 
each storage device through all combinations of present 
state and future state [158]. 
Secondly, the circuitry between scan path nodes can 
be tested. This is done by selecting scan path mode and 
shifting a predetermined test pattern into the storage 
devices. Also, a set of test vectors are applied to the 
primary inputs. Then the circuit is switched to normal 
operation. The steady state output response of the cir-
cuit under test can now be clocked into the storage dev-
ices. Finally, scan path mode is reselected, and the con-
tents of the storage devices are clocked out. These 
values, plus the values directly observable on the primary 
outputs, can be compared with the expected fault-free 
response. 
The total test time is determined mainly by the 
number of stages in the scan path which, in turn, is 
determined by the number of individual logic blocks to be 
tested. Optimum scan testing requires the inclusion of a 
complete scan path which leaves no sequential logic cir-
cuits during test mode. However, speed, performance, or 
area constraints, may restrict the use of this technique, 
with the result that parts of the circuit are sequential 
during the test. 
The implementation overhead of a scan path test stra-
tegy, in terms of additional design and silicon area, 
depends on the basic structure of the circuit, and the 
availability of circuit elements that are suitable for 
conversion into scan path elements. The simplest form of 
scan path test strategy is to add scan path shift regis-
ters to the VLSI design. Clearly this involves increased 
circuit area. A more attractive scan path implementation 
involves functional conversion of existing circuit ele-
ments into the required reconfigurable storage elements, 
thus reducing test area overheads to a minimum. Such a 
strategy is often forced upon the designer by the archi-
tecture and design software of semi-custom integrated cir-
cuits. In the UK5000 gate array [161], for example, the 
rows of uncommitted logic cells are sandwiched between 
rows of predefined LSSD latches. When the designer 
requires a storage element he is forced to use one of the 
LSSD latches. In this way the design is guaranteed 
testable. In the case of the Eu349 correlator design, 
functional conversion of existing circuitry has been 
extensively used. 
The effect of scan paths on circuit performance is 
only of importance when additional scan path register 
stages have to be included in the design. Otherwise, only 
increased loading and routing need be considered. 
The primary advantage of the scan path method is that 
as few as three extra circuit pins need be used to allow 
test-enable, and data input and output. However, the scan 
path merely allows access to internal circuit nodes to 
enhance controllability and observability. Testing cir-
cuits that have scan paths incorporated still requires 
external test pattern generation, and test response moni-
toring, to derive the test result. 
4.3.4. Built-In Self Test Methods 
Built-in self test (BIST) is a design for testability 
strategy in which test pattern generation and circuit 
response monitoring is performed within the system. This 
can be done either concurrently or nonconcurrently. Con-
current (on-line) methods use a variety of error-
detecting, error-correcting, and self-checking codes. 
Nonconcurrent methods require an external activation which 
initiates the built-in test and inhibits the normal func-
tion of the circuit. The advantages of self test are that 
the test may be repeated as and when necessary during the 
service life of the system, and not simply at manufacture. 
For example, the system may be configured to initiate a 
self test automatically at each power-on. This thesis is 
primarily concerned with nonconcurrent self test methods, 
and the remainder of this section shall deal with two 
implementation techniques for. built-in test. These are 
signature analysis [162,163], and BILBO (Built-In Logic 
Block Observation) [164,165]. 
In built-in test, it is essential that the the test 
pattern is short, or at least that it can be generated 
easily by a small amount of additional circuitry. The 
same criterion applies to test response monitoring. Test 
pattern generation can be simplified by using pseudo-
random binary sequences (PRBS) [166] which are easy to 
generate on chip using a simple linear feedback shift 
register (LFSR), as shown in Figure 4.4. 
pseudo - random Outputs 
81 	B2 	83 	84 
Clock 
Initial values 
81 	B2 	83 	84 
1 	1 1 1 - 
o 1 1 1 
o 0 1 1 
o 	0 0 1 
1 0 0 0 
o 1 0 0 
o 	0 1 0 
1 0 0 1 
1 	1 0 0 
o 1 1 0 
1 	0 1 1 
o 1 0 1 
1 	0 1 0 
1 1 0 1 
1 	1 1 0 
1 	1 	1 	1- 
Repeat Cycle 
(2 4) - 1 = 15 patterns 
(0000) is the lockup pattern 
Figure 4.4. Configuration of a PRBS generator. 
Data compression techniques, such as signature 
analysis, can greatly reduce the problem of test response 
monitoring [167,156]. Signature analysis is carried out 
using a linear feedback shift register, adapted to perform 
cyclic redundancy checking (CRC) on the test response 





Figure 4.5. Signature analysis register. 
The test sequence is sampled and clocked into the shift 
register. The contents of the shift register are influ-
enced not only by the next sampled value of the test 
sequence, but, by virtue of the feedback structure, by the 
current contents of the register. In this way, any corr-
uption of the sampled bit stream causes a corresponding 
corruption in the contents of the shift register. At the 
end of the test period, the accumulated contents of the 
shift register represents the signature of the node under 
test. The signature is compared with the expected fault-
free signature, and a match indicates that the' node is 
fault-free; a mismatch indicates that the response 
sequence is corrupt in some way. For CRC signatures, the 
probability of a corrupt data stream generating the same 
signature as the fault-free data stream is extremely low, 
quickly approaching 2' as the length of the data stream 
exceeds the length n of the shift register [162]. 
The BILBO technique [164,165] is a recent innovation 
which draws together all the main elements of design for 
testability, including pseudo-random test pattern 
generation, scan path, and signature analysis. The tech-
nique reduces the test overhead by exploiting the shift 
register elements, common to all three schemes. The basic 
BILBO element is illustrated in Figure 4.6. 
parallel data outputs 
ZI 	 Z2 	 Z3 	 Z4 
mode control C 




U] 	 U2 	 03 	 04 
parallel data outputs 
Figure 4.6. Basic BILBO element. 
Each BILBO consists of a latch register and some 'addi-
tional gates for shift and feedback operations. Four dif-
ferent functional modes can be selected using the two mode 
controls Cl and C2 (164). In the first mode (Cl = 1, C2 = 
1), each latch is independent and can be used in normal 
operation. In the second mode, the BILBO is configured as 
a shift register and operates as a scan path (Cl = 0, C2 = 
0). In the third mode (Cl = 1, C2 = 0), the BILBO is 
functionally converted into a multiple input signature 
register (MISR), and in the forth mode (Cl = 0, C2 = 1) 
the latches are reset. 
Multiple input signature registers can perform either 
pseudo-random sequence generation, or signature analysis. 
PBBS generation is achieved by setting the parallel inputs 
- 91 - 
to zero. 	As a signature analysis register the BILBO can 
operate in two modes: serial input, or parallel input. In 
serial input mode, the test data is clocked into Zi while 
the remaining parallel inputs are held at zero. In paral-
lel input mode, the test sequences are clocked into some, 
or all of the Z-inputs. The theory of multiple input sig-
nature analysis is complex, and is beyond the scope of 
this thesis. The most important aspect of signature 
analysis, as regards this thesis, is that the probability 
of fault detection is very high. It can be shown that the 
probability of detecting errors from L input vectors of m 
bits each, by an n bit MISR is [167] 
2mL-n 1 
mL 	 4.1 2 -1 
assuming all error sequences to be equally likely. 
4.4. VLSI Design for Testability in the Eu349 Correlator 
Chip 
This section contains a summary of the design 
features that are relevant to the Eu349 correlator chip. 
Details concerning the operation of the architecture have 
been described in Chapter 3; details of the chip design 
will be presented in Chapter 5. 
A block diagram showing the main elements of the pro-
totype correlator is presented in Figure 4.7. The figure 
shows an array of coincidence detectors and integrating 
counters, whose inputs and outputs are linked together by 
two shift registers, the data shift register (DSR), and 
the overload shift register (OSR). In test mode the DSP 
and OSR act as scan paths, while the integrating counters 
perform signature analysis. Signature analysis provides a 
self test of the integrating counters, and after a 
- 92 - 
complete test period, the counters contain the compressed 
signatures of each correlator stage. There are only two 
primary data inputs to the correlator, therefore an 
exhaustive functional test is possible, and only requires 
four different test patterns. Each test pattern, however, 
must be repeated for the number of clock cycles necessary 
to complete the signature analysis. In a complete test 
period, four integrating counter self tests, and one 


























- 93 - 
CONTROL CIRCUITRY 	 a) 




Cl) 	 0 
Q) En 
I 	-J 




















logic 	 MISR 
SET comparator 
Figure 4.7. Correlator block diagram showing Built-In 
Self Test features. 
At the end of each integrating counter self test the sig-
nature must be checked for correctness. This is done by 
an external signal called Fidelity-Test (F-Test), and the 
result, a single GO/NO-GO status bit is stored in the 
associated overload latch. The results of the full test 
for each correlator stage in the array may then be exam-
ined using the overload shift register in scan path mode. 
- 94 - 
The architecture of the Eu349 correlator uses the 
results of the self test (the GO/NO-GO status bits) to 
provide yield enhancement and fault tolerance. This 
aspect of the design is summarised in Section 4.7, and is 
addressed in more detail in Chapter 5. 
4.5. Integrated Circuit Yield Statistics 
4.5.1. Scope 
The integrated circuit described in this thesis con-
tains approximately 7500 MOS transistors interconnected to 
perform a specific electronic function. The probability 
that all the devices and their interconnections will func-
tion correctly depends on the control exercised during the 
series of complex processing steps used in their manufac-
ture. The fraction of chips that satisfy the final test 
programme is called the yield. This section of the thesis 
deals with the mathematical models used to predict yield. 
Yield statistics are important in both controlling a sem-
iconductor fabrication process, and in predicting the 
yield of future semiconductor products. They are also 
essential for analysing (or anticipating) the effective-
ness of a yield enchancement scheme. It is this particu-
lar application of yield statistics that is of primary 
interest here. 
The yield associated with integrated circuit fabrica-
tion can be divided into three parts. The first part 
results from catastrophic defects, such as wafer breakage, 
missing or erroneous processing steps etc., which prevent 
the circuits ever reaching final test. These defects will 
not be included in the discussion. The second part, known 
as pre-assembly test yield, deals with localised process 
defects, and the third part takes account of faults caused 
by packaging. The main area of interest here is the 
- 95 - 
second yield category, the pre-assembly test yield. This 
yield can be divided into two classes. Firstly, there are 
gross yields, which are the result of gross defects, such 
as process parameter variations, causing large areas of 
the wafer to fail, and secondly, random yields, which are 
the result of random defects, such as thin oxide pinholes. 
The dependence of yield upon chip area has been 
extensively studied in the literature. Various theories 
have been presented, and analytical expressions derived to 
fit statistical data based on defect density distributions 
[168,169,17O,171,172,173,174]. The work is based on ran-
dom defect distributions, and the papers differ in their 
treatment of various defects being distinguishable or 
indistinguishable from each other. 
Attempts at yield calculations that take redundancy 
into account have largely concerned memory chips 
(175,176,177]. The model presented by Schuster [177] is 
based on the exponential dependence of yield on the active 
chip area. The defects are separated as correctable, 
uncorrectable, and gross imperfections, and the net yield 
is calculated as the product of these three independently 
calculated yields. Stapper et al [175] have described a 
yield model with redundancy based on the Gamma distribu-
tion of defects. They then use mixed Poisson statistics 
to derive a yield expression to describe the yield of 
redundantly designed memory chips. 
Researchers have also been concerned with redundancy 
in non-memory VLSI chips [178,153,179], and they all agree 
that redundantly designed circuits have more chance of 
working than nonredundant designs. Mangir et al [153], in 
their model, account for the effects of the complexities 
of areas, connectivities between different areas, and the 
effect of regularity of interconnections, which would 
affect the processing tolerances, and hence yield. 
Before describing in detail a yield model for random 
defects, it is necessary to describe the yield losses due 
to gross defects. 
4.5.2. Yield Loss due to Gross Defects 
Gross defects, which are normally associated with 
errors in the process parameters, may cause large areas, 
or entire wafers, to have no functioning chips. Examples 
of these parameters are Lian.I:tor gain, threshold vol-
tages, contact resistance, and parasitic capacitances. 
Entire wafers will fail if the values of these parameters 
fall outside of their specified range. In marginal cases 
parts of wafers may fail, as shown in Figure 4.8. Gross 
yield losses may also be caused by errors in photolitho-
graphic processes. Examples of these are over or under 
exposure of the photosensitive resist material, optical 
distortions, and misalignment of mask patterns. The 
failures do not cause the chips to fail in random patterns 
on the wafer. This is why they must be treated separately 
from random defects. 
Figure 4.8. Wafer map showing gross yield. The shaded 
chips are functioning correctly. 
- 97 - 
Special test circuits for measuring the process parameters 
are usually fabricated on the wafer, either in the free 
space between the chips (the scribe channel), or in 
reserved areas of each chip (test stripe), or in a small 
number (5-6) of chip size replacement " drop - ins " . Mask 
misalignment can also be measured in this way. The frac-
tion of test devices whose measured parameters lie within 
the required range, contributes to the gross yield. 
Stapper (180] gives an example of the relative yield 
losses occurring in the manufacture of a 64k-bit random 
access memory (RAM) chip. These are reproduced in Figure 
4.9. The actual values of the yield losses are 
proprietary information and have not been published. Note 
that the parametric yield accounts for less than 5% of the 
total yield loss. 
GROSS YIELD LOSSES 
RANDOM DEFECT 
LOSSES 	 - 






Random Photo Defects 
Random Pinhole Defects 
Random Leakage Defects 
Miscellaneous 
Random Defects 
Figure 4.9. Relative yield losses. Random defects cause 
most of the losses. 
Figure 4.9 represents data obtained from a specific pro-
cess line for a specific product, and therefore may not be 
applicable to other circuit types or fabrication facili-
ties. 
4.5.3. Yield Model for Random Defects 
The data shown in Figure 4.9 indicates that random 
defects cause approximately five times as many chips to 
fail than gross defects. Random defect models are, there-
fore, an important factor in semiconductor yield statis-
tics. 
Due to the nature of random defects, and to the com-
plexity of the fabrication process, it is impossible to 
S. 
tell whether observed defects will cause actual chip 
failures. Therefore, the random defect model must be 
divided into two parts. The first part deals with the 
average number of failures of faults that can be caused by 
a large number of different defect mechanisms. The second 
part deals with the statistical distribution of the aver-
age number of faults per chip. According to this theory, 
each defect type is associated with a probability that it 
will cause a failure. This probability can be multiplied 
by the number of defects in the corresponding category to 
obtain the average number of failures or faults per chip. 
This must be done for each defect type. Several failure 
models that have been developed for this purpose are 
described by Stapper et al [180]. However, this theory 
leads to very cumbersome expressions involving hundreds of 
terms, the data for many of which would be very difficult 
to obtain. Fortunately, for the purposes of this thesis, 
a simpler model using a single average defect density will 
suffice. 
The simple theory using Poissoi statistics on a ran-
dom distribution of faults, predicts that the yield is 
proportional to the exponential of the average number of 
faults per chip, or the chip area (if the fault distribu-
tion is constant). In practice, however, it has been 
observed that the defect distribution is non-uniform and 
the yield falls off less sharply, but nevertheless signi-
ficantly with increasing chip area [181,182]. To account 
for this, a wide range of random defect models have been 
reported. Price [169], and later Mangir et al [153], 
maintained that defects should be modelled by Bose-
Einstein statistics. Others have favoured Maxwell-
Boltznian statistics [183,170,184]. Stapper et al [180] 
discuss Poisson, Binomial, and Generalised Negative Bino-
mial statistics, and conclude that each one of these may 
be applied to yield theory. The correct model is the one 
- 100 - 
which fits the data best, and according to Stapper et al 
the Generalised Negative Binomial, distribution is the most 
suitable for modelling present day semiconductor manufac-
turing. 
When simple Poisson statistics are used, the yield 
due to random defects is given by 
= e X 	 4.2 
where X is the average number of faults, given by the pro-
duct of the defect susceptible chip area A, and the aver-
age defect density D, 
=AD 	 4.3 
However, the average value of faults per chip X varies 
from chip to chip, from wafer to wafer, and from batch to 
batch. To take account of these variations a yield model 
that uses the sum of many thousands of fault terms is 
required. The sum may be approximated by an integral. 
The yield is then given by 
00 
S e'g(X)dX 	 4.4 
where g(A) is a probability distribution function of 
faults per chip. This model was reported by Murphy in 
1964 [168] with uniform and triangular distributions given 
fbr g(A). Murphy's results, however, took no account of 
the fact that defects in semiconductor fabrication tend to 
cluster. A more suitable yield model reported by Stapper 
in 1973 [171] uses a Gamma distribution for g(), and an 
expression for yield is obtained of the form 
Y  = (1+o2/_2b02 	 4.5 
- 101 - 
where X is the mean, and o is the standard deviation of 
the Gamma distribution. Defining a constant 
a = 	 4.6 
gives 
= (l +X/ a )_a 	 4.7 
This distribution is known as the Generalised Negative 
Binomial distribution. The parameter a depends on the 
spread of the fault distribution and takes into account 
the clustering of defects. 
Clustering is believed to be caused by the aggrega-
tion of particles that have collected in the manufacturing 
equipment. When shaken loose by vibrations, pressure 
changes, etc., these clumps of particles form clouds in 
the fluids used for processing the integrated circuits. 
Where such clouds reach the wafer surface, particles are 
clustered. Even when contaminating particles are uni-
formly distributed in the fluids, they are electrostati-
cally attracted to the nearest edge of the wafer. This 
leads to edge clustering, a phenomenon in IC fabrication 
that has been widely observed [185,186,170,187]. 
A comparison of yield models is shown in Figure 4.10. 
- 102 - 
100% 








Poisson Statistics  
10% 
5 	10 	15 Chip area mm 2 
Figure 4.10. Comparison of yield models. 
It is interesting to note, that the expressions for yield 
given in Figure 4.10, can be linked to each other by the 
value of the clustering parameter. Low values of cx are 
used to model severe clustering. When a=1 the yield model 
in 4.7 takes the form = (1+X) which is the same, 
mutatis mutandis, as the Bose-Einstein model reported by 
Price [169], and Mangir et al [153]. When a • approaches 
infinity, 4.7 approaches e', which is the same as the 
simple Poisson model in 4. 2. In this case ti iere is no 
clustering, i.e. a uniform distribution. 
4.5.4. General Yield Model for VLSI Chips with No Redun-
dancy 
The yield model described in the previous section can 
be considered a general model for random yield. To com-
plete the model, gross yield, which was discussed in 
- 103 - 
Section 4.5.2, must be taken into account. Gross yields 
may be incorporated into the general model simply as yield 






where Y G  is the average yield due to gross defects, listed 
in Section 4.5.2. 
In practice, it is a difficult task to obtain con-
sistent data for even these key features of a yield model. 
This is due to several important causes: 
The information is proprietary and is rarely dis-
closed. 
State of the art processes often change more quickly 
than the data can be compiled. 
C) 	A yield model that has -been derived for one particu- 
lar process will often not apply to another process, 
even of the same type. 
d) 	A yield model that has been derived for one particu- 
lar circuit will often not apply to a new circuit, 
because of the dependency on circuit complexity and 
interconnect [153]. 
Therefore, the general yield expression is often sim-
plified by assuming the values a-.00 and Y G=  1, to produce 
the Poisson yield expression, 
4.9 
- 104 - 
which assumes a uniform distribution of faults. 	This 
expression tends to produce a lower yield estimate than is 
observed in practice, so it can be considered as a lower 
bound. 	The upper bound can be expressed by the Bose- 
Einstein model, which is obtained by setting ct=1. 	This 
results in the expression for yield, 
Y = (1+X) 	 4.10 
4.5.5. Yield Model for VLSI Chips with Redundancy 
In the previous sections it has been stated that 
integrated circuit yield is reduced by gross defects, and 
by faults caused by random defects in the materials and 
photolithography. In a yield enhancement scheme, where 
faulty circuit stages are replaced by redundant stages, or 
in a scheme where faulty stages are simply bypassed to 
leave a partially functioning chip, it has been observed 
that the defect susceptible portion of the chip is divisi-
ble into two areas [177]. The first area is where random 
defects can cause failures in the circuit stages or 
modules (the words stage and module are synonymous in this 
context). Defects in this area are correctable by replac-
ing the faulty module. The remaining defect susceptible 
portion of the chip is uncorrectable, and any defects 
occurring in this area cause the chip to fail. Uncorrect-
able circuitry includes redundancy switching circuits, 
chip test status latches, clock lines and interconnect, 
input/output buffers etc. The net yield Y E after the 
enhancement scheme has been implemented, is, therefore, 
the product of the gross defect yield Y G' the correctable 
random defect yield Y CRD' and the uncorrectable random 
defect yield Y UNC' that is [188], 
= G UNC CRD 	 4.11 
I S• '-A 1.1 '.11 I 	 _l 	 _& '::S ' -' 
- 105 - 
With the aid of a block diagram of the integrated 
circuit, shown in Figure 4.11, expressions for the 
correctable and uncorrectable yields can be obtained. 
ZA Uncorrectable bypass circuitry 
7,71 
LZA Uncorrectable peripheral circuitry 
Figure 4.11. Block diagram of Eu349, showing correctable 
and uncorrectable area in the yield enhancement scheme. 
Figure 4.11 shows an array of correlation stages 
- 106 - 
surrounded on three sides by pad drivers and buffer circu-
itry, some miscellaneous logic, and power and clock lines. 
This area, shown hatched, is uncorrectable. In addition, 
the area shown cross hatched contains the multiplexer con-
trol register; this too is uncorrectable. The yield of 
the hatched and cross-hatched region is denoted Y andUNC 
is expressed by 
UNC - 
- e _DAUNC 	
4.12 
where AUNC is the uncorrectable area, and D is the defect 
density. 
The module yield Y m  can be calculated by any of the 
yield models discussed in Section 4.5.2. For simplicity, 
a Poisson defect distribution will be assumed here. Thus, 





where Am is the module area. This expression can now be 
used to derive an expression for the correctable random 
defect yield Y CRD• 
The correctable yield of a one dimensional array of 
identical modules, each having the probability Y m of work-
ing, is determined using binomial statistics as follows. 
If there are no spare (redundant) modules, then the 
yield of the array is simply 
CRD - N 
	
4.14 
where N is the required number of modules in the array. 
If there is one spare module, the yield becomes 
- 107 - 
YY 
N+1 
CRD = 	+ (N+1)Y 1 (1-Y m 
	 4.15 
Here the first term represents the probability that all 
Ni-i modules are functioning, and the second term 
represents the Ni-I possible combinations of N working 
modules and one defective module. 
Extending this approach to the case where there are N 
required modules and S spare modules in the array, then 
the probability of having at least N working modules from 
an array of N+S=M modules is 




where (Mj' represents the number of possible combina-
tions of M modules taken j at a time. 
Finally, by substituting 4.16 into 4.11, the expres-
sion for the enhanced yield can be written as 
= G UNC j=0 (____fly 
 (M_) (1 _ )J 	 4.17 N jj m 	m 
_DAm 	 _DA UNC where Ym = e 	and Y UNc = e 	. This expression has 
been evaluated for a range of parameter values, and the 
results, enhanced yield versus redundancy, are plotted in 
Figures 4.12(a), 4.12(b), and 4.12(c). In each figure, 
the total number of modules N remains constant, and the 
gross yield has been normalised to unity. The following 
observations can be made: 
a) 	The yield saturates after a certain amount of redun- 
dancy. This occurs when the yield of the correctable 
areas approaches unity, and increasing redundancy 
ceases to have a significant effect on overall yield. 
b) 	The increase in yield is greatest for chips with the 
highest defect density. Therefore redundancy is most 
effective in low yielding processes. 












Am 0.007cm 2 







0 	 4 	 8 
No. of Redundant Modules R 









0 	 4 	 8 
- 109 - 
Yc=1 
M=28 
Am =0.007cm 2 
D=10 defects/cm 2 




No. of Redundant Modules R 
Figure 4.12(b). Yield vs. Redundancy for various values 
of uncorrectable area. 
-. 110 - 
M=28 
Aunc =O. 1 cm 2 









0 	 4 	 8 
No. of Redundant Modules R 
Figure 4.12(c). Yield vs. Redundancy for various values 
of module area. 
4.5.6. Cost of Redundancy 
The increase in net yield is not obtained without 
cost. 	The redundant circuits require extra chip area. 
Therefore, there are fewer chips per wafer. 	Also the 
redundancy scheme requires switching mechanisms, or addi-
tional circuitry which is uncorrectable and thus detracts 
from the yield. The compromise between the increase in 
yield due to the action of a redundancy scheme, and the 
decrease in yield due to the implementation of such a 
scheme is discussed by many authors on yield enhancement 
[177,153,176]. 
The effective yield of a yield enhancement scheme is 
- 111 - 
found using the enhanced yield Y E  and the proportional 
increase in chip area that is required to implement the 
scheme. The penalty due to the increased area is 
expressed by ratio of chip area without redundancy A01 and 
the area with redundancy AE. The effective yield is 
defined as the product of the enhanced yield and the area 
penalty term, 
A0 
"eff 	'E 	 4.18 
For cost considerations a figure of merit FM may be 
defined. This takes into account the relative yield 
improvement and the relative increase in area required. 
The figure of merit is defined as 
FM = (YE/Yo)(Ao/AE) 	 4.19 
where Y is the yield without redundancy. If FM > 1 then 
a cost advantage is attained by the use of redundancy. 
Figure 4.13 shows how the figure of merit varies with 
redundancy. The relationship indicates that a circuit can 
be designed around an optimum amount of redundancy. 
- 112 - 
Yc=1 
M=28 
Am =0.007cm 2 







0 	 4 	 8 
No. of Redundant Modules R 
Figure 4.13. Figure of Merit vs. Redundancy for various 
values of defect density. 
4.6. Yield Enhancement Techniques 
4.6.1. Scope 
This section deals with the concept of yield enhance-
ment. Attention is focused, however, on yield enhancement 
techniques which incorporate redundant circuits and 
switching mechanisms, so that faulty circuit elements may 
be replaced by redundant ones after an initial test 
period. The discussion, therefore, does not include "on-
line" self checking circuits [189]. 
- 113 - 
4.6.2. Integrated Circuit Redundancy Schemes 
There are many techniques for implementing redundancy 
schemes. These range from non-volatile "once only" confi-
gurations, which normally are carried out by the manufac-
turer, to the volatile schemes which may be configured as 
often as necessary by the host system. For example, dis-
cretionary interconnect layers provide a method of repair 
in non-volatile schemes, as do fusible links, and laser 
personalisation. Electrically programmable storage ele-
ments, and programmable links, are used to configure vola-
tile redundancy schemes. Further more, latches and other 
electrically alterable configurations can be reprogrammed 
in the field if necessary. Thus, redundancy included on 
chip for yield improvement purposes can be used for field 
maintenance and improve reliability. 
The design effort in VLSI is minimised by using regu-
lar repeated architectures. Also, chips with a large 
number of identical cells are the most obvious candidate 
for yield enhancement. Memory chips certainly have such 
an architecture and were among the first to benefit from 
redundancy techniques. 	They are particularly suitable 
since there is no interaction between the cells. 	As the 
interconnection complexity increases, either an increasing 
amount of circuitry must be dedicated to routing faulty 
cells, or a less flexible use of the spares has to be 
accepted. In 1967 Tammaru and Angell [178] proposed the 
concept of treating groups of interconnected elements, 
rather than single gates, as the smallest units to be 
tested and replaced with spares. In this manner the com-
plexity and cost of reconfiguration can be reduced. 
Architectures for yield enhancement which are of interest 
here, consist of an array of identical cells. There are 
several reconfiguration schemes. These schemes fall into 
three categories: bypass schemes, nearest neighbour, and 
- 114 - 
simple chaining schemes. 
4.6.2.1. Bypass schemes 
Bypass schemes use a fixed sequence of cells but 
extra data paths are available so that one or more faulty 
cells may be bypassed, depending on which scheme is 
employed. These schemes should not be confused with 
chaining schemes which are described in Section 4.6.2.3. 
In the bypass scheme, the switching mechanism is part of 
the cell. Therefore, defective switching mechanisms 'can 
be bypassed. In the chain scheme, the switching network 
is regarded as separate from the array cells and accord-
ingly must be defect free. 
Examples of some bypass schemes are shown in Figure 
4.14. These schemes have been compared by Moore and Day 
[187]. 



















-_-:-_---_-... -- - '•- - --S- - 
DOUBLE 
CASTELLATION 
Figure 4.14. Bypass schemes. 
They were selected because they contain no crossovers and 
can be mapped compactly into silicon. An example of the 
silicon layout for the 1,3 zig-zag scheme is shown in Fig-
ure 4.15. 
- 116 - 
Figure 4.15. Layout of the 1,3 zig-zag scheme. 
4.6.2.2. Nearest neighbour schemes 
The nearest neighbour concept for yield enhancement 
is best suited to two dimensional arrays of regular cells 
as shown in Figure 4.16. 
Figure 4.16. Nearest neighbour scheme. 
The scheme depends on each cell being able to take any one 
of its neighbours as its successor. The resulting path 
therefore, is not fixed as in the bypass and chain 
schemes, but may snake around in any desired pattern 
[190,191]. 
4.6.2.3. Chaining schemes 
This category contains the simplest of all yield 
enhancement architectures. A chaining scheme consists of 
• fixed array of cells which can either be connected into 
• chain or not. The main advantage of this scheme is its 
simplicity, in both implementation and configuration. The 
concept can easily be extended to 2 dimensional arrays and 
- 117 - 
to architectures with interconnectivity which would be too 
complex for the alternative schemes [192]. The main draw-
back of this scheme is that, unlike the others, the 
switching network must be defect free, because it is 
uncorrectable. The chain scheme, for a linear array is 
shown in Figure 4.17. 
Figure 4.17. Chain scheme for yield enhancement. 
4.6.3. Comparison of Redundancy Schemes 
The differences between the above yield enhancement 
schemes can be described in terms of the amount by which 
they improve yield, and their implementation costs. The 
nearest neighbour schemes will not be considered because 
they are not suitable for linear array architectures, such 
as that of the Eu349 correlator chip. This leaves the 
bypass schemes and the chain schemes. 
The degree of yield enhancement that may be obtained 
using a bypass scheme, is determined, to a large extent, 
by the complexity of the scheme, and by the routing algo-
rithm. In one of the simplest algorithms, a faulty cell 
would enable the bypass route which connects its natural 
predecessor to its natural successor. This algorithm, 
however, only works for single faults, since two or more 
consecutive faulty cells result in total chip failure. 
This is illustrated by bypass 'a' in Figure 4.18. In a 
more sophisticated scheme, one that can implement a more 
complex routing algorithm, a faulty cell would enable 
bypass 'b' in Figure 4.18. 
- 118 - 
- ---S 	 -- - 	 .- __*- •% 
Figure 4.18. Bypass scheme with two consecutive faults. 
Thus, an increase in fault tolerance is achieved by pro-
viding more bypass routes. A study by Moore and Day [187] 
has shown that yields are improved by using more complex 
zig-zag schemes, but. for the same yield improvement the 
castellation schemes have consistently higher cost over-
heads, and therefore may be rejected. 
Bypass schemes have an advantage that the switching 
circuitry is an integral part of the array cells. Thus, 
defective switches can be tolerated by the scheme. A 
disadvantage is that special terminating cells are 
required at the ends of the array. These cells select the 
desired start and finish paths, and connect them to the 
input and output pads respectively, as shown in Figure 
4.19. 
STARTING 	 TERMINATING 
CELL CELL 
Figure 4.19. Terminating a bypass architecture. 
By far the simplest yield enhancement technique is 
the chain scheme. 	In this case a faulty cell merely 
- 119 - 
enables its own bypass. It has an advantage that no spe-
cial terminating cells are required. Also, it can 
tolerate consecutive faulty cells, although it is possible 
that signals may be required to go through several bypass 
switches before reaching the next working cell. The addi-
tional delays through these switches may reduce the max-
imum attainable clock rate. This aspect of a chain scheme 
may be viewed as a restriction in the number of consecu-
tive faults that may be tolerated, if the circuit is to 
operate up to a specified maximum clock rate. The only 
serious disadvantage associated with the chain scheme is 
that the switching logic is critical and must work. Thus, 
the yield of chip is given, approximately, by the criti-
cal, uncorrectable areas of the chip. 
Due to its simplicity, the chain scheme can easily be 
extended to provide yield enhancement in arrays that have 
more than single inter-cell connections, and to two dimen-
sional arrays. The two dimensional array digital correla-
tor, reported by McCanny and McWhirter (192), incorporates 
this type of yield enhancement technique. 
4.7. Yield Enhancement Features in the Eu349 Correlator 
Chip 
This section contains a summary of the yield enhance-
ment features that have been used in the design of the 
Eu349 correlator chip. Details concerning the design can 
be found in Chapter 5. 
In the Eu349 digital correlator, there are two inter-
connections per cell which require switching mechanisms. 
Therefore, in view of the expandability, and low cost, the 
chain scheme has been chosen to provide yield enhancement 
in the correlator array. The disadvantage that is associ-
ated with the chain scheme, in that the switching network 
- 120 - 
is uncorrectable, is considered to be outweighed by the 
advantages listed above, since the area of the uncorrect-
able switching network is less than 10% of the area of the 
correctable correlator array. 
The yield enhancement scheme in the Eu349 correlator 
consists of, for each correlator stage, two multiplexers 
(one per inter-cell connection), and one controlling 
latch. 	The fault status of each correlator stage is 
stored in its associated latch. 	The stored information 
then controls the multiplexers; a faulty stage causes the 
relevant inputs to be switched to the respective outputs, 
thus isolating the faulty stage. A block diagram of the 
yield enhancement features of the Eu349 chip are shown in 
Figure 4.20. Details of the design are given in Chapter 
5. 
- 121 - 
DSR 	 Overload 	Preset 
x 	y MCR 	OSR Flag 	F—Test Integrating Counters 
t i "I IM!"   I NOW MIA in1' 
11111hrIJBININNHMIAMIE  
111 IMIJ-94 1 F110  no M 0 b -u 
 so M 0 
-I i!J)11EIJ 	0 
Figure 4.20. Yield enhancement features of the Eu349 
correlator chip. 
4.8. Summary 
Two important subjects in integrated circuit design, 
namely testability and yield, have been discussed. 
Methods by which the testability of a design may be 
enhanced have been described and summarised for the par-
ticular case of the Eu349 correlator chip. Close linking 
of design and test has enabled the architecture of the 
Eu349 to achieve a high degree of testability at a very 
low overhead. 
Yield enhancement through the use of redundant circu-
itry is of central importance to the design of the Eu349 
correlator chip. Yield models and yield enhancement 
- 122 - 
techniques, have been described. A binomial model for the 
yield of redundantly designed circuits has been presented, 
and the curves produced by the model show that the yield 
of an array of identical modules, such as in the Eu349 
correlator, increases rapidly with the addition of spare 
modules, but saturates to a level determined by the defect 
density and the uncorrectable area of the chip. 
In the next chapter, details of the design, and test 
results of the correlator chip are presented. 
- 123 - 
CHAPTER 5 
DESIGN AND TEST OF THE PROTOTYPE INTEGRATED CIRCUIT. 
5.1. Introduction 
In this chapter, details of the prototype chip design 
are presented. A "top-down" approach is adopted, and in 
Section 5.2, the description begins with reference to a 
"floor plan" of the basic correlator system. This floor 
plan represents the architecture of the chip before 
built-in self test and repair features are added. 
In Section 5.3, the architecture is shown modified, 
to allow built-in self test and repair, and the circuitry 
required to perform self test and self repair is dis-
cussed. In Section 5.4, the integrated circuit design is 
described. 
In Section 5.5, the test strategy is described. This 
section refers to test programs and configuration pro-
cedures for the Tektronix Digital Analysis System (DAS 
9100). In Section 5.6, the test results for the batch of 
130 chips are presented. The results show the effective-
ness of the test strategy and yield enhancement scheme. 
5.2. Architecture of the Basic Polarity Correlator 
The theory of polarity correlation using the over-
loading integrating counter technique is presented in Sec-
tion 2.6. Figure 5.1 shows the architecture of a correla-
tor that implements the technique. The VLSI architecture 
offers high speed operation, long (programmable) integra-
tion time, and an arbitrary range of correlation time 
- 124 - 
delay or resolution. It consists of a data shift regis-
ter, or DSP, connected to a parallel array of coincidence 
detectors and integrating counters. The counters each 
have a single bit output which indicates the overload con-
dition of the counter. This overload output is latched, 
to be transferred subsequently, to the overload shift 
register, or OSR. The pattern held in the OSR may then be 
shifted serially off chip to display the correlation func-
tion. An additional output from the chip is the overload 








0 OSR ove,load shift register 
0 
2 
- . 0 delay element 
11) 0 
C coincidence function 
CL 0 0 L one bit latch 
X 	y 0 0 0 Samples, m 
Preset Counter Capacity 
Figure 5.1. Architecture of the basic polarity correlator 
using the overloading integrating counter technique. 
The control circuitry required for this architecture 
- 125 - 
consists of a sample counter that has twice the capacity 
of the integrating counters, and additional circuitry to 
monitor the overload flag. Two modes of operation are 
necessary: peak detection, and function display. 
In peak detection mode, the objective is to locate 
the first integrating counter to overload. It operates as 
follows: 
Correlation commences with a reset pulse which clears 
the overload latches and presets the integrating counters 
to their start value. This sets their capacity to N. 
After at least N input sample pairs, the overload flag 
signals the arrival of the first overload, which 
represents the most significant peak of the correlation 
function. The contents of the sample counter m are used 
to compute the significance, or the ordinate, of the 
detected correlation peak, using Equation 2.38. The time 
lag, or the abscissa of correlation peak, is then calcu-
lated by transferring the contents of the latches to the 
overload shift register, and counting the leading zeros in 
the pattern as it is shifted out. The system is then 
reset and correlation begins once more. Successive over-
load patterns may be viewed as a pulse train whose fre-
quency is inversely proportional to the time lag at peak 
correlation. In other words, the frequency is propor-
tional to the flow velocity (in a correlation based flow 
meter for example). 
Display mode operates similarly to peak detection 
mode except that the system is not reset after the 
occurrence of the first overload. Instead, at suitable 
intervals of correlation significance (that is, at regular 
intervals along the vertical axis of the correlation func-
tion), the contents of the OSR are shifted out and 
displayed, as discussed in Section 3.2.2.2 and in Section 
- 126 -- 
3.3.3. 
The initial design criteria for the basic correlator 
are as follows: 
Cascadability: the number of correlation points, or 
the resolution of the time delay axis, is determined by 
the number of cascaded correlation stages. In the proto-
type design, there is no limit to the number of stages 
that may be cascaded. 
Long, programmable integration time: in correlation 
applications where the time-bandwidth product (TB) is low, 
as discussed in Chapter 2, long integration times are 
required. The integrating counters in the Eu349 prototype 
chip have a capacity of approximately 215 states. To 
maintain flexibility, and allow the correlator to address 
both high and low TB applications, the integrating 
counters must be programmable. 
Design style, static or dynamic: there are two impor-
tant considerations here. First, the integrating counters 
are required to count at a rate determined by the number 
of coincidences in the data. In other words, the count 
rate is proportional to the correlation of the input bit 
streams, and therefore may vary from zero to the sample 
rate. Consequently the design of the integrating counters 
must be static, regardless of the factors that determine 
the sampling rate. Second, the shift registers, OSP and 
DSP, may be static or dynamic depending on the sampling 
rate requirements. However, to allow flexibility in 
choice of sampling rate, these shift registers must be 
static. Using static registers for the OSP and DSP func-
tions, incurs a small area penalty of approximately 2%. 
Note that the architecture in Figure 5.1 differs from 
that in Figure 2.14, in that the output of the integrating 
- 127 - 
counter comes from the left of the counter rather than the 
right. This detail allows the DSP, the coincidence gates, 
the OSR, and associated latches to be laid out close to 
each other on the silicon. The significance of this facet 
of the architecture will emerge in the next section, when 
the modifications to the basic correlator architecture to 
accommodate the self test and self repair philosophy are 
described. 
5.3. Architecture of the Correlator with Built-In Self 
Test and Self Repair Features 
The VLSI architecture considered here, consists of a 
long series connection of identical correlation stages. 
If any stage suffers faults during manufacture, or becomes 
faulty during service, the whole chip will fail. A self 
test and self repair strategy has been devised to overcome 
this problem. The self test sequence is started each time 
the chip is switched on; any faulty stages discovered as a 
result of the test are automatically bypassed. This 
reconfigures the working stages into a continuous serial 
connection. Thus, faults that develop during the working 
life of the chip are automatically eliminated every time 
the chip is switched on. Modifications to the basic 
correlator architecture, to accommodate the self test and 
self repair philosophy, are shown in Figure 5.2. Figure 
5.2(a) shows the basic correlator stage; Figure 5.2(b) 
shows the basic stage modified to perform built-in self-
test and self-repair. A block diagram showing 8 stages of 
the array is presented in Figure 5.3. The architecture is 
well structured and thus maps easily on to silicon. 






\ 	 ~- \_ & 
\. 
Preset N 
KEY: D = Delay 	 M = Multiplexer control 
C = Coincidence gate 
	
L = Latch 
	
X = Multiplexer 
Figure 5.2(a). Basic correlator stage; (b). 	Correlator 
stage with built-in self-test and self-repair circuitry. 
- 129 - 





C 0 • 	 C 
-o-o • 0. ci) 








0 Cl) 	o 
E E o o o 
I 
>.. o LL_  - 	 LU 
Figure 5.3. Block diagram of the correlator array, show-
ing 8 of the repeated stages. 
The principal additions for self-test are the input 
signal "F-test' (for function test), and its associated 
anticoincidence detector (EXOR) at the "set" input of each 
latch. Also, there is a parallel set and reset facility 
in the data shift register. All other circuitry required 
by the test strategy already exists as part of the basic 
-130- 
correlator. The principal additions for self repair are 
the multiplexer control register, or MCR, and 2:1 multi-
plexers on the data shift register and overload shift 
register outputs. 
In test mode, the DSR, MCR, and OSR shift registers 
act as scan paths, and signature analysis is performed by 
the integrating counters. The result of the signature 
analysis is compared with the known good signature using 
the F-test input. The results are latched for subsequent 
use in the self repair scheme. (The test sequence is 
described in detail in Section 5.5.) Testability is 
achieved using functional conversion to such an extent, 
that the silicon area overhead is only 2%. This is illus-
trated by Figure 5.4, which shows the floor plan of one 
correlation stage in the Eu349 chip. 
clocks 	 F—Test 	 CND 
	
VDD MCR 	I exnor of PRBS counter 
mux 	mux 
'j( 
I 	 II Ad 	 1. 	
d d 
I overload latch 	integrating counter 
OSR 	 (15 bit) 
DSR exnor & clock drivers 
set & preset DSR 
1 = Area overhead for Built—In Self Test 
= Area overhead for Built—In Self Repair 
Figure 5.4. Floor plan of one correlation stage in the 
Eu349 chip, illustrating the relative areas of the com-
ponents in the design. 
The self-repair technique requires that the data 
- 131 - 
shift register, and the overload pattern shift register, 
each have a 2:1 multiplexer connected to their outputs. 
The technique requires a multiplexer control register for 
storing the control information for these multiplexers. 
The multiplexer control register is the key feature in the 
self repair scheme. After the self-test sequence, the MCR 
contains the pass/fail status for each stage. In the case 
of a failure, the input and output registers of the corre-
lator stage are bypassed via the multiplexers, so that the 
malfunctioning stage is short-circuited. The number of 
functioning stages on the chip can be read out serially 
from the MCR by reconfiguring it as a shift register. 
This parameter represents the maximum attainable correla-
tion delay (or resolution) and can be used for chip 
reject/accept decisions in production test. The self-test 
and repair sequence may be repeated as required during the 
service life of the chip. 
The layout of the MCR and its associated multi -
plexers, is simplified by manipulating the architecture of 
the correlator stage so that the DSR and OSP are laid out 
close to each other topographically, as discussed in the 
previous section. As a result, the overhead for self-
repair is not greater than 6% additional silicon area. 
The self repair features are shown cross-hatched in Figure 
5.4. 
5.4. Design of the Eu349 Correlator 
5.4.1. System Overview 
A prototype digital correlator featuring self test 
and self repair has been fabricated on a six-micron N-
channel MOS process. The prototype design, shown in Fig-
ure 5.5, contains 28 parallel stages of correlation, each 
of which implements the block diagram in Figure 5.2(b). 








' ''• .a, 	a,a,a, 
a'a &a, 	ufa 
a. 	a. 
44 	 Net 
' IL 
a, - 
- 133 - 
Peripheral area: pads, buffers 
- 	 .. 	 - 
Test stripe 
Figure 5.6. Eu349 correlator chip floor plan. 
The correlator design uses a two phase non-
overlapping clock system. The phases are denoted pl and 
p2, respectively, and in general, data is sampled on p1, 
and stored on p2. The design is semistatic through out. 
This means that during one clock phase (in this case (p2) 
- 134 - 
the stored state can be maintained indefinitely. Thus, 
the clock frequency, and therefc:c the sampling frequency 
of the correlator can range from dc to 4 MHz (for this 
fabrication process). 
The prototype devices have also been packaged in 40 
pin dual-in-line ceramic packages. The pin designations 












































Figure 5.7. Pin designation for packaged Eu349 correlator 
chips. 
- 135 - 
A summary of the functional description of each pin is 
given in Table 5.1 
TABLE 5.1 
PIN-OUT FUNCTIONAL DESCRIPTION 
FUNCTION PIN NUMBER 
x (DSP) 	i/p 4 
yi/p 3 
MCR i/p 17 
Inputs 
OSR i/p 18 
F-TEST 20 
ii 	- 	115 21-28,34-40 




MCR hold 10 
MCR shift 11 
Reset 14 
OSR 0/p 1 
MCR a/p 2 
Outputs 
x (DSP) a/p 13 
OVERLOAD 19 




Supplies VSS 29,30,32,33 
VBB 31 
Truth tables for the operation of the control signals to 
the overload shift register, data shift register, and mul- 
tiplex control register are listed in Tables 5.2 to 5.4 
- 136 - 
respectively. 
TABLE 5.2 
OSR TRUTH TABLE 
OSR p1 Effect 
L Serial shift from input pin 
H Parallel load from latches 
TABLE 5.3 
DSR TRUTH TABLE 
DSR p1 DSR s/c Effect 
L L Serial shift from input pin 
L H Serial shift from input pin 
H L Parallel load zeros (CLEAR) 
H H Parallel load ones (SET) 
TABLE 5.4 
MCR TRUTH TABLE 
MCP shift MCR hold Effect 
L L Parallel load MCR from latches 
L H Hold MCR contents stationary 
H L Serial shift MCR 
H H Serial shift MCR 
The chip design is divisible into two main areas: 
correlator array circuitry, and peripheral circuitry. 
Each area can be subdivided further into its component 
nMOS modules. The nMOS modules are listed in Appendix 1. 
- 137 - 
5.4.2. Correlator Array Design 
The correlator array is composed of 28 identical 
stages of correlation. This number is arbitrary, but is 
determined by the amount of available silicon area. At 
the time of design, the maximum available chip size meas-
ured 5.08 mm by 5.08 mm. of which a border 0.5 mm wide is 
required for mandatory test structures and peripheral cir-
cuitry. Thus an area of approximately 4 mm by 4 mm is 
available for the layout of the correlator array; enough 
silicon for 28 stages. 
Each correlator stage, designated Module STGIOO, 
implements the circuit shown in Figure 5.8. The floor 
plan and layout of module STG100 is shown in Figure 5.9. 




Figure 5.8. Circuit schematic of the repeated correlator 




- 138 - 
CLOCKS 	EXNOR  
Figure 5.9. Floor plan and nMOS layout of Module STGIOO. 
This represents the repeated correlator stage. 
The interstage connections are either bit serial, 
nearest neighbour communications, or globally broadcast 
control signals. This allows the stages to form a serial 
cascade by simple abutment in the y direction. All con-
nections to and from the correlator array are made via the 
peripheral circuitry. Connections to the outside world 
are made through pads in the peripheral area, where input 
protection and buffering takes place. Some control sig-
nals to the array are generated in the peripheral area, 
and therefore the inputs to the chip, as shown in Figures 
5.6 and 5.7, differ from the inputs to the correlator 
array, as shown in Figure 5.8. The circuitry of the peri-
pheral area is summarised in the next section. 
Referring to Figure 5.8, the data shift register, 
DSP, consists of three inverters and five pass transis-
tors. The three pass transistors that form the input to 
the shift register select the required input source. In 
one instance the source is the x data input (the x data 
output from the previous correlation stage), and in the 
other cases the input sources are VDD and GND, so that the 
register may be set or cleared respectively. These data 
- 139 - 
selectors only operate during (p1 (that is, while phase 1 
of the clock is high). Thus, data is transferred from the 
selected source to be stored, dynamically, on the gate of 
inverter Invi under the control of (p1. Static storage is 
implemented during tp2. Inverter Inv2 provides a feedback 
loop that regenerates the stored state so that it may be 
stored indefinitely (so long as p2 remains high to enable 
the feedback loop). The stored information is also 
transferred to the register output under the control of 
(p2. The same basic semistatic shift register circuitry 
can be found in the DSR, OSR, MCR, and the shift register 
stages that make up the integrating counter. 
The integrating counter is a 15 element PRBS counter. 
The feedback is the logical exclusive NOR of the 14th and 
15th output. Thus the count length is 215  less one for-
bidden state where all 15 registers contain logical ones. 
All of the other possible combinations are legal, and one 
such combination may be used to indicate that the counter 
has reached capacity. The simplest combination to detect 
using nMOS circuitry is the all zero state which requires 
a 15-input NOR gate. To detect the all ones state, which 
would be required if exclusive OR feedback were used, a 15 
input NAND gate is required. In nMOS, it is desirable to 
construct NOR gates in preference to NAND gates; therefore 
exclusive NOR feedback has been implemented. 
The 15-bit shift register in the integrating counter 
is laid out on silicon in the form of a ring. The benefit 
of doing this is that the length of the feedback connec-
tion is minimised, and thus the delays between stages of 
the shift register are approximately equal. 
The integration time, or counter capacity may be pro-
grammed by presetting the combination of ones and zeros 
that represents the starting point for the count sequence. 
-140- 
The counter then counts from this starting sequence and 
produces an "overload detect", or OD, pulse when the all 
zeros state is reached. The combination of ones and 
zeros, or start word, that represents a particular 
integration time is derived by simulating the action of 
the integrating counter in reverse. The simulation pro-
gram takes as its input the required integration time in 
clock cycles; it then steps back through the PRBS sequence 
from the all zero state for the specified number of clock 
cycles and prints the combination of ones and zeros at 
that point in the sequence. Table 5.5 lists some integra-
tion times and corresponding integrating counter start 
words. 
TABLE 5.5 
Integrating Counter Start Words 
Integration time Counter Start Code 
(clock cycles) (binary) 	(hex) 
5 000000000010101 0015 
10 000001010101010 02AA 
15 101010101010101 5555 
32 011001100110001 3331 
64 001011010011100 169C 
128 010010001110001 2471 
256 011100011100010 38E2 
512 001110110001011 1D8B 
1024 010011101001111 274F 
2048 011000011111110 30FE 
4096 001111000000011 1E03 
8192 000011111110000 07F0 
16384 000000011111111 00FF 
32766 100000000000000 1 4000 
- 141 - 
Referring again to Figure 5.8, the operation of the 
circuit is as follows. The x data and the y data inputs 
are compared by the comparator module, designated COMPIO. 
This module has two functions: first, to produce the count 
pulse for the integrating counter, and second, to syn-
chronise the RESET pulse with (p1. Producing the count 
pulse for the integrating counter, requires the module to 
perform the logical exclusive NOR of the x and y data, to 
perform the necessary logic so that the count pulse is 
disabled during a RESET, to synchronise the count pulse 
with (p1, and to provide adequate buffering for the count 
and RESET pulse. 
The purpose of the RESET pulse is to clear the over-
load latch and load a preset start word into the integrat-
ing counter. The input to each shift register stage in 
the integrating counter can come from one of two sources: 
the preceding shift stage, or the preset inputs ii to 115. 
The selection is controlled by either a count pulse or a 
RESET pulse and the operation must be mutually exclusive. 
To prevent both events occurring at the same time, the 
RESET signal disables the generation of a count pulse. 
There is similar reasoning behind the design of the other 
shift register control signals. The input select controls 
for the DSR are designed to mutually exclusive, as are the 
controls for the MCR and OSR respectively. 
After the RESET pulse has cleared the overload latch, 
and preset the integrating counter, the sampled data is 
shifted along the DSR. When the x and y inputs are equal 
(at the correlator stage under discussion) a count pulse 
is generated which increments the integrating counter. 
Eventually, as this operation continues, the counter 
reaches the overload state, i.e. all zeros, and produces 
an overload detect pulse. In normal circuit operation, 
the F-test signal is held low, and the overload detect 
- 142 - 
pulse sets the overload latch. This in turn causes the 
output signal "overload flag" to change state (high to 
low), which indicates that a correlation peak has been 
detected. The overload flag is a wired-OR output, so that 
the Eu349 devices may be arbitrarily cascaded. 
Under the control of OSR-pl (parallel load control 
for the OSR), the values stored by the overload latches 
are transferred, in parallel, to the overload shift regis-
ter OSR. Then, again under the control of the OSR-pl, the 
overload pattern is shifted off chip via the OSR serial 
output. 
In the above discussion, it has been assumed that the 
MCR contains the necessary bit pattern to configure the 
cascade of correlator stages into a continuous serial con-
nection of correctly functioning stages. The method by 
which this is performed is described in Section 5.5, but 
the circuitry used to perform self repair is discussed 
here. 
The MCP is similar in czign to the OSR and DSR. The 
controls to the MCR allow it operate in three modes: 
serial shift, parallel load, and hold. The output of MCR 
controls the multiplexers at the outputs of the DSP and 
OSR. When a logic one is stored in MCR element of a par-
ticular correlator stage the DSR and OSR inputs are short 
circuited to their respective outputs. When this is done, 
the affected correlation stage serves only to link 
together its two immediate neighbours. The overall 
effect, therefore, is that correlation stages may be 
selected and eliminated from the correlation array by 
inserting logic ones into the relevant bit positions in 
the [4CR. The built-in self test and self repair procedure 
is a method whereby faulty stages may be identified and 
eliminated automatically. 
- 143 - 
During self test, a correctly functioning correlator 
stage produces two overload detect pulses. The overload 
detect output is continuously compared with the expected 
good output using the F-test input and the exclusive OR 
gate, denoted OD-EXOR in Figure 5.8. Any deviation from 
the expected good output sets the overload latch. Thus, 
faulty correlation stages have their overload latches set 
during the self test period. The number of faulty stages 
in a cascade,. may be determined by transferring the con-
tents of the overload latches to the OSP and shifting the 
pattern off chip. The number of ones in this pattern 
represents the number of faulty stages in the cascade. 
Self repair is carried out by transferring the contents of 
the overload latches to the MCR. This is done using the 
MCR-hold and MCR-shift controls in combination (both con-
trols low). 
The multiplexers associated with the DSR and OSR out-
puts each consist of one inverter and two pass transis-
tors. In normal circuit operation, the "bypass" transis-
tor is turned off, and the "output" transistor is turned 
on (see Figure 5.8). When a correlator stage is identi-
fied as being faulty the bypass transistor is turned on, 
and the output transistor is turned off. Thus, the input 
data to a subsequent correlator stage will have passed 
through n bypass transistors and one output transistor, 
when n preceding, contiguous stages have been identified 
as faulty. If n is greater than three, then the operation 
of the DSR (or OSR) is degraded due to the excessive delay 
introduced by the series connection of output and bypass 
transistors. (The delay in a series connection of four 
pass transistors is approximately equal to the delay in 
one inverter.) This system, therefore cannot guarantee to 
repair more than three consecutive faults. However, the 
probability that more than three consecutive faults will 
occur is very low, and can be estimated to be less than 
- 144 - 
10 for an overall yield of 20%. 
5.4.3. Peripheral Circuit Design 
The peripheral circuitry consists of input and output 
pads, power supply pads, buffer circuits, and some random 
logic for generating or synchronising control signals. 
The peripheral circuitry is described in Appendix 1. 
5.5. Test Strategy 
The test strategy for the correlator consists of 
built-in self test and self repair procedures. These pro-
cedures are off-line,  therefore, they are distinct from, 
and do not impede the normal operation of the correlator 
in "run" mode. The test strategy is divisible into three 
parts which are summarised here. A detailed step by step 
test schedule is listed in Appendix 2. The three parts of 
the test strategy are: initial test, self test, and self 
repair. 
5.5.1. Initial Test 
During the initial test period three tests are car-
ried out on the critical elements of the design, namely 
the scan path registers. These registers (DSP, MCP and 
OSR) and their various control functions are not covered 
by the self test and repair strategy, and therefore must 
be tested to check that the subsequent self test and 
repair procedures are possible. The initial test sequence 
is as follows: 
a) 	Test MCP, OSP and DSP as shift registers and measure 
their delay. This is done using a flush test, as 
described in Section 4.3. The MCP must be flushed 
with zeros and held static while the DSP and OSH are 
tested. 
- 145 - 
b) 	Test the effect of the MCR on the DSR and OSR regis- 
ters. This is done by shifting n "ones" into the MCR 
and then measuring the delay of the DSR and 0SR 
registers, which should each be reduced by n. 
C) 	Test the parallel load facilities of the DSR, OSR and 
MCR registers, and the set and reset facilities of 
the overload latches. 
5.5.2. Self test 
In the self-test period, a full functional test of 
the correlation array takes place. In this test sequence 
(b) is repeated four times according to the possible corn-
binations of the two binary input s Lgnals, x and y. Ii-
tially the MCR must be flushed with all zeros and held 
static. 
Reset latches and integrating counters. The counters 
are loaded with 4000 (in hexadecimal), a number that 
• corresponds to the maximum integration time of 
215_2 = 32766 sample clock cycles, as described in 
Section 5.4.2. 
Set up the input conditions x and y and set or clear 
the DSP register as required. Shift x and y through 
correlator for 32766 clock cycles. When the inputs 
are equal, F-test must be set HIGH to coincide with 
the expected overload detect pulse. 
C) 	Parallel load latches into OSR. The overload pattern 
may be shifted out for observation. 
- 146 - 
5.5.3. Self Repair 
The self repair sequence follows the self test 
sequence. During the self test sequence the overload sig-
nal is compared with the expected value of overload sig--
nal. Any deviations from the expected signal results in a 
logic I stored in the corresponding latch. Thus, when the 
self test sequence has finished the logic l's and 0's 
stored in the latches are the results of the self test, 
where a logic I indicates a faulty stage. The self repair 
operation transfers this information to the MCR which in 
turn causes the faulty stages to be bypassed. The net 
effect is a series connection of correctly operating 
correlation stages. The following sequence is required. 
Parallel load MCR. 
Hold MCR static. 
5.5.4. Run 
The run period follows automatically after the self-
test and repair sequence is completed. After the test 
period the number of zeros stored in the MCP represents 
the number of correctly operating correlation stages. The 
following sequence may occur during the run period. 
Monitor the overload flag status, and/or display out- 
put from the OSR. 
Compute ordinate and abscissa of correlation peak. 
C) 	Reset, and repeat correlation. 
- 147 - 
5.6. Test System Configuration and Results 
5.6.1. Test Configuration 
The test equipment used to carry out the functional 
test comprises a Tektronix Digital Analysis System (DAS 
9100), one dual power supply unit, and a purpose-built 
test-jig. 
The test-jig incorporates power supply decoupling, 
one external load resister for the wired-OR overload out-
put from the correlator chip, and a 40 pin dual-in-line 
(DIL), zero insertion force IC socket. The test-jig pro-
vides an interface between the DAS and the device under 
test (DOT), which is either a packaged chip or a probed 
chip on a wafer. In both cases electrical connection is 
made via the 40 pin DIL socket. 
Initially 10 packaged chips, which had passed a 
visual inspection, were functionally tested. However, 
many more samples were required to demonstrate the yield 
enhancement capability of this design, Sc) the remaining 
wafers were probe-tested. The Eu349 chip was fabricated 
as part of a multi-project wafer, with only 24 chip sites 
per wafer. Consequently only 130 candidates were avail-
able for testing. 
5.6.2. Test Results 
The results are divisible into two parts. These are 
chip verification results, and yield enhancement results. 
Chip verification consists of initial test sequence 
results, and self test and repair sequence results. These 
results are demonstrated here using display material from 
the Tektronix DAS. Yield enhancement results are dis-
cussed in Section 5.6.3. 
- 148 - 
The initial test sequence is shown in Figure 5.10. 
These tests are equivalent to those described in Appendix 
.2, Sections A2.2 to A2.7, but are abbreviated and linked 
together to form a continuous display. These abbreviated 
elements of the initial test - sequence, and part of the  
self test sequence, are small enough to fit into a single 
DAS pattern generator program, and the resulting data 
sequences are short enough to fit into the DAS acquisition 
memory. This short hand method allows a large number of 
devices to be checked easily. Chips that pass this test 
can then be given a more exhaustive test according to 
Appendix 2. 
POOCH NAME  
X—i/p 	 I 	 I 	I 
YBAR—i/"p 
OSR—i/p I 	I 





DSR—s/c I 	 I 




M CR-- ho d 
RESET 
F—TEST 
Figure 5.10. Results for initial test sequence. 
The left hand side of Figure 5.10 is shown expanded 
- 149 - 
in Figure 5.11. The figure shows 16 traces. The top four 
traces show the inputs to the device under test, and the 
group of four traces below these represent the outputs. 
The remaining eight traces are the chip control signals. 
This figure shows tests that verify the function of the 
MCR, DSP and the OSR. 
POD CH NAME  
YBAR—i,/p 	 II 
OSR—i,/p if 
MCR—i/p 	 -11-I 	n 
X-0/p  
OSR—o/L a 
MCR—o/if a 	a  
OVER! n.4n Lis N 








Figure 5.11. Tests to verify the function of the MCR, 
DSR, and the OSR. 
In Figure 5.11 there are three dense vertical lines 
labelled "T", "M", and "C ,', for "trigger", "marker", and 
"cursor" respectively. The sequence of events before the 
marker are concerned with flushing zeros through the MCP, 
DSP, and OSP. At the marker, the MCP controls indicate 
that the contents of the MCP are being held static, that 
- 150 - 
is, the MCP is neutralised. Also at the marker, single 
logic ones are presented to the x and the OSR inputs. 
After 28 clock cycles these logic ones, against a back-
ground of zeros, have shifted through the registers and 
appear at the x and OSR outputs. The point at which they 
appear is marked by the cursor. The time delay between 
the marker and the cursor is given in the top left of the 
figure in a line starting "C - M", and is shown to be 28 
is. This shows a sample of test A2.3. 
Starting at the cursor position, and moving to the 
right, a similar test is shown with logic ones being 
shifted through the MCR. Although it is not shown expli-
citly, the delay through the MCR is also 28 is. This 
shows a sample of test A2.2. The next sequence tests the 
effect of the MCR on the other shift registers (test 
A2.4). The sequence starts where the £4CR input goes HIGH 
for the second time. This MCR input pattern represents a 
group of three consecutive logic ones which are shifted 
into the MCP and held static. Then a simple flush test is 
performed on the DSP and OSR, by shifting single logic 
ones into the DSP and OSP against a background of zeros. 
The resulting delay through these resisters can be meas-
ured as before, and is shown here to be 25 js. This test 
sequence is completed by flushing the MCP with zeros to 
neutralise it for the next test. In doing this, the logic 
ones that had been held static in the MCP, can be seen 
emerging from the MCR output. 
The next test sequence represents test A2.5, and is 
concerned with the SET and CLEAR features of the DSP. The 
sequence starts 28 clock cycles before the point where the 
control signals DSR-pl and DSP-s/c go HIGH. At the time 
when these signals go HIGH, a background of zeros have 
been shifted through the DSR. DSR-pl and DSP-s/c then 
cause the DSR to parallel load all ones, which can be 
- 151 - 
observed as a bank of 28 logic ones in the x serial out-
put. After this bank of ones has shifted out, the x input 
is set HIGH, and a background of ones is established in 
the DSP. When the pattern reaches the output, the signal 
DSR-pl is pulsed HIGH again, this time with DSP-s/c LOW, 
and the DSR is cleared. The result of this action can be 
seen as a large gap before the final block of ones in the 
x output. 
The second part of the initial test sequence is shown 
in Figure 5.12. This figure shows the waveforms relating 
to tests A2.6 and A2.7, where the MCP and OSP parallel 
load operations, and the overload latches set and clear 
operations are tested. 
- 152 - 
















Figure 5.12. Initial test sequence relating to tests A2.6 
and A2.7. 
The sequence for test A2.6 begins with a RESET pulse 
to clear the overload latches. There immediately follows 
a control combination (MCR-shift LOW, MCR-hold LOW) which 
transfers the contents of the latches to the MCR. The MCR 
controls are then changed (MCR-shift HIGH) to shift the 
MCR contents out through the MCR output for observation. 
Since the latches were reset, the observed output should 
be all zeros, as can be seen in the MCR output in the 
region around the cursor in Figure 5.12. To complete the 
test, this sequence is repeated with one additional 
feature: the F-test pulse which immediately follows the 
RESET. In this respect the F-test signal works correctly, 
11 
- 153 - 
and sets the overload latches. 	The contents are then 
transferred as before, and shifted out for observation. 
The bank of logic ones, as expected, can be seen on the 
MCR serial output. 
Test sequence A2.7 is similar to A2.6 except that the 
OSR is tested instead of the MCR. Two pulses on the OSR-
p1 control indicate where the bank of zeros, and the bank 
of ones begin, respectively, on the OSR serial output. 
Figure 5.13 and 5.14 show some of the input and out-
put waveforms from two correlator chips, that have 
occurred during the self test and repair period. For 
display purposes the integration time of the correlator 
has been reduced to just 15 clock cycles. Figure 5.13 
shows the correlation output of a "golden chip", that is, 
a fully functional chip, while Figure 5.14 shows the out-
put of a chip that has one failed stage. The top four 
traces in each figure represent the inputs to the device. 
In each figure the x and y inputs sequence through their 
four possible combinations in accordance with the test 
strategy described in Section A2.8. 
- 154 - 
POD CH NAME 
















Figure 5.13. Self test sequence for fully functional, or 
"golden chip" 
- 155 - 
(41 







OSR—o/p3 I! 	II 	I 	II 	I 	 I 
MCR—o/pE I 	________ ________ ___ ___ ___ 











Figure 5.14. Self test and repair sequence for a chip 
with one faulty correlation stage. 
The significant points to note in Figures 5.13 and 
5.14 are the MCR input and the OSP output. All the other 
signals are the same for both chips, with the exception of 
the MCR control signals, MCR-hold and MCR-shift. With 
reference to Figures 5.13 and 5.14 and moving left to 
right from the cursor, the overload output (OVRFLO) has 
changed from logic 1 to 0. This indicates that at least 
one integrating counter has overloaded after the 
prescribed period of 15 clock cycles. This result is 
expected since the inputs have been equal, x and y both 
zero over this period. 
- 156 - 
When OVRFLO next goes high, the correlator has been 
reset and the next correlation test, with x = 0 and y = 1, 
is begun. Also at this time, the overload pattern, that 
is, the contents of the latches, are transferred to the 
OSR and shifted out for display. Now we can see the 
difference between the "golden chip", Figure 5.13 and the 
faulty chip, Figure 5.14. The OSR should contain a series 
of 28 logic ones and in Figure 5.14 there is a logic 0 in 
position number 2, indicating a fault in stage 2. The 
correlation test is repeated for the remaining combina-
tions of x and y, and the fault is again exposed on the 
OSR output in the case where x and y are both equal to 1. 
Self repair is then carried out on the faulty chip. 
A single logic 1 is shifted into bit position 2 of the 
MCR. This causes stage 2 to be bypassed. The correlation 
test, with x and y are both equal to 1, is repeated 
several times at a period of 27 rather than 28 and the 
incorrect logic 0 on the OSR output has been eliminated. 
The result is a "golden chip" containing 27 stages of 
correlation. 
Figure 5.15 shows an expanded view of the repair 
sequence. The part of the figure labelled "A" represents 
the correlation overloads for the input combination x = y 
= 1. The overload pattern is displayed on the OSR serial 
output, and it should contain a continuous block of logic 
ones. However, with an apparent stuck-at-zero fault in 
stage 2 there is a zero at this position. 









- 157 - 
DSR—s/c 
DSR—pl 	 I 
___ 	 -L OSR—p! fl 
MCR—shft 
I 	 I 
MCR—hold 
RESET 
F—TEST 	 I 
921 
A 	B -L 	C 	0 
Figure 5.15. Zoom in on the self repair sequence. 
Part "B" shows the logic one in the MCP input being 
shifted into the bit position of the MCR that corresponds 
to the second stage in the correlator array. Part "C" 
represents a correlation of the input combination x = y = 
1. The overload output can be seen to go LOW, as 
expected, after the prescribed 15 clock cycle integration 
time. Part "D" of the figure shows the overload pattern 
displayed on the OSR serial output. The period between 
the RESET and OSR-pl pulses is now 27 clock cycles so that 
the OSR is reloaded with correlation results before data 
shifted from its serial input appears at the serial 
- 158 - 
output. 
The full self test and self repair sequence, as 
described in Appendix 2, uses the F-test signal to emulate 
the expected overload pattern. The action of the F-test 
signal is to invert the overload pattern shown in Figure 
5.15, so that it contains a logic one at the position of 
the faulty stage. The self repair sequence would then 












[Si 4 	8 	12 	16 	20 	24 	28 
- 159 - 
The correlation test, as described in Section A2.10, 
is similar to the self test except for the action of the 
F-test signal. However, the self test sequence shown in 
Figures 5.13 and 5.14, is a modification of test A2.8 that 
demonstrates both self test and correlation test. There-
fore correlation test need not be treated separately. 
5.6.3. Yield Enhancement 
This section contains the results of the first 130 
processed chips. Figure 5.16 shows a chart of number of 
chips plotted against number of working stages. It shows 
that 29 of the 130 candidates passed the initial test and 
that 27 of these yielded more than 20 stages of correla-
tion. 
No. of working stages 
Figure 5.16. Distribution of functioning stages. 
Listed below are the test results for each wafer. 	The 
multi-project wafers each contained 24 correlator chips. 
- 160 - 
TABLE 5.6 
RESULTS OF CHIP TEST 
Candidates Without 	- With 
Self Repair Self Repair 
tested 
(100% working) (at least 75% working) 
Packaged (10) 0 2 
Wafer *1 	(24) 1 5 
Wafer *2 	(24) 0 5 
Wafer *3 	(24) 0 6 
Wafer *4 	(24) 0 0 
Wafer *5 	(24) 2 9 
TOTALS (130) 3 27 
YIELD (¼) 2.3 20.7 
Although these results are based on a small statisti-
cal population (130 chips), they show nevertheless a 
strong agreement with the theoretically predicted figures. 
For example, the expected distribution of number of work-














I] 4 	8 	12 	16 	20 	24 	28 
- 161 - 
No. of working stages 
Figure 5.17. Distribution of working stages according to 
Equation 4.16. 
5.7. Summary 
In this chapter, the architecture and design of the 
Eu349 digital correlator has been described. Additions to 
the basic architecture, that make possible built-in self 
test and self repair strategies have been discussed. The 
net result of the design strategy, that closely links 
design to test, is a well structured, and regular VLSI 
architecture. 
The test results show the correct operation of the 
device as a correlator, and demonstrate the principles of 
self test and self repair. 
- 162 - 
CHAPTER 6 
CONCLUSIONS 
6.1. Summary of Work 
This thesis has described built-in self test and self 
repair strategies in VLSI architectures for digital corre-
lation. In Chapter 2, correlation theory was presented. 
Correlation techniques from analogue through to digital 
polarity implementations were discussed. It has been 
shown that, for stationary, ergodic signals, a temporal 
correlation function with finite integration time can 
approximate the true correlation coefficient. The effects 
of sampling, quantisation, and dither have been described. 
The main conclusion is that any physically realisable 
correlation system must compromise accuracy with integra-
tion time, and measurement time with circuit complexity. 
The overloading integrating counter technique for 
polarity correlation has also been described, and the pro-
totype correlator chip, featuring built-in self test and 
self repair mechanisms, has been introduced. 
In Chapter 3, several implementations of silicon 
correlators have been discussed. The architectures may be 
classified by observing whether time integrating or spa-
tially integrating techniques have been. used. The differ-
ence between these two concepts has been illustrated by 
the correlation cube. Further segregation of correlator 
architectures may be made by observing which computational 
techniques have been used, namely bit serial, bit paral-
lel, polarity, systolic etc. 
- 163 - 
Parallel and concurrent techniques are employed to an 
ever increasing extent in integrated circuit correlators. 
However there exists a compromise between using a large 
number of very simple concurrent operations, and using a 
small number of complex cells, to achieve a common objec- 
tive. 	In the DELTIC correlator, discussed in Section 
3.2.3, a single, fast, multiplier is used. 	In the sys- 
tolic correlator, discussed in Section 3.2.4.2, delay, 
multiply, and add operations are distributed over a large 
2-dimensional array of simple cells. However, partial 
products are only generated in cells within an interaction 
region and these in turn are only used to form a product 
on every alternate clock cycle. Furthermore, to achieve 
useful integration times a large array of cells is 
required, and to increase the integration time requires 
cells to be cascaded. Normally this would not be a disad-
vantage; it is in fact preferable for VLSI architectures 
to be modular and cascadable. However the output rate of 
this correlator is inversely proportional to the size of 
the array. 
The architecture of the Eu349 correlator achieves a 
balance between concurrency, cascadability and correlation 
rate. The architecture is concurrent in that each point 
of the correlation function is computed in parallel. The 
architecture is directly cascadable, and the correlation 
rate is independent of the length of the array. 
In Chapter 4, two important subjects in integrated 
circuit design, namely testability and yield, have been 
discussed. Methods by which the testability of a design 
may be enhanced have been described and summarised for the 
particular case of the Eu349 correlator chip. Close link-
ing of design and test has enabled the architecture of the 
Eu349 to achieve a high degree of testability at a very 
low overhead. 
- 
Yield enhancement through the use of redundant circu-
itry is of central importance to the design of the Eu349 
correlator chip. Yield models and yield enhancement tech-
niques, have been described. A binomial model for the 
yield of redundantly designed circuits has been presented, 
and the curves produced by the model show that the yield 
of an array of identical modules, such as in the Eu349 
correlator, increases rapidly with the addition of spare 
modules, but saturates to a level determined by the defect 
density and the uncorrectable area of the chip. 
In Chapter 5, the architecture and design of the 
Eu349 digital correlator has been described. Additions to 
the basic architecture, that make possible built-in self 
test and self repair strategies have been discussed. The 
net result of the design strategy, that closely links 
design to test, is a well structured, and fault tolerant 
VLSI architecture. 
The test results show the correct operation of the 
device as a correlator, and demonstrate the principles of 
self test and self repair. Results from the first batch 
of processed wafers have demonstrated that yield can be 
improved considerably at a very low cost in circuit over-
head; the initial sample's yield enhancement factor was 
9.0 for 130 chips tested. In addition, any of these chips 
can be given an exhaustive functional test in less than 
150 ins at 1 MHz. 
6.2. Further Work 
The work described by this thesis provides a signifi-
cant base for further research. Both the self repair 
aspect of the VLSI architecture, and the advantages it 
holds for high speed digital correlation would be worth 
further investigation. One such project would involve 
- 165 - 
redesigning the correlator array on to a wafer of its own 
so that many thousands of the chips may be made With 
such large numbers of test candidates, a comprehensive 
yield model for the fabrication process could be esta-
blished. Another research topic would be to expand the 
correlation architecture and self repair technique to mul-
tibit direct-digital correlation. 
The investigation of large area silicon systems is 
rapidly becoming an important topic in microelectronics 
research. The correlator architecture discussed here 
would play a significant role in the development of a 
wafer scale, or large area silicon system. If, for exam-
ple, the correlator were fabricated on a 2pm CMOS process, 
the 7 mm x 7 mm chip would contain approximately 256 
parallel stages of correlation. Cascades of these chips 
would provide very attractive high speed, high resolution 
correlation systems. 
In the prototype device, the control circuitry has 
not been included on chip. An interesting situation can 
be envisaged where each chip contains the required control 
circuitry to supervise any arbitrary length of correlation 
cascade. When these chips are cascaded, either discretely 
or as part of a wafer scale system, a second tier of fault 
tolerance can be introduced. This situation would be 
achieved if each correlator control circuit could be iso-
lated from the correlation system. The system would con-
sist of a cascade of identical chips, each with their own 
controller. However, only one controller in the entire 
cascade may be active at any time. The important fact is 
that it would not matter which controller was active. 
Thus, for a cascade of four correlator chips, there would 
be three redundant control circuits. The active control 
circuit, in addition to controlling the correlation array, 
would also control the other redundant control circuits. 
- 166 - 
This controller, the "master" chip, would signal all other 
control circuits to adopt their transparent mode. The 
system would be reconfigurable. Thus, in addition to the 
normal self repair and reconfiguration of the correlator 
array stages, the "master" controller can be reselected, 
if it is found to be defective. This concept has been 
investigated in a Master of Science degree project, and 
silicon layout has been produced for an overloading corre-
lator with such a " master " controller [193]. 
In conclusion, the design of regular, cascadable VLSI 
architectures for high speed digital correlation, coupled 
with low circuit-overhead self test and self repair stra-
tegies, holds potential for the fabrication of high yield-
ing large area silicon systems. 
- 167 - 
ACKNOWLEDGEMENTS 
I offer sincere thanks to Dr. Mervyn A. Jack and Dr. 
James B. Jordan, my supervisors in this research work. I 
would also like to thank my colleagues in the Department 
of Electrical Engineering, and Wolfson Microelectronics 
Institute for their help and encouragement. Thanks are 
also due to the staff of the Edinburgh Microfabrication 
Facility for processing the integrated circuits. 
I also thank my wife Alison, and my family, for their 
constant help and encouragement. 
APPENDIX I 
EU349 COPRELATOR DESIGN 
A1.1. Introduction 
The Eu349 prototype polarity correlator is a monol-
ithic- n-channel MOS integrated circuit. The VLSI struc-
ture implements polarity correlation using an overloading 
integrating counter technique. The device architecture 
permits the direct cascading of individual correlator 
chips without the need for additional components, to give 
complete flexibility in choice of correlator delay and 
resolution. Additional features include programmable 
integration time, built-in self test, and built-in self 
repair capabilities. 
The prototype device consists of a cascade of 28 
identical correlation stages. Each stage comprises a 
delay element (DSR), an exclusive-NOR gate for the 
multiplication/comparison process of correlation, a 15-bit 
programmable integrating counter, and a counter overload 
latch. In addition the chip contains a parallel-
in/serial-out shift register (OSR) for serially shifting 
the values of the correlation function off chip, and a 
parallel-in/parallel-out shift register (MCR) used to con-
trol the self repair multiplexers. There are two multi-
plexers per stage. 
- 169 - 
A1.2. Silicon Design 
The Eu349 correlator chip consists of a correlator 
array and peripheral circuitry. The correlator array is 
composed of 28 identical modules. Each module, or corre-
lation stage, consists of a further level of sub-modules. 
The design is structured in such a way that correlator 
stages may be cascaded by simple abutment. The correlator 
chip is composed of the following hierarchy: 
	
<chip> 	: 	<array><peripheral circuitry> 
<array> : <28 x <STG100>> 
<STG100> 	: 	<DSR1O><COMP1O><MCR1O><OSRIO><OL1O><PNIOO> 
<PNIOO> : <PN30><7 x <PN10>><PN20> 
where the module names have the following meanings: 
<STG100> : 	 Correlator Stage 
<DSR10> : 	 Data Shift Register and Mux. 
<COMP10> : 	 Comparator and PN100 Clock Buffer 
<MCR10> : 	 Multiplexer Control Register 
<OSPIO> : 	 Overload Shift Register and MUX. 
<OL10> : 	 Overload Latch 
<PN100> : 	 Integrating Counter 
<PN30> : 	 Stage 15 and Feedback EXNOR 
<PNIO> : 	 Repeated Section of Counter 
<PN20> : 	 Link between Stages 7 and S 
A block diagram of the integrating counter, module 
PN100, is shown in Figure A1.1. The counter consists of a 
cascade of 15 semistatic shift register stages, and one 
exclusive NOR module. The reasons for implementing the 
integrating counter in the shape of a ring with semistatic 
shift register elements, and the reasons for choosing 
exclusive NOR feedback instead of exclusive OR, are 
- 170 - 
discussed in Section 5.4.2. The overload detect circui-
try, which consists of a 15 input NOB gate, is distributed 
throughout the counter. 
Integrating counter start code 
	
parallel input 	 VSS 








14 13 12 11 10 9 8 Detect - 
VDD - 	___________________________________ 
vss 
Figure A1.1. Block diagram of integrating counter. 
The integrating counter is composed of three sub-
modules: PN10, PN20, and PN30. Module PN10 contains two 
shift register stages: stage n and stage 15 - n, where 1 
7, and is replicated seven times along the integrating 
counter. Module PN20 completes the connection between 
shift register stages 7 and 8, and provides the VSS con-
nection to the correlator array. Module PN30 provides the 
exclusive NOR feedback connection of the integrating 
counter, incorporates shift register stage 15 and the 
depletion mode pull-up transistor that forms part of the 
overload detect NOR gate. 
The shift register stages that make up the integrat-
ing counter are also used in other modules in the 
- 171 - 
correlator design. Simulation results for this basic sem-
istatic shift register stage are shown in Figure A1.2. 
The input data for this simulation is shown in Figure 
A1.3. Figure A1.2 shows four voltage traces. The non-
overlapping clocks, pl and tp2, are drawn together in the 
Lop ;rid. The middle grid shows the input waveform to the 
shift register, and the lower grid shows the output 
waveform. The figure shows that the input data, which is 
sampled on (p1, appears on the output when (p2 becomes 
active. The simulation shows the shift register working 
at 4 MHz. 











TEMP = TNOM 
2 
time (micro-5econE) 
Figure A1.2. Simulation of basic shift register element. 
- 172 - 
SEMISTATIC SHIFT REGISTER 
.SUBCKT INVKB 10 20 500 100 
MEl 20 10 0 100 MENH2 6U 12U 
MD1 500 20 20 100 MDEP2 24U 6U 
.ENDS INVK8 
.SUBCKT INVK4 10 20 500 100 
MEl 20 10 0 100 MENH2 6U 12U 
MD1 500 20 20 100 MDEP2 12U 6U 
.ENDS INVK4 
.SUBCKT MINPAS 10 20 30 100 
MP1 10 20 30 100 MENH2 6U 6U 
.ENDS MINPAS 
.SUBCKT SRSS 10 20 30 40 500 100 
XPI 10 30 35 100 MINPAS 
XP2 35 40 45 100 MINPAS 
XP3 55 40 65 100 MINPAS 
XN1 35 55 500 100 INVK8 
XN2 55 45 500 100 INVK4 
XN3 65 20 500 100 INVKS 
.ENDS SRSS 
VDD 500 0 DC 5 
VBB 100 0 DC -2.5 
VP1 30 0 PULSE 0 5 20N 4N 2N lOON 250N 
VP2 40 0 PULSE 5 0 16N 4N 2N liON 250N 
VIN 5 0 PULSE 5 0 2N 8N 4N 134N 500N 
XSRI 10 20 30 40 500 100 SRSS 
CLOAD 20 0 0.05P 
.TRAN 5N 1000N 
.GRAPH TRAN V(10) '1(20) V(30) V(40) 
.WIDTH OUT=80 
.MODEL MENH2 NMOS (LEVEL=2 VTO=0.75 GAMMA=0.46 
+CGSO=4. 5E- 10 CGDO=4. 5E-10 CJ=1 .OE-4 CJSW=1 .OE-9 JS=1 . OE-7 
+TOX=8E-8 NSUB=8.5E14 NFS=1E10 XJ=1.5U LD=1.25U U0=700 
+UEXP=0.1 UTRA0.3 VMAX=5E4 NEFF=3.0 XQC=0.4 DELTAI.0) 
.MODEL MDEP2 NMOS (LEVEL=2 VTO=-4.7 GAMMA=0.7 
+CGSO4. 5E- 10 CGDO=4. 5E-10 CJ=1 .OE-4 CJSW=1 .OE-9 JS=1 .OE-7 
+TOX=8E-8 NSUB=2.0E15 NFS=1E10 XJ=1.5U LD=1.25U U0=550 
+UEXP=0.1 UTRA=0.3 VMAX=5E4 NEFF=3.0 XQC=0.4 DELTA=1.0) 
• END 
Figure A1.3. Input data for shift register simulation. 
A1.3. Peripheral Circuitry Design 
The peripheral circuitry consists of input and output 
buffers. 	It also contains three combinational logic cir- 
cuits for the generation of control signals. 	The output 
- 173 - 
buffers are standard library designs, slightly modified to 
fit the available space. The input buffers are designed 
according to the capacitive load that they must drive so 
that the chip can operate at 4 MHz. The capacitance is 
determined by calculating the number of gates to be 
driven, and the area of interconnect. A buffer with a 
drive capability of 5pF in a rise time of 20ns is adequate 
for all but two input pads. The remaining inputs are ( pl, 
which requires a drive capability of 12pF in 20ns, and p2, 
which requires a drive capability of 65pF in 20ns. Cir -
cuit simulations for each of these buffers are shown in 
Figures A1.4 to A1.9. 






Figure A1.4. Simulation results for 5pF input buffer. 
- 174 - 
INPUT BUFFER TO DRIVE 5PF IN 20NS 
.SUBCKT BUFF 10 30 500 100 
MEl 20 10 0 100 MENH2 6U 96U 
MD1 500 20 20 100 MDEP2 6U 24U 
ME2 30 20 0 100 MENH2 6U 96U 
MD2 500 10 30 100 MDEP2 6U 24U 
ME3 500 10 30 100 MENH2 6U 72U 
.ENDS BUFF 
VDD 500 0 DC 5 
VBB 100 0 DC -2.5 
VIN 5 0 PULSE 0 5 iON iON iON 60N 160N 
RCABLE 5 6 50 
CCABLE 6 0 50P 
RP61O1K 
CP 10 0 IP 
Xl 10 20 500 100 BUFF 
CL 20 0 5P 
.TRAN 2N 200N 
.GRAPH TRAN V(5) V(10) V(20) 
.MODEL MENH2 NMOS (LEVEL=2 VTO=0.75 GAMMA=0.46 
+CGSO=4.5E-10 CGDO=4.5E-10 CJ=1.OE-4 CJSW=1.OE-9 JS=1.OE-7 
+TQX=8E-8 NSUB=8.5E14 NFS=1E1O XJ=1.5U LD=1.25U U0=700 
+UEXP=O.1 UTRA=0.3 VMAX=5E4 NEFF=3.0 XQC=0.4 DELTA=1.0) 
.MODEL MDEP2 NMOS (LEVEL=2 VTO=-4.7 GAMMA=0.7 
+CGSO4. 5E- 10 CGDO=4. 5E-IO CJ=1 . OE-4 CJSW=i .OE-9 JS=1 .OE-7 
-4-TOX=8E-8 NSUB=2.0E15 NFS=IE10 XJ=1.5U LD=1.25U U0=550 
+UEXP=O.1 UTRA=0.3 VMAX=5E4 NEFF=3.0 XQC=0.4 DELTA=1.0) 
.END 
Figure A1.5. Simulation data for 5pF input buffer. 
- 175 - 






 C; :- ; C, nr-srd
Figure A1.6. Simulation results for 12pF input buffer. 
- 176 - 
INPUT BUFFER TO DRIVE I2PF IN 20NS 
.SUBCKT BUFF 10 30 500 100 
MEl 20 10 0 100 MENH2 60 96U 
MD1 500 20 20 100 MDEP2 6U 24U 
ME2 30 20 0 100 MENH2 6U 3840 
MD2 500 10 30 100 MDEP2 6U 96U 
ME3 500 10 30 100 MENH2 60 288U 
.ENDS BUFF 
VDD 500 0 DC 5 
VBB 100 0 DC -2.5 
VIN 5 0 PULSE 0 5 iON ION iON 60N 160N 
RCABLE 5 6 50 
CCABLE 6 0 50P 
RP 6 10 1K 
CP 10 0 1P 
Xl 10 20 500 100 BUFF 
CL 20 0 15P 
.TRAN 2N 200N 
.GRAPH TRAN V(5) V(10) V(20) 
.MODEL MENH2 NMOS (LEVEL=2 VTO=0.75 GAMMA=0.46 
+CGSO4. 5E- 10 CGDO=4. 5E-10 CJ=1 .OE-4 CJSW=1 .OE-9 JS=1 .OE-7 
-4-TOX=8E-8 NSUB=8.5E14 NFS=IE10 XJ=1.5U LD=1.25U U0=700 
+UEXPO.1 UTRA=0.3 VMAX=5E4 NEFF3.0 XQC0.4 DELTA1.0) 
.MODEL MDEP2 NMOS (LEVEL=2 VTO=-4.7 GAMMA=0.7 
+CGSO4. 5E- 10 CGDO4. 5E-10 CJ=1 .OE-4 CJSW=1 .OE-9 JS=1 .OE-7 
+TOX8E-8 NSUB=2.OEI5 NFS=IE10 XJ=1.50 LD=1.25U UO=550 
+UEXP=0.1 UTRA=0.3 VMAX=5E4 NEFF=3.0 XQC=0.4 DELTA=1.0) 
.END 
Figure A1.7. Simulation data for 12pF input buffer. 









Figure A1.8. Simulation results for 65pF input buffer. 
- 178 - 
INPUT BUFFER TO DRIVE 65PF IN 20NS 
.SUBCKT BUFF65 10 30 500 100 
MEl 20 10 0 100 MENH2 6U 384U 
MD1 500 20 20 100 MDEP2 6U 96U 
ME2 30 20 0 100 MENH2 6U 3072U 
MD2 500 10 30 100 MDEP2 6U 768U 
ME3 500 10 30 100 MENH2 6U 2304U 
.ENDS BUFF65 
VDD 500 0 DC 5 
VBB 100 0 DC -2.5 
VIN 5 0 PULSE 0 5 ION iON iON 60N 160N 
RCABLE 5 6 50 
CCABLE 6 0 50P 
RP 6 10 1K 
CP 10 0 1P 
Xl 10 20 500 100 BUFF65 
CL 20 0 lOOP 
.TRAN 2N 200N 
.GRAPH TRAN V(5) V(10) V(20) 
.MODEL MENH2 NMOS (LEVEL=2 VTO=0.75 GAMMA0.46 
+CGSO4.5E-10 CGDO=4.5E-10 CJ=1.OE-4 CJSW=1.OE-9 JS=I.OE-7 
+TOX=8E-8 NSUB8.5E14 NFS=IE10 XJ=1.5U LD=1.25U U0=700 
+UEXP=0.1 UTRA0.3 VMAX=5E4 NEFF=3.0 XQC=0.4 DELTA=1.0) 
.MODEL MDEP2 NMOS (LEVEL2 VTO=-4.7 GAMMA=0.7 
+CGSO4. 5E- 10 CGDO4. 5E-10 CJ=1 . OE-4 CJSW=1 .OE-9 JS=1 .OE-7 
+TOX8E-8 NSUB=2.0E15 NFS=IE10 XJ=1.5U LD=1.25U UO=550 
+UEXP=0.1 UTRA0.3 VMAX=5E4 NEFF=3.0 XQC0.4 DELTA=1.0) 
.END 
Figure A1.9. Simulation data for 65pF input buffer. 
Combinational logic is included in the peripheral 
circuitry to generate control signals for the DSP, OSR and 
NCR. These circuits are shown in Figures A1.10 to A1.12 
respectively. The circuits perform input buffering in 
addition to their logic functions. 












Figure A1.11. Logic for generating OSR control signals. 
MCR—hold 
MCR—shift 
.Mu(—noIa.Mt-c — Sfllfl 
i .MCR—shift 
Figure A1.12. Logic for generating MCR control signals. 
EMIRIGM  
A1.4. Power Supply Considerations 
Supply currents for logic gates are calculated under 
the approximations that the pull-down's resistance is 
negligible, and that the the pull-up transistor is in 
saturation. Thus, 
'sat = k'(Vgs_Vth) 2 Vds > (Vgs_Vth)1 	 A1.1 
0.3 	mA, 
where: 
k' Z 20pA/V 2 , 
V ZOV, gs 
Vth Z -4V,and 
V 5V. ds 
A1.2 
W and L are the width and length of the active area of the 
pull-up device. 
The VDD and VSS metal line width is determined by the 
current carrying capability of the aluminium tracks. The 
metal migration limit is estimated to be, 1uA/LiI 2 , where 
the metal thickness is 11im. Thus, a 101im metal track can 
carry lOmA. Taking the worst case condition where all 
inverters in the correlator chip are turned on, a supply 
current estimate can be estimated to be -'-150mA. There-
fore, the VDD and VSS metal line widths must be 150pm. 
However, since there are two each VDD and VSS pads, the 
major power line widths can be reduced to 75pm. 
- 181 - 
APPENDIX 2 
EU349 TEST SCHEDULE 
A2.1. Introduction 
The Test Schedule consists of a series of simple test 
procedures. Each procedure is presented as a subsection 
containing a list of the individual tests required to ver-
ify a specific chip function. 
The test schedule procedures are summarised in Table 
A2. 1. 
TABLE A2.1 
Summary of Test Schedule 
A2.2. Test MCR as shift register. 
A2.3. Test OSR and DSP as shift registers. 
A2.4. Test MCR effect on both OSR and DSP. 
A2.5. Test SET and CLEAR features of DSP. 
A2.6. Test latches and MCR parallel load. 
A2.7. Test latches and OSR parallel load. 
A2.8. Verify self test sequence. 
A2.9. Verify self repair sequence. 
A2.10. Verify correlation performance. 
A2.2. Test MCR as Shift Register 
The Multiplexer Control Register is a critical ele- 
ment in the correlator circuit. 	It is required to be 
functional and neutralised before other tests 	are 
- 182 - 
attempted. 	(It is neutralised by flushing with zeros and 
holding the all zero state static.) 
Perform a flush test on the. MCP. 	(Flush test is 
described in Section 4.3.3.) 
Perform a shift test by shifting 	the pattern 
00110011... 	through 	the 	MCR. 	(Shift test is 
described in Section 4.3.3.) 
Check that the time delay between MCR input and MCR 
output is 28 clock cycles. 
Neutralise MCP. 
A2.3. Test OSR and DSR as Shift Registers 
The MCR must be neutralised before carrying out this 
test. 
Neutralise MCR. 
Perform a flush test on both the OSR and DSP. 
Perform a shift test on both the OSR and DSP. 
Check that the time delays through the OSR and DSR 
are 28 clock cycles. 
A2.4. Test MCR Effect on both OSR and DSR 
This test should be repeated with the input patterns 
11001100..., 01100110..., 00110011... and 10011001.... 
1. 	Shift a binary pattern containing n ones into the 
MCP. 
- 183 - 
Hold MCR contents static. 
Flush test, and shift test the OSD and DSP. 
Check that the time delays through the OSR and DSP 
are 28 - n clock cycles. Note that the bypass circu-
itry is not guaranteed to work correctly if n con-
tains blocks of more than 3 ones together. 
A2.5. Test SET and CLEAR Features of DSR 
Neutralise the MCR before perfoming this test. 
Parallel load logic ones into DSR (SET DSP). This is 
done by asserting the control signals DSR-pl and 
DSP-s/c both HIGH. 
Serially shift 28 zeros into DSP and observe DSP out-
put. 
Serially shift 28 logic ones into DSP. 
Parallel load zeros into DSP (CLEAR DSR). 	Assert 
DSP-pl control HIGH, and DSR-s/c control LOW. 
Serially shift 28 logic ones into DSP and observe DSP 
output. 
A2.6. Test Latches and MCR Parallel Load 
Neutralise the MCP before performing this test. This 
test sets/resets the overload latches, and transfers the 
latch contents to the MCR. The normal input to the latch, 
that is, the overload detect output from the integrating 
counter, cannot be disabled. Therefore, the integrating 
counter start word must be set to 4000-hex, or some other 
suitably large number, to prevent the integrating counter 
generating an overload detect output during the test. 
1* 
(The RESET control also loads the integrating start word.) 
1. 	Reset all latches using RESET control signal. 
2 	Parallel load latches into MCR. 	Assert MCR-shift 
LOW; MCR-hold LOW. 
Shift 28 zeros into MCR and observe MCR serial out-
put. Assert MCR-shift HIGH. 
Reset again all latches using RESET control signal. 
Set all latches using F-Test control signal. 	Assert 
F-test HIGH. 
Parallel load MCR. 
Shift 28 zeros into MCR and observe MCR serial out-
put. 
A2.7. Test Latches and OSR Parallel Load 
Neutralise the MCR before performing this test. This 
test sets/resets the overload latches, and transfers the 
latch contents to the OSR. The normal input to the latch, 
the overload detect output from the integrating counter, 
cannot be disabled. Therefore, the integrating counter 
start word must be set to 4000-hex, or some other suitably 
large number, to prevent the integrating counter generat-
ing an overload detect output during the test. (The RESET 
control also loads the integrating start word.) 
1. 	Reset all latches using RESET control signal. 
2 	Parallel load latches into OSR. Assert OSR-pl HIGH. 
- 185 - 
Shift 28 zeros into OSR and observe OSR serial out-
put. Assert OSR-pl LOW. 
Reset again all latches using RESET control signal. 
Set all latches using F-Test control signal. 	Assert 
F-test HIGH. 
Parallel load OSR. 
Shift 28 zeros into OSR and observe OSR serial out-
put. 
A2.8. Self Test Sequence 
The self test sequence consists of four repetitions 
of a single test. The test begins by reseting the corre-
lator and setting up the initial conditions to each corre-
lation stage. The objective of the test is to make each 
correlator stage correlate the same data. They should all 
then produce the same result which can easily be verified 
using the F-test signal to emulate the expected good 
response of the correlator array. The test is repeated 
for each of the four combinations of input data, xy = 00, 
01, 11, 10. The F-test signal is set HIGH to correspond 
to the expected overload detect pulse at the end of the 
integration period of the • 0 11 input combination. It is 
again set HIGH for the duration of the expected overload 
at the end of the integration period of the "11" combina-
tion. At all other times F-test is LOW. 
Neutralise the MCR before performing this test. 
1. 	Reset all latches and load 4000-hex into integrating 
counters using the RESET control. This number 
represents the maximum integration time of the corre-
lator. 
:. 
Clear the DSR using the DSR-pl and DSR-s/c controls 
(DSR-pl HIGH; DSP-s/c LOW). Set up the input condi-
tions with x = 0 and y = 0, and shift zeros into the 
DSR. 
Emulate the expected response of the circuit with F- 
test signal. 	In this case F-test is set HIGH to 
coincide with the expected overload. 
Set up the input conditions with x = 0 and y = 1, 
shift zeros into the DSR. 
Emulate the expected response of the circuit with F-
test signal. In this case F-test is held LOW. 
Set the DSP using the DSP-pl and DSP-s/c controls 
(DSR-pl HIGH; DSP-s/c HIGH). Set up the input condi-
tions with x = 1 and y = 1, shift logic ones into the 
DSP. 
Emulate the expected response of the circuit with F- 
test signal. 	In this case F-test is set HIGH to 
coincide with the expected overload. 
Set up the input conditions with x = 1 and y = 0, 
shift logic ones into the DSR. 
Emulate the expected response of the circuit with F-
test signal. In this case F-test is held LOW. 
Parallel load latches into the OSP. 
Shift contents of the OSR to observe the number of 
detected faults. 
- 187 - 
A2.9. Self Repair Sequence 
This test directly follows the self test sequence. 
The self test status of the correlator array is stored in 
the overload latches. The self repair sequence consists 
simply of transferring this information to the MCR. 
Parallel load MCR using the controls MCR-shift and 
MCR-hold (both LOW). 
Hold contents of MCR static (MCR-shift LOW; MCR-hold 
HIGH). 
The self test procedure may now be repeated, without 
neutralising the MCR, to verify the self repair pro-
cedure. 
A2.10. Correlation Test Sequence 
Correlation test is similar to the self test sequence 
except that the F-test signal is inactive, that is, held 
LOW. As in self test, the data inputs are cycled through 
the four combinations 00, 01, 11, 10. This test is per-
formed using a short integration time so that the results 
of all four input combinations may be displayed together 
on the DAS screen. The integration time is chosen to be 
less than the length of the correlator array, which in the 
case of one Eu349 device is 28 delay stages. This means 
that when the input conditions are such that an overload 
should occur after the specified integration time, then it 
can be displayed by parallel loading the OSR and shifting 
out its contents every 28 clock cycles. 
The correlation test may be carried out either before 
or after self repair has been carried out. If it is to be 
performed before self repair, then the MCR must be neu-
tralised; OSR parallel load and RESET pulses should occur 
-188- 
every 28 clock cycles. In this case the test will show 
the response of all correlation stages, including faulty 
ones if they exist. When the test is performed after self 
repair, the MCR must be held static to maintain the confi-
guration of the array. Also, the OSP-pl and RESET pulses 
must occur every n clock cycles, where n represents the 
effective length of the correlator array,-i.e. n is the 
number of zeros held in the MCR. 
Correlation test requires the following sequence of 
events: 
Reset all latches and load 5555-hex into integrating 
counters using the RESET control. This number 
represents an integration time of 15 clock cycles. 
It is implied here that 5555-hex will be loaded on 
each subsequent RESET pulse. 
At the same time clear the DSP using the DSR-pl and 
DSP-s/c controls (DSR-pl HIGH; DSP-s/c LOW). Also, 
set up the input conditions with x = 0 and y = 0, and 
shift n zeros into the DSR. 
Parallel load the OSR (OSR-pl HIGH) and RESET corre-
lator. 
At the same time set up the input conditions with x = 
0 and y = 1, shift n zeros into the DSP. 
Parallel load the OSR (OSP-pl HIGH) and RESET corre-
lator. 
At the same time set the DSP using the DSR-pl and 
DSP-s/c controls (DSR-pl HIGH; DSP-s/c HIGH). Set up 
the input conditions with x = 1 and y = 1, shift n 
logic ones into the DSP. 
Parallel load the OSR (OSR-pl HIGH) and RESET corre-
lator. 
At the same time set up the input conditions with x = 
I and y = 0, shift n logic ones into the DSP. 
Parallel load the OSR (OSR-pl HIGH) and RESET corre-
lator. 
Shift contents of the OSR to observe the results of 
paragraphs 7 and 8 above. 
- 190 - 
APPENDIX 3 
EU349 TEST CONFIGURATION 
A3. 1. Introduction 
The specific configuration for the Digital Analysis 
System (DAS) is described here. The description is 
divided into seven subsections which describe the indivi-
dual menus and POD configurations, from the data probes, 
data acquisition, display and triggering menus, to the 
pattern generator program and instruction codes. 
A3.2. Prototype Test Configuration 
The bonding diagram and pin arrangement of the proto-
type IC is shown in Figure A3.1. 






























x (DSR) o/p 
DSR-/c DSR-pl OSR-P1 MCR-hold t4CR—shift 
Figure A3.1. Eu349 bonding diagram. 
Patterns of input and control data are generated by 
the "pattern generator" section of the DAS. Outputs from 
the chip under test, and if required, inputs and control 
signal are acquired and displayed by the "data acquisi-
tion" section of the DAS. DAS input and output is 
achieved through data probes. 
A3.3. DAS Data Probes 
The DAS is composed, for the purposes of this test, 
of the following modules: 
One 91A32 Data Acquisition Module. 
One 91P16 Pattern Generator Module. 
- 192 - 
3. 	One 91P32 Pattern Generator Module. 
Each module has specific data probe connections, which are 
described here. 
A3.3.1. 91A32 Data Acquisition Module 
One 91A32 Data Acquisition Module with a maximum of 
32 data channels at 25 MHz. is installed. In this case 16 
channels are used to acquire and display inputs to, and 
outputs from the chip under test. These channels are 
listed in Table A3.1. 
POD 
TABLE A3.1 
Assignments for 91A32 Data Acquisition Module 
Channel Function Channel Function 
POD 2A 0 OSR 0/p POD 2B 0 MCR-hold 
POD 2A 1 MCR 0/p POD 2B 1 MCR-shift 
POD 2A 2 y i/p POD 2B 2 x (DSP) 	0/p 
POD 2A 3 x (DSP) 	i/p POD 2B 3 RESET 
POD 2A 4 pl POD 2B 4 MCP i/p 
POD 2A 5 DSP-s/c POD 2B 5 OSP i/p 
POD 2A 6 DSR-pl POD 2B 6 OVRFLO 
POD 2A 7 OSR-pl POD 2B 7 F-TEST 
POD 2AQ ... POD 2BQ 
A3.3.2. 91P16 Pattern Generator Module 
One 91P16 Pattern Generator Module with a maximum of 
16 data channels at 25 MHz. is installed. In this case 11 
channels are used to provide all the control signals 
necessary for the correlator chip. These channels are 
listed in Table A3.2. 
- 193 - 
POD 
TABLE 
Assignments for 91P16 
A3.2 
Pattern Generator Module 
Channel Function Channel Function 
POD lB 0 MCR i/p POD 1C 0 MCR-hold 
POD lB 1 OSR i/p POD IC 1 MCR-shift 
POD IB2 yi/p POD 1C2 - 
POD lB 3 x 	(DSR) 	i/p POD 1C 3 - 
POD lB 4 DSR-pl POD 1C 4 OSR-pl 
POD lB 5 DSR-s/c POD IC 5 - 
POD IB6 - POD IC6 - 
POD lB 7 RESET POD 1C 7 F-TEST 
POD lB STRB pl POD 1C STRB p2 
POD lB CLK - POD 1C CLK - 
A3.3.3. 91P32 Pattern Generator Module 
One 91P32 Pattern Generator Module with a maximum of 
32 data channels at 25 MHz. is installed. In this case 15 
channels are used to provide the parallel input, ii to 
i15, to the integrating counters of the correlator chip. 
These channels are listed in Table A3.3. 
- 194 - 
POD 
TABLE A3.3 
Assignments for 91P32 Pattern Generator Module 
Channel Function Channel Function 
POD 4A 0 ii 	(LSB) POD 4B 0 19 
POD 4A 1 i2 POD 4B I ilO 
POD 4A 2 i3 POD 4B 2 ill 
POD 4A 3 i4 POD 4B 3 i12 
POD 4A 4 15 POD 4B 4 113 
POD 4A 5 16 POD 4B 5 i14 
POD 4A 6 17 POD 4B 6 115 
POD 4A7 i8 POD 4B7 
POD 4A STRB ... POD 4B STRB 
POD 4A CLK ... POD 4B CLK 
The permitted values which may be given to the bits 
ii to i15 are summarised in the section on Pattern Genera-
tor Instruction Codes, Section A3.8. 
A3.4. Channel Specification 
The Channel Specification menu is for controlling the 
display format of the data acquisition channels. It 
divides channels into groups, sets display radix and 
polarity values, and determines probe input thresholds. 
Table A3.4 shows the grouping of the acquisition 
channels and their POD IDs into data inputs, data outputs, 
and control signals. 
- 195 - 
TABLE A3.4 
Grouping of Acquisition Data 
Group Name POD ID Function 
x 	(DSP) 	i/p 2A 3 
yi/p 2A2 
A Data Inputs 
OSR i/p 2B 5 
MCR i/p 2B 4 
x (DSP) 	0/p 2B 2 
B OSR 0/p 2A 0 Data Outputs 
MCP 0/p 2A 1 
C OVRFLO 2B 6 Overflow Flag 
DSP-s/c 2A 5 
D DSP Control 
DSR-pl 2A 6 
E OSR-pl 2A7 OSR Control 
MCR-shift 2B I 
F MCP Control 
MCP-hold 2B 0 
0 RESET 2B 3 Reset Latches & 
Load Counters 
1 F-TEST 2B 7 Set Latches or 
Fault Repair 
2 p1 2A 4 Chip's 	pl 
The display radix and polarity fields (not shown) are 
set to binary and positive respectively. The probe input 
thresholds are all set to 2.6 volts MOS. PODs 2D and 2C, 
which are not required in the correlator test, are unas-
signed. 
- 196 - 
A3.5. Timing Diagram 
Once in memory, acquired data may be displayed in a 
timing diagram format. In this format the DAS displays up 
to 16 logic waveforms-representing the high and low states 
in each clock cycle. Screen editing is used for viewing 
different portions of memory, altering the display magnif-
ication, and for labelling and rea rranging the channel 
orders. 
Table A3.5 shows how the channels are labelled and 
rearranged for the correlator test. 
- 197 - 
TABLE A3.5 
Arrangement and Labelling of Channels for Display 
POD ID Name Display 
2A 3 x 	(DSP) 	i/p 
2A 2 y i/p input data 
2B5 OSRi/p 
2B 4 MCR i/p 
2B 2 x 	(DSP) 	o/p 
2A 0 OSP 0/p output data 
2A 1 MCR 0/p 
2B 6 OVRFLO 
2A 5 DSP-S/C 
2A 6 DSR-pl 
2A 7 OSR-pl 
2B I MCR-shift 
CONTROL SIGNALS 
2B 0 MCR-hold 
2B 3 RESET 
2B 7 F-TEST 
2A 4 cpl CHIP'S ç1 
A3.6. Trigger Specification 
The Trigger Specification menu is for controlling the 
modules used during data acquisition. It specifies which 
modules are used, their clock rates, clock qualifiers, and 
trigger parameters. 
For the correlator test only one 91A32 data acquisi-
tion module is used, and it is operated from the DAS 
internal clock. The trigger word is positioned at the 
beginning of the acquisition memory. The acquisition 
memory of the DAS is not large enough to store all data 
from the correlator chip during a complete test, so suit-
able trigger words must be specified to acquire the 
desired portion of the test results. 
A3.7. Pattern Generator - Timing 
The Timing sub-menu of the pattern generator is for 
entering the characteristics of the strobe signals 
asserted in the Program sub-menu. It is also used to 
select the pattern generator's start mode, either single 
step or run. 
Figure A3.2 shows the Timing sub-menu. It indicates 
that STROBE 0, which is the output line labelled STRB from 
POD 1B, is set up to perform the t.pl function in the corre-
lator chip; and that STROBE 1, the STRB output line from 
POD 1C, is set up to perform the p2 function. (See also 
Table A3.2.) 
POD WIDTH DELAY 	SHAPE 
 
STROBE 1 10 70ns 









DAS output clock 
l000ns 
Figure A3.2. DAS Pattern Generator Timing Sub-Menu and 
clock strobes. 
- 199 - 
Figure A3.2 illustrates the specified features of the 
two strobe signals. STROBE 0 is the short duration clock 
phase. When STROBE 0 is high, information id transferred 
from one circuit element to another. STROBE 1 is the long 
duration clock phase. During this phase information is 
stored and maintained by the semistatic clocking scheme 
adopted in the correlator design. 
A3.8. Pattern Generator Instruction Codes 
This section provides a key to the pattern generator 
program instructions. The program, which is given in the 
next section, consists of a sequence of in-line instruc-
tions, each containing, inter alia, three fields for gen-
erating a bit pattern on the 48 (maximum) output lines. 
In this case, the fields of interest are the two relating 
to the PODs 4B and 4A, and PODS lB and 1C. 
Tables A3.6 and A3.7 show a list of correlator chip 
functions and their associated codes in hexadecimal; 
Table A3.6 is concerned with data input signals to the 
chip, while Table A3.7 is concerned with the function con-
trol signals. 
- 200 - 
TABLE A3.6 
DAS Instruction Codes for Correlator Input Signals 
Input Data POD4BA PODICB 
x, 	DSP 	(0) 0000 0000 
x, 	DSP 	(1) 0000 0008 
Y 	(0) 0000 0004 
y 	(1) 0000 0000 
OSP (0) 0000 0000 
OSP 	(1) 0000 0020 
MCR (0) 0000 0000 
MCR (1) 0000 0001 
TABLE A3.7 
DAS Instruction Codes for Correlator Control Signals 
and Integrating Counters Start Value (ICSV). 
Chip Function POD4BA POD1CB 
OSR serial shift 0000 0000 
OSR parallel load 0000 1000 
DSP serial shift 0000 0000 
DSP set all ones 0000 0030 
DSP set all zeros 0000 0010 
IVICP serial shift 0000 0200 
MCR parallel load 0000 0000 
MCR hold contents 0000 0100 
RESET 	load & reset ICSV 0080 
F-TEST set latches 0000 8000 
A particular program instruction is obtained by OR- 
ing the required function and data input codes in Tables 
- 201 - 
A3.6 and A3.7. 
Whenever the RESET function is selected, which resets 
the correlator latches and loads the integrating counters 
to their start value, then that value must be supplied in 
the menu field for PODs 4B and 4A. Illegal start values 
are: 
15 zeros (0000-hex or 8000-hex), which represents a 
zero integration time. 
15 ones (7FFF-hex or FFFF-hex), which represents an 
infinite integration time. 
Apart from these two conditions there are 32766 permitted 
values. Some values, produced by simulating the integrat-
ing counter, are listed in Table 5.5. 
A3.9. Pattern Generator - Program 
The Program sub-menu of the DAS pattern generator is 
for entering the program instructions, and for selecting 
the output clock and strobe signals. 
The pattern generator program is listed in Figure 
A3.3. The output clock in this case, is derived from the 
DAS internal master clock, and the clock period, specified 
by the field on the third line of the menu, is lijs. 
The Interrupt, Pause, and Inhibit signals are not 
used in the correlator test, and their inactive, default 
values are selected. 
- 202 - 
Mum mmTm: IOfI 	 IH1tP1: Cum 08 U 
U.0Q 	rt 	p$ IMM PJSE G H D8flBIT 4: U 
P40C P0D4B1 PIc8 
SEg LABEL ui EM ME IWWIOHS STROBES 
II CM ME M - Io 
1 8889 4880 8194 QIlO M0G 012 
2 DIR 8000 9888 0109 812 
3 8080 0000 8184 ET 30 812 
4 8088 8808 0104 REM 812 
5 0 8080 8800 OW ROUT 6 812 
6 8088 0808 8284 FOOT 38 012 
7 8008 8088 8285 MUT 6 812 
8 8088 8888 8204 MUT 6 812 
9 8088 8088 8184 CJ. DIR 812 
10 0808 8889 8204 REPEAT 38 812 
11 8880 8808 em RETM 812 
.12 SET 8880 8800 e134 812 
13 8888 8880 8184 ROUT 30 012 
14 8888 8080 818C MOT 38 012 
15 8080 8880 811C 812 
16 8888 8808 eiec MST 38 812 
17 8008 8808 8294 MUT 38 812 
18 8800 0008 8284 RM  812 
19 FHCR 0880 4888 8184 012 
28 0088 8880 8194 812 
21 8080 8080 8084 812 
22 0888 8888 0294 REPEAT 30 012 
23 8988 9898 8184 REM 012 
24 FOSR 8800 4888 8184 812 
25 8808 0088 8184 812 
26 8888 0088 1184 812 
27 8888 0808 8184 MST 38 812 
20 8880 8880 8104 RETIN 812 
28 IEM. 8888 4880 9184 012 
38 0888 0088 8884 812 
31 0080 8088 8184 RERFM 812 
32 P10 8008 34€4 1184 812 
M i t =a 
______ 
=0 M I 
34 8000 8880 1184T0 10 812 
35 PU 0088 34E4 1188 012 
36 U 0888 8000 8180 MOT 38 812 
37 8098 8008 1188 GOlD LI 812 
38 PUB 0888 8082 IIBC 012 
39 LIB 8880 0888 818C MEAT 38 812 
48 8888 0808 IIBC GOlD L10 812 
41 Pi.81 8808 0802 1100 812 
42 1.81 0888 8908 9100 MUT 38 012 
43 8888 8888 1100 COlD 1.01 812 
44 PC 8880 0088 8184 C1. 088 e12 
45 8088 0800 0184 C1. rICR 812 
46 0080 0088 8184 CaJ. SET 012 
47 8800 0880 8184 CALL FPCR 012 
48 8880 8088 9194 CaL. FOIR 812 
49 8888 0888 8184 CALL L€1. 812 
1 - RM am III E!__ 
Figure A3.3. Pattern Generator Program for Prototype IC 
Test. 
The program consists of a sequence of in-line pattern 
generator code which is divided into seven columns. These 
are, from left to right in Figure A3.3: 
- 203 - 
SEQ: 	program sequence number. 	Each 	number 
corresponds to one program line. Altogether there are 245 
sequences. 
LABEL: up to four characters for labelling a specific 
program line, and up to 32 labels may be assigned. Labels 
are for use with GOTO, CALL, and INTERRUPT CALL instruc-
tions. 
POD4DC: 16 data output lines; not used in this pro- 
gram 
POD4BA: 16 data output lines; 15 lines are used to 
generate the integrating counters' start value (see Sec-
tion A3.8). 
PODICB: 16 data output lines; 11 lines are used to 
generate the control signals and the input signals for the 
correlator chip (see also Section A3.8). 
INSTRUCTIONS: program instructions for code compres-
sion, which include: CALL, GOTO, RETURN, REPEAT, HOLD, 
COUNT, and HALT. The REPEAT, HOLD, and COUNT instructions 
require numerical parameters; no more than six unique 
numerical parameters may be shared among them. 
STROBES: the numbers in this columns refer to the 
strobes as defined in the Timing sub-menu, which are to be 
asserted during the current program sequence. In this 
case STROBE 0 represents p1 on chip; STROBE 1 represents 
c.p2 on chip. Note that when the pattern generator is 
started any strobes asserted on SEQ 0 are ignored. These 
strobes will only be asserted if SEQ 0 is accessed again 
by a loop or call. 
- 204 - 
APPENDIX 4 
AUTHOR'S PUBLICATIONS 
tlation: Polarity correlation is based on computa-
;crete function 
r(r) = 	(sgn [ye]  sgn [x5  -,]) (2) 
DIGITAL POLARITY CORRELATOR 
0 ON AN OVERLOADING COUNTER 
NIQUE 
Indexing terms: integrated circuits. VLSI 
A VLSI structure to implement a digital polarity correlator 
using an overloading integrating counter technique is report-
ed. The implementation permits direct cascading of individ-
ual correlator chips without using additional circuits, to give 
complete flexibility in choice of correlator delay and 
resolution. The design considered offers significant per-
formance advantages in high-speed correlation applications. 
uction: Correlation is based on computation of the cor-




her Yk  and xk are analogue or digital sampled data' 
qu nces. Implementation of a high-speed correlator requires, 
ere ore, an array of multipliers, delay elements and accumu-
itor , either analogue or digital. Polarity correlation methods 
iinitnise the complexity of the computational elements by 
iscarding the magnitude information of the input sequences. 
)igi al design techniques can then be employed to realise the 
iulti pliers by EXNOR gates, the delay elements by a digital 
hiltregister and the accumulators by simple counting circuits. 
his results in a more economical and more compact imple-
entation than would otherwise be achieved, the penalty for 
rhich is an increase in integration time to obtain a correlation 
inc ion with acceptable variance.' The polarity correlation 
inc ion is nonlinearly related to the (direct) correlation func-
on, eqn. I, by the Van Vleck arc sine relation' for input 
qu nces which have Gaussian statistics. 
Pr viously reported techniques for obtaining the polarity 
orrlation function have included parallel counters,' -" which 
re tot directly cascadable and hence nonoptimal for VLSI 
nplmentation. This letter describes an interpretation of the 
olaiity correlation function which permits the elimination of 
aralel counters and results in a highly regular correlator 
trucure amenable to VLSI implementation. The structure 
[so permits direct cascading of correlator stages. Details of a 
8-stage prototype correlator chip based on this approach are 
omplete positive correlation (rpy-= 1) occurs when the pol-
rities of the input samples (assuming the mean of both inputs 
be zero) are at all times equal, yielding an average product 
I + I. Complete negative correlation = — 1) occurs when 
te polarities of the input samples are never equal (inverse 
roportionality), yielding an average product of — 1. In the 
ise where the input samples are not related (rpy,= 0) the sum 
the positive products will equal the sum of the negative 
roducts, and the average product will be zero. 
Implementation of polarity correlation requires an analogue 
)mparator circuit to convert sgn [x] = x/IxI and 
n [y] = I y  I into logic 11ff the signal is positive and logic 
if the signal is negative. The time delay r between the two 
ndIs is achieved by using a digital shift register where a 
trtiu1ar value of delay is defined by the product of the 
imber of preceding shift register stages and the sample clock 
ri d P. Multiplication is performed by the Boolean coin-
Ic cc function EXNOR, whose output is I only if the inputs 
e both equal. If time-successive values of the coincidence 
nc ion F(t)  are summed in a digital counting circuit for a 
nt d T seconds, where T = NP, then the contents of the 
un ten at the end of the period will be proportional to the 
cv nt value of the correlation function. The EXNOR func- 
tion can only be regarded as performing multiplication if the 
logic 0 is allowed to represent — I. Thus, a logic I in the 
coincidence signal would indicate 'increment by one' the con-
tents of the counter, and a logic 0 would indicate 'decrement 
by one' the contents of the counter. This would necessitate the 
use of up-down counters which are undesirable from a VLSI 
circuit design point of view. However, it is possible to use 
simple up-counters whose contents q(r) can be related to the 
correlation function in the following way. First, the contents 
of an integrating counter are given by 
q(t) = 
	
Fk(t) 	 (3) 
k O  
where Fk(r) is the coincidence function bit stream defined by 
Fk(t) = 4 + 4 sgn [y.]sgn [Xk,] = I or 0 	 (4) 
Thus, by substituting into eqn. 3, 
N N (5) 
Hence, 
r(r) = 2 
q(r)
-- — 1 
	 (6) 
where r(t) is the polarity correlation function as given by 
eqn. 2. Thus eqn. 6 gives a measure of the correlation function 
using the integration counter contents q(r) after sampling N 
times. At maximum positive correlation = + 1) a 
maximum count q(r) = N is obtained after sampling N times. 
In the case of maximum negative correlation = — 1), 
where the input samples are never equal, the coincidence 
signal is always zero, resulting in a zero count, q(r) = 0. In the 
case of zero correlation (rpy, =  0), a count of q(r) = N12 is 
reached after sampling N times. 
Overloading counter technique: An alternative approach to 
polarity correlation is based on an integrating overloading 
counter technique,' which eliminates the requirement for a 
value of q(r) to be at all times available. Instead, the correla-
tion function is computed using the number of samples 
required to achieve count conditions, q(t) = N, in a given inte-
grating counter. The concept of the technique is illustrated by 
Fig. 1, which shows the relationship between the contents of 
a. 
' 	overload occurs at q:N 
rpyx 
N 
N 4 tN 	2N 
contents of sample counter m 
Fig. I Relationship between an integrating counter overload and the 
contents of the sample counter 
an integrating counter q(r) and the number of samples, which 
is now a variable m. The number of samples m can be related 
to the polarity correlation function by writing q(r) as 
q(z) = N 
= 	
+ 4sgn [Yk]  sgn [x,] k _) 
k=O  
M in 
= .2 + - r,(r) 	 (7) 
where 
(sgn Lik]  sgn [x4 _,]) 	 (8) 
mkO 
ence. in this case, 
form ~!N 	 (9) 
in 
here N is the capacity of the integrating counter and in is the 
imber of samples required to achieve overload conditions in 
e integrating counter corresponding to time delay r. An 
erload occurs after in = N samples when correlation is 
aximum and positive. In the case of zero correlation an 
(erload occurs after in = 2N samples and after an infinite 
imber of samples when the correlation is maximum and 
gative. Note that an overload cannot occur until in 2t N. 
A polarity correlator using the overloading counter tech-
que thus comprises a delaying shift register connected to a 
trallel array of coincidence detectors and integrating 
unters. An overload pattern shift register is used to inspect 
e overload condition of the counters. The evolving pattern 
overload states defines the correlation function shape and 
e time-delay position of the first integrating counter to over-
ad defines the position of the most significant peak of the 
nction. A sample counter is included to count the number of 
put samples in so that the value of the correlation function 
ay be computed easily for any integrating counter to over-
ad. lithe maximum capacity of the sample counter is set to 
twice the capacity of the integrating counters the signifi-
nce range is limited to I ~: r ~: 0. If it is required to cover 
e range I ~ r > - I, two correlator circuits working in 
Lrallel can be used with one covering the positive range and 
e other covering the negative range. 
Such a system is most suitably realised using integrated-
cuit technology and an early device implemented 12 stages 
correlation using p-MOS technology. Fig. 2 shows the 
Key 
OSR overload shift register 
D 	delay element 
Je 0 	 C coincidence function 
L 	one bit latch 
cc t 
Y 	0 0 	r 	
samplesm 
I sample counter 
~over cc Ing integrating 
1111111 
F9 9-3—f2l 	
preset counter capacity 
g. 2 Layout diagram of polarity correlator using the overloading mt e-
ating counter technique 
block diagram of a polarity correlator with additional control 
circuitry to realise a technique for displaying the correlation 
function and to provide built-in self test and self repair. The 
built-in self test and self repair mechanism automatically 
detects and eliminates failed channels in the VLSI circuit. The 
failed channels are short-circuited to maintain a series connec-
tion of correctly operating channels. Design parameters of a 
prototype chip, containing 28 parallel stages of correlation 
and fabricated on a 5 pm n-channel MOS process are 4 MHz 
sample rate with integration time programmable to a 
maximum of 2 15  samples. The architecture shown allows 
direct cascading of chips, without using additional com-
ponents, to give correlation delays of arbitrary length. Sample 
rates up to 40 MHz with up to 512 parallel stages of correla-
tion per chip can be expected from available VLSI fabrication 
processes. 
Conclusions: A VLSI structure has been described which offers 
an attractive digital implementation of a high-speed polarity 
correlator. Individual chips may be directly cascaded to realise 
a correlator with arbitrary resolution or delay, in contrast to 
other digital correlator circuits, particularly those using the 
parallel counter technique, which cannot be easily cascaded 
and do not render regular VLSI structures. Furthermore the 
operating speed of parallel counter based correlators is limited 
by carry signal propagation delays whereas the correlator 
described here, which is composed mainly of simple shift regis-
ter stages can approach the maximum clocking rate of a 
chosen VLSI fabrication process. The architecture described, 
through built-in self test and self repair techniques, offers 
enhanced production yield and in-service reliability. 
Acknowledgments: This work was carried out under a UK 
Science & Engineering Research Council grant. 
W. S. BLACKLEY 	 20th July 1983 
M. A. JACK 
J. R. JORDAN 
Department of Electrical Engineering 
University of Edinburgh, King's Buildings 
Mayfield Road, Edinburgh EH9 3JL, Scotland 
References 
I HAYES, A. M., and MUSCRAVE, G.: 'Correlator design for flow mea-
surement', Radio & Electron. Eng., 1973, 43, pp. 363-368 
2 VAN VLECK, J. H., and MIDDLETON. .: 'The spectrum of clipped 
noise', Proc. IEEE, 1966, 54, pp.  2-19 
3 SWARTZLANDER, E. E.: 'Parallel counters', IEEE Trans., 1973, C-22, 
pp. 102 1-1024 
4 DADDA, .: 'Composite parallel counters', ibid., 1980, C-29, pp. 
942-946 
5 JORDAN, J. it., and BECK, M. S.: 'Correlation function display and 
peak detection', Electron. Lett., 1972, 24, pp.  602-604 
AA 	A D (OMMMD 
ADVISORY GROUP FOR AEROSPACE RESEARCH & DEVELOPMENT 
7 RUE ANCELLE 92200 NEUILLY SUR SEINE FRANCE 
d 
oo:%)%N 
NORTH ATLANTIC TREATY ORGANIZATION 
- zoqy4r - 
12-I 
BUILT-IN TEST AND SELF REPAIR MECHANISMS 
IN A DIGITAL CORRELATOR INTEGRATED CIRCUIT 
by 
W.S. Blackley, M.A. Jack, J.R. Jordan 
Department of Electrical Engineering 







A VLSI digital correlator architecture which incorporates built-in self test 
and self repair mechanisms is described. The architecture offers' testability and 
reliability, and the overhead for the test and repair circuitry is only one latch 
and two multiplexers per correlator stage. The correlator has been fabricated on a 
5-micron nMOS process and results from the first batch of processed chips are 
reported. 
INTRODUCTION 
The advantages in terms of increased complexity, improved performance, reduced 
costs and new systems applications made available as silicon integrated circuit 
technology matures from the level of large scale integration (LSI) to very large 
scale integration (VLSI) have been widely recognised. However, one important - facet 
of integrated circuit technology which lags dangerously behind the complexity - poten-
tial of VLSI, is the problem of establishing the integrity of the VLSI design in 
terms of initial design validation, manufacturing quality and longer term opera-
tional reliability [1,2]. 
This paper addresses the need to embody a testability scheme within the VLSI 
integrated circuit itself and presents details of a digital polarity correlator 
architecture with built-in self test (and self-repair) mechanisms. The concept is 
demonstrated using results obtained from a prototype integrated circuit chip which 
has beed fabricated in 5-micron enhancement/ depletion n-channel MOS technology. 
Correlation techniques are widely used in communications, instrumentation, com-
puters, telemetry, sonar, radar, medical and other signal processing systems 
[3,4,5]. The desirable properties of correlation include the ability to detect a 
desired signal in the presence of noise or other signals; the ability to recognise 
specific patterns, and the ability to measure time delays through various media. 
Electronic systems for computation of the correlation function have been avail-
able for many years, but they have been large and inefficient. With the development 
,f VLSI, correlation can be performed efficiently now, with a minimal number of com-
ponents. 
The correlator chip presented here, consists of a linear cascade of identical 
:orrelation elements. The performance of the correlator depends on the serial con-
iection of correctly functioning correlation elements. To optimise the performance 
md gain full advantage of the VLSI architecture a design strategy was adopted which 
Includes testability, yield enhancement, and reliability improvement. 
LEST STRATEGY REQUIREMENTS IN VLSI DESIGN 
A VLSI test strategy must ideally allow for a range of differing test environ-
rents to be experienced by the circuit during its operational service. These 
nvironments can be summarised as: 
a) prototype characterisation; to include design validation and parametric test-
lag. 
- -T-, ,  - - 
production test; to include yield enhancement features. 
service or maintenance test; to include self-repair features. 
In prototype characterisation it is essential to identify and localise indivi-
ial faults to enable fault diagnosis and correction. Prototype faults may be 
ocess-related faults statistically distributed over a processed wafer, or they may 
design faults (errors) such as forgotten contact holes, wrong interconnections or 
cessive signal delays. Prototype testing is invariably carried out by the 
signer(s) using automatic test equipment (ATE), microprobing or electron beam 
tcilities. 
Production test requirements include both process quality checks and functional 
tecks. Process quality control is achieved either by a number of chip-size - drop-
L' replacements spaced over the wafer or by using a small test area on each chip. 
asures of transistor parameters, contact resistance and capacitance values are 
Lde to check production tolerances. In production test, functional (and 
Lrametric) tests must be minimised since here testing time and costs are important. 
Lnctional tests (teed only yield a limited number of the significant internal states 
.nce it is not generally possible to redesign or repair at this stage. 
In maintenance and systems test, fault diagnosis is precluded so a simple 
p/NO-GO indication for the circuit is adequate. 
The correlator architecture considered here incorporates design for test which 
fers the potential of valid use at each stage in the life of a VLSI circuit. To 
preciate the ease with which this architecture has been adapted to perform self 
st and self repair, the concept of polarity correlation and its silicon realisa-
on must be discussed. 
LAR1TY CORRELATION 
Polarity correlation is based on the computation of the discrete function, 
N 





Lere r(t) is the value of the correlation function between two signals, x and y. 
inplete positive correlation occurs when the polarities of the input samples 
ssuming the mean of both inputs to be zero) are at all times equal, yielding an 
'erage product of +1. Complete negative correlation occurs when the polarities of 
e input samples are never equal (inverse proportionality), yielding an average 
oduct of -1. In the case where the input samples are not related, the sum of the 
sitive products will equal the sum of the negative products and the average pro-
ct will be zero. 
Implementation of polarity correlation requires an analogue comparator circuit 
convert sgn[x]x/JxI and sgn[yJy/IyI into logic I if the signal is positive and 
gic 0 if the signal is negative. The time delay-t between the two signals is 
hieved by using a digital shift register where a particular value of delay is 
fined by the product of the number of preceding shift register stages and the sam-
e clock period, P. Multiplication is performed by the Boolean coincidence func-
on, EXNOR, whose output is I only if the inputs are both equal. If time-
ccessive values of the coincidence function are summed in a digital counting cir-
it for a period T seconds, where T = NP, then the contents of the counter at the 
d of the period will be proportional to the relevant value of the correlation 
nction. 
Polarity correlation methods minimise the complexity of the computational ele-
nts by discarding the magnitude information of the input sequences. Digital 
sign techniques can then be employed to realise a more economical and more compact 
plementation than would otherwise be achieved, the penalty for which is an 
crease in integration time to obtain a correlation function with acceptable van-
ce (6]. The polarity correlation function is nonlinearly related to the (direct) 
A& "7/ Q 
12-3 
:orrelation function by the Van Vleck arc sine relation [7] for input sequences 
hich have Gaussian statistics. 
Previously reported techniques [8,9] for obtaining the polarity correlation 
function have included parallel counters [10,11] which are not directly cascadable 
md hence non-optimal for VLSI implementation. This paper describes an interpreta-
:ion of the polarity correlation function which permits the elimination of parallel 
ounters and results in a highly regular correlator structure amenable to VLSI 
Lmplementation. As a consequence direct cascading of correlator stages to any arbi-
:rary level is possible. 
The structure is based on an integrating overloading counter technique [12,13], 
Ln which the correlation function is computed using the number of Input samples 
:aken to achieve overload count conditions, in a given integrating counter. An 
)verload flag bit for each counter is used instead of the counter contents. This 
reduces the complexity of the structure to bit-serial input and output. The number 
f input samples, in, can be related to the polarity correlation function by [14] 
r 	(-t) = 2 - 1 	for in > N 	 (2) pyx 	in 	 - 
there N is the capacity of the integrating counter and in is the number of samples 
equired to achieve overload conditions in the integrating counter corresponding to 
:ime delay t. An overload occurs after in = 'N samples when correlation is maximum 
md positive. In the case of zero correlation an overload occurs after in = 2N sam-
les and after an infinite number of samples when the corr , elation is maximum and 
megative. Note that an overload cannot occur until m > N. 
A polarity correlator using the overloading counter technique is shown in Fig-
ire 1. It comprises a delaying shift register connected to a parallel array of 
oincidence detectors and integrating counters. An overload pattern shift register 
.s used to inspect the overload condition of the counters. The evolving pattern of 
verload states defines the correlation function shape and the time delay position 
f the first integrating counter to overload defines the position of the most signi-
icant peak of the function. A sample counter is included to count the number of 
.nput samples, in, so that the value of the correlation function may be computed for 
fly integrating counter to overload. If the maximum capacity of the sample counter 
.s set • to be twice the capacity of the integrating counters the significance range 
.s limited to I > r > 0. If it Is required to cover the range 1 > r > -1, two 
orrelator circuits working in parallel can be used with one covering the positive 
ange and the other covering the negative range. 
ORRELATOR ARCHITECTURE FOR SELF TEST AND SELF REPAIR 
The VLSI architecture considered here consists of a long series connection of 
dentical correlation stages. If. any one of these stages suffers faults during 
anufacture or becomes faulty during service then complete chip failure will be 
xperienced. A self-test and self-repair structure has been devised to overcome 
his problem. The self-test sequence is initiated each time the chip isswitched-on 
nd any faulty stages discoered as a result of these tests will be automatically 
ypassed so that the working stages are reconfigured to form a continuous serial 
onnection. Faults developing during the working life of the chip will thus be 
utomatically eliminated every time the chip is switched on. The self-test control 
ircuit must offer high reliability and therefore employs redundant circuit tech-
iques, however assuming fault conditions to be evenly distributed over the chip 
rea it can be expected that the majority of faults will be experienced in the large 
rea taken by the integrating counters. Using these self-test and repair stra-
egies, an overall manufacturing yield of good working chips is enhanced and longer 
orking life can be expected. 
The principal additions to the basic correlator stage of Figure 2(a) to allow 
t to perform built-in self-test and self-repair are shown in Figure 2(b). The 
elay shift register (DSR) and the overload pattern shift register (OSR) each have a 
to 1 multiplexer added and a multiplexer control register (MCR) has been included 
store the control information for these multiplexers. Full functional testing is 
- 
124 
C) -... KEY: 
0 0 OSR 	overload shift register 
- 0 
o - D delay element - 
0 
a a C 	coincidence function 







co x y 0 o a Samples, m 

















L1_.{_J  1 -f__Ff --1 I TI 	 IT IT TI 
Preset Counter Capacity 
Igure 1. Layout diagram of digital polarity correlator using the overloading 











iCRI 	 IoSR 




,ossible due to the extent of the link between design and test. A large degree of 
ircu1t partitioning is incorporated in the design and this, coupled with the DSR, 
)SR and MCR shift registers acting as scan paths [15] allows all the internal states 
to be controlled and observed. 
The key feature in the self repair mechanism is the Multiplexer Control Regis-
ter (MCR) which, after the self-test sequence, contains the pass/fail status for 
ach stage. A circuit schematic of the MCR and one multiplexer is shown in Figure 
3. In the case of a failure the Input and output registers of the correlator stage 
are bypassed using the multiplexers, thereby short circuiting the malfunctioning 
stage. The number of functioning stages on the chip can be read Out serially from 
the MCR by reconfiguring it as a shift register. This parameter represents the max-
imum attainable correlation delay and can be used for chip reject/accept decisions 
in production test. The self-test and repair sequence may be repeated as required 










Ui A'P 2 
91 
MCR 	Bypass 	 OSR 	 - 
Figure 3. 	Zoom in on floor plan: Bypass loop and multiplexer control register. 
EST STRATEGY 
The correlator operates in three distinct modes: initial test, self-test and 
epair, and run. During the Initial test period three simple tests are carried out 
it the most basic elements of the design, namely the scan path registers. These 
egisters (DSR, MCR and OSR) and their various control functions are tested to check 
hat a chip is acceptable immediately after fabrication. The initial test sequence 
s as follows: 
Test DSR, OSR and MCR as shift registers and measure their delay. 
Test the effect of the MCR on the DSR and OSR registers. 	This is done by 
shifting it ones Into the MCR and then measuring the delay of the DSR and OSR 
registers, which should each be reduced by n. 
Test the parallel load facilities of the DSR, OSR and MCR registers. 
—.-L I -. 
2-6 
The self-test period is where the chip effectively tests itself and reconfig-
Lres its registers so that all of the working stages are connected in series. In 
:his test the following sequence is repeated four times according to the possible 
ombinationsof the two binary input signals, X and Y. 
Reset Latches and Integrating Counters. The counters are loed with 4000-hex, 
a number corresponding to the maximum integration time of 2 	-2 = 32766 sam- 
ple clock cycles. 
Set up the input conditions (X and Y) by setting or clearing the DSR register 
as required. Shift X and Y through correlator for 32766 clock cycles. 
Parallel load Latches into OSR. The overload pattern - may be shifted out for 
observation. 
The self repair sequence follows the self test sequence. During the self test 
equence the overload signal is compared with the expected value of overload signal 
nd any deviations form the expected signal results in a logic 1. stored in the 
orresponding Latch. Thus, when the self test sequence has finished the logic i's 
nd 0's stored in the Latches are the results of the self test, where a logic 1 
ndicates a faulty stage. The self repair operation essentially transfers this 
nformation to the Multiplexer Control Register which in turn causes the faulty 
tages to be bypassed. The net effect is a series connection of correctly operating 
orrelation stages. 
The run period follows automatically after the self-test and repair sequence is 
ompleted. Note that after the test the contents of the .MCR may be inspected to 
risure that enough of the correlator stages are working to satisfy the requirements 
f the system into which the chip is to be installed. 
WTOTYPE DESIGN 
A prototype digital correlator featuring self-test and self-repair has been 
abricated on a 5 micron n-channel MOS process. The prototype design contains 28 
arallel stages of correlation, each of which implements the block diagram of Figure 
(b). The area of the chip is 5.08mm by 5.08mm. 
The layout of the two parallel stages of correlation is shown, annotated, in 
igure 4. Each stage is composed of cells which may be repeated by abutting in the 
-direction. The largest cell is the presettable PRBS counter which has 15 shift 
gister stages. and thus a maximum count of approximately 32K samples. The layout 
E the presettable counter is in the form of a ring in order to minimise the circuit 
lays between each shift register stage. The correlator design is semi-static 
roughout. This means that the clock frequency and thus the sampling frequency of 
e correlator can range from d.c. to 4MHZ. (for this fabrication process). From 
Lgure 4 it may be seen that the presettable counter occupies most of the active 
ea of the chip. Also shown is the area taken up by the self-test and repair cir-
zitry. The overhead for self-test, and self-repair is approximately 6%. 
Figure 4. 	nNOS layout of 2 stages of correlation. 
SRCH 
POOCH NAME I I 
91 E(k LI- 
2A 2 Y8IP  






- 204/10 - 
12-7 
TEST RESULTS 
The correlator chip has been functionally tested using a Tektronix DAS 9100 
igital Analysis System coupled to a Teledyne Probe Station. Initially ib packaged 
hips, which had. passed a visual inspection, were functionally tested. However many 
Lore samples were required to demonstrate the yield enhancement capability of this 
[esign so the remaining wafers were probe-tested. Unfortunately, only 130 candi-
ates were available for testing since the chip was fabricated as part of a multi-
roject wafer. More wafers are however, to be processed. 
Figure 5(a) and (b) show some of the input and output waveforms from two corre-
ator chips, that have occurred during the self test and repair period. For display 
urposes the integration time of the correlator has been reduced to just 15 clock 
ycles. Figure 5(a) shows the correlation output of a golden chip,. while Figure 
(b) shows the output of a chip which -has one failed stage. The top four traces in 
ach figure represent the inputs.to the device. In each figure the X and Y inputs 
equence through their four possible combinations in accordance with the test stra-
egy described above. For clarity, the control signals which cause, for example, 
arallel load OSR, or reset counters, have not been shown. 
CLJSOR 5Q:  42 
DELTA TIME: [Jj 
I 
TIMING DIAGRAM 	F1G: 
SRCH 
p 
POD Dl NAME 1 








282 X op t I 	 8 
2A 0 OSROPI - 
 
I I 	I 	 I 	I 	 8 
2A 1 PIcROPI I 8 
286 OULO LJLrLrLJ1r1rLJ 1 
(a). 
TIMING DIAGRAM 	MAG 
	






X OP I i ____J 	____ 	 __________ 8 
osRoPfI 	ii 	ii 	__________ 	 8 
NCR OPII 8 
OVLO LJmr rLrLrLJThr 1 
II 
(b). 
Igure 5. 	Test results from two correlator chips. A golden chip (a) and a chip 
Lth one faulty stage (b). 	- 
- 	204rI1I - 
2-8 
The significant points to note in Figure 5 are the Multiplexer Control Register 
input (MCR IP) and the Overload Shift Register output (OSR OF). All the other shown 
3ignals are the same for both chips. With reference to Figure 5 and moving left to 
right from the cursor, the overload output (OVRFLO) has changed from logic 1. to 0. 
this indicates that at least one of the integrating counters has overloaded after 
the prescribed period of 15 clock cycles (see above). This result is expected since 
the inputs have been equal (X=O, Y=O) over this period. 
When OVRFLO next goes high, the correlator has been reset and the next correla-
tion test (XO, Y=l) Is begun. Also at this time, the overload pattern, i.e. the 
ontents of the Latches are transferred to the OSR and shifted out for display. Now 
e can see the difference between the golden chip, Figure 5(a) and the faulty 
:hip, Figure 6(b). The OSR should contain a series of 28 logic Ps and In Figure 
(b) there is a logic 0 in position number 27, indicating a fault in stage 27. The 
orre1ation test is repeated for the remaining combinations of X and Y, and the 
ault is again exposed on the OSR output in the case where X = Y = 1. 
Self repair is then carried Out on the faulty chip. A single logic I is shifted 
Lnto bit position 27 of the MCR which causes stage 27 to be bypassed. The correla-
:ion test, with X=Y=1 is repeated several times at a period of 27. rather than 28 and 
:he incorrect logic 0 on the OSR output has been eliminated. The result is a gol-
len chip containing 27 stages of correlation. 
IELD ENHANCEMENT 
This section contains the results of the first 130 processed chips. 	The 
esults are preliminary and the sample is small. Figure 6 shows a chart of Number 
if Chips plotted against number of working stages. It shows that 29 of the 130 can-









0 4 	8 	12 	16 	20 	24 	28 
No. of Working stages 
Figure 6. No. chips vs. No. working stages. 
Listed below are the test results for each wafer. 	The multi-project wafers 
Lch contained 24 correlator chips. 







(i.e. >20 stages working) 
Packaged (10 candidates) 0 2 
Wafer #1 (24 candidates) 1 5 
Wafer #2 (24 candidates) 0 5 
Wafer #3 (24 candidates) 0 6 
Wafer #4 (24 candidates) 0 0 
Wafer #5 (24 candidates) 2 9 
==== = 
TOTALS (130 candidates) 3 27 
YIELD with no yield enhancement: 	2.3% 
YIELD with yield enhancement: 20.7% 
CONCLUSIONS 
A digital polarity correlator architecture which incorporates all of the 
required features of a built-in self-test and repair strategy has been described. 
The test strategy will carry the design through all of the varying test requirements 
to be encountered by the chip. Incorporating extra stages of correlation on-chip 
permits the use of self-repair mechanisms for enhanced production yield and in-
service reliability. 
The VLSI structure offers an attractive digital implementation of a high speed 
polarity correlator. Individual chips may be directly cascaded to realise a corre-
lator with arbitrary resolution or delay, in contrast to other digital correlator 
circuits, particularly those using the parallel counter technique, which cannot be 
easily cascaded and do not render regular VLSI structures. Furthermore, the operat-
ing speed of parallel counter based correlators is limited by carry signal propaga-
tion delays whereas the correlator described here, which is composed mainly of sim-
ple shift register stages can approach the maximum clocking rate of a chosen VLSI 
fabrication process. 
The results from the functional testing of the first batch of processed chips 
have been reported. They demonstrate that a considerable improvement in yield can 
be obtained at a very low circuit overhead. A yield enhancement factor of 9.0 has 
been obtained for the initial sample of 130 chips. In addition this chip can be 
given a exhaustive functional test in less than 150ms at 1MHz. 
References 
A 
T. W. Williams, "Design for Testability: What's the Motivation?," VLSI Design, 
pp. 21-23 (October 1983). 
E. B. Eichelberger and E. Lindbloom, "Trends in VLSI Testing," pp. 339-348 in 
VLSI 83, ed. F. Anceau and E. J. Ass, ELsevier Science Publishers B. V. (North 
Holland) (1983). 
J. S. Bendat and A. C. Piersol, Engineering Applications of Correlation and 
Spectral Analysis, John Wiley and Sons, Chichester (1980). A Wiley Irttersci-
ce Publication 
Y. W. Lee, T. P. Cheatham Jr., and J. B. Wiesner, "Application of Correlation 
Analysis to the Detection of Periodic Signals in Noise," Proc. IRE, Vol. 38, 
pp. 1165-1171 (October 1950). 
- 204/15 - 
!_10 
J. E. Tanner and C. Mead, "A Correlating Optical Motion Detector," Proc. Coef. 
on Advanced Res. in VLSI, MIT, Cambridge, MA, pp. 57-64 (January 1984). 
A. M. Hayes and G. Musgrave, "Correlator design for flow measurement," The 
Radio & Electronic Engineer, Vol. 43, pp. 363-368 (June 1973). 
J. H. Van Vieck and D. Middleton, "The Spectrum of Clipped Noise," Proc. IEEE, 
Vol. 54, pp. 2-19 (January 1966). 
Eldon, "Correlation - A Powerful Technique for Digital Signal Processing," 
Application Notes, TRW LSI Products, California, Vol. TP-17, pp. 1-22 (1981). 
W. Current, "A High Data-Rate Digital Output Correlator Design," IEEE Trans. 
Comput., Vol. C-29, pp. 403-405 (May 1980). 
D. E. E. Swartzlander Jr., "Parallel Counters," IEEE Trans. Comput., Vol. C-22, 
pp. 1021-1024 (November 1973). 
L. Dadda, "Composite Parallel Counters," IEEE Trans. Comput., Vol. C-29, 	pp- 
942-946 (October 1980). 
Z. J. R. Jordan and M. S. Beck, "Correlation Function Display and Peak Detection," 
Electron. Lett., Vol. 8, pp. 602-604 (November 1972). 
3. W. S. Blackley, M. A. Jack, and J. R. Jordan, "Digital Polarity Correlator," UK 
Patent Application Nos. 8306797 and 8300699 (11th March 1983). 
W. S. Blackley, M. A. Jack, and J. R. Jordan, "VLSI Digital Polarity.Correlator 
Based on an Overloading Counter Technique," Electron. Lett., Vol. 19, pp. 
761-762 (September 1983). 
T. W. Williams and K. P. Parker, "Design for Testability - A Survey," IEEE 
Trans. Comput., Vol. C-31, pp. 2-15 (January 1982). 
KN0WLEDGEMENTS 
This work was carried Out under a UK Science & Engineering Council Research 
uncil grant. 
4 
204./ 1+. - 
This chip's test and repair overhead is only one latch 
and two multiplexers per correlatoE stage. Yield on the 
first batch processed was enhanced nine to one. 
A Digital Polarity 
Correlator with Built-in 
Self Test and Self Repair 
William S. Blackley, Mervyn A. Jack, and James R. Jordan 
University of Edinburgh 
An earlier version of this article 
appeared in the International Test 
Conference Proceedings, October 1983. 
The maturing of silicon integrated circuit technology from large- 
scale to very large scale integration 
has improved performance, reduced 
costs, and opened new systems ap-
plications. However, one important 
facet of integrated circuit technology 
lags dangerously behind the complex-
ity potential of VLSI: establishing the 
integrity of the VLSI design in terms 
of initial design validation, manufac-
turing quality, and long-term opera-
tional reliability. 1,2 
This article addresses the need to 
embody a testability scheme within the 
VLSI integrated circuit itself. It also 
presents details of a digital polarity  
correlator architecture with built-in 
self-test and self-repair mechanisms. 
Results obtained from a prototype 
integrated circuit chip fabricated in 
five-micron enhancement/depletion 
N-channel MOS technology demon-
strate the concept. 
Correlation techniques are widely 
used in communications, instrumen-
tation, computers, telemetry, sonar, 
radar, medical, and other signal pro-
cessing systems. Desirable correla-
tion properties include the ability to 
detect a desired signal in the presence 
of noise or other signals, to recognize 
specific patterns, and to measure time 
delays through various media. 
Electronic systems for computa-
tion of the correlation function have 
been available for, many years, but 
they have been large and inefficient. 
With the development of VLSI, cor-
relation can be performed efficiently ,  
and with fewer components. 
Our correlator chip consists of a 
linear cascade of identical correlation 
elements. The performance of the 
correlator depends on the serial con-
nection of correctly functioning cor -
relation elements. To optimize per -
formance and gain full advantage of 
the VLSI architecture, we adopted a 
design strategy that includes testabili-
ty, enhances yield, and improves reli-
ability. 
Summary 
Correlation techniques are widely used in communications, in-
strumentation, computers, telemetry, and other signal processing 
systems to detect a desired signal in the presence of noise, to recognize 
patterns, and to measure time delays. With the development of VLSI, cor -
relation can be performed efficiently with a minimal number of com-
ponents. 
The correlator chip presented in this article consists of a linear 
cascade of identical elements; failure of any one element causes com-
plete chip failure. Therefore, we devised a self-test and self-repair struc-
ture to automatically bypass faulty stages. 
The overhead for self test and self repair was approximately six per-
cent of the chip area. The results of functional testing of the first batch of 
processed chips demonstrated a nine-to-one yield enhancement and an 
exhaustive functional test time of less than 150 milliseconds. The self-
repair mechanism provides high in-service reliability. 
0740-7475/84/0500-0042501.00 © 1984 IEEE 	 IEEE DESIGN & TEST 
- 204,/ 149 - 
st strategy requirements 
VLSI design 
Ideally, a VLSI test strategy allows 
circuit to experience a range of 
t environments during its opera-
nal service. in summary, these en-
onments are 
• prototype characterization, to in-
clude design validation and para-
metric testing; 
• production test, to include yield-
enhancement features; and 
• service or maintenance test, to 
include self-repair features. 
In prototype characterization, it is 
sential to identify and localize in-
vidual faults to enable fault diagno-
and correction. Prototype faults 
n be process-related and statistical-
distributed over a processed wafer, 
they can be design faults (errors), 
ch as omitted contact holes, wrong 
terconnections, or excessive signal 
lays. Prototype testing is invariably 
.rried out with automatic test equip-
ent, microprobing, or electron 
am facilities. 
Production test requirements in-
ude both process quality checks and 
nctional checks. Process quality 
ntrol is achieved by means of a 
imber of chip-size, drop-in replace-
ents spaced over the wafer or by 
dicating a small area on each chip 
testing. 
Transistor parameters, contact re-
;tance, and capacitance values are 
easured in order to check produc-
)fl tolerances. In production test, 
inctional (and parametric) tests must 
minimized, since testing time and 
sts are important. Functional tests 
ed only yield a limited number of 
:e significant internal states, since it 
not generally possible to redesign or 
pair at this stage. 
In maintenance and systems test, 
Lult diagnosis is precluded; a simple 
0/NO-GO indication for the cir-
ut is adequate. 
The correlator architecture con-
Jered here incorporates a design for 
st with the potential for valid use at 
ch stage in the life of a VLSI cir-
it. The ease with which this ar-
itecture has been adapted to per -
rm self test and self repair can only 
be discussed within the context of 
polarity correlation and its silicon 
realization. 
Polarity correlation 
Polarity correlation is based on the 




where r(r) is the Value of the correla-
tion function between two signals, x 
and y. Sgn[x] means signum[xJ, a 
function of the value + 1 for positive 
x and - 1 for negative x. Complete 
positive correlation occurs when the 
polarities of the input samples (as-
suming.the mean of both inputs to be 
zero) are at all times equal, yielding 
an average product of + I. Complete 
negative correlation occurs when the 
polarities of the input samples are 
never equal (inverse proportionality), 
yielding an average product of - I. 
In the case where the input samples 
are not related, the sum of the pos-
itive product will equal the sum of the 
negative products, and the average 
product will be zero. 
Implementation of polarity corre-
lation requires an analog comparator 
circuit to convert sgn[x] = x/kI and 
sgn[y] =y/IyI into logic I if the signal 
is positive and logic 0 if the signal is 
negative. The time delay 7 between the 
two signals is achieved by using a 
digital shift register in which the prod-
uct of the number of preceding shift 
register stages and the sample clock 
period P define a particular value of 
delay. 
Multiplication is performed by the 
Boolean coincidence function, EX-
NOR, whose output is 1 only if the in-
puts are equal. If time-successive 
values of the coincidence function are 
summed in a digital counting circuit 
for a period Tseconds, where T= NP, 
then the contents of the counter at the 
end of the period will be proportional 
to the relevant value of the correla-
tion function. 
Polarity correlation methods mini-
mize the complexity of the computa-
tional elements by discarding the  
magnitude information of the input 
sequences. Digital design techniques 
can then be employed to realize a 
more economical and compact im-
plementation than could otherwise be 
achieved. The penalty is the increased 
integration time needed to obtain a 
correlation function with acceptable 
variance. 6 The polarity correlation 
function is nonlinearly related to the 
(direct) correlation function by the 
Van Vleck arc sine relation 7 for input 
sequences with Gaussian statistics. 
Previously reported techniques 8 ' 9 
for obtaining the polarity correlation 
function have included parallel count-
ers, 10.11 which are not directly cascad-
able and hence nonoptimal for VLSI 
implementation. Our interpretation of 
the polarity correlation function per-
mits elimination of parallel counters 
and results in a highly regular cor - 
Our interpretation of the 
polarity correlation function 
permits elimination of parallel 
counters and results in a 
highly regular correlator 
structure amenable to VLSI 
implementation. 
relator structure amenable to VLSI 
implementation. As a consequence, 
correlator stages can be directly cas-
caded to any arbitrary level. 
The structure is based on an in-
tegrating overloading counter tech-
nique 12 ' 13 in which the correlation 
function is computed by using the 
number of input samples needed to 
reach overload count conditions in a 
given integrating counter. Under such 
conditions, an overload flag bit for 
each counter is used instead of the 
counter contents. This reduces the 
complexity of the structure to bit-
serial input and output. The number 
of input samples m can be related to 
the polarity correlation function by 
(r) = 2 - - I 	for m a: N (2) 
where N is the capacity of the in-
tegrating counter and m is the number 
43 
tay 1984 
— 204/16 — 
KEY: Correlator architecture 
OSR - OVERLOAD SHIFT REGISTER for self test and self repair 
0 - DELAY ELEMENT 
C - COINCIDENCE FUNCTION The VLSI architecture considered 
I 	ONE-BIT LATCH here consists of a long series connec- 
SAMPLES, m tion of identical correlation stages. If 
any one of these stages suffers faults t t t t 	i t t t t t tj±' 
during manufacture or becomes faulty 
SAMPLE COUNTER during service, the whole chip will 
t 	t 	1 	Itt 	f 	t 	4 	tttt' fail. 
OVERLOADING INTEGRATING I. 	'We have devised a self-test and 
COUNTER I self-repair structure to overcome this 
itt 	t 	ft 	ft 	t 	t 	ft 	t 1 t problem. The self-test sequence isin- 
OVERLOADING1NTEGRATING 
}' 
itiated each time the chip is switched 
COUNTER on; any faulty stages discovered as a 
t 	I 	t 	t 	t 	t 	t 1 	1 	t . 4 	t. 1 result of these tests are automatically 
r— 1 OVERLOADING INTEGRATING  
D uj.-.i COUNTER 	 J 	°YP'. 	i iiis 	recontigures me 
t I T7 t t. I t , 	 working stages into a continuous 
	
OVERLOADING INTEGRATING 	 serial connection. Faults developing 
L 	 COUNTER 	 during the working life of the chip are 
t 	t 
	
I t 	t t 	t t 	t . t 	t t 	- 	 thus automatically eliminated every 
OVERLOADING INTEGRATING time the chip is switched on. 
L 	 COUNTER 	- 	 Since the self-test control circuit 
11.11 	1 	f-if 	ttf't 	 i- i- 
OVERLOADING INTEGRATING 	
must oiier high reliability, it employs 
j:J.1 	COUNTER 	 .- 	
redundant circuit techniques. Assum 
. 	
ing fault conditions to be evenly 
OVERLOADING INTEGRATING 	





H o I. I_ 	C I H 0 I 	JJ..4 	COUNTER 	 majority or rauits are likely to occur 
t 	t It 	t 	t 	t 	t 	t ' 	in the large area occupied by the in- 
I 	_-+-6---. .. . I OVERLOADING INTEGRATING 	 Legrating counters. 
D 	- 	 C 	D 	 COUNTER 	 Figure 2a shows the basic cor- 
f 	t t 	t 	t 	f 	t 	t 	relator stage; 	Figure 2b shows the 
PRESET COUNTER CAPACITY 	 principal additions that allow it to 
perform built-in self test and self 
igure 1. Layout diagram of digital polarity correlator that uses the overloading in- 	repair. The delay shift 	register, or 
'grating counter technique. 	 flR 	and  flip 	prlc,l ,-.tt,-n 	et 
Ul 




CD < < 
-J -J 
cc LU 	LU 










f samples required to achieve over 
)ad conditions in the integratin 
ounter corresponding to time dela 
14  An overload occurs after m = P 
mples, when correlation is max 
num and positive. In the case of zerc 
orrelation, an overload occurs aftei 
= 2N samples and after an infinih 
umber of samples, when the correla 
on is maximum and negative. Ar 
verload cannot occur until m ~ N. 
Figure 1 shows a polarity correlatoi 
iat uses the overloading counter tech. 
que. It consists of a delaying shift  
gister connected to a parallel array 
I coincidence detectors and in-
grating counters. An overload pat-
rn shift register inspects the overload 
indition of the counters. The evolv- 
- ing pattern of overload states defines 
the correlation function shape, and 
y the time-delay position of the first in-
1 tegrating counter to overload defines 
- the position of the most significant 
) peak of the function. A sample count- 
er is included to count the number of 
input samples m, so that the value of 
- the correlation function can be com- 
puted for any overloaded integrating 
counter. If the maximum capacity of 
the sample counter is set to be twice 
- the capacity of the integrating count- 
ers, the significance range is limited to 
I ~r~O. If it is required to cover the 
• range I ~ r ~ - I, two correlator cir - 
cuits working in parallel can be used— 
one to cover the positive range, one to 
• cover the negative range. 
• 	 flJI4SI }JatL'I II 31111 
register, or OSR, each have a two-to-
one multiplexer. They also have a 
multiplexer control register, or MCR, 
for storing the control information 
for these multiplexers. 
Close linking of design and test 
makes full functional testing possi-
ble. The design incorporates a high 
degree of circuit partitioning. The 
partitioning—coupled with the DSR, 
OSR, and MCR shift registers, which 
act as scan paths ' 5 —allows all the in-
ternal stages to be controlled and 
observed. 
The multiplexer control register is 
the key feature in the self-repair 
mechanism. After the self-test se-
quence, the MCR contains the pass/ 
fail status for each stage. Figure 3 
shows a circuit schematic of the MCR 
and one multiplexer. 
IEEE DESIGN & TEST 
— 2o4/17 — 
DSR 
I 	 1PRESET 
LCH 
__IlI 	I 
I 	rv kin n - 







• 	 - 	
• •- 
Iigure 2. Basic correlator stage (a); correlator stage with built-in sell-test and self 
rpair mechanisms (b). 
DSR 












- 	 • 
V 
•'--: 	 • 	
V 
\ 	 -V 
• 	 •V 















gure 5. Chip plot of digital correlator featuring self-test and self-repair mechanisms. 
- 204/18 - 
igure 4. NMOS layout of two stages of correlation. 
In the case of a failure, the input 
nd output registers of the correlator 
tage are bypassed via the multiplex-
rs, so the malfunctioning stage is 
hort-circuited. The number of func-
oning stages on the chip can be read 
ut serially from the MCR by recon-
iguring it as a shift register. This 
arameter represents the maximum 
ttainable correlation delay and can  
be used for chip reject/accept deci-
sions in production test. The self-test 
and self-repair sequence can be re-
peated. as required during the service 
life of the chip. 
Test strategy. 
The correlator operates in three 
distinct modes: initial test, self test 
and repair, and run. 
Initial test. During the initial test 
period, three simple tests are carried 
out on the most basic elements of the 
design, the scan-path registers. These 
registers—DSR, MCR, and OSR-
and their various control functions 
are tested to determine whether a chip 
is acceptable immediately after fabri-
cation. The initial test sequence is as 
follows: 
Test DSR, OSR, and MCR as 
shift registers and measure their 
delay. 
Test the effect oF the MCR on 
the DSR and OSR. This is done by 
shifting n ones into the MCR and 
then measuring the delay of the DSR 
and OSR, which should each be re-
duced by n. 
Test the parallel load facilities 
of the DSR, OSR, and MCR. 
Self test and repair. The self-test 
period occurs when the chip effective-
ly tests itself and reconfigures its 
registers so that all working stages are 
connected in series. In this test, the 
following sequence is repeated four 
times, according to the possible com-
binations of the two binary input 
signals, x and y. 
Reset latches and integrating 
counters. The counters are loaded 
with 4000-hex, a number correspond-
ing to the maximum integration time 
of 2' - 2 = 32,766 sample clock 
cycles. 
Set up the input conditions (x 
and y) by setting or clearing the DSR 
as required. Shift x and y through 
correlator for 32,766 clock cycles. 
IEEE DESIGN & TEST 
pling frequency of the correlator can 
range from dc to 4 MHz (for this 
fabrication process). 
Figure 5 shows a plot of the com-
plete chip area. The presettable 
counter occupies most of the active 
area of the chip. Figure 5, of course, 
includes the self-test and repair cir-
cuitry; the overhead for self test and 
self repair is approximately six per -
cent. 
Test results 
The correlator chip has been func-
tionally tested with a Tektronix DAS 
9100 digital analysis system coupled 
to a Teledyne probe statiOn. Ten 
packaged chips that had passed a 
visual inspection were functionally 
tested. Because many more samples 
were required to demonstrate the 
yield enhancement capability of this 
design, the remaining wafers were 
probe-tested. Unfortunately, only 
130 candidates were available for 
testing, since the chip was fabricated 
as part of a multiproject wafer. More 
wafers are to be processed. 
Figure 6 shows some of the input 
and output waveforms from two cor-
relator chips. They occurred during 
the self-test and repair period. For 
display purposes, the integration time 
of the correlator has been reduced to 
just 15 clock cycles. Figure 6a shows 
the correlation output of a "golden 
chip," while Figure 6b shows the out-
put of a chip with one failed stage. 
The top four traces in each figure 
represent the inputs to the device. In 
each figure, the x and y inputs se-
quence through their four possible 
combinations in accordance with the 
test strategy described above. For 
clarity, we have omitted some control 
signals—those that cause, for example, 
parallel load OSR or reset counters. 
- 2ol-/19 
) Parallel load latches into OSR. 
overload pattern can be shifted 
for observation. 
he self-repair sequence follows 
self-test sequence. During the 
-test sequence, the overload signal 
ompared with the expected value 
he overload signal. Any deviation 
ii the expected signal result in a 
Ic 1, which is stored in the cor-
,onding latch. Thus, when the 
-test sequence has finished, the 
ic is and Os stored in the latches 
the results of the self test; a logic 1 
icates a faulty stage. 
he self-repair operation essential-
:ransfers this information to the 
R, which in turn causes the faulty 
, es to be bypassed. The net effect 
i series connection of correctly 
rating correlation stages. 
tun. The run period automatically 
Lows the self-test and repair se-
nce. After the test, the contents of 
MCR can be inspected to ensure 
t the number of working cor-
itor stages meets the requirements 
the system in which the chip is to 
installed. 
ototype design 
prototype digital correlator 
turing self test and self repair has 
ri fabricated on a five-micron 
channel MOS process. The pro-
ype design contains 28 parallel 
ges of correlation, each of which 
plements the block diagram in 
;ure2b. The area of the chip is 5.08 
n by 5.08 mm. 
Figure 4 shows, with annotations, 
layout of the two parallel stages 
correlation. Each stage consists of 
Is that can be repeated by abutting 
the y direction. The largest cell is 
presettable PRBS counter, which 
s 15 shift register stages and thus a 
Lximum count of approximately 
K samples. 
The presettable counter is layed out 
the form of a ring to minimize the 
cuit delays between each shift reg-
r stage. 
The correlator design is semistatic 
roughout. This means that the 
ck frequency and thus the sam- 
TIMING DIAGRAM 	MAC: CURSOR SEQ 	42 
DELTA TIME: 
SRCH ____________________ 	 I 
<-i 	C 
POD CH 	NAME 	V  
99 am If 
2A2 	YBIP 	II 	I 
285 	OSRIP I 8 
284 	MCRIPI 	I 8 
IlL 
282X OP Il-I 
8oSRoPr-1 	j 	____ ___ 	 8 
2A1MCROPL 	I 8 
286 OUL0 __I—L_r 1 
(a) 
TIMING DIAGRAM 	NAG: 10 (1jRSJ) 	SEQ: 	42 
DELTA TIME: IM 
SRCH 	 I ______________ 
(-1 
POD CH 	NAME 	1 	I 
2A2 	YBIP' 
285 	OSRIP 1 - 8 
284 	MCRIP A - 8 
2B2X OP t 
2A8 oSRoPr1J1___J11_____- 8 
2A1 	PICROPI 	1 8 
286 	OULO 
(b)  
Figure 6. Test results from two correlator chips: a "golden chip" (a) and a chip with 




24 28 	7. 
igure 7. Number of chips vs. number of working stages. 
Table I. 
Test results for each wafer. 
Without With 
Self Repair Self Repair 
(28 stages working) (>20 stages working) 
Packaged (10 candidates) 0 2 
Wafer 1(24 candidates) 1 5 
Wafer 2(24 candidates) 0 5 
Wafer 3 (24 candidates) 0 -. 6 
Wafer 4(24 candidates) 0 0 
Wafer 5 (24 candidates) 2 9 
TOTALS (130 candidates) 3 27 
YIELD with no yield enhancement: 2.3 percent 
YIELD with yield enhancement: 	20.7 percent 
- 204-120 - 
The significant points to note in 
Figure 6 are the multiplexer control 
register input—MCR IP—and the 
Dverload shift register output—OSR 
DP. All other shown signals are the 
;ame for both chips. 
Moving left to right from the cur-
;or in Figure 6, the overload out-
put—OVRFLO—has changed from 
ogic I to 0. This indicates that at least 
ne of the integrating counters has 
)verloaded after the prescribed pen-
)d of 15 clock cycles. This result is ex-
)ected, since the inputs have been 
qual (x=O, y =O) over this period. 
When OVRFLO next goes high, 
the correlator has been reset and the 
next correlation test (x=0, y= 1) 
begins. Also at this time, the overload 
pattern—that is, the contents of the 
latches—is transferred to the OSR 
and shifted out for display. 
Now we can see the difference be-
tween the golden chip, Figure 6a, and 
the faulty chip, Figure 6b. The OSR 
should contain a series of 28 logic Is; 
in Figure 6b, a logic 0 is in position 
number 27, indicating a fault in stage 
27. The correlation test is repeated 
for the remaining combinations of x  
and y: the fault is again exposed on the 
OSR output in the case wherex=y = I. 
Self repair is then carried Out on 
the faulty chip. A single logic 1 is 
shifted into bit position 27 of the 
MCR, causing stage 27 to be by-
passed. The correlation test, with 
x—y= 1, is repeated several times at a 
period of 27 rather than 28 to elimi-
nate the incorrect logic 0 on the OSR 
output. The result is a golden chip 
containing 27 stages of correlation. 
The yield enhancement results are 
preliminary, and the sample is small-
130 processed chips. Figure 7 charts 
the number of chips against number 
of working stages. It shows that 29 of 
the 130 candidates passed the initial 
test and that 27 of these yielded more 
than 20 stages of correlation. 
Table 1 lists test results for each 
wafer. The multiproject wafers each 
contained 24 correlator chips. 
The VLSI structure offers an at-tractive digital implementation 
of a high-speed polarity correlator. 
Individual chips can be directly cas-
caded to realize a correlator with 
arbitrary resolution or delay, in con-
trast to other digital correlator cir-
cuits—particularly those using the 
parallel counter technique—which 
cannot be easily cascaded and do not 
render regular VLSI structures. Fur -
thermore, the operating speed of 
parallel counter-based correlators is 
limited by carry signal propagation 
delays. The correlator described here, 
which is composed mainly of simple 
shift register stages, can approach the 
maximum clocking rate of a chosen 
VLSI fabrication process. 
Functional testing of the first batch 
of processed chips has demonstrated 
that yield can be improved con-
siderably at a very low cost in circuit 
overhead; the initial sample's yield 
enhancement factor was 9.0 for 130 
chips. In addition, any of these chips 
can be given an exhaustive functional 
test in less than 150 ms at 1 MHz. 
The time taken in linking design 
and test has proved to be time well 
spent. 
IEEE DESIGN & TEST 
- 204-f21 
;knowedgmefltS 
rhis work was carried out under a 
ited Kingdom Science and Engineering 
uncil Research Council grant. 
eferenceS 
T. W. Williams, "Design for Test-
ability: What's the Motivation?" 
VLSI Design, Vol. 4, No. 6, Oct. 
1983, pp. 21-23. 
E. B. Eichelberger and E. Lind-
bloom, "Trends in VLSI Testing," 
in VLSI 83, F. Anceau and E. J. Aas, 
eds., Elsevier Science Publishers B. 
V. .(North Holland), Amsterdam, 
1983, pp. 339-348. 
J. S. Bendat and A. G. Piersol, 
Engineering Applications of Correla-
tion and Spectral Analysis, John 
Wiley and Sons, Chichester, U.K., 
1980. 
Y. W. Lee, T. P. Cheatham, Jr., and 
J. B. Wiesner, "Application of Cor- 
relation Analysis to the Detection of 
Periodic Signals in Noise," Proc. 
IRE, Vol. 38, Oct. 1950, Pp. 1165-
1171. 
J. E. Tanner and C. Mead, "A Cor-
relating Optical Motion Detector," 
Proc. Conf. Advanced Research 
VLSI, MIT, Cambridge, Mass., Jan. 
1984, pp. 57-64. 
A. M. Hayes and G. Musgrave, 
"Correlator Design for Flow Mea-
surement," The Radio & Electronic 
Engineer, Vol. 43, No. 6, June 1973, 
pp. 363-368. 
J. H. Van Vleck and D. Middleton, 
"The Spectrum of Clipped Noise," 
Proc. IEEE, Vol. 54, No. 1, Jan. 
1966, pp. 2-19. 
J. Eldon, "Correlation—A Powerful 
Technique for Digital Signal Process-
ing," Application Notes, Vol. TP-17, 
TRW LSI Products, Los Angeles, 
Calif., 1981, pp. 1-22. 
K. W. Current, "A High Data-Rate 
Digital Output Correlator Design," 
IEEE Trans. Computers, Vol. C-29, 
No. 5, May 1980, pp. 403-405. 
E. E. Swartzlander, Jr., "Parallel 
Counters," IEEE Trans. Com-
puters, Vol. C-22, No. 11, Nov. 1973, 
pp. 1021-1024. 
L. Dadda, "Composite Parallel 
Counters," IEEE Trans. Com -
puters, Vol. C-29, No. 10, Oct. 1980, 
pp. 942-946. 
J. R. Jordan and M. S. Beck, "Cor-
relation Function Display and Peak 
Detection," Electronics Letters, Vol. 
8, No. 24, Nov. 1972, pp. 602-604. 
W. S. Blackley, M. A. Jack, and J. 
R. Jordan, "Digital Polarity Cor-
relator," U.K. Patent Application 
Nos. 8306797 and 8300699, Mar. 11, 
1983. 
W. S. Blackley, M. A. Jack, and J. 
R. Jordan, "VLSI Digital Polarity 
Correlator Based on an Overloading 
Counter Technique," Electronics 
Letters, Vol. 19, No. 19, Sept. 1983, 
pp. 761-762. 
T. W. Williams and K. P. Parker, 
"Design for Testability—A Survey," 
IEEE Trans. Computers, Vol. C-31, 
No. 1, Jan. 1982, pp. 2-15. 
.1 
William Blackley is a member of the In-
tegrated Systems Group in the Department 
of Electrical Engineering, University of 
Edinburgh. His current research interests 
include custom and semicustom integrated 
circuit design for digital signal processing 
and testability and yield enhancement. 
Blackley received his BSc in engineering 
science (electrical) from the University of 
Edinburgh in 1979. After working briefly 
for Racal Microwave and Electronic Sys-
tems, Ltd., he returned to the University 
of Edinburgh in 1980 as a research associ-




Mervyn A. Jack is a lecturer in the Depart-
ment of Electrical Engineering, University 
of Edinburgh, where he has taught since 
1979. From 1975 until 1979, he was a re-
search fellow of the university, studying 
the design and application of Fourier 
transform processors based on surface 
acoustic wave and charge-coupled de-
vices. From 1971 to 1975, he worked as a 
project engineer with Microwave and 
Electronic Systems, Ltd., Edinburgh. 
Jack received his BSc and MSc in elec-
tronic engineering from Heriot-Watt Uni-
versity, Edinburgh, in 1971 and 1975, re-
spectively. He received his PhD from the 
University of Edinburgh in 1978. Jack is a 
member of the lEE. 
P 
James R. Jordan joined the Department 
of Electrical Engineering, University of 
Edinburgh, in 1969 after industrial engi-
neering experience with EMI Electronics, 
Ltd., and teaching experience at Teeside 
Polytechnic. He is now a senior lecturer 
specializing in teaching system theory and 
electronic instrumentation to undergradu-
ate students and reliability and fault detec-
tion methods to postgraduates. His prin-
cipal research interest is the application of 
LSI circuits and microelectronic fabrica-
tion techniques to electronic instrumenta-
tion and transducers. 
Jordan received his MSc from the Uni-
versity of Surrey in 1967 and PhD from 
the University of Bradford in 1973. 
The authors' address is Department of Electrical Engineering, University of Edinburgh, King's Buildings, Mayfield Rd., Edinburgh, 
EH9 3JL Scotland. 
49 
- 205 - 
References- 
T. W. Williams, "Design for Testability: What's the 
Motivation?," VLSI Design, Vol. 4, pp. 21-23 (Oct., 
1983) 
W. S. Blackley, M. A. Jack, and J. P. Jordan, "VLSI 
Digital Polarity Correlator Based on an Overloading 
Counter Technique," Electronics Letters, Vol. 19, 
pp. 761-762 (Sept., 1983). 
J. S. Bendat and A. G. Piersol, Engineering Applica-
tions of Correlation and Spectral Analysis, John 
Wiley and Sons, Chichester (1980). A Wiley Intersci-
ence Publication 
P. P. Roth, "Effective Measurements Using Digital 
Signal Analysis," IEEE Spectrum, Vol. 8, pp. 62-70 
(Apr., 1971). 
D. A. Gandolfo, J. P. Tower, L. D. Elliott, E. J. 
Nossen, and L. W. Martinson, "CCD's for Spread Spec-
trum Applications," pp. 90-96 in Proc. International 
Specialist Seminar on Case Studies in Advanced Signal 
Processing., lEE (Sept., 1979). 
W. B. Allen and E. 	C. 	Westerfield, 	"Digital 
Compressed-Time Correlators and Matched Filters for 
Active Sonar," J. Acoustical Society of Arnericia, 
Vol. 36, 	pp. 121-139 (1964). 
S. Cacopardi, "Applicability of the Relay Correlator 
to Radar Signal Processing," Electronics Letters, 
Vol. 19, 	pp. 722-723 (Sept., 1983). 
J. P. Forrest and D. J. Price, "Digital Correlation 
for Noise Radar Systems," Electronics Letters, Vol. 
14, 	pp. 581-582 (Aug., 1978). 
D. J. Price, "Correlation Processing in Noise Radar," 
pp. 8/1 - 8/4 in Colloquium on Correlation Process-
ing, lEE Colloquium digest No. 1979/32, Savoy Place, 
London (May, 1979). 
F. J. Taylor, V. Shenoy, C. P. Olinger, and F. 
Wasserman, "Aneurysm Detection Using One-Bit Correla-
tion," Medical and Biological Engineering and Comput-
ing, Vol. 17, pp. 443-448 (July, 1979). 
-206- 
F. J. Looft, III and W. J. Heetderks, "Real Time 
Correlator for Detecting Single Units in Peripheral 
Nerve," IEEE Trans. Biomedical Engineering, Vol. 
BME-25, pp. 564-567 (Nov., 1978). 
S. E. Fu and J. S. Lee, "A Video System for Measuring 
the Blood Flow Velocity in Microvessels," IEEE Trans. 
Biomedical Engineering, Vol. BME-25, 	pp. 295-297 
(May, 1978). 
H. Ekre, "Polarity Coincidence Correlation Detection 
of a Weak Noise Source," IEEE Trans. Information 
Theory, Vol. IT-9, pp. 18-23 (Jan., 1963). 
Y. W. Lee, T. P. Cheatham Jr., and J. B. Wiesner, 
"Application of Correlation Analysis to the Detection 
of Periodic Signals in Noise," Proc. IRE, Vol. 38, 
pp. 1165-1171 (Oct., 1950). 
C. M. Rader, "An Improved Algorithm for High Speed 
Auto Correlation with Applications to Spectral Esti-
mation," IEEE Trans. Audio Electroacoustics, Vol. 
AU-18, pp. 439-441 (Dec., 1970). 
B. W. Finnie, "Digital Correlation Techniques for 
Identifying Dynamic Systems," Ph.D. Thesis, Univer-
sity of Edinburgh (May, 1965). 
D. Bassi, "Pseudorandom Digital Cross Correlator for 
Impulse Response Measurements," Review of Scientific 
Instruments, Vol. 51, pp. 795-798 (June, 1980). 
R. J. Polge and E. M. Mitchell, "Impulse Response 
Determination by Cross Correlation," IEEE Trans. 
Aerospace and Electronic Systems, Vol. AES-6, 	pp- 
91-97 (Jan., 1970). 
J. A. M. McDonnell and J. Forrester, "Polarity Coin-
cidence Techniques for Correlation Function Measure-
ment and System Response Evaluation," The Radio and 
Electronic Engineer, Vol. 40, 	pp. 165-172 (Oct., 
1970) 
L. P. Horwitz and C. L. Shelton, Jr., "Pattern Recog-
nition Using Autocorrelation," Proc. IRE, Vol. 49, 
pp. 175-185 (Jan., 1961). 
J. E. Tanner and C. A. Mead, "A Correlating Optical 
Motion Detector," pp. 57-64 in Proc. Conf. Advanced 
Research in VLSI, MIT, Cambridge, MA. (Jan., 1984). 
- 207 - 
D. I. Barnea and H. F. Silverman, "A Class of Algo-
rithms for Fast Digital Image Registration," IEEE 
Trans. Computers, Vol. C-21, 	pp. 179-186 (Feb., 
1972) 
H. Murakaini and B. V. K. Vijaya Kumar, "Correlation 
of Binarized Images," IEEE Trans. Aerospace and Elec-
tronic Systems, Vol. AES-19, 	pp. 322-328 (Mar., 
1983). 
J. S. Boland, L. J. Pinson, E. G. Peters, G. P. Kane, 
and W. W. Malcolm, "Design of a Correlator for Real 
Time Video Comparisons," IEEE Trans. Aerospace and 
Electronic Systems, Vol. AES-15, pp. 11-19 (Jan., 
1979) 
M. Azaria and D. Hertz, "Time Delay Estimation by 
Generalized Cross Correlation Methods," IEEE Trans. 
Acoustics, Speech, and Signal Processing, Vol. ASSP-
32, pp. 280-285 (Apr., 1984). 
J. P. lanniello, "Time Delay Estimation Via Cross-
Correlation in the Presence of Large Estimation 
Errors," IEEE Trans. Acoustics, Speech, and Signal 
Processing, Vol. ASSP-30, pp. 998-1003 (Dec., 1982). 
Special Issue, "Time Delay Estimation," IEEE Trans. 
Acoustics, Speech, and Signal Processing, Vol. ASSP-
29, (June, 1981). Edited by G. C. Carter 
J. R. Jordan and P. C. Kelly, "Integrated Circuit 
Correlator for Flow Measurement," Measurement and 
Control, Vol. 9, pp. 267-270 (July, 1976). 
T. S. Durrani and C. A. Greated, Laser Systems in 
Flow Measurement, Plenum Press, New York and London 
(1977). 
F. Boonstoppel, B. Veitmen, and F. Vergouwen, "The 
Measurement of Flow by Cross Correlation Techniques," 
pp. 110-124 in Proc. Conf. Industrial measurement 
techniques for on-line computers, lEE Conf. Publica-
tion No. 43, London (June, 1968). 
W. Matthes, W. Riebold, and E. De Cooman, "Measure-
ment of the Velocity of Gas Bubbles in Water by a 
Correlation Method," Review of Scientific Instru-
ments, Vol. 41, pp. 843-845 (June, 1970). 
M. Intaglietta and W. P. Tompkins, "System for Meas-
urement of Velocity of Microscopic Particles in 
Liquids," IEEE Trans. Biomedical Engineering, Vol. 
BME-18, pp. 376-377 (Sept., 1971). 
W. R. Tompkins, P. Monti, and N. Intaglietta, "Velo-
city Measurement by Self Tracking Correlator," Review 
of Scientific Instruments, Vol. 45, 	pp. 647-649 
(May, 1974). 
D. Jones, "An On-Board Digital Correlator for Space-
craft VLF Radio - Wave Studies," IEEE Trans. Geoscience 
Electronics, Vol. GE-12, pp. 9-18 (1974). 
P. Jones, "The Single-Clipped Digital Malvern Corre-
lator," pp. 7/1 - 7/4 in Colloquium on Correlation 
Processing, lEE Colloquium digest No. 1979/32, Savoy 
Place, London (May, 1979). 
M. Corti, A. De Agostini, and V. Degiorgio, "Fast 
Digital Correlator for Weak Optical Signals," Review 
of Scientific Instruments, Vol. 45, 	pp. 888-893 
(July, 1974) 
P. C. Egau, "Correlation Systems in Radio Astronomy 
and Related Fields," lEE Proc. Part F, Vol. 131, pp. 
32-39 (Feb., 1984). 
L. P. Allen and B. H. Frater, "Wideband Multiplier 
Correlator," lEE Proc., Vol. 117, 	pp. 1603-1608 
(Aug., 1970). 
P. J. Kindlmann and E. B. Hooper, Jr., "High Speed 
Correlator," Review of Scientific Instruments, Vol. 
39, pp. 864-872 (June, 1968). 
M. Fukao, "A Wide Band Correlator," Review of Scien-
tific Instruments, Vol. 42, 	pp. 783-788 (June, 
1971) 
J. C. Brenot, J. A. Fayeton, and J. C. Houver, "Fast 
Multichannel Time Correlator for Coincidence Experi-
ments in Atomic Physics," Review of Scientific 
Instruments, Vol. 51, pp. 1623-1629 (Dec., 1980). 
B. B. Lee and E. S. Furgason, "An Evaluation of 
Ultrasound NDE Correlation Flaw Detection Systems," 
IEEE Trans. Sonics and Ultrasonics, Vol. SU-29, 	pp- 
359-369 (Nov., 1982) 
C. M. Beck, P. N. Henry, B. T. Lowe, and A. 
Plaskowski, "Autocorrelation Function Parameters Used 
to Indicate Incipient Blockage in a Pneumatic Tran-
sport System," Electronics Letters, Vol. 18, pp. 
705-706 (Aug., 1982). 
- 209 - 
H. Meyr and G. Spies, "The Structure and Performance 
of Estimators for Real-Time Estimation of Randomly 
Varying Time Delay," IEEE Trans. Acoustics, Speech, 
and Signal Processing, Vol. ASSP-32, 	pp. 81-94 
(Feb., 1984). 
C. H. Knapp and C. C. Carter, "The Generalized Corre-
lation Method for Estimation of Time Delay," IEEE 
Trans. Acoustics, Speech, and Signal Processing, Vol. 
ASSP-24, pp. 320-327 (Aug., 1976). 
J. N. Bradley and R. L. Kirlin, "Delay Estimation by 
Expected Value," IEEE Trans. Acoustics, Speech, and 
Signal Processing, Vol. ASSP-32, 	pp. 19-27 (Feb., 
1984). 
J. C. Hassab and R. E. Boucher, "Optimum Estimation 
of Time Delay by a Generalized Correlator," IEEE 
Trans. Acoustics, Speech, and Signal Processing, Vol. 
ASSP-27, pp. 373-380 (Aug., 1979). 
H. Meyr, "Application of Digital Signal Processing in 
Measuring," pp. 431-438 in Signal Processing II: 
Theories and Applications, ed. H. W. Schussler, 
Elsevier Science Publishers B.V. (North Holland) 
(1983). 
H. Meyr, "Delay-Lock Tracking of Stochastic Signals," 
IEEE Trans. Communications, Vol. COM-24, pp. 331-339 
(Mar., 1976). 
A. W. Lohmann and B. Wirnitzer, "Triple Correla-
tions," Proc. IEEE, Vol. 72, pp. 889 - 901 (July, 
1984). Invited Paper 
P. W. Cheney, "A Digital Correlator Based on the 
Residue Number System," IRE Trans. Electronic Comput-
ers, Vol. 10, 	pp. 63-70 (Mar., 1961). 
F. H. Lange, Correlation Techniques, Iliffe Books 
Limited, London (1967). 
F. E. Brooks and H. W. Smith, "A Computer for Corre-
lation Functions," Review of Scientific Instruments, 
Vol. 23, 	pp. 121-126 (Mar., 1952). 
W. R. Bennett, "The Correlatograph. A Machine for 
Continuous Display of Short Term Correlation," Bell 
Systems Tech. J., Vol. 32, 	pp. 	1173-1185 (Sept., 
1953). 
- 210 - 
Y. W. Lee, Statistical Theory of Communication, John 
Wiley and Sons Inc., New York (1960). 
A. B. Carlson, Communication Systems, McGraw-Hill 
Kogakusha Ltd., Tokyo (1975). 
R. P. Keech, "The KPC Multichannel Correlation Signal 
Processor for Velocity Measurement," Trans. Inst. 
Measurement and Control, Vol. 4, pp. 43-52 (Jan. - 
Mar., 1982). 
J. B. Jordan and B. A. Manook, "Correlation-Function 
Peak Detector," lEE Proc., Vol. 128, Part E, pp. 
74-78 (Mar., 1981). 
J. Coulthard and R. P. Keech, 	"A 	Six-Channel 
Microprocessor Controlled Correlator," pp. 4/1 - 4/6 
in Colloquium on Correlation Processing, lEE Collo-
quium digest No. 1979/32, Savoy Place, London (May, 
1979). 
A. M. Hayes and G. Musgrave, "The Variance of Time 
Delay Estimates from Cross Correlation Functions," 
pp. 2/1 - 2/3 in Colloquium on Correlation Process-
ing, lEE Colloquium digest No. 1979/32, Savoy Place, 
London (May, 1979). 
S. M. Kay, "The Effect of Sampling Rate onAutocorre-
lation Estimation," IEEE Trans. Acoustics, Speech, 
and Signal Processing, Vol. ASSP-29, 	pp. 859-867 
(Aug., 1981). 
F. K. Bowers, D. A. Whyte, T. L. Landecker, and B. J. 
Klingler, "A Digital Correlation Spectrometer Employ-
ing Multiple-Level Quantization," Proc. IEEE, Vol. 
61, 	pp. 1339-1343 (Sept., 1973). 
D. A. Gandolfo, J. B. Tower, J. I. Pridgen, and S. C. 
Munroe, "Analog-Binary CCD Correlator: A VLSI Signal 
Processor," IEEE Trans. Electronic Devices, Vol. ED-
26, 	pp. 596-603 (Apr., 1979). 
D. Lagoyannis, "Stieltjes-Type Correlator Based on 
Delta-Sigma Modulation," lEE Proc., Vol. 128, Part G, 
pp. 9-14 (Feb., 1981). 
R. S. Miller and N. B. Berry, "A Merged Pipe Organ 
Binary-Analog Correlator," IEEE J. Solid-State Cir-
cuits, Vol. SC-17, pp. 20-27 (Feb., 1982). 
- 211 - 
A. Gersho, "Principles of Quantization," IEEE Trans. 
Circuits and Systems, Vol. CAS-25, pp. 427-36 (July, 
1978). 
K. Y. Chang and A. D. Moore, "Modified Digital Corre-
lator and is Estimation Errors," IEEE Trans. Informa -
tion Theory, Vol. IT-16, pp. 699-70.6 (Nov., 1970). 
J. J. Freeman, "The Action of Dither in a Polarity 
Coincidence Correlator," IEEE Trans. Communications, 
Vol. COM-22, pp. 857-862 (June, 1974). 
L. Cheded, P. A. Payne, and S. M. Jawad, "High Speed 
Digital Cross-correlator Design for Multifrequency 
Response Analysis," 	The Radio and Electronic 
Engineer, Vol. 53, pp. 229-234 (June, 1983). 
D. Lagoyannis, "Correlator Based on Delta-Sigma Modu-
lation," Electronics Letters, Vol. 12, pp. 253-254 
(May, 1976). 
S. Nakamura, "A Digital Correlator Using Delta Modu-
lation," IEEE Trans. Acoustics, Speech, and Signal 
Processing, Vol. ASSP-24, pp. 238-243 (June, 1976). 
W. N. Cheung, "Correlation Measurement by Delta Sigma 
Modulation," IEEE Trans. Indust. Electron. Contr. 
Instrum., Vol. IECI-26, pp. 88-92 (May, 1979). 
R. E. H. Bywater, W. Matley, and D. Brock, "Design of 
a flexible phase reversal modulation correlator," The 
Radio and Electronic Engineer, 'Vol. 46, pp. 129-135 
(Mar., 1976). 
L. F. Rocha, B. Cernuschi-Frias, and C. Orda, "Convo-
lution and Correlation Using Delta Modulators," Proc. 
IEEE, Vol. 68, pp. 1024-1026 (Aug., 1980). 
D. G. Watts, "A General Theory of Amplitude Quantiza-
tion with Applications to Correlation Determination," 
lEE Proc., Vol. 109, Part C, pp. 209-18 (1962). 
B. Widrow, "A Study of Rough Amplitude Quantization 
by Means of Nyquist Sampling Theory," IRE Trans. Cir-
cuit Theory, Vol. CT-3, pp. 266-276 (Dec., 1956). 
L. C. Andrews, "Analysis of a Cross Correlator with a 
Clipper in One Channel," IEEE Trans. Information 
Theory, Vol. IT-26, pp. 743-746 (Nov., 1980). 
- 212 - 
W. F. Sheppard, "On the Calculation of the Most Prob-
able Values of Frequency - Constants, for Data 
Arranged According to Equidistant Divisions of a 
Scale," Proc. London Mathematical Society, Vol. 29, 
p. 353 (1898). 
J. G. Ables, B. F. C. Cooper, A. J. Hunt, G. G. 
Moorey, and J. W. Brooks, "A 1024-Channel Digital 
Correlator," Review of Scientific Instruments, Vol. 
46, pp. 284-295 (Mar., 1975). 
G. C. Anderson and M. A. Perry, "A Calibrated Real 
Time 	Correlator/Averager/Probability 	Analyser," 
Hewlett Packard J., Vol. 21, pp. 9-15 (1969). 
P. E. Dewdney, "Product Transition 	Correlator," 
Review of Scientific Instruments, Vol. 51, 	pp- 
1548-1552 (Nov., 1980). 
J. H. Van Vic --k and D. Middleton, "The Spectrum of 
Clipped Noise," Proc. IEEE, Vol. 54, pp. 2-19 (Jan., 
1966) 
L. C. Andrews, "The Output PDF of a Polarity Coin-
cidence Correlation Detector," IEEE Trans. Aerospace 
and Electronic Systems, Vol. AES-10, 	pp. 712-714 
(Sept., 1974). 
H. Berndt, "Correlation Function Estimation by a 
Polarity Method Using Stochastic Reference Signals," 
IEEE Trans. Information Theory, Vol. IT-14, 	pp- 
796-801 (Nov., 1968). 
D. Landsberg and A. Cohen, "Fast Correlation Estima-
tion by Random Reference Correlator," IEEE Trans. 
Instrumentation and Measurement, Vol. IM-32, 	pp. 
438-442 (Sept., 1983). 
P. G. A. Jespers, M. G. Windal, and T. Watteyne, "An 
Integrated Binary Correlator Module," IEEE J. Solid-
State Circuits, Vol. SC-18, 	pp. 286-290 (June, 
1983) 
C. P. Cahn, "Performance of Digital Matched Filter 
• Correlator with Unknown Interference," IEEE Trans. 
Communication Technology, Vol. COM-19, pp. 1163-1172 
• 	(Dec., 1971). 
A. M. Hayes and G. Musgrave, "Correlator Design for 
Flow Measurement," The Radio and Electronic Engineer, 
Vol. 43, pp. 363-368 (June, 1973). 
- 213 - 
J. A. Eldon and J. D. Haight, "New CMOS Chip Facili-
tates Multibit Correlation," pp. 44.2 in Proc. Inter-
national Conf. Acoustics, Speech, and Signal Process-
lug (ICASSP), IEEE, San Diego, CA. (1984). 
J. A. Eldon, "Digital Correlators Suit Military 
Applications," Electronic Design News (EDN),- pp. 
148-160 (Aug., 1984). 
J. A. Eldon, "Correlation - A Powerful Technique for 
Digital Signal Processing," pp. 1-22 in Application 
Notes, TRW LSI Products, La Jolla, CA. (1981). 
K. W. Current, "A High Data-Rate Digital Output 
Correlator Design," IEEE Trans. Computers, Vol. C-29, 
pp. 403-405 (May, 1980). 
K. W. Current and D. A. Mow, "Digital Correlator 
Design with Four-Valued Threshold Logic," pp. 237-241 
in Digest of Papers, International Symp. Circuits and 
Systems (ISCAS), (1978). 
E. E. Swartzlander, Jr., "Parallel Counters," IEEE 
Trans. Computers, Vol. C-22, pp. 1021-1024 (Nov., 
1973) 
L. Dadda, "Composite Parallel Counters," IEEE Trans. 
Computers, Vol. C-29, pp. 942-946 (Oct., 1980). 
K. W. Current, "Pipelined Binary Parallel Counters 
Employing Latched Quaternary Logic Full Adders," IEEE 
Trans. Computers, Vol. C-29, 	pp. 400-403 (May, 
1980). 
K. W. Current and D. A. Mow, "Implementing Parallel 
Counters with Four-Valued Threshold Logic," IEEE 
Trans. Computers, Vol. C-28, 	pp. 200-204 (Mar., 
1979) 
J. P. Jordan and M. S. Beck, "Correlation Function 
Display and Peak Detection, " Electronics Letters, 
Vol. 8, 	pp. 602-604 (Nov., 1972). 
W. S. Blackley, M. A. Jack, and J. P. Jordan, "Digi-
tal Polarity Correlator," UK Patent Application Nos. 
8306797 and 8300699 (Mar., 1983). 
W. S. 	Blackley, 
Integrated Circuit 
Structures for Yield 
ity;" pp. 	12/1 	- 
Conference on Custom 
mars Ltd., in as 
"Digital 	Polarity Correlator 
Featuring Built-In Self Repair 
Enhancement and High Reliabil-
12/10 in Proc. 4th International 
and Semi-Custom ICs, Prodex Sem-
ociation with the lEE., London 
- 214 - 
(Nov., 1984). 
W. S. Blackley, M. A. Jack, and J. P. Jordan, "A 
Digital Polarity Correlator with Built-In Self Test 
and Self Repair," IEEE Design and Test of Computers, 
Vol. 1, pp. 42-49 (May, 1984). 
W. S. Blackley, M. A. Jack, and J. R. Jordan, 
"Built-In Test and Self Repair Mechanisms in a Digi-
tal Correlator Integrated Circuit," pp. 12/1 - 12/10 
in Proc. Conf. Design for Tactical Avionics Maintai-
nability, 	North Atlantic Treaty Organisation, 
Advisory Group for Aerospace Research and Development 
(NATO - AGARD), AGARD Conf. Preprint No. 	361, 
Brussels, Belgium (May, 1984). 
W. S. Blackley, M. A. Jack, and J. R. Jordan, "A 
Digital Polarity Correlator Featuring Built-In Self 
Test and Self Repair Mechanisms," pp. 289-294 in Dig-
est of papers, International Test Conf., IEEE, Phi-
ladelphia, PA. (Oct., 1983). 
M. A. Monahan, K. Bromley, and P. P. 	Docker, 
"Incoherent Optical Correlators," Proc. IEEE, Vol. 
65, 	pp. 121-129 (Jan., 1977). 
D. Casasent, "Coherent Optical Pattern Recognition," 
Proc. IEEE, Vol. 67, pp. 813-825 (May, 1979). 
T. M. Turpin, "Spectrum Analysis Using Optical Pro-
cessing," Proc. IEEE, Vol. 69, 	pp. 79-92 (Jan., 
1981). Invited Paper 
T. W. Cole, "New Class of One-Bit Digital Auto Corre-
lator," Electronics Letters, Vol. 16, 	pp. 86-88 
(Jan., 1980). 
W. T. Rhodes, "Acousto-Optical Signal Processing: 
Convolution and Correlation," Proc. IEEE, Vol. 69, 
pp. 65-79 (Jan., 1981). Invited Paper 
A. Korpel, "Acousto-Optics - A Review of Fundamen-
tals," Proc. IEEE, Vol. 69, pp. 48-53 (Jan., 1981). 
Invited Paper 
R. A. Decker, R. W. Ralston, and P. V. Wright, 
"Wide-Band Monolithic Acoustoelectric Memory Correla-
tors," IEEE Trans. Sonics and Ultrasonics, Vol. SU-
29, pp. 289-298 (Nov., 1982). 
- 215 
C. M. Elias, "An Ultrasonic Pseudorandom Signal-
Correlation System," IEEE Trans. Sonics and Ultrason-
ics, Vol. SU-27, pp. 1-7 (Jan., 1980). 
C. M. Verber, R. P. Kenan, and J. R. Busch, "Design 
and Performance of an Integrated Optical Digital 
Correlator," J. Lightwave Technology, Vol. LT-1, pp. 
256-261 (Mar., 1983). 
G. Comoretto, "A Microprocessor-Controlled Multichan-
nel Counter for a Digital Autocorrelator," J. Phys. 
E: Sd. Instrum., Vol. 16, pp. 836-839 (1983). 
L. Basano, P. Ottonello, and E. Sch.iavi, "Improve-
ments in the Design of Time-Delay Correlators," J. 
Phys. E: Sci. Instrum., Vol. 16, pp. 840-843 (1983). 
B. Wenk, "Aspects of Correlator Design for Industrial 
Applications," pp. 89-92 in Signal Processing II: 
Theories and Applications, ed. H. W. Schussler, 
Elsevier Science Publishers B.V. (North Holland) 
(1983). 
R. M. Henry, "An Improved Algorithm Al1owir, 	mt 
On-Line Polarity Correlation by Micrprocessor or Min-
icomputer," pp. 3/1 - 3/4 in Colloquium on Correla-
tion Processing, lEE Colloquium digest No. 1979/32, 
Savoy Place, London (May, 1979). 
R. Fell, 	"Microprocessor-Based 	Cross-Correlators 
Using the "Skip" Algorithm," pp. 25-32 in Proc. Conf. 
the Influence of Microelectronics on Measurements 
Instruments and Transducer Design, lEE Conf. Publica-
tion No. 55, Manchester, UK. (June, 1982). 
J. P. Jump and S. P. Ahuja, "Effective Pipelining of 
Digital Systems," IEEE Trans. Computers, Vol. C-27, 
pp. 855-865 (Sept., 1978). 
A. L. Fisher and H. T. Kung, "Synchronizing Large 
Systolic Arrays," pp. 44-52 in Proc. SPIE Vol. 341, 
Real Time Signal Processing V, The Society of Photo-
Optical Instrumentation Engineers, 	Arlington, VA. 
(May, 1982). 
S. Y. Kung and P. J. Gal-Ezer, "Synchronous versus 
Asychronous Computation in Very Large Scale 
Integrated (VLSI) Array Processors," pp. 53-65 in 
Proc. SPIE Vol. 341, Real Time Signal Processing V, 
The Society of 	Photo-Optical 	Instrumentation 
Engineers, Arlington, VA. (May, 1982). 
- 216 - 
C. A. Mead and L. A. Conway, Introduction to VLSI 
Systems, Addison-Wesley, Reading, MA. (1980). 
H. T. Kung, "Why Systolic Architectures?," IEEE Com-
puter, pp. 37-46 (Jan., 1982). 
E. G. Magill, D. M. Grieco, P. H. Dyck, and P. C. Y. 
Chen, "Charge-Coupled Device Pseudo-Noise Matched 
Filter Design," Proc. IEEE, Vol. 67, 	pp. 50-60 
(Jan., 1979). 
B. E. Burke, D. L. Smythe, D. J. Silversmith, W. H. 
McGonagle, R. W. Mountain, and B. J. Felton, "A 
10MHz. CCD Time-Integrating Correlator," pp. 256-257 
in Digest of Papers, International Solid State Cir-
cuits Conf., IEEE, New York, NY. (1983). 
P. B. Denyer, J. Mayor, and J. W. Arthur, "Miniature 
Programmable Transversal Filter Using CCD/MOS Tech-
rtology," Proc. IEEE, Vol. 67, 	pp. 42-50 (Jan., 
1979). 
J. Mayor, J. W. Arthur, and P. B. Denyer, "Analogue 
CCD Correlator Using Monolithic MOST Multipliers," 
Electronics Letters, Vol. 13, 	pp. 373-374 (June, 
1977) 
E. P. Herrmann and D. A. Gandolfo, "Programmable CCD 
Correlator," IEEE Trans. Electronic Devices, Vol. 
ED-26, 	pp. 117-122 (Feb., 1979). 
J. P. Jordan, "Integrated Circuit Relay Correlator 
for Measurement System Applications," Electronics 
Letters, Vol. 15, pp. 366-367 (June, 1979). 
W. D. Pritchard and J. N. Gooding, "Design and Appli-
cation of a Cascadable Binary Weighted Analogue 
Correlator," pp. 241-246 in Proc. 5th International 
Conf. Charge Coupled Devices (CCD'79), ed. J. Mayor, 
University of Edinburgh, Centre for Industrial Con-
sultancy and Liason, Edinburgh (1979). 
Y. A. Elaque and M. A. Copeland, "Design and Charac-
terization of a Real-Time Correlator," IEEE J. 
Solid-State Circuits, Vol. SC-12, pp. 642-649 (Dec., 
1977). 
J. L. Buie and D. P. Breuer, 	"A 	Large-Scale 
Integrated Correlator," IEEE J. Solid-State Circuits, 
Vol. SC-7, 	pp. 357-363 (Oct., 1972). 
- 217 - 
N. A. Saethermoen, B. Skeie, and S. Prytz, "Digital 
SOS/MOS Correlator: Basic System Component in Experi-
mental Army Spread Spectrum Radio," pp. 73-78 in 
Proc. Conf. the The Impact of High Speed and VLSI 
Technology on Communications Systems, lEE Conf. Pub-
lication No. 230, London (Dec., 1983). 
C. C. Foster and F. D. Stockton, "Counting Responders 
in an Associative Memory," IEEE Trans. Computers, 
Vol. C-20, 	pp. 1580-1583 (Dec., 1971). 
D. D. Gajski, "Parallel Compressors," IEEE Trans. 
Computers, Vol. C-29, pp. 393-398 (May, 1980). 
P. P. Cappello and K. Steiglitz, "A Fast Tally Struc-
ture and Applications to Signal Processing," pp. 
25A.4 in Proc. International Conf. Acoustics, Speech, 
and Signal Processing (ICASSP), IEEE, San Diego, CA. 
(1984). 
K. W. Current, "High Density Integrated Computing 
Circuitry with Multiple Valued Logic," IEEE J. 
Solid-State Circuits, Vol. SC-15, pp. 127-131 (Feb., 
1980) 
J. C. White, J. M. Keen, M. F. Hamer, D. V. 
McCaughan, and J. P. Hill, •"A Fast 32 Point Analogue 
Correlator," pp. 237-240 in Prpc. 5th International-
Conf. Charge Coupled Devices (CCD'79), ed. J. Mayor, 
University of Edinburgh, Centre for Industrial Con-
sultancy and Liason, Edinburgh (1979). 
P. A. 	Elaken, 	"An 	Electronically 	Programmable 
Transversal Input Filter," IEEE J. Solid-State Cir-
cuits, Vol. SC-17, pp. 34-39 (Feb., 1982). 
P. T. F. Williams, "Correlators Using M.0.S.T.'s in 
Sonar Applications," J. Sound and Vibration, Vol. 9, 
pp. 161-168 (1969). 
W. M. Gentleman and H. T. Kung, "Matrix Triangulari-
sation by Systolic Arrays," pp. 19-26 in Proc. SPIE 
Vol. 298, Real Time Signal Processing IV, The Society 
of Photo-Optical Instrumentation Engineers (1981). 
J. M. Jover and T. Kailath, "Design Framework for 
Systolic-Type Arrays," pp. 8.5 in Proc. International 
Conf. Acoustics, Speech, and Signal Processing 
(ICASSP), IEEE, San Diego, CA. (1984). 
- 218 - 
H. Barral and N. Moreau, "Circuits for Digital Signal 
Processing," pp. 44.9 in Proc. International Conf. 
Acoustics, Speech, and Signal Processing (ICASSP), 
IEEE, San Diego, CA. (1984). 
W. E. Snelling and J. E. Penn, "A Fully Pipelined, 
Bit-Sliced, VLSI Correlator," pp. 313-320 in Proc. 
Digital Signal Processing - 84, ed. V. Cappellini and 
A. G. Constantinides, Elsevier Science Publishers 
B.V. (North Holland) (1984). 
S. K. Kawahara, R. P. O'Connell, and J. C. Peterson, 
"A One-Micron Bipolar VLSI Convolver," pp. 226-227 in 
Proc. International Solid-State Circuits Conf. 
(ISSCC), IEEE (Feb., 1981). 
J. C. McWhirter, J. V. McCanny, and K. W. Wood, 
"Novel Multibit Convolver/Correlator Chip Design 
Based on Systolic Array Principles," pp. 66-73 in 
Proc. SPIE Vol. 341, Real Time Signal Processing V, 
The Society of 	Photo-Optical 	Instrumentation 
Engineers, Arlington, VA. (May, 1982). 
R. A. Evans, D. Wood, K. Wood, J. V. McCanny, J. G. 
McWhirter, and A. P. H. McCabe, "A CMOS Implementa-
tion of a Systolic Multi-Bit Convolver Chip," pp. 
227-235 in VLSI 83, ed. F. Anceau and E. J. Aas, 
Elsevier Science Publishers B.V. (North Holland) 
(1983) 
A. G. Corry and K. Patel, "Architecture of a CMOS 
Correlator," GEC J. Research, Vol. 1, pp. 35-38 
(1983). 
T. W. Williams and K. P. Parker, "Design for Testa-
bility - A Survey," Proc. IEEE, Vol. 71, pp. 98-112 
(Jan., 1983). Invited paper. 
E. J. Mccluskey and S. Bozorgui-Nesbat, "Design for 
Autonomous Test," IEEE Trans. Computers, Vol. C-30, 
pp. 866-875 (Nov., 1981). 
C. H.. Chen, "Designing Testable Synchronous Logic," 
pp. 89-94 in Digest of papers, International Test 
Conf., IEEE, Philadelphia, PA. (1981). 
P. K. Lala, "Current Problems in VLSI Testing and 
Testability," The Radio and Electronic Engineer, Vol. 
54, 	pp. 415-423 (Oct., 1984). 
- 219 - 
M. T. M. R. Segers, "The Impact of Testing on VLSI 
Design Methods," IEEE J. Solid-State Circuits, Vol. 
SC-17, pp. 481-486 (June, 1982). Invited Paper 
T. E. Mangir and A. Avizienis, "Fault Tolerant Design 
for VLSI: Effect of Interconnect Requirements on 
Yield Improvement of VLSI Designs," IEEE Trans. Com-
puters, Vol. C-31, pp. 609-615 (July, 1982). 
P. Banerjee and J. A. Abraham, "Generating Tests for 
Physical Failures in MOS Logic Circuits," pp. 554-559 
in Digest of papers, International Test Conf., IEEE, 
Philadelphia, PA. (1983). 
S. G. Papaioannou, "Optimal Test Generation in Combi-
national Networks by Pseudo Boolean Programming," 
IEEE Trans. Computers, Vol. C-26, pp. 553-560 (June, 
1977) 
J. Savir and P. H. Bardell, "On Random Pattern Test 
Length," pp. 95-106 in Digest of papers, Interna-
tional Test Conf., IEEE, Philadelphia, PA. (1983). 
G. Grassl, "Design for Testability," pp. 1-36 in 
Proc. NATO Advanced Study Institute on VLSI Design, 
North Atlantic Treaty Organisation, 	Louvain, Bel- 
gium. (1980). 
: G. Bennetts, Design of Testable Logic Circuits, 
Addison-Wesley Publishing Company, London (1984). 
M. J. Y. Williams and J. B. Angell, "Enhancing Testa-
bility of Large Scale Integrated Circuits via Test 
Points and Additional Logic," IEEE Trans. Computers, 
Vol. C-22, pp. 46-60 (Jan., 1973). 
E. I. Muehldorf, "Designing LSI Logic for Testabil-
ity," pp. 45-49 in Proc. IEEE Semiconductor Test 
Conf., 	(1976). 
J. B. Grierson, "The UK5000 Project," pp. 1/1 - 1/4 
in Proc. 3th International Conference on Custom and 
Semi-Custom ICs, Prodex Seminars Ltd., in association 
with the lEE., London (Nov., 1983). 
R. A. Frohwerk, "Signature Analysis: A New Digital 
Field Service Method," Hewlett-Packard J., pp. 2-8 
(May, 1977). 
H. J. Nadig, "Signature Analysis - Concepts, Exam-
ples, and Guidelines," Hewlett-Packard J., pp. 15-21 
(May, 1977). 
- 220 - 
B. Konemann, J. Mucha, and G. Zwiehoff, "Built-In 
Logic Block Observation Techniques," pp. 37-41 in 
Digest of papers, International Test Conf., IEEE, 
Philadelphia, PA. (1979). 
B. Koneivann, J. Mucha, and G. Zwiehoff, "Built-In 
Test for Complex Digital Integrated Circuits," IEEE 
J. Solid-State Circuits, Vol. SC-15, 	pp. 315-318 
(June, 1980). 
N. Benowitz, D. F. Calhoun, G. E. Alderson, J. E. 
Bauer, and C. T. Joeckel, "An Advanced Fault Isola-
tion System for Digital Logic," IEEE Trans. Comput-
ers, Vol. C-24, pp. 489-497 (May, 1975). 
D. K. Bhavsar and R. W. Heckelman, "Self Testing by 
Polynomial 	Division," pp. 208-216 in Digest of 
papers, International Test Conf., IEEE, 	Philadel- 
phia, PA. (1981). 
B. T. Murphy, "Cost Size Optima of 	Monolithic 
Integrated Circuits," Proc. IEEE, Vol. 52, 	pp- 
1537-1545 (Dec., 1964). 
J. E. Price, "A New Look at Yield of Integrated Cir-
cuits," Proc. IEEE, Vol. 58, pp. 1290-1291 (Aug., 
1970) 
A. Gupta and J. W. Lathrop, "Yield Analysis of Large 
Integrated Circuit Chips," IEEE J. Solid-State Cir-
cuits, Vol. SC-7, pp. 389-395 (Oct., 1972). 
C. H. Stapper, "Defect Density Distribution for LSI 
Yield Calculations," IEEE Trans. Electronic Devices, 
Vol. ED-20, pp. 655-657 (July, 1973). 
R. S. Hemmert, "Poisson Process and Integrated Cir-
cuit Yield Prediction," Solid-State Electron., Vol. 
24, PP. 511-515 Pergamon Press Ltd., (1981). 
K. Saito and E. Arai, "Experimental Analysis and New 
Modelling of MOS LSI Yield Associated with the Number 
of Elements," IEEE J. Solid-State Circuits, Vol. SC-
17, 	pp. 28-33 (Feb., 1982). 
S. C. Seth and V. D. Agrawal, "Characterizing the LSI 
Yield Equation from Wafer Test Data," IEEE Trans. 
Computer-Aided Design, Vol. CAD-3, 	pp. 	123-126 
(Apr., 1984). 
- 221 - 
C. H. Stapper, A. N. McLaren, and M. Dreckmann, 
"Yield Model for Productivity Optimization of VLSI 
Memory Chips with Redundancy and Partially Good Pro-
duct," IBM J. Research and Development, Vol. 24, pp. 
398-409 (May, 1980). 
B. F. Fitzgerald and E. P. Thoma, "Circuit Implemen-
tation of Fusible Rendundant Addresses on RAMs for 
Productivity Enhancement," IBM J. 	Research and 
Development, Vol. 24, pp. 291-298 (May, 1980). 
S. E. Schuster, "Multiple Word/Bit Line Redundancy 
for Semiconductor Memories," IEEE J. Solid-State Cir-
cuits, Vol. SC-13, pp. 698-703 (Oct., 1978). 
E. Tammaru and J. B. Angell, "Redundancy for LSI 
Yield Enhancement," IEEE J. Solid-State Circuits, 
Vol. SC-2, 	pp. 172-182 (Dec., 1967). 
I. Koren and N. A. Breuer, "On Area and Yield Con-
siderations 	for Fault Tolerant VLSI Processor 
Arrays," IEEE Trans. Computers, Vol. C-33, pp. 21-27 
(Jan., 1984). 
C. H. Stapper, F. M. Armstrong, and K. 	Saji, 
"Integrated Circuit Yield Statistics," Proc. IEEE, 
Vol. 71, 	pp. 453-470 (Apr., 1983). 
J. Bernard, "The IC Yield Problem: A Tentative 
Analysis for MOS/SOS Circuits," IEEE Trans. Elec-
tronic Devices, Vol. ED-25, 	pp. 939-944 (Aug., 
1978) 
W. R. Moore, "A Review of Fault-Tolerant Techniques 
for the Enhancement of Integrated Circuit Yield," GEC 
J. Research, Vol. 2, pp. 1-15 (1984). 
B. T. Murphy, "Comments on 'A New Look at Yield of 
Integrated Circuits'," Proc. IEEE, Vol. 59, p. 1128 
(July, 1971). 
G. E. Moore, "What Level of LSI is Best for You?," 
Electronics, Vol. 43, pp. 126-130 (Feb., 1970). 
T. Yanagawa, "Yield Degradation of Integrated Cir-
cuits Due to Spot Defects," IEEE Trans. Electron Dev-
ices, Vol. ED-19, pp. 190-197 (Feb., 1972). 
A. Gupta, W. A. Porter, and J. W. Lathrop, "Defect 
Analysis and Yield Degradation of Integrated Cir-
cuits," IEEE J. Solid-State Circuits, Vol. SC-9, pp. 
96-102 (June, 1971). 
- 222 - 
W. R. Moore and M. J. Day, "Yield Enhancement of a 
Large Systolic Array Chip," Microelectronics Relia-
bility, Vol. 24, 	pp. 511-526 (1984). 
P. L. Meyer, Introductory Probability and Statistical 
Applications, Addison-Wesley Publishing Co., Reading, 
MA. (1970). 
R. M. Sedmak, "Implementation Techniques for Self 
Verification," pp. 267-278 in Digest of papers, 
International Test Conf., IEEE, 	Philadelphia, PA. 
(1980). 
H. T. Kung and M. S. Lam, "Fault Tolerance and Two 
Level Pipelining in VLSI Systolic Arrays," pp. 74-83 
in Proc. Conf. Advanced Research in VLSI, MIT, 	Cam- 
bridge, MA. (Jan., 1984). 
P. W. Linderman and W. H. Ku, "A Three Dimensional 
Systolic Array Architecture for Fast Matrix Multipli-
cation," pp. 34A.6 in Proc. International Conf. 
Acoustics, Speech, and Signal Processing (ICASSP), 
IEEE, San Diego, CA. (1984). 
J. V. McCanny and J. C. McWhirter, "Yield Enhancement 
of Bit Level Systolic Array Chips Using Fault 
Tolerant Techniques," Electronics Letters, Vol. 19, 
pp. 525-527 (July, 1983). 
I. Kale, "A CMOS Digital Polarity Correlator with 
Built-In Self-Test and Self-Repair," MSc. Project 
Report MSP26, University of Edinburgh (Sept., 1984). 
