Periodic Application of Concurrent Error Detection in Processor Array Architectures by Chen, Paul Peichuan
,. 
April 1993 UILU-ENG-93-2214 
CRHC-93-08 
Center for Reliable and High-Performance Computing 
PERIODIC APPLICATION 
OF CONCURRENT 
ERROR DETECTION 
IN PROCESSOR 
ARRAY ARCHITECTURES 
Paul Peichuan Chen 
(NASA-CR-193211) PERIODIC 
APPLICATION OF CONCURRENT ERROR 
DETECTION IN PROCESSOR ARRAY 
ARCHITECTURES PhD. Thesis -
1;/)<'/1 I ,. .. , v ,. 
N93-21238 
Unclas 
( III i no is Un i v. at 
Urbana-Champaign) 114 P 
G3/62 0166345 
Coordinated Science lAboratory 
College of Engineering 
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
Approved for Public Release. DisUibution Unlimited. 
https://ntrs.nasa.gov/search.jsp?R=19930018049 2020-03-17T06:09:00+00:00Z
L~CL.l,.SS 1:- LED 
SECuRIry CLASSIFICAtiON OF THIS PAGE 
REPORT DOCUMENTATION PAGE 
la. REPORT SECURITY CLASSIFICATION 1 b. RESTRICTIVE MARKINGS 
Unclassified None 
2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION I AVAILABILITY OF REPORT 
Approved for public release; 
2b. DECLASSIFICATION I DOWNGRADING SCHEDULE distribution unlimited 
4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING ORGANIZATION REPORT NUMBER(S) 
UILU-ENG-93-2214 CRHC-93-0' 
6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION 
Coordinated Science Lab (If applicable) 
University of Illinois N/A NASA 
6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, St~te, and ZIP Code) 
1308 W. Main St. Moffett Field, CA 
Urbana, 1L 61801 
Sa. NAME OF FUNDING I SPONSORING Sb. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 
ORGANIZATION (If applicable) 
7a 
Sc. ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS 
PROGRAM PROJECT TASK WORK UNIT 
7b ELEMENT NO. NO. NO. ACCESSION NO. 
11. TITLE (Include Security C/auification) 
Periodic Application of Concurrent Error Detection in Processor Array Architectures 
12. PERSONAL AUTHOR(S) 
CHEN Paul Peichuan 
13a. TYPE OF REPbRT 1'3b. TIME COVERED 114. DATE OF REPORT (Y .. r,MontI't,Oay) ts. PAGE COUNT 
Technical FROM TO 1 g<B Anril 22 110 
16. SUPPLEMENTARY NOTATION 
17. COSA TI CODES IS. SUBJECT TERMS (Continue on reverse if MC.SUry and identify by block numbed 
FIELD GROUP SUB·GROUP 
modularity, high parallelism, VLSI/WSI, rea-time signal 
processing 
~9. ABSTRACT (Continue on revers. if Me.sury anditHntify by block number) 
Processor arrays can provide an attractive architecture for some applications. Featuring modularity, reg-
ular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and 
applications with high computational requirements, such as real-time signal processing. 
Preserving the integrity of results can be of paramount importance for certain applications. In these 
cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault 
tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer 
the advantage that transient and intermittent faults may be detected with greater probability than with 
off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. 
However, most time-redundant CED techniques degrade a system's performance. 
20. DISTRIBUTION I AVAILABILITY OF ABSTRACT 21. ABStRACT SECURITY CLASSIFICATION 
~ UNCLASSIFIEDIUNLIMITED o SAME AS RPT. o OTIC USERS Unclassified 
22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Are. Code) 1 22c. OFFICE SYMBOL 
. . DO FORM 1473. B4 MAR B3 APR edition may be used until exhausted . SECURITY CLASSIFICATION OF THIS PAGE 
All other editions are obsolet •. 
U1~CLASS1FIF.D 
PERIODIC APPLICATION OF CONCURRENT ERROR DETECTION 
IN PROCESSOR ARRAY ARCHITECTURES 
BY 
PAUL PEICHUAN CHEN 
B.S., Stanford University, 1984 
M.S., University of Illinois at Urbana-Champaign, 1987 
THESIS 
Submitted in partial fulfillment of the requirements 
for the degree of Doctor of Philosophy in Electrical Engineering 
in the Graduate College of the 
University oflllinois at Urbana-Champaign, 1993 
Urbana, lllinois 
© Copyright by Paul Peichuan Chen, 1993 
pp.~€€O'Nf; PAGE BLAr~K flO; FILMEj) 
iii 
PERIODIC APPLICATION OF CONCURRENT ERROR DETECTION 
IN PROCESSOR ARRAY ARCHITECTURES 
Paul Peichuan Chen, Ph.D. 
Department of Electrical and Computer Engineering 
University of lllinois at Urbana-Champaign, 1993 
Prof. W. Kent Fuchs, Advisor 
Processor arrays can provide an attractive architecture for some applications. Featuring 
modularity, regular interconnection and high parallelism, such arrays are well-suited for 
VLSI/WSI implementations, and applications with high computational requirements, such as 
real-time signal processing. 
Preserving the integrity of results can be of paramount importance for certain applications. 
In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. 
One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detec-
tion (CED) techniques offer the advantage that transient and intermittent faults may be detected 
with greater probability than with off-line diagnostic tests. Applying time-redundant CED tech-
niques can reduce hardware redundancy costs. However, most time-redundant CED techniques 
degrade a system's performance. 
Periodic Application of Concurrent Error Detection (PACED) is a technique introduced in 
this thesis to reduce the performance costs incurred through the use of time-redundant CED in 
processor array architectures. To check computations periodically instead of continuously, 
PACED varies the application of such CED techniques to a processor array in both time and 
space. The purpose of PACED is to provide probabilistic detection of transient, intermittent. 
and permanent failures in processor arrays while reducing the overhead of performing detection. 
iv 
Since CED is not performed continuously when PACED is used, undetected errors may 
occur prior to an error indication. Therefore, upon error detection, not only the current outputs 
of the array but both recent and subsequent outputs may also be erroneous. This thesis investi-
gates the confidence to place on system outputs when PACED is applied, deriving formulae to 
predict the amount of output to suspect as possibly erroneous for single processors, linear unidi-
rectional and two-dimensional mesh-connected processor arrays. The error coverage afforded 
by PACED in these architectures is also studied. Finally, the performance impact of using 
PACED in each array type is studied using both an array simulation model that gives estimates 
of application completion times with low computational cost and results of experiments using an 
Intel iPSCI2 hypercube to simulate a 16-node unidirectional linear array and a 4x4 two-
dimensional mesh array. 
v 
ACKNOWLEDGMENTS 
I give sincere thanks and appreciation to my advisor Professor W. Kent Fuchs for his guid-
ance and support during the course of my graduate studies. I also thank Professors Prithviraj 
Banerjee, Ravishanker K. Iyer and Michael C. Loui for serving on my committee. 
I thank Antoine Mourad for his invaluable assistance with the confidence analyses pre-
sented in Sections 3.2 and 4.3 and the error coverage analysis of Section 3.3. I also thank 
Robert Dimpsey, Kumar Goswami, Inhwan Lee, and Dong Tang for their help with the curve fit-
ting performed in Section 3.1. 
I thank my parents for their support and understanding these past years. 
Thanks to all of my friends in and out of the Center for Reliable and High-Performance 
Computing and special thanks to the following people. I wouldn't have done it without them -
Dan Bailey, Kate Baumgartner, Jeff Baxter, John Bentrup, Randy Brouwer, Robert Dimpsey, Pat 
Duba, John Fu, Kumar Goswami, John Holm, Sabrina Hwu, Bob Janssens, Andrew Jeter, Ralph 
Kling, Suzanne Kuo, Marc Levitt, Jim Li, Robert and Dorothy Long, Matt Lowrie. Vicki 
McDaniel, Maria Mendez, Antoine Mourad, Robert Mueller-Thuns, Tom Niermann, Michael 
Peercy, Stony Peng, Paul Ryan, Joe Scanlon, Dale Schouten, Jude Shavlik. Jonathan Simonson, 
Craig Stunkel. Kurt Thearling, Paul Tobin, Nancy Warter, Vickie Willis, and Kun-Lung Wu. 
I gratefully acknowledge the support provided by the Office of Naval Research (Contract 
No. N00014-89-K-0070). 
vi 
Finally, thanks to Bach. Beethoven, Brahms, Brubeck. Costello. Gould. Jarrett, Haydn. 
Mozart, and Shostakovich for the soundtrack, and a very special thank-you to Melody for mak-
ing it fun when it wasn't. 
vii 
TABLE OF CONTENTS 
CHAPTER PAGE 
1. INTRODUCTION ..................................................................................................... 1 
2. THE PACED TECHNIQUE ................................................................................... 9 
3. PACED IN A SINGLE PROCESSOR .................................................................. 13 
3.1. Error Arrival Model .................................................................................. 13 
3.2. Confidence Analysis ..... .............. ...... .... ...... ...... .......... ................ ................ 17 
3.2.1. Fault-active intervals .................................................................. 20 
3.2.2. Undetected-errors intervals ........................................... ........... 30 
3.3. Error Coverage ...................................................................... .... ........ .... ..... 45 
4. PACED IN A LINEAR ARRAY ............................................................................. 49 
4.1. Error Detection Latency ................................................ ...................... ..... 51 
4.2. Error Propagation Distance .................................... ........ .............. .... ....... 54 
4.3. Suspected Outputs ...................................................................................... 55 
4.4. Error Coverage .................................. ...... .................... .......... .... ................. 62 
4.5. Performance .............................................................................. .... .... .... .... ... 63 
4.5.1. Simulation model ......................................................................... 65 
4.5.2. Hypercube simulations .................... .... ...... ........ ........ ........ .... ..... 68 
5. PACED IN A TWO-DIMENSIONAL ARRAY .................................................. 74 
5.1. Error Detection Latency ........................................................................... 75 
5.2. Suspected Outputs ......................................................... .......... .... ............... 77 
5.3. Error Coverage ........................................................ .............. ........ ............. 83 
5.4. Performance .................................... .......... ........................ ............ ............... 84 
5.4.1. Simulation model ............ ·............................................................. 85 
5.4.2. Hypercube simulations ............................ ............................ ....... 90 
6. SUMMARy.......................... .......................................... .................... ............ ...... ....... 94 
REFERENCES ............................................................................................................ 97 
VITA .............................................................................................................................. 101 
viii 
LIST OF TABLES 
TABLE PAGE 
1.1. EXAMPLE TIME· REDUNDANT CED TECHNIQUES. .......................................... 2 
1.2. OVERHEADS OF TIME· REDUNDANT CED IN PROCESSOR ARRAYS. ............ 3 
3.1. OBSERVED SEU RATES. ......................................................................................... 16 
4.1. COMPUTATION CYCLE TIMES, EDGE DETECTION PEs. ................................. 67 
5.1. NUMBER OF SUSPECfED PREVIOUS OUTPUTS, 2-D ARRAY. .......................... 82 
5.2. NUMBER OF SUSPECfED FUTURE OUTPUTS, 2·D ARRAY. .............................. 83 
5.3. TASK AND CHECK TIMES, ADAPTIVE BEAMFORMING PEs. .......................... 87 
LIST OF FIGURES 
FIGURE 
2.1. 
2.2. 
3.1. 
3.2. 
PACED parameters M and N. . ............................................................................. . 
PACED in a lOx 10 mesh-connected array. . ........................................................ . 
TBE histogram and fitted pdf for VAX "Earth." .................................................. . 
TBE histogram and fitted pdf for Pioneer. . .......................................................... . 
3.3. Outputs to suspect in fault-active intervals of length K. . .................................... . 
3.4(a). Fault-active intervals, C vs. A. (0 ~ C ~ 1) ........................................................... . 
3.4(b). Fault-active intervals, C vs. A. (C ~ 0.95). . .......................................................... . 
3.5(a). Fault-active intervals, C vs. N (0 ~ C ~ 1). . ......................................................... . 
3.5(b). Fault-active intervals. C vs. N (C ~ 0.95) ............................................................ . 
3.6(a). Fault-active intervals, C vs. q (0 ~ C ~ 1). . ......................................................... . 
3.6(b). Fault-active intervals. C vs. q (C ~ 0.95). . .......................................................... . 
3.7. Outputs to suspect in undetected-errors intervals of length L. ............................ . 
3.8. Time savings using L instead of K. . ..................................................................... . 
3.9(a). Undetected-errors intervals. C vs. A. (0 ~ C S 1). . ............................................... . 
3.9(b). Undetected-errors intervals, C vs. A. (C ~ 0.95). . ................................................ . 
3.10(a). Undetected-errors intervals. C vs. N (0 ~ C ~ 1). . .............................................. . 
3.10(b). Undetected-errors intervals. C vs. N (C ~ 0.95). 
3.1l(a). Undetected-errors intervals, C vs. q (0 ~ C ~ 1) ................................................. . 
3.1l(b). Undetected-errors intervals. C vs. q (C ~ 0.95). . ................................................ . 
3.12. Single processor estimated error coverage, q = 1. J.L = 11.1 min/err. . ................. . 
4.1. 
4.2. 
4.3. 
4.4. 
4.5. 
4.6. 
4.7. 
4.8. 
4.9. 
4.10. 
5.1. 
5.2. 
A V-PE unidirectional linear processor array. . .................................................... . 
Checking pattern in a 7 -PE array. . ....................................................................... . 
Error propagation in a 10-PE array ...................................................................... . 
Suspected previously produced outputs. lO-PE array. .. ...................................... . 
Suspected future outputs. lO-PE array ................................................................. . 
Estimated error coverage for a 16-PE linear array. . ........................................... .. 
Simulated linear array performance. edge detection. . ........................................ .. 
Sample input and output. edge detection algorithm. . .......................................... . 
Linear array performance. edge detection. . ........................................................ .. 
Linear array performance. edge detection, no comn:tunication. .. ........................ . 
A UxV 2-D mesh processor array ........................................................................ . 
Error detection latency. . ....................................................................................... . 
ix 
PAGE 
10 
12 
15 
17 
21 
23 
24 
26 
27 
28 
29 
33 
38 
39 
40 
41 
42 
43 
44 
48 
49 
53 
56 
58 
61 
64 
68 
69 
70 
72 
75 
77 
5.3. 
5.4. 
5.5. 
5.6. 
5.7. 
5.S. 
Suspected previously produced outputs, lOxlO array ......................................... . 
Suspected future outputs, IOxiO array. .. ............................................................. . 
Estimated error coverage for a 4x4 mesh array. . ................................................. . 
Triangular array for adaptive digital beamforming. . ........................................... . 
Adaptive beamforming array. (a) Performance degradation. (b) Checking 
overhead. . ............................................................................................................ . 
Mesh array performance, matrix multiply. .. ....................................................... .. 
x 
78 
80 
85 
86 
89 
91 
1 
CHAPTER!. 
INTRODUCTION 
Processor arrays can provide an attractive architecture for some applications. Featuring 
modularity, regular interconnection, and high parallelism, such arrays are well-suited for 
VLSIIWSI implementations and applications with high computational requirements, such as 
real-time signal processing. 
Preserving the integrity of results can be of paramount importance for certain applications. 
In these cases, fault tolerance features should be used to handle component failures that could 
upset reliable delivery of a system's service .. One aspect of fault tolerance is the detection of 
errors caused by faults. Techniques for error detection may be classified as either off-line. in 
which diagnostic tests are applied to the system, or concurrent, in which normal system opera-
tions are checked for errors. Concurrent error detection (CEO) techniques offer the advantage 
that transient and intermittent faults may be detected with greater probability than with off-line 
methods. 
This thesis considers the application of CEO techniques to processor array architectures. 
To minimize the overhead caused by fault tolerance, both hardware and time redundancies 
should be minimized. Applying time-redundant CEO techniques can reduce the hardware costs. 
Table 1.1 lists some examples of such techniques, which are described below. 
TABLE 1.1. 
EXAMPLE TIME-REDUNDANT CED TECHNIQUES. 
Alternating logic 
Recomputing with shifted operands (RESO) 
Comparison with concurrent redundant computation (CCRC) 
Recomputing by alternate path 
Data redundancy 
Triple time redundancy 
Algorithm-based fault tolerance 
Saturation 
Spare capacity 
[1,2] 
[3] 
[4] 
[5] 
[6] 
[7,8] 
[9,10] 
[11] 
[12] 
2 
Time-redundant CED techniques have been used to detect faults in digital circuits. For 
example, in alternating logic, both the true and complemented values of a circuit's inputs are 
applied serially to produce two versions of the output [1,2]. The two error-free versions are 
complementary for self-dual functions. Although all faults which manifest themselves as single 
stuck-at faults can be detected, this technique could require hardware modification to create self-
dual functions from non-self-dual ones, as well as extra flip-flops for the sequential parts of the 
circuit. Recomputing with shifted operands (RESO) [3] also achieves error detection by com-
paring two results. Each computation is followed by a similar one which uses bit-shifted ver-
sions of the operands; the bit-shifted result is then shifted back and compared with the original 
result. RESO can detect all errors in ripple-carry and carry-Iookahead adders due to one faulty 
bit slice, and all errors in array multipliers and array dividers due to one faulty cell. 
At a higher architectural level. the method called comparison with concurrent redundant 
computation (CCRC) [4] compares the results of two identical computations performed concur-
rently on different processors. Similar to CCRC are the recomputing by alternate path method, 
3 
designed specifically for use in an FFr processor array [5], and the data redundancy technique, 
which uses idle processors to perform duplicate computations [6]. All three techniques can 
detect any errors caused by faults confined to a single processor. 
Many CED techniques have been applied to processor arrays (see Table 1.2). Some exam-
pIes include: alternating logic in divider arrays [14], RESO in linear logic arrays [15] and 
matrix-multiply arrays [16], CCRC in divider and bidirectional systolic arrays [4], data redun-
dancy in both linear and mesh matrix-multiply arrays [6], triple time redundancy in linear sys-
tolic arrays [8], and algorithm-based fault tolerance in FFI' arrays [9] and matrix operations 
arrays [10]. In triple time redundancy, adjacent triples of processors perform triple modular 
redundancy (TMR), a standard error-masking technique. Triple time redundancy can detect up 
to r n/3l faulty cells before reconfiguration is necessary, where n is the size of the array. How-
ever, two extra processing elements (PEs) are required, as well as additional interconnect and 
switches throughout the array. An earlier version of triple time redundancy used even more 
TABLE 1.2. OVERHEADS OF TIME-REDUNDANT CED 
IN PROCESSOR ARRAYS. 
technique CED overhead 
Algorithm-based fault tolerance <50% [13] 
Alternating logic ~ 100% [14] 
RESO ~ 100% [15, 16] 
CCRC ~ 100% [4] 
Data redundancy ~ 100% [6] 
Triple time redundancy ~2oo% [8] 
r -processes ~2oo% [ 17] 
Overlapping H-processes ~300% [ 18] 
4 
hardware: approximately 3n/2 cells were required, as well as increased complexity of the PEs 
and the interconnect [7]. 
Algorithm-based fault tolerance is a technique which, by modifying an algorithm to oper-
. ate on specially encoded data. can provide both error detection and location. Though not as 
generally applicable as other CEO techniques, extremely low performance cost can be realized 
since the fault-tolerance scheme is tailored to the specific application. Error coverages ranging 
from 85% to 100% have been reported with less than 10% performance degradation [13]. 
The r-processes [17] and overlapping H-processes [18] techniques employ different pat-
terns of neighboring PEs within mesh-connected two-dimensional processor arrays to perform 
redundant computations. The r -processes technique can detect errors caused by faults confined 
to one of every three PEs, but requires an extra row and an extra column of PEs. Designed for 
algorithms whose main PE computation is of the form (a· b) * (c· d) (where· and * represent 
general binary operators), overlapping H-processes can detect any errors from one PE of any 
4x4 sub array of an array. 
The problem with most time-redundant CED techniques is that their use may degrade a 
system's performance. IfPE utilization is less than 100% in an array, then such techniques may 
possibly be applied with very little performance cost. Data redundancy and CCRC rely on idle 
cycles at PEs within an array, caused by a 50% PE utilization inherent to the algorithm, in which 
to launch redundant computations. Though the completion time of a single problem is unaf-
fected, the array loses its ability to interleave problems: its throughput is cut 50%. These tech-
niques would incur an overhead of 100% if used in algorithms in which the PEs of the array 
were used continuously. The application of RESO to band matrix multiplication also relies on 
5 
idle cycles. Of three designs proposed [16], two had PE utilizations under 50% (33% and 50%), 
enabling application of RESO in the idle cycles. The third design's utilization was 100% until 
RESO was added: the data rate was halved to create artificial idle cycles between computations. 
Without resorting to such measures, RESO can incur a time overhead of 100% or more, since 
shifting of operands is required in addition to the replicated computation. When alternating 
logic is applied to divider arrays, at least 100% overhead results since the complemented ver-
sions of the inputs are applied interleaved with the actual inputs [14]. Both triple time redun-
dancy and r -processes require every PE to perform three times as much work, which causes an 
overhead of at least 200%, ignoring the overhead due to the increased message traffic. Overlap-
ping H-processes can reduce the throughput of a mesh array by 75% - a time overhead greater 
than 300%. 
Periodic Application of Concurrent Error Detection (PACED) is a technique introduced in 
this thesis to reduce the performance degradation incurred through the use of time-redundant 
CED in processor array architectures. To check computations periodically instead of continu-
ously, PACED varies the application of time-redundant CED techniques to a processor array in 
both time and space. The purpose of PACED is to provide probabilistic detection of transient, 
intermittent, and permanent failures in processor arrays, while reducing the overhead of per-
forming detection. Error recovery is not provided by PACED. Other techniques, such as roll-
back or forward recovery, are necessary to handle recovery from detected errors. 
Since CED is not performed continuously when PACED is used, undetected errors may 
occur prior to an error indication. Therefore, when an error.is detected. not only the current out-
puts of the array but both recent and subsequent outputs may also be erroneous. This thesis first 
6 
investigates the confidence to place on a single processor's outputs when PACED is applied, 
deriving fonnulae to predict the amount of output to suspect as possibly erroneous. In linear 
processor arrays, checking patterns are created when constituent PEs perfonn PACED at differ-
ent times; optimal scheduling of these patterns to minimize the error detection latency has been 
studied [19]. By use of these checking patterns, if errors can be propagated by PEs, then the 
amount of output to suspect upon error detection as possibly erroneous can be limited. It is then 
shown that high confidence in most linear array outputs can be achieved using CED applied rel-
atively infrequently. Similar patterns of checking are then studied in two-dimensional mesh-
connected processor arrays, to detennine which outputs from the array to suspect as possibly 
erroneous upon error detection. The error coverages afforded by PACED in the single proces-
sor, linear array, and two-dimensional mesh array are also studied. 
Finally, the perfonnance impact of using PACED in each array type is studied using both 
an array simulation model that gives estimates of application completion times with low compu-
tational cost and results of experiments using an Intel iPSCI2 hypercube to simulate a 16-node 
unidirectional linear array and a 4x4 two-dimensional mesh array. 
This thesis focuses on the use of PACED in processor array architectures. The idea of peri-
odic checking has previously been applied to multicomputer systems: saturation [11] and spare 
capacity [12] use idle processors in large-grain parallel architectures to perfonn redundant 
copies of other processor's processes. Error detection is achieved by voting at each processor 
on the process results. The perfonnance is affected only by the increased message traffic. which 
can be negligible when certain specific protocols are used [11]. 
7 
Other architectures may profit from the application of PACED. for example. fine-grained 
parallel architectures that use very long instruction words (VLIW) to address multiple functional 
units (pus). In the PUs of the CRAY-l scalar unit. idle cycles have reduced the performance 
cost of using RESO to check computations to the range 0.2% to 17.3% for the Livermore For-
tran kernels [20]. A similar result was obtained through simulations with the IMPACT VLIW 
machine model [21]. From those simulations. it was found that idle cycles in a 4-PU architec-
ture running a set of integer Unix utilities limited the performance penalties from almost nil 
(0.2% for we) to quite significant (grep: 161%) [22]. In both of these studies. however. check-
ing was performed for every checkable computation. and the performance costs were dependent 
upon the chance coincidences of redundant computations with idle functional units. A form of 
PACED in which compile-time information is used to schedule redundant computations only 
during idle slots could reduce the performance costs. This technique has been employed with 
good results for control-flow checking on the Multifiow TRACE 141300 [23]. Since 100% of 
the checking operations used otherwise idle resources. there was no estimated performance 
penalty (neglecting increased memory traffic). and greater than 99% of all control-flow errors 
were detected in the benchmarks tested. A compiler-assisted PACED scheme to provide data 
integrity could meet with similar success. 
The contributions of this thesis are as follows. A method is introduced to reduce the per-
formance costs of using time-redundant CEO through periodic application. An analysis is pro-
vided to determine. upon error detection at a single processor using PACED, the confidence to 
place on that processor's outputs. Similar analyses are performed for the linear unidirectional 
and two-dimensional mesh-connected processor array architectures. assuming that errors can be 
8 
propagated through the array. The error coverage afforded by PACED in each architecture is 
also studied. A PACED checking-pattern simulator and analyzer are described that facilitate 
choosing PACED parameter values in the two-dimensional array to minimize the error detection 
latency and the amount of suspected output at error detection time. A performance simulation 
model is described that estimates the performance costs of PACED applied to the linear and 
two-dimensional arrays; results of experiments using the simulation model are also given. 
Finally, empirical data collected from experiments using the Intel iPSCI2 hypercube are pro-
vided that show PACED in linear and two-dimensional arrays can reduce the performance 
degradation incurred through the use of CED. 
The organization of this thesis is as follows. Chapter 2 outlines the PACED technique. In 
Chapter 3, the confidence and error coverage analyses of a single processor using PACED are 
described, and similar analyses of PACED applied to linear unidirectional arrays and two-
dimensional mesh arrays are presented in Chapters 4 and 5, respectively. Those chapters also 
discuss the performance of the array architectures using PACED. Finally, Chapter 6 summarizes 
and presents conclusions. 
9 
CHAPTER 2. 
THE PACED TECHNIQUE 
This thesis considers processor array architectures in which the constituent processing ele-
ments (PEs) are regularly interconnected and each PE communicates only with its local neigh-
bors. The computational activity at each PE, called a computation cycle, consists of receiving 
input, performing a task with or without applying CED, and sending output. A task is a fine-
grained set of data manipulations, such as a multiply-accumulate operation. "Fine-grained" 
means that many such tasks are required to complete a problem execution. 
When PACED is applied to one PE of an array, it can be parameterized as follows. Let M 
be the period of CED application and let N be the duration of CED application, where 
° ~ N ~ M. The parameters M and N govern the time distribution of CED at the processor: in 
any period of M computation cycles, N tasks are checked and M - N tasks are unchecked. As a 
mathematical abstraction to facilitate analysis of the PACED technique, let the checking 
sequence, CSM,N' be an array of M values as follows: 
CSM,N[r] = 1, for ° ~ r ~ N - 1, 
CSM,N[r] = 0, for N ~ r ~ M - 1. 
EXAMPLE 2.1: The checking sequence for M = 13 and N = 5 is 
CS13 ,s = (1, 1, 1, 1, 1, 0, 0, 0, 0, 0,0, 0, 0). o 
10 
Each value in the checking sequence represents the checking activity at a PE during one 
computation cycle. The entire sequence represents one M-computation cycle (M-cycle) period. 
The N checked computation cycles are represented by the N consecutive "l"s in CSM,N' The 
M - N unchecked computation cycles are represented by the M - N subsequent "O"s in CSM •N . 
The checking activity at a PE over time may be represented as a cyclic reading of the checking 
sequence array. Note that the definition of the checking sequence gives but one possible way to 
perform N checks in M cycles; there are a maximum of (~) different ways to perform N-out-of-
M checking (some combinations are simply shifts of other patterns). In the remainder of this 
thesis, only checking sequences as defined above are considered; a value from a CS M.N array 
will represent the checking activity at a particular PE at a particular computation cycle. Figure 
2.1 shows a portion of the activity at a processor using PACED with M = 5 and N = 2. When 
NIM is small, less performance degradation can usually be expected, but small NIM also reduces 
the probability of error detection. 
When PACED is applied to the constituent PEs of a processor array, M and N may in gen-
eral vary at each PE in the array. A third parameter, the PE checking offset 0, determines the 
task & check 
task 
~M ~I 
__ ! !"-------J 
computation cycles 
Figure 2.1. PACED parameters ¥ and N. 
i ~ 
11 
initialization of each PE's first M-cycle period; 0 is an offset into the checking sequence CSM,N' 
By varying 0 at each PE in an array, checking is performed at different times at different PEs. 
Snapshots of the checking acti~ty in the array then reveal patterns of checking. 
EXAMPLE 2.2: Given a UxV two-dimensional processor array using PACED, let the slope 
of the checking pattern be given by RISEIRUN and let the checking pattern be set by OJ,} = 
(M i ,) + i + j - (U -1- i)RUN - (V -1- j)RISE) mod M i ,} at each PEi,}.l Figure 2.2 shows 
snapshots of a lOxlO mesh-connected array using PACED with Mi,}= 12, N i ,} = 4, and 
RISEIRUN = 113, where each snapshot shows the checking activity in the array during one com-
putation cycle. This checking pattern sets up waves of checking which advance upstream 
through the array, "catching," in effect, errors propagating downstream. o 
The parameters described in this chapter are but one possible way to define PACED. As 
noted above, there are a maximum of (~) different ways to perform N-out-of-M checking; this 
thesis only considers N consecutive checked cycles followed by M - N unchecked cycles. A 
variation of PACED could be designed for arrays running algorithms with inherent idle cycles, 
so that CED would only be performed during PE idle times. Although the arrival of the idle 
cycles may be periodic, instead of a strict N-out-of-M schedule they may follow a more compli-
cated pattern involving several different N and M values that change value in a periodic manner. 
1 Here and in the remainder of this thesis, the binary mod function is assumed to return a positive integer. To 
ensure this condition, multiples of the modulus can be added until the result is nonnegative while still less than the 
modulus. 
12 
j j j 
i 0 1 2 3 4 5 6 7 8 9 i 0 1 2 3 4 5 6 7 8 9 i 0 1 2 3 4 5 6 7 8 9 
0 x x x x - - - - - - 0 X X X - - - - - - - 0 x x - - - - - - - -
1 x - - - - - - - - x 1 - - - - - - - - x x 1 - - - - - - - x x x 
2 - - - - - - x x x x 2 - - - - - x x x x - 2 - - - - x x x x - -
3 
- - -
X X X X - - - 3 - - x x x x - - - - 3 - x x x x - - - - -
4 x x x x - - - - - - 4 x x x - - - - - - - 4 x x - - - - - - - -
5 - - - - - - - - - x 5 - - - - - - - - x x 5 - - - - - - - x x x 
6 - - - - - - x x x x 6 - - - - - x x x x - 6 - - - - x x x x - -
7 - - - x x x x - - - 7 - - x x x x - - - - 7 - x x x x - - - - -
8 x x x x - - - - - - 8 x x x - - - - - - - 8 x x - - - - - - - -
9 x - - - - - - - - x 9 - - - - - - - - x x 9 - - - - - - - x x x 
computation cycle c computation cycle c + 1 computation cycle c + 2 
Figure 2.2. PACED in a lOx 10 mesh-connected array. 
In another variation, M and N values could be dynamic. assuming different values accord-
ing to the particular application under execution, time of day, system workload fluctuations, or 
even the presence of detected errors. For example, normal PACED could prevail until an error 
is detected, to which the system might respond by increasing N or setting N = M for some prede-
termined length of time. If no other errors are detected in this interval. then normal PACED 
would be resumed. This scheme could give assurance that an error arrival process has become 
inactive. 
Finally, another variation might allow each PE to perform CED at its discretion. based on 
conditions such as individual workload or input data, thereby having no fixed values of M and N 
at all. This method of applying PACED could have potentially greater savings in performance 
costs than the type of PACED considered in this thesis. especially if CED can be scheduled to 
occur during idle cycles. If errors produced at PEs that are not checking can be propagated 
through the array, error coverage could still be very high. The use of these PACED variations. 
though not considered in this thesis, certainly merit further investigation. 
13 
CHAPTER 3. 
PACED IN A SINGLE PROCESSOR 
In this chapter, an analysis of PACED applied to a single processor will determine the con-
fidence to place on that processor's outputs upon error detection. Because a processor using 
PACED does not perform CED continuously and because it is possible that the CEO method 
employed does not have perfect detection (cannot detect all possible errors), there is a probabil-
ity that some outputs produced prior and subsequent to an error indication may be erroneous. In 
some applications, e.g., image edge detection and image smoothing, a small number of errors 
may be tolerable. In other applications, however, high confidence in array outputs may be 
desired. For these cases, when an error is detected, it is important to know what confidence to 
place on outputs: which outputs to trust, and with what probability, and which outputs to suspect 
as possibly incorrect. Following the confidence analysis, the error coverage that can be 
expected when using PACED in a single processor will be investigated. 
3.1. Error Arrival Model 
Faults are generally characterized as one of three types: transient, intermittent, or perma-
nent. Much work has been done in modeling the behavior of intermittent faults [24-27]. 
Because the primary interest of this study lies in the correctness of outputs, this thesis concen-
trates on errors; no assumptions are made concerning either the types or the distributions of 
faults that cause the errors. It has been shown that errors often arrive in clusters or 
14 
bursts [28,29]. perhaps caused by "incomplete fixes" in which repairs after an error detection 
insufficiently address the cause of the error, or error propagation, which can cause additional 
errors to appear after an initial detection. Thus. it is assumed that errors arrive in clusters (of 
one or more errors). that error clusters follow a Poisson arrival process with a constant mean 
arrival rate. and that the errors within clusters themselves follow a Poisson distribution. The fol-
lowing examples demonstrate that errors may arrive either clustered or singly. These examples 
confirm that the Poisson distribution serves as a good approximation to the error arrival process. 
EXAMPLE 3.1: A Poisson arrival process was fitted to actual error arrivals measured on one 
machine of a "VAXcluster" distributed system. The system was composed of seven machines 
and four mass storage controllers, interconnected by the Computer Interconnect (el) bus. The 
data were collected by the V AXNMS operating system during normal operation of the machine 
"Earth," from 8 December 1987 to 14 August 1988 [29]. 
The SAS procedure NLIN (nonlinear regression) [30] was used to fit a two-phase hyperex-
ponential function to the data for the machine "Earth" because a single exponential could not be 
found to fit the data well. The density of the fitted distribution Jet) is 
Jet) = O. 88(0. 829 e~·829t) + O. 12(0.012 e-O·Ol2t ) , 
where t is measured in minutes. Figure 3.1 shows ~ f (t) superimposed upon the histogram of 
the time-between-error (TBE) data, where the bin size ~ = 5 min. Note that the ordinate axis is 
shown on a log scale as the values quickly become very small. The sample mean and sample 
standard deviation for the data are also given in the figure. The fit was tested using the chi-
square test and could not be rejected at the 0.28 significance' level, with ,2 = 0.99997. The error 
Relative 
Frequency 
S~--------------------------------------~ 
2 
1 
O.OS 
0.01 
O.OOS 
0.001 
Mean = 8.8 
Std. Dev. = 37.S 
J(t) = (XI Al e-A.lt + ~~e-'-2t 
(XI = 0.88 At = 0.83 
C},z = 0.12 ~ = 0.012 
X2 p-value = 0.28 
O.~S~~~~~~~~~~~~~~~~~~~ 
30 60 90 120 ISO 180 
t (min) 
Figure 3.1. TBE histogram and fitted pdf for VAX "Earth." 
15 
arrival process is thus approximated by two homogeneous Poisson processes. Approximately 
88% of the errors arrive in clusters with interarrival time 1.21 min [1/(0.829 error/min)] while 
approximately 12% of the arrivals signal new clusters with interarrival time 80.S min [1/(0.012 
error/min)]. o 
Two-phase hyperexponential density distributions have also been used to model software 
error interarrivals on the VAXcluster taken as a whole and on a Tandem Cyclone 
multiprocessor [31]. In the following distributions, t is measured in days. 
JVAX(t) = 0.67(0. 20e-O·20t ) + 0.33(2. 75e-2.75t) 
1 Cyclone(t) = 0.87(0. lOe-O· lOt ) + O. 13(2. 78e-2.78t ) 
These distributions also can be interpreted as modeling errors that arrive in clusters. The VAX. 
system has an intracluster rate of 2.75 error/day with new clusters arriving at a rate of O.20/day; 
the Cyclone has an intracluster rate of 2.78 error/day and a cluster arrival rate of O.W/day. 
16 
EXAMPLE 3.2: Single event upsets (SEUs) in spacecraft electronics have been studied 
extensively to develop techniques to estimate the rate at which such errors might occur [32]. 
Table 3.1 summarizes some observed SEU rates from various spacecraft. The wide range in 
SEU rates can be attributed to both the dependence of the error rate on the orbital environment 
and the sensitivity of the circuitry to the ionizing particles [33]. Again using NUN, a single 
exponential was fit to the data from Pioneer, collected 10 January 1979 to 16 August 1990. The 
first row of Table 3.1 shows the mean arrival rate of the data. The density of the fitted distribu-
tion J(t) is 
J(t) = O. 03ge~·039t , 
where t is measured in days. Figure 3.2 shows the histogram of the data overlaid with ~ J(t), 
using a bin size ~ of 15.8 days. The figure shows the sample mean and sample standard de via-
tion for the data as well. Using the chi-square test. the fit could not be rejected at the 0.11 
TABLE 3.1. 
OBSERVED SEU RATES. 
observed SEU rate, 
satellite errorslbitlday 
Pioneer 3.3 x 10-2 [34] 
Hughes Leasat 1. 26 x 10-4 [32] 
Hughes Leasat 2.44 x 10-4 [32] 
Hughes Leasat 2.71 x 10-4 [32] 
Pioneer < 2 x 10-5 [35] 
Unspecified 4.1 x 10-6 [36] 
Unspecified 7.5 x 10-6 [37] 
Unspecified 2.07 x 10-7 [38] 
Voyager < 2.6 x 10-9 [35] 
17 
significance level with ,2 = 0.9958. This exponential distribution models the arrival of clusters 
of errors which are composed of only single errors and that have mean interarrival time of 25.6 
days [1/(0.039 error/day)]. o 
3.2. Confidence Analysis 
Suppose that a processor, perhaps a constituent of a processor array, is using PACED. 
When an error is detected, the current outputs of the processor should be suspected as being pos-
sibly erroneous. Use of PACED implies that checking may not have been performed continu-
ously; this casts some doubt on both the recent and future outputs of the processor. 
In the following sections, formulae are derived for determining how much output to sus-
pect as possibly erroneous when an error is detected at a processor employing PACED. Given 
the evanescent nature of transient and intermittent faults, it is assumed that the error arrival 
0.6 
Mean = 29.9 
0.4 Std. Dev. = 35.8 
Relative f(t) = O. 039 e-O·039t Frequency 
0.2 ~l p-value = 0.11 
-0 50 100 150 200 
t (days) 
Figure 3.2. TBE histogram and fitted pdf for Pioneer. 
18 
process dies after a certain time; however. while active, the process is assumed to behave as a 
Poisson process. Assuming an error arrival distribution like that discussed in Section 3.1 in 
which errors arrive in clusters, the intracluster arrival rate is used as the parameter of the Pois-
son distribution (0.829 error/min from Example 3.1). By ensuring that the time to perform M 
cycles is small compared to the intercluster arrival time (80.5 min in Example 3.1), it can be fur-
ther assumed that the detection of an error is independent of whether any other error is detected. 
With these assumptions. it is first shown that detected error arrivals also follow a Poisson 
process. Let Et represent the number of error arrivals in a time interval of length t. Since it is 
assumed that error arrivals follow a Poisson process. then their interarrival times are exponen-
tially distributed. Let Dr represent the number of detected error arrivals in a time interval of 
length t. If the detected error interarrival time is exponentially distributed, this implies that 
detected error arrivals also follow a Poisson process. This lemma introduces the variable q, the 
detection probability, which is the probability that when a particular CEO technique is applied 
(e.g .• RESO). it will detect an error if one exists. 
LEMMA 3.1: In a single processor using PACED with 1 S N S M and where the CEO 
technique has detection probability q S I, detected error arrivals are exponentially distributed. 
PROOF: 
00 ( ) ( N )k( N )n-k Pr{Dt = k} = ! Pr{Et = n} ~ q M 1 - q M 
( 
q ~ Jk ~ (At t -/..l n! ( N In 
= l-q; .=: -;;Je . k!(n-k)! l-q M 
19 
( N)k ')...q-t M _'),.qN, 
= k! e M 
This is a Poisson distribution, with modified error arrival rate ')..' = ')...qNIM. o 
When an error is detected at a processor running PACED, some of the previously produced 
unchecked outputs may be erroneous. Also, some of the previously produced checked outputs 
may be erroneous, if the CED technique employed does not have perfect detection (i.e., q < I). 
In addition, future outputs from the processor should also be suspected, since the detected error 
signals that a fault process is active and may be producing errors. 
The next two subsections determine, when an error is detected, the intervals of time during 
which outputs should be suspected as possibly erroneous, given the desired level of confidence 
to place on unsuspected outputs. For each detected error, two time intervals in which to suspect 
outputs are found: one interval prior, and one subsequent, to the detected error. The lengths of 
these intervals are determined using two different criteria. 1) Fault-Active Intervals: Suspect all 
output produced in time intervals in which the fault was probably active. 2) Undetected-Errors 
Intervals: Suspect all output produced in time intervals that start from the time of the current 
detected error and extend backward to include, with a desired probability, the first undetected 
error, and forward to include, with a desired probability, the last undetected error. The length of 
20 
the time intervals found using the first criterion is called K and is derived in the following sub-
section; that found using the second criterion is called L and is derived in Section 3.2.2. 
3.2.1. Fault-active intervals 
Let K be the length of a time interval such that the probability that a detected error arrives 
within K is greater than some desired value C, where C, the confidence, is set arbitrarily close to 
1. When an error is detected, if no other errors were detected within a time interval of length K 
prior to the detection, outputs produced earlier than K units of time before the time of the 
detected error can be trusted with confidence C (Le., are correct with probability C): had the 
fault that caused the detected error been active K units of time previously, another detected error 
should have been observed, with probability C. With no other detected errors observed, the 
fault was probably (with probability C) inactive and outputs produced before that time are cor-
rect (can be trusted) with probability C. All outputs produced within K time units of the time of 
the detected error should be suspected as possibly erroneous: the outputs may be used, but the 
user should be aware that some of this suspected output may be incorrect 
In addition, outputs produced in the time interval of length K after the time of the detected 
error should also be suspected. If no other errors are detected in a time interval of length K after 
the time of the detection, then outputs produced later than K units of time after the time of the 
detected error can be trusted (are correct) with confidence C. Figure 3.3 illustrates the time 
intervals of length K. Theorem 3.1 gives an expression for K in terms of C, the PACED parame-
ters, and the parameters of the detected-error arrival process. 
: Jr : Jr : 
r···································7················· ................... >1 
time 
<- - - - - - - ~- - - - - - - - - - - - - - - - - - - - - - - - - -:>to- -- - -- --:> 
trust output I suspect output suspect output I trust output 
Error 
Detected 
Figure 3.3. Outputs to suspect in fault-active intervals of length K 
21 
THEOREM 3.1: Let a processor use PACED where I ~ N ~ M and the CED technique has 
detection probability q ~ l. Upon error detection, outputs produced prior to a time interval of 
length Jr before the time of the detected error, or after an interval of length Jr subsequent to the 
time of the detected error, can be trusted with confidence C. The length Jr satisfies 
Ml 
Jr ~ - N 'A.q In( 1 - C) . 
Outputs produced within Jr time units before or after the time of the detected error should be 
suspected as possibly erroneous, since the fault was active with probability C in those intervals. 
PROOF: Let D represent the detected error interarrival time. Since detected errors follow a 
N N 
Poisson process with parameter 'A.qNIM, Pr{D > t} = e -).qJ;i and Pr{D ~ t} = 1 - e -f..q Mt. 
Let Jr be the length of a time interval such that Pr{D ~ Jr} ~ C. Then. 
Ml 
Jr ~ - - - In(1 - C) N 'A.q . o 
22 
This expression for K can be used to predict, at error detection time, how much of both the 
most recent outputs and the subsequent future outputs to suspect as possibly erroneous. Con-
versely, given the length of time between the initial detected error and any preceding (or subse-
quent) detection, the level of confidence to place on outputs produced before (or after) that time 
interval may be determined, using Theorem 3.l. 
For multiple detections, time intervals of length K are simply taken about each detection 
with no special significance attached to overlaps. Hence, if a second error detection occurs 
within K of a first, then the following outputs should be suspected: those produced within K 
before the first detection, those produced between the two detections and those produced within 
K after the second detection. 
ExAMPLE 3.3: Let q = 1, NIM = 0.5, and C = 0.99. Using A. = 0.829 error/min from Exam-
ple 3.1, if an error is detected, then by Theorem 3.1, outputs generated earlier than 11.1 min 
prior to the detected error, or later than 11.1 min after the detected error, can be trusted with a 
confidence of 0.99, provided no other errors were detected in those time intervals. All outputs 
produced less than 11.1 min before or less than 11.1 min after the detected error should be sus-
pected as possibly erroneous. o 
Figure 3.4(a) shows how the confidence is affected by the error arrival rate A. and the time 
interval length K, given a constant CED detection probability q = 1 and a constant amount of 
checking NIM = 0.5. The confidence varies from 0 to 1 on the z (vertical) axis; K varies on the 
x-axis (increasing to the right) and A. varies on the y-axis (increasing into the page). Figure 
3.4(b) shows a zoom of Figure 3.4(a), focusing on confidences greater than 0.95 (the grid on the 
N==5, M==lO, q::l 
lambda 
Figure 3.4(a). Fault-active intervals, C vs . .A.. 
(0$ C$1). 
23 
Conf 
N==5, M==lO, q:::l 
lambda 
Figo,.., 3.4(b). Fault'active intervals. C Vs. }. 
(C;:: 0.95). 
Cont 
25 
floor of the plot is not part of the function). As can be seen, K has to be larger if the error arrival 
rate A. is smaller, to achieve a given level of confidence. The assumption that errors arrive in 
clusters allows small values of K to reach high confidence levels: when A. ~ 0.5 error/min. confi-
dences greater than 0.95 can be achieved with K> 12 min when the checking ratio NIM is just 
0.5. 
Figure 3.5 shows how the confidence is affected by varying amounts of checking, given a 
constant error arrival rate A. = 0.829 error/min and a constant CED detection probability q = 1. 
As on the previous plot, confidence is shown on the z-axis; here, however, K increases into the 
page (though still on the x-axis) and N increases to the left on the y-axis. Given a time interval 
K of just 12 min, a checking ratio value NIM greater than 0.3 will suffice to give greater than 
0.95 confidence in outputs. Figure 3.5(b), a zoom of Figure 3.5(a) for C ~ 0.95. shows this 
clearly. This is an encouraging result as it allows designers utilizing PACED in a processor to 
use less than continuous checking (the goal of PACED. after all) and still achieve high confi-
dence in outputs produced near a detected error. 
Figure 3.6 shows how the confidence is affected by the CED detection probability q, given 
a constant error arrival rate A. = 0.829 error/min and a constant checking ratio NIM = 1. The 
axes are similar to Figure 3.5 except q replaces N on the y-axis. If K > 12 min is used. confi-
dences over 0.95 can be achieved for any q ~ 0.3 (Figure 3.6(b». This. too. is an encouraging 
result as it may be difficult in practice to estimate q accurately. This result shows that the pre-
cise value of q is not critical. for large enough K. 
The similarity of Figures 3.5(a) and (b) to Figures 3.6~a) and (b). respectively. is not coin-
N 
cidental. From Theorem 3.1, C :::; 1 - e -M{"M K• Holding q constant at 1 while varying NIM from 
Conf 
A.=Q.829 error/min. M=10, q=1 
K 30 
o 
Figure 3.5(a). Fault-active intervals, C vs. N 
(O~C::;; 1). 
26 
\ ...... ---
cont 
o 
Conf 
A=O.829 error/min, N=10, M=10 
o 
Figure 3.6(a). Fault-active intervals, C vs. q 
(0::;; Cs 1). 
28 
Conf 
A.:=O.829 error/min, N=lO, M=lO 
o 
Figure 3.6(b). Fault-active intervals, C vs. q 
(C~ 0.95). 
29 
30 
o to 1 (Figures 3.5(a), (b» is equivalent to holding NIM constant at 1 while varying q from 0 to 
1 (Figures 3.6(a), (b». Thus, Figures 3.5 and 3.6 show identical plots but were both included 
and shown from different viewpoints to simplify the exposition. 
When using the time intervals of length K to determine the confidence to place on outputs, 
it is assumed that if no other detected errors are found in the intervals, then with a certain proba-
bility the fault has become inactive. By using the times between the detected error and the first 
undetected error (looking backward) or the last undetected error (looking forward), time inter-
vals of length L < K can be obtained and fewer outputs need be suspected as possibly erroneous. 
The next subsection derives L using two different approaches: one to determine L looking back-
ward in time from a detected error and the second to determine L looking forward in time from a 
detected error. 
3.2.2. Undetected-errors intervals 
To begin, it is shown that undetected errors, like detected errors, arrive following a Poisson 
distribution. Let Et represent the number of error arrivals in a time interval of length t and V t 
represent the number of undetected error arrivals in a time interval of length t. The proof of the 
following lemma is substantially similar to that of Lemma 3.1. 
LEMMA 3.2: In a processor using PACED with 1 :::; N :::; M and where the CEO technique 
has detection probability q :::; 1, undetected error arrivals are Poisson distributed. 
PROOF: 
00 ( ) ( N Jk( N In- k Pr{Vt = k} = ~ Pr{Et = n} ~ 1 - q" M q M 
31 
= (1 -q ~ Jk e-~ - (At J (q ~ J 
N k!! (n-k)! q-
M 
({ N))k l-q- t 
= M e -A.(1-q ;), 
k! 
This is a Poisson distribution, with modified error arrival rate A" = A( 1 - qNIM). o 
Lemma 3.3 establishes that the detected and undetected error Poisson processes are inde-
pendent This result will be used in Theorem 3.2 to form a joint pdf. 
LEMMA 3.3: 
Pr{U, = k & Dt = l} = Pr{Ut = k} . Pr{Dt = l} 
PROOF: 
Pr{U, = k & D, = l} 
Pr{Dt = l} = 
= 
(k+I)( N )k( N JI Pr{E,=k+l} k l- q "M qM 
( N)I Aq-t M _A.q Nt 
l! e M 
(At)k+1 (k + I) (_ N )k( N JI 
(k + I)! k 1 q M q M 
(Aq MN t)1 N 
-A.q-t 
I! e M 
32 
( ( N) Jk A. l-q- t M _A.<l_qN)t 
= ..;.... ........ --~-'-- e M k! 
= Pr{Ut = k} 
Thus, 
Pr{Ut = k & Dt = I} = Pr{Ut = k} . Pr{Dt = I} . o 
In the previous section, Theorem 3.1 determined the length K of time intervals during 
which the fault was probably active. Outputs from intervals -of length K backward and forward 
from the time of an error detection were then suspected as possibly erroneous. The following 
two theorems use a less stringent criterion: looking backward from a detected error, only those 
outputs produced since the first undetected error need be suspected; or, looking forward, only 
those outputs produced up to the last undetected error need be suspected. These two intervals 
have length L: Theorem 3.2 determines L for the backward case and Theorem 3.3 determines L 
for the forward case. As will be shown, L S K. Figure 3.7 shows the relationship between the 
time intervals of lengths K and L, as well as which outputs to suspect and which to trust when 
using the L-Iength intervals. 
THEOREM 3.2: Let a processor use PACED where 1 S N S M and the CED technique has 
detection probability q S 1. Upon error detection, outputs produced prior to a time interval of 
length L before the detected error can be trusted with confidence C, where the time interval 
extends backward from the time of the detected error to reach the first undetected error with 
probability C. The length L satisfies 
33 
K K 
. . ;c .................................... ~ .................................... ~ 
l 1 1 
. L. L . 
.. 
I ~ ...................... ~ ..................... ~ I time 
~~ ~~~~t- - t- -s~p~~t--r- -s~p~~t- -1--~; ~uq;~ ~ 
output output 
First Error Last 
Undetected Detected Undetected 
Error Error 
Figure 3.7. Outputs to suspect in undetected-errors intervals of length L. 
M 1 (I-C J L~-NAqln N· 
l-q-
M 
Outputs produced within length of time L before the detected error should be suspected as possi-
blyerroneous. 
PROOF: Let D and U represent the detected and undetected error interarrival times, respec-
tively. From Lemmas 3.1 and 3.2, both random variables are exponentially distributed with 
parameters A' = ')...qNIM and A" = A.( 1 - qNIM), respectively. 
The quantity D - U represents the time between the first undetected error and the first 
detected error. The probability Pr{ D - U > t} is now determined using a joint probability distri-
bution. 
00 00 
Pr{D - U > t} = f f ')..," e-i.."x . ')..,' e-i..'y dydx 
o x+t 
34 
00 
= J A" e -"A." x • e -"A.'(x+t) dx 
o 
o 
A" 
= -"A.'t 
A" + A' e 
A" , 
It follows that Pr{U -0 S t} = 1 - A" +A' e-"A.t. Let L be the length of a time interval 
such that Pr{O - US L} ~ C. Then, 
'\" 
II. -"A.'L 
1 - A" + A' e ~ C 
Hence, with confidence C, the first undetected error occurred within a time interval of 
length L before the time of the detected error. Outputs produced prior to L time units before the 
time of the detected error can be trusted with confidence C and outputs produced within L units 
of time prior to the time of the detected error should be suspected as possibly erroneous. 0 
THEOREM 3.3: Let a processor use PACED where 1 S N S M and the CEO technique has 
detection probability q S 1. Upon error detection, outputs produced subsequent to a time inter-
val of length L after the detected error can be trusted with confidence C. where the time interval 
extends forward from the time of the detected error to reach the last undetected error with prob-
ability C. The length L satisfies 
35 
M I (I-C) L~-Nr-In N' 
q I-q-
M 
Outputs produced within length of time L after the detected error should be suspected as possi-
bly erroneous. 
PROOF: Let U represent the time to the last undetected error before the next detected error, 
and V, the time to the next detected error. From Lemmas 3.1 and 3.2, detected and undetected 
errors are exponentially distributed with parameters A' = ')...qNIM and A" = A( I - qNIM), respec-
tively. 
First, the probability is determined that the last undetected error occurs in some infinitesi-
mal time slice du at time u while the next detected error occurs in some infinitesimal time slice 
dv at time v, where v ~ u. (If it were known that v < u, i.e., no undetected errors occur before 
the next detected error, then none of the outputs produced between the two error detections 
would have to be suspected.) 
The expression below has a term for each of the following conditions: 1) no errors are 
detected in a time interval of length u starting from the time of the current error detection; 2) at 
least one error is undetected in an interval of length du; 3) no errors occur in an interval v - u; 
and 4) at least one error is detected in an interval dv. (The variable U should be defined as the 
time of the last undetected error before the fault becomes inactive, but since the distribution of 
fault lifetimes is unknown, U is predicated instead on the next error detection. In the derivation, 
then, the next detection is allowed to take place at any time slice dv from u to infinity, in effect 
allowing the fault to become inactive.) 
36 
The terms 1 - e-A"du and 1 - e-A'dv have been simplified using the approximation 1 - e-x :; 
x + o(x) as x ~ O. 
Now, the probability that U is greater than some L is determined, using the joint probability 
just derived. 
to 1. 
... ... 
Pr{U ~ L} = J J ')..,?,"e-A'UeAMe-A"dvdu 
L U 
= (l - q N )e-A'L 
M 
L is determined such that Pr{U~} S 1 - C, where C, the confidence, is set arbitrarily close 
Pr{U~L} S l-C 
o 
Note that Theorems 3.2 and 3.3 arrive at the same expt:ession for L. It is attractive that the 
same time interval L is used both looking backward and forward from an error detection, as in 
37 
the case using K. It is also reasonable that the interval from the first undetected error to the first 
detected error should be the same length as that from the last detected error (the current detec-
tion can be considered the "last") to the last undetected error, given that the error distributions 
used in each case are the same. Also, it can be seen that the expression for L leads to smaller 
values than that for K: they differ only in the natural logarithm term. Both the numerator and 
denominator of this term in the expression for L are less than one. Hence, the quantity I - C is 
increased closer to 1, reducing the absolute value of the natural logarithm of the fraction and 
making L smaller than K for equal values of the other parameters. 
EXAMPLE 3.4: If a single processor uses PACED with the same parameter values as in 
Example 3.3, viz., q = 1, A. = 0.829 error/min, NIM = 0.5, and C = 0.99, then when an error is 
detected, using Theorem 3.2, the outputs generated prior to 9.4 min before, or subsequent to 9.4 
min after, the detected error can be trusted with a confidence of 0.99, as long as no other errors 
are detected in those time intervals. All outputs produced less than 9.4 min before the detected 
error or less than 9.4 min after should be suspected as possibly erroneous. o 
In Example 3.3, outputs produced 11.1 min before and after the detected error had to be 
suspected, since Theorem 3.1 suspects all outputs generated while the fault was probably active. 
By suspecting only those outputs generated since the first or before the last undetected error, a 
time savings of about 15% can be realized in this case. Figure 3.8 plots the time savings 
(K - L)/ K of using L instead of K as a function of NIM, when q = 1, A. = 0.829 error/min. and C 
=0.99. 
Time 
Savings, 
Lvs. K 
(%) 
l00~-------------iI 
90 
80 
70 
60 
50 
40 
30 
20 
10 
A. = 0.829 
q=l 
C=0.99 
O~~~~----~--~~--~----~ 
o 0.2 0.4 0.6 0.8 1 
NIM 
Figure 3.8. Time savings using L instead of K. 
38 
Figures 3.9-3.11 show the effect of L on the confidence. These graphs use the same axes 
and scales as Figures 3.4-3.6, respectively, and can be compared therewith directly. (As with 
Figures 3.5 and 3.6, Figures 3.10 and 3.11 show identical plots from different points of view.) 
For each figure a zoom plot shows confidences over 0.95. 
N 
By Theorems 3.2 ud 3.3, C S 1 - (1 - qNI M)e -"J,.q'M L• As L --+ 0, the confidence C 
becomes bounded above by qNIM. In Figure 3.9(a), qNIM = 0.5, so the plot never falls below 
0.5; in Figures 3.1O(a) and 3.11(a), qNIM varies from 0 to 1 and bounds C when L is close to O. 
All the figures show that for a given set of parameter values, a desired confidence level can 
be achieved with a value of L smaller than the necessary value of K. Figure 3.10 shows that the 
confidence, as in the case for K, is relatively insensitive to the checking ratio NIM, given L > 10 
min; likewise, Figure 3.11 shows that for large enough L (> 10 min) the confidence is relatively 
unaffected by q. These results again indicate that high confidence can be achieved without the 
N=5, M=IO, q=1 
lambda 
Figure 3.9(a). Undetected-errors intervals, C vs. A. 
(0 ~ C ~ I). 
39 
Conf 
N=5, M=10, q=1 
Figure 3.9(1)). Undetected.errors intervals. C vs. " 
(C~ 0.95). 
40 
Conf 
Conf 
A, = 0.829 error/min, M=lO, q=l 
L 
30 
o 
Figure 3.10(a). Undetected-errors intervals, C vs. N 
(0:::; C:::; 1). 
41 
Conf 
)" = 0.829 error/min, M=10, q=1 
L 30 
o 
Figure 3.10(b). Undetected-errors intervals, C vs. N 
(C ~ 0.95). 
42 
Conf 
A. = 0.829 error/min, N=lo, M=lO 
o 
Figure 3.11(a). Undetected-errors intervals, C vs. q 
(OSCSl). 
43 
Conf 
A. = 0.829 error/min, N=lO, M=10 
o 
Figure 3.11(b). Undetected-errors intervals, C vs. q 
(C?:! 0.95). 
44 
4S 
need for either precise values of q or high checking ratios. Yet marked improvement on the 
amounts of output to suspect upon error detection will be made in the following two chapters, in 
which PACED is applied to processor array architectures. It will become evident that through 
cooperation among the constituent PEs, the amounts of output to suspect can be significantly 
reduced. 
3.3. Error Coverage 
The error coverage of the PACED technique is the probability that if an error occurs then it 
will be detected. In a single processor using PACED, the error coverage can be estimated as 
qNIM: the processor perfonns checks NIM of the time, and each check has detection probability 
q. Even with perfect detection (q = 1), it is clear that low values of the checking ratio NIM 
would have low error coverage. 
Consider, however, the undetected-errors intervals of length L calculated in the previous 
section. When an error is detected, the backward interval will, in effect, "detect" all the unde-
tected errors in that interval, by casting them under suspicion; the forward interval would simi-
larly "detect" any future undetected errors. Hence, an error can only escape "detection" if no 
other errors are actually detected in the time intervals of length L before and after it. 
The following theorem detennines an expression for the estimated error coverage of a sin-
gle processor using PACED. It first finds the probability that an error goes undetected; this 
probability depends on the average number of errors in an interval of length L. As an approxi-
mation, there will be an average of UJl errors in an L-Iength interval, where Il is the mean inter-
arrival time and L is given by Theorem 3.2 or 3.3. If error arrivals are modeled by a two-phase 
46 
hyperexponential distribution (as in Example 3.1) of the fonn Jet) = a A.i e-A.[t + 
(1- a) ~ e-~t, the mean interarrival time ~ can be found as follows . 
.. 
~ = J t J (t) dt 
o 
00 
= J t (oA l e-A.lt + (1 - a)~e-~t) dt 
o 
a (I-a) 
= 1..1 + ~ 
If error arrivals are modeled by a simple exponential distribution of the fonn ) (t) = A. e-A.t. the 
mean interarrival time ~ = III... 
THEOREM 3.4: Let a single processor use PACED where 1 ::; N::; M and the CEO tech-
nique has detection probability q ::; 1. The estimated error coverage of the processor is given by 
N 2L --+1 l-(1-q-)JJ. 
M 
PROOF: The probability that an error goes undetected is the probability that the error itself 
is not detected, and that no other errors are detected in the time intervals of length L before and 
after the error. The following expression for the probability of an undetected error has tenns for 
each of the following conditions: 1) no errors are detected in a time interval of length L prior to 
an undetected error, 2) the error is itself undetected; and 3) no errors are detected in an interval 
of length L after an undetected error. 
Pr{ error undetected} 
N 2L -+1 
= (l-q-)J.L 
M 
47 
If an error is detected in either of the two L-Iength intervals, then when the two L-Iength 
intervals are taken around the error detection according to Theorems 3.2 and 3.3, the error in 
question will be "detected" in the sense that the outputs it could have corrupted will be sus-
pected as possibly erroneous. Since Pr{ error detected} = 1 - Pr{ error undetected}, then 
estimated error coverage o 
EXAMPLE 3.5: For a single processor using PACED, let the CED technique have perfect 
detection (q = 1), NIM = 0.1, C = 0.99, and the error interarrival time be modeled by the two-
phase hyperexponential distribution given in Example 3.1, viz., J (t) = 0.88(0.829 e-O.829t) + 
0.12(0.012 e-O·OI2t ). This gives Il = 11.1 min. From Theorems 3.2 and 3.3, L = 86.9 min, so the 
expected number of error arrivals in time L is Ull = 7.8. The estimated error coverage is then 
1 - (1 - 0.5)16.6 = 0.99993. Hence, with only 10% checking, an estimated error coverage 
greater than 99% can be achieved. 
Figure 3.12 plots the estimated error coverage for the above error arrival distribution as a 
function of NIM when q = 1, M = 10, and C = 0.99. It can be seen that the coverage is very 
high: over 99.99% for all values of NIM ~ 0.1. This makes sense for values of NIM close to 1. 
since an error is more likely to be detected when more checking is performed. When NIM is 
small, high coverage is still obtained because the length L of the undetected-errors interval is 
very long. Thus, many error arrivals would be expected to occur in a time interval of length 2L. 
For any given error, then, it would be quite likely that at least one error in an interval of length 
48 
2L would be detected, leading to the "detection" of the error by casting suspicion on outputs 
produced at the time of the error. 
Estimated 
Error 
Coverage 
(%) 
100.000 
99.999 
99.998 
99.997 
99.996 
99.995 
99.994 
99.993 
99.992 
99.991 
99 .990 -i--"""-r--r--"""-""--r--"""-""--r-~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
NIM(M= 10) 
Figure 3.12. Single processor estimated error coverage. 
q = 1, J.l = 11.1 min/err. 
o 
49 
CHAPTER 4. 
PACED IN A LINEAR ARRAY 
This chapter considers a unidirectional linear processor array composed of V linearly con-
nected PEs (Figure 4.1). Inputs enter at the top and left; outputs are produced at the bottom and 
right Data Bow only from left to right and from top to bottom. Such arrays have been used to 
implement algorithms such as FFT processing [39], matrix computations [40], and image edge 
detection [41]. For two PEs in the array PEj and PEi , if i < j, then P~ is upstream of PEi and 
PEi is downstream from PE j • 
When PACED is used in this array, checking patterns can be designed so that PEs check 
the unchecked computations of upstream PEs. Each PE j in the array may have its own separate 
values of M and N: M j and N j • The offset parameter OJ, introduced in Chapter 2, determines 
the pattern of checking that appears in the array. It is implemented as an offset into a PEi ' s 
CSM,N array and governs at what point in its Mj-cycle checking sequence to begin. With 
o 1 2 . . . V -1 
Figure 4.1. A V-PE unidirectional linear processor array. 
50 
PACED applied to the linear array, this chapter will study the confidence to place on array out-
puts upon error detection, the error coverage, and the perfonnance of the array. 
Having investigated the confidence in a single processor's outputs under PACED, the con-
fidence in the outputs from a unidirectional linear processor array using PACED upon error 
detection will be examined first. The confidence analysis is based on three assumptions. 1) All 
communication channels in the array are fault-free. 2) If an erreneous array output is produced 
by a PE, an erroneous propagating output will also be produced and sent downstream (e.g., by 
using the AN-code [42]: see Section 4.5.1.) 3) PEs are code-disjoint: use of erroneous inputs or 
state values causes erroneous PE outputs to propagate. 
Assumption 2) ensures that no erroneous array outputs can be produced without the possi-
bility that a downstream PE will detect a propagated error. To ensure that errors are propagated, 
each PE may produce an additional propagating check output, generated from all of its array 
outputs using some code-preserving operation (e.g., the sum of all its array outputs if the AN-
code is used). This additional check output can be piggy-backed onto an existing data message 
-to avoid increasing the message traffic. Any downstream PE that is checking will check this 
output as well, and then clear it; any downstream PE that is not checking will simply include the 
output when calculating its own check output to send further downstream. Since errors are of 
interest only if they affect PE outputs, Assumption 3) ensures that detecting errors at PE outputs 
will catch input and state errors as well. 
Two time intervals are detennined in which to suspect linear array outputs upon error 
detection. The analysis begins with a discussion of the error detection latency in the array and 
the error propagation distance. This distance is used in the detennination of the backward time 
51 
interval. After the forward time interval is determined, the error coverage in the linear array is 
examined, followed by a study of the performance of the linear array using both simulations and 
experiments. 
4.1. Error Detection Latency 
If an error occurs, of interest is the error detection latency in the array. The error detection 
latency, L, is the number of computation cycles through which an output is propagated until it is 
detected. The maximal value of L is denoted by Lmax. 
Lemma 4.1 determines the detection latency for errors created at any given PEj for each of 
its M j - N j unchecked cycles in one Mj-cycle period. Here and in the remainder of this chapter, 
the CED scheme in all PEs is assumed to have perfect detection (i.e., q = 1) and the checking 
pattern is assumed to be set by OJ = (N j • i) mod M j • This choice of checking pattern has been 
shown to minimize Lmax in linear arrays [19]. 
LEMMA 4.1: Given a V-PE unidirectional linear processor array using PACED with perfect 
detection (q = 1), let M j = M, Ni = N, and 1 S N SM. Using OJ = (Ni) mod M, the detection 
latency of an error created in the unchecked cycle rat PEj , Lro is reM - r)/Nl, where N ~ r ~ 
M - 1 and i < V - Lmax. The maximum error detection latency in the array, Lmax. is 
reM - N)/ Nl ' for all PE j such that i < V - Lmax. 
PROOF: By design of the checking pattern, if CSM,N[r] is the checking activity at PE j in 
some computation cycle c, then CSM,N[(r + yeN -1) + z) mod M] is the checking activity at 
PE j+y in cycle c + z. With perfect detection, errors only propagate through unchecked cycles, so 
the proof only considers N S r S M - 1. 
52 
If an error occurs at PE j during its ~ cycle, it will go undetected: this cycle is unchecked 
(CSM,N[N] = 0). In the next cycle, the error will propagate to PE j+1 and be detected if 
CSM,N[(2N) mod M] = 1 (Le., if PE j+1 is checking). If CSM,N [(2N) mod M] = 0, then the error 
will propagate to PEi+2 in the next cycle, where it will be detected if CSM,N[(3N) mod M] = 1, 
and so on. 
The latency of detection of this error, L N , is the number of computation cycles required for 
the error to reach a checked cycle. In terms of the checking sequence, LN is the smallest integer 
number of N-bit hops needed to reach s such that CSM,N[S] = 1 (Le., 0 ::;; S ::;; N - 1) from N, 
where CSM,N[N] = O. This is a distance of M - N bits. 
LN·N ~ M-N 
LN = f(M - N)lNl 
Similarly, LN+1, the latency of an error created during the N + 1 st cycle (an unchecked 
cycle, since CSM,N[N + 1] = 0), is r(M - N -l)INl. In general, an error created during cycle r 
(an unchecked cycle: CSM,N[r] = 0) will have latency Lr = r(M - N - (r - N)INl = 
reM - r)INl, N ::;; r::;; M -1. Clearly, LN ~ LN+1 ~ ••• ~ L M- 1• Therefore, the maximum error 
detection latency, Lmu, is LN: Lmu = LN = f(M - N)I Nl. 
This analysis applies to all PEs in the array except the end elements, PE; where i ;::: 
V - Lmax. At these PE j , an error may propagate undetected out of the array since for these PE; 
there are fewer than Lmax PEs downstream. o 
EXAMPLE 4.1: Figure 4.2 shows the checking pattern.in a 7-PE unidirectional array as it 
begins work on a problem, with M; = 5, Ni = 2, and OJ = (2i) mod 5; CSS,2 = (1, 1,0, 0, 0). 
53 
Computation cycles are shown on the vertical axis; each row shows the checking activity in the 
array during a cycle. Notice that the checking pattern sets up waves of checked cycles that 
advance upstream over time to catch propagating errors. 
In the figure, ~ for an error created at P~ in cycle 10 (marked by *) is r(S - 2)/2l = 2: the 
error would be detected two cycles later, by PE4 in cycle 12 (labeled ~). For an error created at 
P~ in cycle 11 (marked by 0), ~ = reS - 3)/21 = 1: the error would be detected by P~ in cycle 
12 (labeled ~). For an error created at P~ in cycle 12 (*), L4 = reS - 4)/2l = 1, since the error 
would be detected by P~ in cycle 13 (labeled L4). Finally, Lmax = ~ = 2. o 
computation PE 
cycle 0 1 2 3 4 S 6 
0 x I I I I I I 
1 x I I I I I 
2 I I I I 
3 x x I I I 
4 x x I I 
S x x x I 
6 x x x 
7 x x 
8 x x 
9 x x x 
10 x x 
* 
x x 
11 x 0 x x 
12 
* 
L3 ~ 
13 X L4 
14 x x x 
15 x x x x 
1= PE idle - = PE doing task x = PE doing checked task 
Figure 4.2. Checking pattern in a 7-PE array. 
S4 
To prevent errors from propagating out of the linear array and escaping detection, a modifi-
cation to PACED can be applied in which the last PE in the array, PEv-I> performs 100% check-
ing. This variation of PACED, PACED', can be implemented by duplicating PEV- 1 in hardware; 
this can prevent PEV- 1 from becoming a performance bottleneck. Only the normal PACED per-
formance costs would then be incurred. A less hardware-expensive implementation of PACED' 
might monitor the outputs of PEV- 1 using a hardware code checker. 
4.2. Error Propagation Distance 
The following lemma gives an expression for the maximum number of unchecked cycles 
through which a detected error could have propagated. This result will be used in Theorem 4.1 
to determine the amount of previously produced output to suspect as possibly erroneous from 
each PE in the linear array, upon error detection. 
LEMMA 4.2: Given a V-PE unidirectional linear processor array using PACED with perfect 
detection (q = 1), let M; = M, N; = N, and 1 ~ N ~ M. Using 0; = (NI) mod M, an error detected 
by CSM,N[r] at PE;, 0 ~ r ~ N -1, propagated through at most Dr unchecked cycles, where Dr = 
min(i, reM + r + 1)1 Nl- 2). 
PROOF: Let CSM,N[O] at PE; detect an error in computation cycle c. The checking activity 
at PE;-l during cycle c -1 is CSM,N[(-N) mod M]. The maximum number of unchecked cycles 
through which the detected error may have propagated, Do, is the number of computation cycles 
required to reach a checked cycle, minus 1, counting backwards in time. In terms of the check-
ing sequence, Do + 1 is the smallest integer number of N-bit hops needed to reach CSM,N[r], 0 ~ 
r ~ N - 1, from CSM,N[O]. This is a distance of M - N + 1 bits. 
(Do + I)N ~ M - N + 1 
Do =f(M - N + 1)IN1-l 
=f(M + 1)INl-2 
55 
Similarly, Dl = reM + 2)1 Nl- 2. In general, Dr = reM + r + 1)1 Nl- 2, 0 S r ::;; N - 1. For 
PEs near the beginning of the array, there may be fewer than Dr PEs through which the error 
propagated. Hence, at PEj, Dr = mine;, r (M + r + 1)1 Nl- 2), for 0 ::;; r ::;; N - 1. 0 
EXAMPLE 4.2: Using the array of Example 4.1 (Figure 4.2), Do for an error detected at PE3 
at computation cycle 12 is r (5 + 0 + 1 )/21- 2 = 1, because PEl checked computation cycle 10. 
For an error detected at P~ at computation cycle 13, Dl = r(5 + 1 + 1)/21- 2 = 2, because PEo 
checked computation cycle 10. o 
4.3. Suspected Outputs 
Upon error detection, outputs produced both in the recent past and the near future should 
be suspected as possibly erroneous. The following theorem determines which of the previously 
produced outputs to suspect when an error is detected by a PE in the linear array; Theorem 4.2 
considers which of the future outputs to suspect 
THEOREM 4.1: Given a V-PE unidirectional linear array using PACED with perfect detec-
tion (q = 1), let M j = M, N j = N, 1 ::;; N::;; M, and OJ = (Ni) mod M. If PE j detects an error at its 
rth checked cycle in computation cycle c, 0 ::;; r ::;; N - I, then the output from PE j in c should be 
suspected as possibly erroneous. In addition, the outputs produced by PEi - k in cycle c - k, for 1 
::; k ::;; Dr' should be suspected. All other unsuspected, previously produced outputs can be 
56 
trusted with a confidence of 1, unless a later error detection makes it necessary to suspect them. 
PROOF: By Lemma 4.2, the detected error propagated through at most Dr unchecked 
cycles to reach PE j • Thus, the error was created at some PEj - k in a cycle c - (k + y), where 1 :::; 
k:::; Dr and y = 1, 2, 3, .... 
Figure 4.3 shows the checking activity in a lO-PE array in the midst of a problem, with 
M = 00 and N = 2. The X marks an error detection at PEs in cycle c and the *s mark the Dr 
cycles through which an error may have propagated to reach PEs. 
Suppose that the error had occurred at PE4 in cycle c - 2, c - 3, or c - 4. The error would 
have been detected by PE6 in cycle c, c - 1, or c - 1, respectively. Suppose the error had 
occurred at P~ in cycle c - 3, c - 4, or c - 5. This error would have been detected by P~ in 
cycle c or c - 1, or by P~ in cycle c - 1, respectively. 
computation PE 
cycle 0 1 2 3 4 5 6 7 8 9 
c -12 
c -11 
c-lO 
c-9 
c-8 
c-7 
c-6 
c-5 * 
c-4 * x 
c-3 * x x 
c-2 * x x 
c-l * x x 
c X x 
Figure 4.3. Error propagation in a lO-PE array. 
57 
In general, any error created at PEt- k before cycle c - k would either have been detected by 
cycle c (and the appropriate outputs. already suspected), or gone undetected (if the error propa-
gated out of the array). This is a result of the checking pattern, in which each PEj perfonns its 
last checked cycle (CSM,N[N -1]) during the same computation cycle that PEi - 1 perfonns its 
first checked cycle (CSM,N[O]). Hence, only the outputs from PEj-k in cycles c - k need be sus-
pected, 1 S; k S; Dr> as well as that from PEj in c. All other unsuspected, previously produced 
outputs can be trusted with a confidence of 1, unless a later error detection makes it necessary to 
suspect them. o 
EXAMPLE 4.3: Figure 4.4 shows the checking pattern in a lO-PE unidirectional linear array 
in the midst of a problem, with M j = 13, N j = 3, and OJ = (3i) mod 13. Let PEs detect an error 
in cycle c (X in the figure) by check CS13,3[2]. The output from PEs in cycle c should be sus-
pected. Also, since D2 = f(13 + 2 + 1)/3l- 2 = 4 (Lemma 4.2), the outputs of P~, PE6 , PEs, 
and PE4 in cycles c - I, c - 2, c - 3, and c - 4, respectively, (marked by *) should be suspected 
as possibly erroneous, by Theorem 4.1. All other outputs generated up through cycle c can be 
trusted with a confidence of I, unless a later error detection makes it necessary to suspect them.o 
Section 4.1 mentioned a modification of PACED , PACED', which eliminates the possibil-
ity that errors escape undetected from the linear array. Besides this boon, PACED' also has the 
advantage that, upon error detection, only outputs produced just prior to the detection need be 
suspected. Since all errors are eventually detected, there is no need to suspect outputs produced 
after an error detection unless a later error detection warrants it However, future outputs have 
58 
computation PE 
cycle 0 1 2 3 4 5 6 7 8 9 
c-12 x x x 
c-ll x x x 
c-10 x x 
c-9 x x 
c-8 x 
c-7 x x 
c-6 x 
c-5 x x 
c-4 x * x 
c-3 x x * x 
c-2 x * x x 
c-1 x x * x 
c x x X 
Figure 4.4. Suspected previously produced outputs, 10-PE array. 
to be suspected if normal PACED is used in the linear array and an error is detected at one of the 
end elements PEj , where i ~ V - Lmax. Theorem 4.2 determines which future outputs to suspect 
if an error is detected at one of these PEs. 
THEOREM 4.2: Given a V-PE unidirectional linear array using PACED with perfect detec-
tion (q = 1), let M j = M, N; = N, 1 S N S M, and OJ = (Ni) mod M. If PEv-Lm.x+i detects an 
error at its rib checked cycle in computation cycle c, where 0 S r S N - 1 and 0 SiS Lmax - 1, 
then the following outputs should be suspected as possibly erroneous. 
a) If (r + (Lmax - 1 - i)N + k) mod M ~ N, then the outputs from PEv-Lm.x+i+j in cycle 
c + j + k should be suspected, where 0 S j S Lmax - 1 - i; if r < N - 1, then k = 0, otherwise ° S k 
S M - N (r = N - 1). 
S9 
b) All output from PEV- 1 in cycles c + Lmax -1 - i until its next checked cycle should be 
suspected. 
All other unsuspected, future outputs can be trusted with a confidence of 1, unless a future 
error detection makes it necessary to suspect them. 
PROOF: By use of OJ = (Ni) mod M in the linear array, when PEj in cycle c performs its rth 
checked task (CSM,N[r] = 1), then PEj+y in cycle c + z will perform CSM,N[(r + y(N -1) + z) 
modM]. 
Now, let PEv-L"..,,+i detect an error in cycle c by CSM,N[r], for 0 ~ i ~ Lmax - 1. These 
PEv-La.x+i are those PEs that could create errors that propagate undetected out of the array. The 
detected error will propagate to PEV-1 in cycle c + L.nax - 1 - i. In that cycle. if PEV- 1 is not 
checking (i.e., (r + (L~ - 1 - i)N) mod M ~ N), then this error will propagate out of the array 
and outputs from all PEs and cycles through which the error propagated should be suspected as 
possibly erroneous. That is, if (r + (Lmax -1- ON) mod M ~ N. then the output from 
PEv-L..-+j+} in cycle c + j should be suspected, 0 ~ j ~ Lmax - 1 - i. If PEV-L,.,.,.+i will check at 
the next cycle c + 1, then this gives part a) when r < N - 1 (k = 0). 
If r = N - 1 (PEv-L,.,.,.+i won't check in cycle c + 1), then as in the above case when r < 
N - 1, if (r + (Lmax - 1 - i)N + k) mod M ~ N, then the output from PEV-L",.,.+i+} in cycle 
c + j + k should be suspected, where 0 ~ j ~ Lmax - 1 - i and k = O. In addition. for each of the 
next M - N unchecked cycles, errors may propagate out of the array. This is likely since an 
error has already been detected at PEV-L,.,.,.+i and the fault may still be active while that PE is not 
checking. The additional outputs to suspect depend upon whether PEV- 1 is not checking when 
60 
the errors arrive there. That is, for each cycle c + k, 1 ~ k ~ M - N, if (r + (Lmax - 1 - i)N + k) 
mod M ~ N, then the output from PEY-L",.,,+i+j in cycle c + j + k should be suspected, for 0 ~ j S 
Lmax - 1 - i. This completes part a) when r = N - 1. 
Once an error propagates to PEY_1 while it is not checking, all of its outputs until its next 
checked cycle should be suspected as possibly erroneous since its outputs are not checked by 
any other PE. Hence, all of the outputs from PEY_1 in cycles c + Lmax - 1 - i (the earliest that 
the error. first detected at PEY-L",." .... in cycle c, could corrupt PEV- 1) until its next checked cycle 
should be suspected as possibly erroneous. This gives part b) in the statement of the theorem. 
All other unsuspected, future outputs from the array can be trusted with a confidence of 1, 
unless a future error detection makes it necessary to suspect them. o 
EXAMPLE 4.4: Figure 4.5 shows the lO-PE linear array of Example 4.3, in which M j = 13. 
N j = 3. and OJ = (3i) mod 13; by Lemma 4.1, Lmax = 4. PE6 has detected an error at check r = 2 
in cycle c (marked X in the figure). Using Theorem 4.2, the future outputs to suspect will be 
determined. Since P~ will not check cycle c + 3 (since (2 + (4 - 1 - 0)3) mod 13 ~ 3). then the 
outputs from the following PEs should be suspected: P~ in cycle c + 1. PEs in cycle c + 2, and 
P~ in cycle c + 3. 
As PE6 detected the error at its Nih check (r = N - 1), its next M - N cycles may also cre-
ate undetected errors. But PE9 begins checking in cycle c + 5, so only the outputs from the fol-
lowing PEs need be suspected: P~ in cycle c + 1, P~ in cycle c + 2. PEg in cycle c + 3, and 
P~ in cycle c + 4. In addition. the outputs from P~ in cycles c + 3 to c + 4 should be suspected 
(both already are), since P~ doesn't begin checking again until cycle c + 5. The outputs to 
61 
suspect are marked * in the figure, plus the site of the detection (X). All other unsuspected, 
future outputs can be trusted with a confidence of 1, unless a later error detection makes it nec-
essary to suspect them. o 
The detection of an error by one of aPE's N checks leads to two static patterns of outputs 
to suspect as possibly erroneous: one for the previously produced outputs and one for the future 
outputs. For example, Figure 4.4 shows the pattern of previous outputs to suspect if Mi = 13, 
Ni = 3, OJ = (30 mod 13, and CS13•3[2] detects an error. Figure 4.5 shows the pattern of future 
outputs to suspect for the same parameter values when the error is detected at PEV- Lmax . For 
computation PE 
cycle 0 1 2 3 4 5 6 7 8 9 
c-6 x x x 
c-5 x x x 
c-4 x x x 
c-3 x x x 
c-2 x x x 
c-l x x 
c x X 
c+l x 
* * 
c+2 x x 
* * 
c+3 x 
* * 
c+4 x x 
* 
c+5 x x 
c+6 x x x 
c+7 x x x 
c+8 x x x 
c+9 x x x 
c+ 10 x x X 
c+ 11 x x x 
Figure 4.5. Suspected future outputs, 10-PE array. 
62 
these parameter values, there would be a total of 3(Lmax + 1) possible patterns of outputs to sus-
pect: three patterns for previous outputs (one for each check CS 13•3[r], 0 ~ r ~ 2) and 3Lmax pat-
terns for future outputs (one for each check CS 13•3[r], 0 ~ r S 2 at each PEv-L..-+i' 0 SiS 
Lma,x - 1). These patterns can be computed once for the array and stored, indexed by r and i. 
Upon error detection, given the PE and which of its checks detected the error, the outputs to sus-
pect can be detennined with no extra computation by simply using the template stored for that 
check. 
The amount of output to suspect upon error detection in the linear array is much less than 
that necessary upon error detection in a single processor using PACED. Example 3.4 showed 
that using the undetected-errors intervals in the single processor, 18.8 min worth of outputs (9.4 
min both prior and subsequent to an error detection) should be suspected. In the linear array, the 
outputs from perhaps only a few tens of computation cycles need be suspected; with cycles 
times in the range of 15 ~ to 20 ms in VSLI array implementations [41], this means on the 
order of just one second's output need be suspected. By using the ability of PEs to check other 
PE outputs, PACED can give high confidence in most array outputs upon error detection with 
less than continuous checking. 
4.4. Error Coverage 
If it is assumed that errors occur unifonnly distributed among the constituent PEs of the 
linear array, an estimate of the error coverage can be made. This assumption can be valid in 
arrays with homogeneous PEs running the same algorithm: no PE would be more or less suscep-
tible to errors than any other PE. In any M consecutive cycles in the array, each PE will have 
63 
completed one pass through its CS M,N[] array. Hence, one M-cycle period has the same cover-
age as any other M-cycle period, so it suffices to examine a single such period. 
In one M-cycle period in a V-PE linear array, there are MV potential sites at which errors 
may occur: one for each PE of the array, in each cycle. Since it is assumed that errors propagate 
through the array and are not masked, when normal PACED is applied to the array, only some of 
these sites could lead to the propagation of undetected errors out of the array, if an error were to 
occur. (When PACED' is used, the estimated coverage is 100% for all values of NIM, since no 
errors can escape undetected from the array.) By counting these sites and dividing by the total 
number of potential sites, an estimate of the error coverage can be made. 
Figure 4.6 shows the estimated error coverage for a 16-PE linear array as a function of 
NIM, when M = 10 and q = 1. It can be seen that even for small values of NIM, the error cover-
age is quite high (greater than 70% for NIM = 0.1). The coverage climbs quickly as NIM 
increases, so that any checking ratio greater than 0.4 will have an estimated error coverage 
greater than 95%. The cooperation among the PEs that allows propagated errors to be detected 
causes this rise in coverage for small NIM. Hence, low values of the checking ratio can yield 
high error coverage. This result is promising, as it allows the possibility of low checking ratios, 
and thus, low performance cost, while still maintaining good error coverage. 
4.5. Performance 
The performance of linear processor arrays using PACED was studied in two ways. First, 
the performance costs were estimated by using an algorithm-independent. simulation-based. 
analysis model that was written in C to study the effect of PACED when used in linear, square, 
64 
100 
95 
Estimated 90 
Error 
Coverage 85 
(%) 80 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
NIM(M= 10) 
Figure 4.6. Estimated error coverage for a 16-PE linear array. 
and triangular processor array architectures. The simulator uses mean execution times required 
by basic arithmetic operations, so that the activity in the array PEs can be simulated without 
actually being performed. This event-driven, reduced simulation gives an estimate of the com-
pletion times for algorithms with and without the use of PACED, allowing the PACED overhead 
to be determined. 
Second, full simulations of a linear array performing an image processing edge-detection 
algorithm were run on an Intel iPSCI2 hypercube, to obtain more accurate values of the perfor-
mance costs to expect when using PACED. The results from these two performance analyses 
are presented in the following two subsections. 
6S 
4.5.1. Simulation model 
The inputs to the simulation model include the dimension(s) of one of the modeled archi-
tectures, the mean task and check times for the PEs, the values of Mi, Ni , and OJ PACED 
parameters, and the desired length of simulation (the number of computational cycles required 
by the PE producing the final output). Given these inputs, the model can estimate the perfor-
mance costs incurred through the use of PACED in the array. 
The task time is the user's estimate of the mean time that a PE requires to complete a task. 
Similarly, the check time is the mean time that a PE requires to complete the CED for a task. It 
is assumed that the deviations from these means are small. (This assumption has been verified 
from actual simulations, described in the next section. However, in cases where the assumption 
is not valid, the simulation results will be more inaccurate.) These times are determined by ana-
lyzing the implementations of the task and check algorithms. If the array consists of more than 
one type of PE, task and check times for each type of PE require specification. Communication 
costs are not explicitly modeled; they can, however, be incorporated into the mean task and 
check times. 
The model uses these parameters to simulate the activity of the array PEs without perform-
ing the computations specified by the algorithm. The partial-simulation saves time and affords 
the model algorithm independency. Though not as accurate as a full simulation, this reduced 
simulation model is intended to provide good results at a low computation cost. 
EXAMPLE 4.5: The simulation model was used to estimate the performance costs of using 
PACED in a linear unidirectional array running an image edge-detection algorithm [41]. Two 
66 
CED schemes were considered in determining the mean task and CED times: RESO and AN-
coding. Briefly, in RESO-k, each arithmetic operation is performed twice: the first normally, the 
second using k-bit arithmetic-shifted operands to produce a bit-shifted result. Different amounts 
of shifting can be used, depending on the operation, to obtain maximum error coverage. For 
these experiments, the basic RESO recommendations were employed: RESO-2 for addition and 
multiplication [3], and RESO-2,3 for division (Le., 2-bit shift of numerator and 3-bit shift for 
denominator) [43]. 
In AN-coding, every operand is encoded by multiplying by the base A. All results, inter-
mediate as well as final, must be 0 modulo A or an error has occurred [42]. For a low-cost 
encoding, the base A should be 2c - I, where c is the number of bits needed to represent A [44]. 
Table 4.1 shows how the approximate mean task and CED times were determined. The 
first column in the table shows the different types of basic arithmetic operations that were 
counted· for one computation cycle in each of three versions of the algorithm: the basic algo-
rithm, one using RESO, and a third using AN-coding. The operations are integer add, integer 
multiply, modulo, arithmetic shift left. and two types of compare: compare register-with-
memory and compare register-with-immediate. The second column shows the number of clock 
cycles required to perform each operation. These were taken from the Intel 80386 instruction 
timing data as an example ALU [45]. The columns headed "Basic" show. for the basic algo-
rithm. how many of each operation and how many clock cycles are required for one computa-
tion cycle. The columns headed "RESO" and "AN-coding" show the same information for each 
of the CED versions. The penultimate row of the table sho..ys the total clock cycles required per 
computation cycle of each algorithm. Since the simulator uses these numbers to control the 
67 
number of time-slice iterations performed for each computation cycle, these totals were reduced 
and rounded to small, whole integers, and are given in the last row. 
Using the reduced clock cycles, the simulator was run for 1024 computation cycles (to 
simulate processing an image of 1024 rows) with M = 10, N varied from 0 to 10, and the detec-
tion probability q = 1. Figure 4.7 shows the relative completion times to be expected from run-
ning the three versions when the amount of checking is varied from 0% to 100%. The simula-
tion predicts that the performance cost for RESO should be approximately linear with the 
amount of checking performed. The slight deviation from linearity arises from the initialization 
of the array, during which no checking is performed in many of the PEs, thereby slightly reduc-
ing the overhead due to CED. 
The simulation also predicts that the cost of using AN-coding will be higher than that of 
RESO. This can be attributed to the large number of clock cycles required to perform a modulo 
TABLE 4.1. 
COMPUTATION CYCLE TIMES, EDGE DETECTION PEs. 
operation Basic RESO AN-coding 
type cycles per # tot cycles # tot cycles # tot cycles 
+ 7 54 378 104 728 54 378 
x 17 27 459 54 918 27 459 
mod 114.5 21 2404.5 38 4351 62 7099 
« 5 108 540 
cmp w/mem 5 9 54 
cmp w/imm 2 41 82 
total cycles 324l.5 6591 8018 
reduced cycles 2 4 5 
5000-
4500-
~--------------------------_150 
.'!I •••• 
'fl •••.••••. 
.' 
.. ' I- 125 
Mean 4000-
Execution 
'fl ••.•••. 
AN ..... y'....... . .... (!1 ••••••• r 100 Performance 
(cycles) 
3000-
'fl" •• ~., ~ 75 Overhead 
.
... rr········ ~ ...... ~ ... . 
~ ..... . 
•• '1'.... (!1 ....... RESO 
... .-
. .' 
•• yo., •••• (!1 ••• 
2500- ........ (!1 ••• 
)1 ..• •• 
\,~.::.~ No PACED 
2000 '--------------------------------------- I- 0 
I I I I I I I I I 
Time 3500-
-50 
(%) 
t- 25 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
N/M(M= 10) 
Figure 4.7. Simulated linear array performance, edge detection. 
68 
operation, which forms the crux of the checking in the AN-coding technique. All of the data for 
, 
the graph were obtained from 22 runs of the simulator, which required less than 8 min. Hence. 
the simulator can be a valuable tool to estimate the performance costs of different CED tech-
niques when used with PACED in the linear array. o 
4.5.2. Hypercube simulations 
A simulation of a linear processor array was performed on an Intel iPSCI2 hypercube using 
the nodes as PEs and the shortest internode connections to minimize the communication over-
head. The application was the image edge-detection algorithm modeled in Section 4.5.1. Using 
69 
this algorithm, a I-by-VI3 array of homogeneous PEs can process a UxV image. Figure 4.8 
shows an example input image to the array and its corresponding output. (Due to the data delay 
through the array, the first row of output image is blank, and the last row is absent.) The algo-
rithm repeatedly convolves a 3x3 mask with 3x3 windows of the image. First, the mask is sent 
by the host to each PE; three columns of image are then sent to each PE, row by row. As the 
data are processed, intermediate results are sent by each PE to its predecessor and outputs are 
sent back to the host row by row, three columns at a time from each PE. 
The first simulation used all 16 nodes of the hypercube to process a 1024x48 image. Two 
CEO techniques were employed: RESO and AN-coding. The base A of 255 = 28 - I was used 
xxxx XX 
XXXX XX 
xx.xx XX 
XXXX XX 
XXXX 
XX 
XX 
XX XX 
XX XX 
XX 
XXX 
XXXX 
XXXXXX 
XX XXXX 
XXXX XXXXXX 
XX XXXX 
XXX 
Input image 
xxxx XX 
X X X X 
X X X X 
XX X X X 
XX X X 
XX X XX 
XXXX X X 
xx.xx X X 
X X XX 
X XX XXXXXXXXXXXXXX 
X XX X 
X XX XXXXXXXX XX X 
X X X XX X 
xx.xx XX XX 
XX 
XX X 
XX X 
XXX X 
XX XX X 
X XXX XXXX 
X X XX 
X XXX XXXX XX 
XX XXX X 
Output image 
Figure 4.8. Sample input and output, edge detection algorithm. 
70 
in the experiments, so that the largest encoded numbers generated by the application would still 
fit in 32 bits, the size of an integer on the hypercube. 
Figure 4.9, constructed from completion times of the three versions of the algorithm. 
shows how the performance was degraded by the use of CED in varying checking ratios NIM. 
The completion times do not include the initializations of either the host programs or the indi-
vidual node programs. For each run, the individual completion times of each of the 16 nodes 
were averaged together. The averages from five runs were then averaged to obtain each data 
point on the graph. Just five runs were deemed sufficient for two reasons: 1) the greatest stan-
dard deviation for the individual node completion times was less than 0.15% of the average 
326~--------------------------, 
t-0.5 
Mean 325 - ···+···+··t·+···+··+···+···+··+······e-O.25perfonnance 
ExecutIon Ov h d er ea Time 
(sec) (%) 
r- -0.25 
323~--~1-~1-~1~1--~1--~1--~1--~1~-~1~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
N/M(M= 10) 
Figure 4.9. Linear array pelformance, edge detection. 
71 
node completion time, and 2) the greatest standard deviation for the run averages was less than 
1.1 % of the average run completion times. The figure displays the 95% confidence interval for 
each datum as a set of vertical bars above and below the point: these intervals are quite small. 
From the figure, it can be seen that the use of CEO, either AN-coding or RESO, in any 
checking ratio had little effect on the completion time. When the algorithm was checked using 
AN-coding, a very slight increase in the completion times is noticed (= 0.25%), but this slight 
difference is probably spurious, due to slight differences in the operating conditions of the 
hypercube when the separate experiments were performed. It was hypothesized that communi-
cation costs in the hypercube were much larger than anticipated. Since VLSI processor arrays 
were developed in part to achieve great processing speed, the communication costs in such 
arrays should be quite small. Apparently, in this experiment, communication costs dominated 
the computation time so both the RESO and AN-coding results showed very little overhead. 
To test this hypothesis and to obtain more accurate results of a simulated processor array 
using the hypercube, all inter-PE communication was removed from the algorithm's implemen-
tation and the experiments were repeated. As expected, the completion times of the application 
were very much smaller than when the PEs performed communication, even when a larger input 
image (l6384x48) was used. 
The results are shown in Figure 4.lO. The RESO and AN-coding performances are shown 
as dotted lines; the right vertical side of the graph shows the performance scale. These results 
were closer to expectation: the performance exhibited gradual degradation as the checking ratio 
NIM increased, with very little degradation at N = 0 and rising linearly to just over lOO% degra-
dation for AN-coding and just about 100% degradation for RESO. (The 95% confidence 
Estimated 
Error 
Coverage 
(%) 
100,-------~~~~~~~~ 
90 
80 
70 
60 
50 
40 
30 
20 
10 
O~--~~--~~--~_r--~~--~~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
NIM{M= 10) 
100 
75 
50 
25 
o 
Performance 
Overhead 
(%) 
Figure 4.10. Linear array performance, edge detection, no communication. 
72 
interval bars are too small to be seen on the graph.) The RESO curve exhibits a slight degrada-
tion even when no checking is performed (N = 0). This is due to a slight increase in code size, 
as modifications were made to some operations that would normally destroy an operand needed 
to perform RESO. 
Figure 4.10 also overlays on the same axes -the estimated error coverage as a function of 
NIM from Figure 4.6. The left vertical side of the graph shows the scale for the error coverage 
in percent Coverages over 95% can be achieved with NIM ;:: 0.4: fairly low values of the 
checking ratio can yield good error coverage, for which the performance penalty can be under 
50%. 
73 
From these experiments it can be concluded that the use of PACED can reduce the perfor-
mance costs incurred through the use of CED in a linear processor array, while still maintaining 
good error coverage. A designer of such an array can trade off between performance and the 
amount of output to suspect when an error is detected (and thereby, the error coverage) by 
choosing the checking ratio NIM, provided a coding technique is used to allow error propagation 
between the PEs in the array (e.g., the AN-code for integer applications). 
These experiments also validate the simulation model described in Section 4.5.1. There, 
the simulator had predicted that RESO would perform better under PACED than AN-coding, on 
a 16-node linear array running the edge detection algorithm to process a 1024-row image. The 
overhead at 0% checking of the CED versions was not predicted by the simulator since the 
mean task and check times used in the model did not reflect the code expansion required by the 
RESO and AN-coding versions. Also, the simulator predicted a higher-than-realized overhead 
for the AN-coding technique when applied continuously. However, considering the short time 
required to generate the simulation results, in this example, the simulator provided a fast and 
fairly accurate estimate of the performance costs to expect when using PACED in a linear array. 
74 
CHAPTERS. 
PACED IN A TWO-DIMENSIONAL ARRAY 
The two-dimensional (2-D) processor array considered in this chapter is composed of UxV 
mesh-connected PEs (Figure 5.1). which accept data at their top and left inputs and send data 
through their right and bottom outputs. The PEs on the top and left edges of the array accept 
external inputs; PEs on the right and bottom edges produce external outputs. Data may only 
flow from left to right and from top to bottom. Note that at the onset of problem execution some 
PEs may be idle until their input data arrive. Such arrays have been used to implement algo-
rithms to perform matrix operations [46]. image processing [47]. digital filtering [48], and poly-
nomial evaluation [49]. For two PEs in the array PEj,j and PEk,J' if i < k or j < I, then PEj,j is 
upstream of PEk,L and PEk,L is downstream from PEj,}. 
Checking patterns in these arrays can be designed so that PEs cJ'teck the unchecked compu-
tations of upstream PEs. As in the linear array, each PEj,} may have its own distinct Mj,j and 
N i,} values. The offset OJ,j creates checking patterns in the array and is determined by two 
parameters called RISE, and RUN: RISEIRUN gives the slope of the waves of checking in the 
checking pattern. With PACED applied to the 2-D array, this chapter will investigate the confi-
dence to place on array outputs at error detection time, the error coverage, and the performance 
of the array. 
array 
inputs 
array inputs 
• 
... 
array outputs 
~-1 
··T 
array 
outputs 
~l'V-l 
Figure 5.1. A UxV2-D mesh processor array. 
7S 
First, the use of PACED in a· 2-D array will be analyzed to determine which outputs to sus-
pect upon error detection. The confidence analysis is based on three assumptions similar to 
those used in Chapter 4. 1) All communication channels in the array are fault-free. 2) If an 
erroneous output is produced by a PE, it will be propagated downstream both rightward and 
downward. 3) PEs are code-disjoint: use of erroneous inputs or state values causes erroneous 
PE outputs to propagate both rightward and downward. 
5.1. Error Detection Latency 
In order to alert the external world of an error detection in the array, an error signal must 
reach a PE that produces external outputs. The error detection latency does not include this sig-
nal delay. Upon error detection. a message is sent by the 'detecting PEi,j downstream with its 
76 
output data indicating the PE and computation cycle of the detection. The time for a user to 
become aware of an error detection at PEi,} is proportional to min(U - i-I, V - j - 1). 
In 2-D arrays, an algorithm is used to determine Lmax and LT , the latency of an error cre-
ated in an unchecked computation cycle r of PEi,}, Ni,} S r S Mi,}-l' When PACED is applied to 
a 2-D array, the checking pattern is set by Oi,} = (Mi,j+ i + j - (U - 1 - i)RUN -
(V - 1 - j)RISE) mod Mi,j' This particular Oi,} was derived empirically, based on the shape of 
the optimal checking pattern for linear arrays: since errors propagate downstream i~ the array, 
waves of checking that proceed upstream in time were desired to reduce the detection latency. 
The algorithm propagates an error from PEi,} in cycle c downstream through the array until it is 
detected in cycle c + z, giving LT = z. The algorithm uses the fact that when the checking activ-
ity at PEi,} in cycle c is CSM,N[r], the checking activity at PEi+y,}+x in cycle c + z is 
CSM,N[(r + xRJSE + yRUN + z) mod M i ,}]. 
As in the linear processor array, errors may be created that propagate undetected out of the 
array. However, in the linear array, the checking pattern was designed such that only a few of 
the endmost PEi could create undetected errors. Such is not the case for the 2-D array, in which 
RISE and RUN can be chosen to create a variety of checking patterns. Therefore, Lmax for the 
2-D array is defined as the largest finite error detection latency. 
EXAMPLE 5.1: Figure 5.2 shows several snapshots of a IOxlO array in the midst of some 
computation, with Mi,}= 10, Ni,} = 3, RISEIRUN = 211, and Oi,} = (2i + 3j - 17) mod 10. The 
detection latency for an error created at P~,s in cycle c (marked in the figure bye), when the 
checking activity at PEz.5 is CS IO,3[5], is called Ls and equals 2, since both PE3,6 and PEz,7 
77 
detect the error in cycle c + 2. The figure shows how the error propagates through the array (* 
in the figure) until detection (in the figure, X). For this array, Lmax = LN = L, = 3. o 
S.2. Suspected Outputs 
This section considers which outputs to suspect as possibly erroneous when an error is 
detected at PEi,j in a 2-D processor array. As in the single-processor and linear-array discus-
sions, outputs produced both prior to the detection as well subsequent thereto are considered. 
For the first case, a simple algorithm works backwards in time from the point of detection, to 
determine through which upstream PEs the error could have propagated; the outputs from those 
PEs should be suspected. The algorithm runs in O(UV . Ni,j) time, assuming Ni,j is constant for 
all PEi,j . 
ExAMPLE 5.2: Figure 5.3 shows five snapshots of a lOxlO processor array using standard 
PACED with Mi,j= 13, Ni,j= 5, RISE/RUN = 311, and Oi,j= (2i + 4j - 23) mod 13. Each grid 
j 
i 012 3 4 5 6 7 8 9 
a ----x----x 
1 - - - x x - - - x x 
2 - - - x - e - - x -
3 - - x x - - - x x -
4 --x----x--
5 - x x - - - x x - -
6 -x----x---
7 x x - - - x x - - -
8 x----x----
9 x - - - x x - - - x 
computation cycle c 
j 
i a 1 2 3 4 5 6 7 8 9 
a - - - x x - - - x x 
1 - - - x - - - - x -
2 - - x x - - * x x -
3 - - x - - * - x - -
4 - x x - - - x x - -
5 - x - - - - x - - -
6 x x - - - x x - - -
7 x - - - - x - - - -
8 x - - - x x - - - x 
9 ----x----x 
computation cycle c + 1 
j 
i 012 3 4 5 6 7 8 9 
a ---x----x-
1 - - x x - - - x x -
2 --x----X--
3 - x x - - - X x - -
4 - x - - - * x - - -
5 x x - - - x x - - -
6 x----x----
7 x - - - x x - - - x 
8 ----x----x 
9 - - - x x - - - x x 
computation cycle c + 2 
Figure 5.2. Error detection latency. 
78 
represents the checking activity in the array in one computation cycle. The outputs to suspect 
are marked either as @ (where the error was detected) or * (from where the error might have 
propagated). 
If an error is detected at P~,9 in cycle c, its output should be suspected as possibly erro-
neous. Also, the outputs from the following PEs should be suspected as possibly erroneous: 
P~,s and PEs,9 in cycle c - 1; P~,7' PEs,s, and PE,,9 in cycle c - 2; PE"s and PE6,9 in cycle 
c - 3; and PE4,9 in cycle c -4. All other unsuspected, previously produced outputs can be 
trusted with a confidence of 1, unless a later error detection makes it necessary to suspect them.o 
The PACED' modification in the 2-D array performs 100% checking at PEu-I,v-1> either by 
duplicating PEU-I, V_lor by monitoring its outputs with a hardware code checker. With PACED' 
in use, errors cannot escape undetected from the array. As in the linear array case, this modifi-
cation obviates the need to suspect any future outputs from the array: all errors are eventually 
2S. ~ ~ ~ 2S. ~I)( 
x 
Xb{ 
~. 
x. 
2S.1X x ~ 
x x )(' 
;l'( x: )(~ 
~ 
l!!, 
• 
l!!, 
:x 
x 
~ 
":X 
x: )( 
X ~ XX: 
x2 
~ 
x 
')( 
• 
• 
~ )( 
t". b( 
2S. Ii{ 
x: 
~ 
2<-
" :. ?5. 
• 
~ [)( xR ~ x x )0< 
x: 
X x DO(, 
2S. DOl 
~ IX I-(l.~ J\ 
,-l!!, 
~ 
0,9 
0,0 
9,9 
c 
c-l 
9,0 c-2 
c-3 
computation cycle c - 4 
Figure 5.3. Suspected previously produced outputs, lOx 10 array_ 
79 
detected, so only previously produced outputs have to be suspected at error detection time. 
However, in the standard PACED implementation, some detected errors may propagate down-
stream, corrupting other outputs before escaping the array. 
An algorithm similar to that used to find the suspected previous outputs first works back-
wards from each unchecked cycle of PEU-1,v-l to detennine from which upstream PEs, in earlier 
checked cycles, undetected errors may have propagated. Also, potential sites of suspected out-
puts are marked in this step. From these detection sites, errors are then propagated forward 
retracing the paths found in the first step; errors on paths that do not lead to subsequent detec-
tions are marked suspect This algorithm runs in O(UV· N i,)) time, assuming N i.) is constant 
for all PEi ,) . 
ExAMPLE 5.3: Figure 5.4 shows three snapshots of a IOxlO processor array using standard 
PACED with Mi,) = 10, N j ,) = 3, RISE/RUN = 211, and OJ,) = (2i + 3j - 17) mod 10. The figure 
is notated as in Figure 5.3. 
If an error is detected at PEs,8 in cycle c (marked @ in the figure), its output should be sus-
pected as possibly erroneous. Furthermore, the outputs from the following PEi ,) should also be 
suspected as possibly erroneous: PEg,9 and P~,8 in cycle c + I, and PE9,9 in cycle c + 2 (all 
marked by *). All other unsuspected, future outputs can be trusted with a confidence of 1 (until, 
of course, the next error detection). o 
The detection of an error by one of the N i ,) checks at PEi ,) leads to static patterns of previ-
ous and future outputs to suspect as possibly erroneous. F9r example, Figure 5.3 is the pattern 
of previous outputs to suspect if Mi,) = 13, Ni,) = 5, RISE/ RUN = 311 (giving OJ,) = 
80 
computation cycle c 
~ checked task 0 unchecked task ~ error detected (suspect) [!] suspected output 
Figure 5.4. Suspected future outputs, 10x10 array. 
(2i + 3j - 23) mod 13), and CS13,s[O] detects an error; Figure 5.4 is the pattern of future outputs 
to suspect if Mi,j= 10, Ni,j= 3, RISEIRUN = 211 (giving Oi,j = (2i + 3j - 17) mod 10). and 
PEu-2•v-2 detects an error by CStO,3 [1]. 
For given values of Mi,j' Ni,j' RISE, and RUN, there are a fixed number of possible pat-
terns of suspected outputs: one for each CS13,s[r], 0 ~ r ~ 4, for the previously produced out-
puts, and a variable number of patterns generated from each CS 13,s[r], 5 ~ r ~ 12, for the future 
outputs. Because the PACED parameter values are known, these patterns can be computed once 
for the array using the algorithms described and stored, indexed by r, i, andj. Upon error detec-
tion, given which check detected the error (the index of CSM,N) at which PEi•j , the outputs to 
suspect can be determined with no extra computations by recalling the appropriate template. 
In the linear array case, it was possible to determine analytically which checking pattern, 
for a given M i•j and N i•j , would lead to the minimal maximum error detection latency [19]. 
81 
Such an analytical treatment is less tractable for the 2-D array, so a pattern generator program 
and analyzer program were written in C to examine the search space. The pattern generator pro-
gram takes as input the architecture of the array (linear, 2-D, or triangular), the dimensions of 
the array, values of the PACED parameters M, N, and 0 (for the linear array case) or RISE and 
RUN (for the other architectures), and the number of computation cycles to generate. It pro-
duces a series of snapshots of the array for the requisite number of computation cycles, showing 
the checking pattern generated by the PACED parameter values. The analyzer program takes 
the output of the pattern generator as input and determines Lmax, as well as the number of out-
puts to suspect, both forward and backward, for the particular PACED parameter values. 
A 20x20 array was tested, setting Mi,j = 15, Ni,j = 1, 2, ... 15, Oi,j = (15 + i + J 
- (19 - i)RUN - (19 - J)RISE) mod 15, and q = 1. By varying RISE and RUN, patterns with 
waves of different slopes were generated. These patterns were then analyzed to determine their 
maximum error detection latency, as well as the pattern and number of both previous outputs 
(Table 5.1) and future outputs (Table 5.2) to suspect when an error is detected. 
For each row of the tables, the first two columns give the checking ratio and percentage, 
and the third column gives the particular RISE and RUN values used to obtain the other values 
in that row. The fourth column, Lmax , gives the minimal maximum error detection latency that 
achieves the minimum number of previous outputs (Table 5.1) or future outputs (Table 5.2) that 
should be suspected as possibly erroneous (sixth column). The fifth column gives the number of 
computation cycles that these suspected outputs span. 
82 
TABLE 5.1. 
NUMBER OF SUSPECTED PREVIOUS OUTPUTS, 2-D ARRAY. 
% min#bwd min # fwd 
NIM checking RlSEIRUN Lmax # cycles susp.o/p susp.o/p 
1115 6.7 0/0 III 3/3 14 15 120 679 
2115 13 212 4 5 30 68 
3/15 20 212 4 5 45 102 
4115 27 414 2 3 24 36 
5115 33 414 2 3 30 45 
6/15 40 -7/S 515 SIS 2 3 27 36 
7/15 47 -sn 6/6 7n 2 3 24 27 
8115 53 <S options> 1 2 22 21 
9/15 60 <32 options> 1 2 21 18 
10/15 67 <72 options> 1 2 20 15 
11115 73 <12S options> 1 2 19 12 
12115 80 <200 options> 1 2 18 9 
13/15 87 <325 options> 1 2 17 6 
14115 93 <544 options> 1 2 16 3 
15115 100 <all options> 0 0 0 0 
It is interesting to note that particular patterns that work well in the backward interval do 
not generally work well in the forward interval. The last columns in each table are provided for 
the purposes of comparison. For example, with N = 1, Table 5.1 suggests that using RISEIR UN 
= 0/0, 111, or 3/3 gives a minimum of 120 previously produced outputs to suspect. but when 
these slopes are used. 679 outputs must be suspected subsequent to certain error detections. The 
reverse is also true: with N = 5, Table 5.2 suggests using RISE/RUN = -114 to limit the amount 
of future suspected outputs to 25, yet 205 previously produced outputs should also be suspected 
upon error detection. Clearly. the search space is large and complex; use of the pattern genera-
tor and analysis programs can aid a designer of such a system to choose the best RISEIRUN for 
a desired checking ratio, to minimize the amount of output to suspect 
83 
TABLE 5.2. 
NUMBER OF SUSPECTED FUTURE OUTPUTS, 2-D ARRAY. 
% min # fwd min#bwd 
NIM checking RlSEIRUN Lmax # cycles susp.o/p susp.o/p 
1/15 7 -1/4 2 3 5 41 
2115 13 -1/4 2 3 10 82 
3/15 20 -1/4 2 3 15 123 
4115 27 -1/4 2 3 20 164 
5/15 33 -1/4 2 3 25 205 
6/15 40 -1/5 -1/8 2 3 21 186 
7/15 47 -1/6 -117 2 3 17 167 
8/15 53 -1/6 -117 1 2 14 148 
9/15 60 -1/5 -1/6 -117 -1/8 1 2 12 129 
10/15 67 -1/4 -1/5 -1/6, 1 2 10 110 
-117 -1/8 -1/9 
11/15 73 <32 options> 1 2 8 11 
12115 80 <40 options> 1 2 6 12 
13/15 87 <64 options> 1 2 4 13 
14115 93 <83 options> 1 2 2 14 
15/15 100 <all> 0 0 0 0 
The tables show that only about one second's worth of output (on the order of 10 cycles' 
worth with cycle times less than 100 ms) need be suspected, either forward or backward in the 
2-D array, upon error detection. As in the linear array case, this is a great improvement over the 
amount of suspected output in the single processor case and shows again how PACED utilizes 
the cooperation of PEs checking other PE outputs to afford high confidence in outputs with only 
periodic checking. 
5.3. Error Coverage 
As in the linear array case, the error coverage in the 2-D array can be estimated if it is 
assumed that errors occur uniformly distributed in space among the PEs in the array. Again, 
84 
only one M-cycle period has to be examined, as all other M-cycle periods are identical and have 
the same coverage. 
In one M-cycle period in a UxV 2-D mesh array, there are MUV potential sites at which 
error may occur: one for each PE of the array, in each cycle. Since it is assumed that errors 
propagate through the array and are not masked, only a fraction of the potential sites can lead to 
the propagation of undetected errors out of the array, if an error occurs. (This is when normal 
PACED is applied; use of PACED' would result in 100% error coverage, as no errors can escape 
from the array undetected.) The estimated error coverage is just the total number of sites from 
which undetected errors may propagate out of the array, divided by the total number of potential 
sites. 
Figure 5.5 shows the estimated error coverage for a 4x4 PE mesh array as a function of 
NIM. when M = 10 and q = 1. When NIM is small, the error coverage is low; but the coverage 
increases quickly as NIM increases: greater than 95% coverage can be achieved with NIM just 
0.5 or greater. As in the linear array case, low values of the checking ratio can yield high error 
coverage - and low checking ratios can lead to reduced performance cost of applying CEO. 
5.4. Performance 
As in the linear array case, the performance of 2-D processor arrays was studied in two 
ways: from reduced simulations using the C simulation model. and from full simulations on the 
Intel iPSC/2 hypercube running a matrix-multiply algorithm. The results from these two perfor-
mance analyses are presented in the following two subsections. 
8S 
100 
95 
Estimated 90 
Error 
Coverage 85 
(%) 80 
75 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
N/M(M= 10) 
Figure 5.5. Estimated error coverage for a 4x4 mesh array. 
5.4.1. Simulation model 
The simulation-based analysis model introduced in Section 4.5.1 was also used to estimate 
the performance of PACED when applied to square and triangular processor array architectures. 
One array investigated was a triangular array running an adaptive beamforming algorithm. 
EXAMPLE 5.4: Digital adaptive beamforming is a signal processing algorithm that opti-
mizes the reception of a desired signal received at an antenna array. A triangular processor 
array has been designed for high-performance, adaptive, digital beamforming [50], and is shown 
in Figure 5.6. The triangular array consists of four types of PEs: boundary, internal, Y-column 
and a residual former. During each computation cycle, PEs of each of the first three types 
Input 
Output 
o 
~ 
<> 
o 
Boundary PE 
Internal PE 
Y-column PE 
Residual Former PE 
Figure 5.6. Triangular array for adaptive digital beamforming. 
86 
compute outputs and update an internal state variable; the residucil former does not maintain any 
state, and only computes an output. 
The modeled CED scheme replicated with duplicate data the computations at each PE. A 
full simulation using the OODRA (Object-Oriented Design of Reliable/reconfigurable Architec-
tures) workbench [51] was used to determine the mean task and check times for each type of PE 
in the array, for one computation cycle. The mean task times are given in the second column of 
Table 5.3; the units in the table are defined such that three units equal the average time required 
for the residual former to complete one task. 
The table shows that the boundary PE task required at least an order of magnitude more 
time than any of the other PE tasks, because of its costly state update computation (involving a 
square root). Therefore, five different variations of the CED scheme were considered. In each 
PE type 
Boundary 
Internal 
Y-column 
Residual 
TABLE 5.3. 
TASK AND CHECK TIMES, 
ADAPTIVE BEAMFORMING PEs. 
check time using CEO scheme: 
task time I II ill IV V 
104 106 17 17 88 88 
15 16 16 8 8 0 
15 16 16 8 8 0 
3 4 4 4 0 0 
87 
variation, only a subset of the computations performed at each PE in a computation cycle were 
checked whenever the CEO technique is performed. 
Jj All output and state computations at each PE were checked. This provided the greatest 
probability of detecting an error, if one were to occur. 
III All output and state computations except the boundary PE state update computation were 
checked. This scheme attempted to check as many of the computations as possible, while 
saving the most time by not replicating the longest operation. 
IIIJ Only output computations at each PE were checked. ~ 
IV I Only state update computations at each PE were checked. 
VI Only the boundary PE state update computation was checked. This scheme covered the 
most time at the boundary PEs while trying to minimize the number of computations to 
replicate. 
The last five columns of Table 5.3 show the mean check times for each PE type, for each of the 
five CEO schemes~ again, the units are relative to the residual former task time. 
88 
The simulation-based analysis model detennined the perfonnance of a 4x4 triangular array 
running the adaptive, digital, beamfonning algorithm using PACED with Mi,j = M, Ni,j = N, 
and q = 1. The simulation was run for 500 computation cycles. For each of the five CED 
schemes, five different checking patterns were applied, in which different subsets of the PEs in 
the array were checking at any particular computation cycle: the entire array, a row, a column, a 
forward wavefront with slope 1, and a backward wavefront with slope 1. (These simulations 
were perfonned before the 2-D array was analyzed. Hence, the fonnula given for OJ,) in Section 
5.1 was not used; other fonnulas for OJ,} were derived to fit the desired PE subsets.) If To and 
T c represent the time units estimated by the model to run an algorithm without and with using 
CED, respectively, then the degraded perfonnance is TofTe and the checking overhead is 
(Tc - To)fI'o. Figure 5.7(a) shows the perfonnances, and Figure 5.7(b), the checking over-
heads, of each of the five CED schemes as a function of NIM. 
It was found that the perfonnance degradations resulting from the five checking patterns 
were practically identical, for any of the CED schemes employed: the perfonnance impact of 
PACED depended only upon M and N. Therefore, each curve in Figure 5.7(a) represents the 
(identical) perfonnances using the five checking patterns considered, and each curve in Figure 
5.7(b) represents the (identical) overheads of those patterns. o 
For N = 0, the modeled PACED system suffers no perfonnance degradation, regardless of 
the CED scheme used. Since checks involve a replicated computation plus a comparison. for 
NIM = 1. the checking overhead for CED scheme I exceeds 100% and the perfonnance is 
slightly less than 50% of the basic performance. The pair of CED schemes II and III have the 
100 's. 
.•••••...•...•...•...• TI, m 
90 • ...• - "?:'. • ....... 
"':"" ....... 
80- ···.l. 
Performance 70 .. :'l!J. 
(%) - 't.':::;".r" V 
....... "'15 i"... "'15 60- I ":j:. •• ":j:.···15 
.... t-:::~ 50-
4O~---r-,--T-1--~1--~1--~ 
o 0.2 0.4 0.6 0.8 1 
NIM(M= 10) 
(a) 
89 
100-
. .t .... . 
80- . . .t.... . ... i~ 
.j.... .i!I 
Overhead 60 -
(%) 
40-
• •• of- i!I'" 
.t.. . ... 
•• ' .i!I I.j.·· .• ' 
.of- .i!I 
-10 .... i!I'" IV: V 
.+ " , 
. .t::~.@I" 
20 - . .j:::.i!I II III 
~~~~. .. .. ' .............. .. 
o .. ~ ............... .. 
I I I I 
o 0.2 0.4 0.6 0.8 
NIM(M= 10) 
(b) 
1 
Figure 5.7. Adaptive beamforming array. 
(a) Performance degradation. (b) Checking overhead. 
same performance and overhead, as do the pair N and V. This means that even though CED 
scheme II replicates the state update computations (which scheme ill does not), these extra com-
putations can be done essentially with no added cost, because the large boundary task time 
forces the other PEs in the array to wait and it is in these idle times that the checking of schemes 
II and III is performed. Since no extra wait states are propagated to the residual former, and 
since the residual former performs the same amount of checking in the two schemes, no perfor-
mance difference is observed. For the same reason, if the boundary PE state update computa-
tions are checked (scheme V), then all PE state update computations can be checked with no 
extra performance cost (scheme N). Hence, of these five CED schemes, I, II and N represent 
the most intelligent options. 
90 
In this example the perfonnance degradation and overhead were constant for any PACED 
checking pattern chosen, given a particular CED scheme. This has been shown to be true for the 
linear array [19] and should be true in general, since the Oi.jparameter only affects the initializa-
tion of PACED at each PE in the array: the perfonnance depends only upon the checking ratio 
NIM. 
5.4.2. Hypercube simulations 
A simulation of a 2-D mesh processor array was perfonned on an Intel iPSC/2 hy.percube, 
with the nodes serving as the PEs and using the shortest internode connections to minimize the 
communication overhead. A matrix-multiply algorithm was implemented in C in which rows 
and columns of each input matrix were distributed to the PEs through the top and left edges of 
the array and sent thereon through the array. Each PE computed a submatrix of the final matrix 
result; the final result was collected by the host at the end of computation. 
The simulations used all 16 nodes of the hypercube to multiply together two 136x136 
matrices of random floating-point numbers. With Mi,j= M and Ni,j= N, two CED techniques 
were employed: RESO and neighbor-assist. (AN-coding only applies to integer applications; to 
date, there are no known arithmetic codes for floating-point numbers.) The RESO employed 
was the same as that used in the simulations of the linear array performing image edge detection 
(Section 4.5.2). In the neighbor-assist technique, each PEi,j requests a recomputation of N of its 
computations from a nearest neighbor PE, which then sends back the results. Both PEs perform 
a comparison of the two sets of results and any discrepancy greater than an error tolerance 
(1. 5 X 10-15 times the value of a reSUlt) triggers an error detection. The neighbor assist 
91 
technique is patterned after CORP (concurrent retry procedure) of Manolakos et al. [52], which 
is used by NEAR (neighbor-assisted recovery) [53], the f-processes [17], and the overlapping 
H-processes [18] for 2-D processor arrays. In this implementation of NEAR, to reduce the num-
ber of CED-related messages, each PEi,i saves N-out-of-M sets of its operands and requests 
CED assistance only once every M computation cycles. 
Figure 5.8 shows the performance cost of using CEO in varying checking ratios NIM, by 
comparing the completion times of the different versions of the algorithm. The completion 
times do not include the initializations of the host or node programs. For each run, the individ-
ual completion times of each of the 16 nodes were averaged together. The averages from five 
Estimated 
Error 
Coverage 
(%) 
l00,---------~~~~~~~ 
90 
80 
70 
60 
50 
40 
30 
20 
10 No PACED 
O~_,--~~--~_r~r_,_~--T-~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
NIM(M= 10) 
250 
200 
150 
100 
50 
0 
Figure 5.S. Mesh array performance, matrix multiply. 
Performance 
Overhead 
(%) 
92 
runs were then averaged to obtain each data point on the graph. Just five runs were deemed suf-
ficient for two reasons: 1) the greatest standard deviation for the individual node completion 
times was less than 3.4% of the average node completion time, and 2) the greatest standard 
deviation for the run averages was less than 0.15% of the average run completion times. The 
graph does show the 95% confidence intervals for each data point but they are too small to be 
seen. 
Unlike the linear array performance results, the use of RESO or neighbor-assist clearly 
degrades the performance of the 2-D mesh array, in an almost linear fashion. This is probably 
because each computation cycle in the matrix-multiply algorithm has but four data 
sends/receives, compared with ten in the edge detection algorithm, and only 136 iterations of the 
computation cycle were required for the 136x136 matrix multiply, whereas 1024 iterations were 
performed in each run of the edge detection algorithm: overall, the matrix-multiply algorithm 
required far less communication than the edge detection algorithm so that the effects of CED 
were more pronounced on the completion time of the matrix-multiply algorithm. 
Both the neighbor-assist and RESO curves show a significant amount of overhead at NIM = 
o since both techniques replace each operand-destroying assignment statement with two state-
ments and a temporary variable, resulting in code expansion. However, the short overall execu-
tion time of the matrix-multiply algorithm amplifies the apparent overhead introduced by CED. 
The absolute time overhead for RESO at NIM = 0 is about 1 second; this is approximately the 
same absolute time overhead exhibited by RESO in the edge detection array. Since the matrix-
multiply execution time is smaller, the percent overhead is ~uch larger. 
93 
The figure shows that at 100% checking, both the RESO and neighbor-assist techniques 
display almost 250% overhead. In the basic version of the algorithm, the main computation 
consists of one floating-point multiply and one floating-point add. Use of RESO adds six extra 
floating-point multiplies as well as one extra floating-point add for each checked computation 
cycle. Though this more than triples the original amount of computation, more overhead is not 
apparent since the extra work incurred by RESO is a smaller proportion of the total amount of 
computation performed in each computation cycle. 
The basic version of the algorithm performs two data receives and two data sends each 
computation cycle. In the neighbor-assist case, every M cycles, two extra CED messages are 
both sent and received. This is the cause of the jump in execution time exhibited between NIM 
= 0 and NIM = 0.1. Thereafter, the extra-message overhead remains constant, and the increase 
of overhead with increased NIM comes from the extra computations each node performs as CED 
for a neighbor. The slope of the curve from NIM = 0.1 to NIM = 1 is gentler than that of the 
RESO case, as less extra computation is performed: just one extra floating-point multiply and 
add, for each checked computation cycle, in addition to copying the operands and partial prod-
ucts for its own neighbor assistant. 
From these experiments, it can be concluded that, as expected, use of PACED can reduce 
the performance costs incurred through the use of CED in a 2-D processor array. A designer of 
such an array can trade off between performance and the amount of outputs to suspect (and 
thereby, the error coverage) by choosing appropriate levels of the checking ratio NIM, provided 
a coding technique is used that facilitates error propagation iJ;l the array. 
94 
CHAPTER 6. 
SUMMARY 
In this thesis, it was shown that the use of periodic application of concurrent error detection 
(PACED) in VLSI processor array architectures can be an attractive alternative to the continu-
ous use of CEO in linear and two-dimensional processor array architectures. 
It was shown that for PACED applied in a single processor, high confidence can be 
achieved when only a small amount of output is suspected as possibly erroneous. This is possi-
ble assuming that errors arrive in clusters, with a fairly high arrival rate occurring for intraclus-
ter errors and a very small arrival rate for clusters themselves. 
For PACED applied in a unidirectional linear or two-dimensional mesh-connected proces-
sor array, even fewer of the array's previous outputs have to be suspected upon error detection, 
if a suitable coding scheme can be found to ensure the propagation of errors. Then, PEs in such 
arrays can cooperate to check the unchecked outputs of other PEs. Furthermore, future outputs 
have to be suspected only for PEs near the ends of linear arrays, since only these PEs can create 
errors that could possibly propagate undetected from the array. Any PE in a two-dimensional 
array can create an undetected error, and in these cases, somewhat more output has to be sus-
pected, depending on the position of the PE in the array. However, the sum total of outputs and 
the time interval which they encompass are smaller than those required for PACED in a single 
processor. 
95 
For each possible error detection site in the linear or two-dimensional array, a static pattern 
of outputs to suspect can be predetermined and stored. Upon error detection, knowledge of the 
particular check and PE that detected the error can be used to retrieve an error pattern that deter-
mines which outputs to suspect. Therefore, very little run-time overhead is required at error 
detection time to determine which outputs should be suspected. 
For all three of the architectures considered, the error coverage was found to be quite high 
even for low values of the checking ratio NIM. In the single processor case, this was due to the 
ability of the undetected-errors intervals to, in effect, "detect" errors that would otherwise have 
gone undetected, by casting suspicion on outputs that may have been corrupted. In the array 
cases, high coverage is achieved by the cooperation of the constituent PEs in the arrays to check 
the unchecked outputs of other PEs. 
In empirical studies of the performance cost of PACED in linear and two-dimensional 
arrays, it was found that performance was degraded approximately linearly with the amount of 
checking performed. Hence, PACED can reduce the performance cost of performing CED in 
such architectures by performing CED periodically instead of continuously. Coupled with the 
potentially high confidence that can be placed on most outputs at error detection time as well as 
the high error coverages possible even with infrequent checking, PACED can be an attractive 
alternative to continuous CED for some applications. 
This thesis has also described a simulation model that can estimate the performance cost of 
PACED in unidirectional linear, two-dimensional mesh and triangular processor arrays. This 
model, plus the confidence theorems and algorithms as well-as the error coverage estimates pre-
sented in this thesis, form a powerful package that can aid a designer in choosing the PACED 
96 
parameter values to trade off the performance cost of using CED for a minimal error detection 
latency, minimal number of outputs to suspect, and high error coverage. 
97 
REFERENCES 
[1] H. Yamamato, T. Watanabe, and Y. Urano, "Alternating logic and its application to 
fault detection," Proc. 1970 IEEE Int. Computer Group Conf, pp. 220-228, June 1970. 
[2] D. A. Reynolds and O. Metze, "Fault detection capabilities of alternating logic," IEEE 
Trans. Computers, vol. C-27, no. 12, pp. 1093-1098, Dec. 1978. 
[3] 1. H. Patel and L. Y. Fung, "Concurrent error detection in ALU's by recomputing with 
shifted operands," IEEE Trans. Computers, vol. C-31, no. 7, pp. 589-595, July 1982. 
[4] R. K. Oulati and S. M. Reddy, "Concurrent error detection in VLSI array structures," 
Proc. IEEE Int. Conf Computer Design, pp. 488-491, Oct 1986. 
[5] Y. H. Choi and M. Malek, "A fault-tolerant FFT processor," IEEE Trans. Computers, 
vol. 37. no. 5, pp. 617-621, May 1988. 
[6] F. T. Luk and E. K. Torng, "Fault tolerance techniques for systolic arrays." Proc. SPIE, 
Vol. 827, Real-TIme Signal Processing X, pp. 30-36, 1987. 
[7] J. H. Kim and S. M. Reddy, "A fault-tolerant systolic array design using TMR 
method," Proc. Int. Conf Computer Design, pp. 769-773, 1985. 
[8] A Majumdar, C. S. Raghavendra, and M. A. Breuer, "Fault tolerance in linear systolic 
arrays using time redundancy," IEEE Trans. Computers, vol. 39, no. 2. pp. 269-276. 
Feb. 1990. 
[9] 1. Y. Jou and 1. A Abraham, "Fault-tolerant FFT networks," Proc. 15th Int. Symp. 
Fault-Tolerant Computing, pp. 338-343, 1985. 
[10] K. H. Huang and J. A Abraham, "Algorithm-based fault tolerance for matrix opera-
tions," IEEE Trans. Computers, vol. C-33, no. 6, pp. 518-528, June 1984. 
[11] 1. C. Fabre, Y. Deswarte, J. C. Laprie, and D. Powell, "Saturation: reduced idleness for 
improved fault-tolerance," Proc. 18th Int. Symp. Fault-Tolerant Computing. pp. 
200-205, 1988. 
[12] A. T. Dahbura, K. K. Sabnani, and W. J. Hery, "Spare capacity as a means of fault 
detection and diagnosis in multiprocessor systems," IEEE Trans. Computers. vol. 38. 
no. 6, pp. 881-891, June 1989. 
[13] P. Banerjee, 1. T. Rahmeh, C. B. Stunkel, V. S. S. Nair, K. Roy, and 1. A. Abraham. 
"An evaluation of system-level fault tolerance on the Intel Hypercube multiprocessor," 
Proc. 18th Int. Symp. Fault-Tolerant Computing, pp. 362-367, June 1988. 
[14] C. L. Wey, "Concurrent error detection in array dividers by alternating input data," lEE 
Proceedings-E, vol. 139, no. 2, pp. 123-130, Mar. 1992. 
[15] W. T. Cheng and 1. H. Patel, "Concurrent error detection in iterative logic arrays," 
Proc. 14th Int. Symp. Fault-Tolerant Computing, pp. 10-15, June 1984. 
98 
[16] S. W. Chan and C. L. Wey, "The design of concurrent error diagnosable systolic arrays 
for band matrix multiplications," IEEE Trans. Computer-Aided Design, vol. 7, no. 1, 
pp. 21-37, Jan. 1988. 
[17] E. S. Manolakos and M. Bletsas, "The r-process: A time redundancy mechanism for 
concurrent error diagnosis in wavefront arrays," submitted to 23rd Int. Symp. F ault-
Tolerant Computing, 1993. 
[18] E. Manolakos, D. Dakhil, and M. Vai, "Concurrent error diagnosis in mesh array archi-
tectures based on overlapping H-processes," Proc. IEEE Workshop on Defect and 
Fault Tolerance in VLSI Systems, pp. 139-152, Nov. 1991. 
[19] Y. M. Wang, P. Y. Chung, and W. K. Fuchs, "Design and scheduling for periodic con-
current error detection and recovery in processor arrays," Technical Report 
CRHC-92-08, Center for Reliable and High-Performance Computing. Univ. of illinois, 
Urbana, IL, May 1992. 
[20] G. S. Sohi, M. Franklin, and K. K. Saluja, "A study of time-redundant fault tolerance 
techniques for high-performance pipelined computers," Proc. 19th Int. Symp. Fault-
Tolerant Computing, pp. 436-443, 1989. 
[21] P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, "IMPACT: An 
architectural framework for multiple-instruction-issue processors," Proc. 18th Int. 
Symp. Computer Architecture, pp. 266-273, 1991. 
[22] 1. Holm and P. Banerjee, "Low cost concurrent error detection in a VLIW architecture 
using replicated instructions," Proc. 1992 Int. Con! Parallel Processing, vol. I, pp. 
192-195, Aug. 1992. 
[23] M. A. Schuette and J. P. Shen, "Exploiting instruction-level resource parallelism for 
transparent, integrated control-flow monitoring," ONR 2d Annual Review & Workshop, 
Nov. 1991. 
[24] M. A. Breuer, "Testing for intermittent faults in digital circuits," IEEE Trans. Comput-
ers, vol. C-22, no. 3, pp. 241-246, Mar. 1973. 
[25] S. Kamal and C. V. Page, "Intermittent faults: A model and a detection procedure," 
IEEE Trans. Computers, vol. C-23, no. 7, pp. 713-719, July 1974. 
[26] S. Y. H. Su, I. Koren, and Y. K. Malaiya, "A continuous-parameter Markov model and 
detection procedures for intennittent faults," IEEE Trans. Computers, vol. C-27, no. 6. 
pp. 567-570, June 1978. 
[27] I. Koren and S. Y. H. Su, "Reliability analysis of N-modular redundancy systems with 
intermittent and permanent faults," IEEE Trans. Computers, vol. C-28, no. 7. pp. 
514-520, July 1979. 
[28] R. K. Iyer and P. Velardi, "Hardware-related software errors: Measurement and analy-
sis," IEEE Trans. Software Engineering, vol. SE-ll, no. 2, pp. 223-231, Feb. 1985. 
[29] D. Tang and R. K. Iyer, "Dependability measurement and modeling of a multicomputer 
system," IEEE Trans. Computers, vol. 42, no. 1, pp. 62-75, Jan. 1993. 
99 
[30] SAS user's guide: Statistics. Cary, NC: SAS Institute, Inc., 1985. 
[31] I. Lee. D. Tang, R. K. Iyer, and M. C. Hsueh, "Measurement-based evaluation of oper-
ating system fault tolerance," to appear, IEEE Trans. Reliability, vol. 42, no. 6, June 
1993. 
[32] M. Shoga, P. Adams, D. L. Chenette, R. Koga, and E. C. Smith, "Verification of single 
event upset rate estimation methods with on-orbit observations," IEEE Trans. Nuclear 
Science, vol. NS-34, no. 6, pp. 1256-1261, Dec. 1987. 
[33] D. Chlouber, P. O'Neill, and J. Pollock, "General upper bound on single-event upset 
rate," IEEE Trans. Nuclear Science, vol. 37, no. 2, pp. 1065-1071, Apr. 1990. 
[34] A Patterson-Hine, personal communication, Oct. 1990. 
[35] L. L. Sivo, 1. C. Peden, M. Brettschneider, W. Price, and P. Pentecost. "Cosmic ray-
induced soft errors in static MOS memory cells." IEEE Trans. Nuclear Science, vol. 
NS-26, no. 6, pp. 5042-5047, Dec. 1979. 
[36] D. Binder, E. C. Smith, and A B. Holman, "Satellite anomalies from galactic cosmic 
rays," IEEE Trans. Nuclear Science, vol. NS-22, no. 6, pp. 2675-2680, Dec. 1975. 
[37] J. C. Pickel and J. T. Blandford, Jr., "Cosmic-ray-induced errors in MOS devices," 
IEEE Trans. Nuclear Science, vol. NS-27, no. 2, pp. 1006-1015, Apr. 1980. 
[38] 1. B. Blake and R. Mandel, "On-orbit observations of single event upset in Harris 
HM-6508 lK RAMs," Report SD-TR-86-89, Space Division. Air Force Systems Com-
mand, Los Angeles, Feb. 1987. 
[39] E. Swartzlander, Jr., "Systolic FFT processors," pp. 133-140 in Systolic Arrays. Ed. W. 
Moore, A. McCabe, R. Urquhart. Bristol: Adam Hilger, 1987. 
[40] V. K. P. Kumar and Y. C. Tsai, "Synthesizing optimal family of linear systolic arrays 
for matrix computations," pp. 51-60 in Systolic Array Processors. Ed. J. McCanny, J. 
McWhirter, E. Swartzlander, Jr .. New York: Prentice Hall, 1988. 
[41] 1. A Vlontzos and S. Y. Kung, "A wavefront-array processor using dataflow processing 
elements," Proc. 1st Int. Conf. Supercomputing (Lecture Notes in Computer Science 
297), pp. 744-767, Springer-Verlag, 1987. 
[42] 1. F. Wakerly. Error Detecting Codes, Self-Checking Circuits, and Applications. New 
York: North-Holland. 1978. 
[43] 1. H. Patel and L. Y. Fung, "Concurrent error detection in multiply and divide arrays," 
IEEE Trans. Computers, vol. C-32, no. 4, pp. 417-422, Apr. 1983. 
[44] V. Piuri, "Fault-tolerant array processors: An approach based upon A*N codes," Proc. 
IEEE Inti. Symp. Circuits and Systems (ISCAS), pp. 199-203, June 1988. 
[45] J. Crawford and P. Gelsinger, Programming the 80386. San Francisco: Sybex, 1987. 
[46] S. Y. Kung, VLSI Array Processors. Englewood Cliffs: Prentice Hall, 1988. 
[47] R. Bayford, "The bit-serial systolic back-projecti0n engine (BSSBPE)," pp. 43-54 in 
Application Specific Array Processors. Ed. S. Y. Kung, E. Swartzlander, Jr., J. A. B. 
Fortes, K. W. Przytula. Los Alamitos: IEEE Computer Society Press, 1990. 
100 
[48] X. H. Wu and Z. Y. He, "Efficient systolic arrays for transfonn domain adaptive digital 
filters," pp. 23-32 in Systolic Array Processors. Ed. J. McCanny, 1. McWhirter, E. 
Swartzlander, Jr .. New York: Prentice Hall, 1989. 
[49] O. Menzilcioglu and H. T. Kung, "A highly configurable architecture for systolic 
arrays of powerful processors," pp. 156-165 in Systolic Array Processors. Ed. 1. 
McCanny,1. McWhirter, E. Swartzlander, Jr .. New York: Prentice Hall, 1989. 
[50] c. R. Ward, P. J. Hargrave, and 1. G. McWhirter, "A novel algorithm and architecture 
for adaptive digital beamforming," IEEE Trans. Antennas and Propagation, vol. 
AP-34, no. 3, pp. 338-346, Mar. 1986. 
[51] D. K. Hwang, T. L. Wernimont, and W. K. Fuchs, "Evaluation of a reconfigurable 
architecture for digital beamfonning using the OODRA workbench," Proc. 26th 
ACMREEE Design Automation Cont, pp. 614-617, June 1989. 
[52] E. S. Manolakos and S. Y. Kung, "CORP - A new recovery procedure for VLSI pro-
cessor arrays," 1988 IEEE Symp. Engineering in Computer-Based Medical Systems 
(EMB-88), June 1988. 
[53] E. S. Manolakos and S. Y. Kung, "Neighbor assisted recovery in VLSI processor 
arrays," Signal Processing IV: Theories and Applications, pp. 1245-1249, Sept. 1988. 
101 
VITA 
Paul Peichuan Chen was born in  on  He earned his B.S. 
degree in Electrical Engineering from Stanford University, California, in 1984. For his work 
with the Tau Beta Pi Engineering Course Evaluation project at Stanford, Mr. Chen received the 
Dean's Award for Service, given by the Stanford School of Engineering, in 1984. Also in 1984, 
he was a Rhodes Scholar State Finalist for the state of Arizona. 
In 1987, Mr. Chen obtained the M.S. degree, also in Electrical Engineering, from the Uni-
versity of lllinois, where he was employed first as a teaching assistant for the Department of 
Electrical and Computer Engineering from 1984 to 1986 and subsequently as a research assis-
tant at the Center for Reliable and High-Performance Computing from 1986 to 1990. He 
received the Harold L. Olesen Award for Excellence in Undergraduate Teaching by a Graduate 
Student in 1986, and consulted for the School of Veterinary Biosciences on a project involving 
the cycle of the seminiferous epithelium from 1990 to 1992. 

